Split strings into pieces based on regular expression

DESCRIPTION:

Split strings into pieces based on regular expression matching parts to split by or parts to keep.

USAGE:

strsplit(x, split, extended=T, fixed=F, ignore.case=F, perl=F, keep=F, subpattern=0)

REQUIRED ARGUMENTS:

x
A vector of character strings to split.
split
A regular expression (see for details) to split by. By default, parts of the input strings that match this are cut out and the remaining parts are returned. However, if keep = TRUE, this matches the parts of the string to return.

OPTIONAL ARGUMENTS:

extended
If TRUE then split is taken to be an "extended" regular expression. Otherwise it is taken to be an "obsolete" (or "basic") regular expression.
fixed
If TRUE then split is taken to be a fixed string to match, not a regular expression at all. No characters are special.
ignore.case
If TRUE then upper and lower case letters are considered identical when matching the regular expression split to the text argument x.
perl
If TRUE then split is taken to be a perl regular expression. (This is not supported yet.)
keep
If TRUE then the parts of the input strings that match split will be returned. Otherwise the parts of the input strings that match split are considered to be the separators and are omitted.
subpattern
If the regular expression includes parenthesized subpatterns, then subpattern=n means to use the n'th matched subpattern to split by. Subpatterns are counted by the number of left parentheses from the left. The default, subpattern=0, means to use the match for the entire pattern.

VALUE:

A list the length of the x. Each element is a vector of character strings representing the split up strings.

SEE ALSO:

For the details of the syntax of the various sorts of regular expression patterns, see .

EXAMPLES:

strsplit(c("Hello", "Two words"), "[[:space:]]")
# [[1]]:
# [1] "Hello"
#  
# [[2]]:
# [1] "Two"   "words"
# Find all numbers in a string
number.pattern <-
   "[-+]?(([0-9]+(\\.[0-9]*)?)|(\\.[0-9]+))([eE][+-]?[0-9]+)?"
strsplit("1.2, 0, .1, -.1e2, 2.3e4, and 2e-5", number.pattern, keep=T)
# [[1]]:
# [1] "1.2"   "0"     ".1"    "-.1e2" "2.3e4" "2e-5" 
strsplit("Hello", "")
# [[1]]:
# [1] "H" "e" "l" "l" "o"
strsplit("Hello", ".", keep=T) # gives same result
strsplit("20 at $5 at or 100 at $4", "\\$([0-9]+)", keep=TRUE, subpattern=1)
# [[1]]:
# [1] "5" "4"