Search for Pattern in Text

DESCRIPTION:

Searches for a text pattern as described by a regular expression in a vector of character strings or a bdCharacter.

USAGE:

grep(pattern, text, value=FALSE,
     extended = TRUE, perl = FALSE, fixed = FALSE,
     ignore.case = FALSE, subpattern = 0)

REQUIRED ARGUMENTS:

pattern
character string specifying the pattern to search for. The interpretation of pattern is controlled by the values of the extended, perl, fixed, ignore.case, and subpattern arguments. See help(regexpr) for details.
text
vector of character strings or a bdCharacter.

OPTIONAL ARGUMENTS:

value
A logical scalar. If FALSE then return the indices of the matched elements of text. If TRUE return the matched elements themselves.
...
All other arguments, the most useful being ignore.case, extended, and fixed, are passed to regexpr, where they affect how the pattern argument is interpreted.

VALUE:

If value is FALSE, a numeric vector telling which elements of text matched pattern ( numeric(0) means there are no matches). This return value can be used as a subscript to retrieve the matching elements of text. If value is TRUE then the matching elements of text are returned (they will have been converted to character data if they didn't start that way).

DETAILS:

The pattern argument specifies a regular expression. Look at the help file for regexpr for details.

NOTE:

grep calls the regexpr function, which uses a pattern matching language (resembling the Unix/Linux grep command) on all platforms and is done in C code (not as a call to any operating system command).

By default, regexpr sets the argument extended=TRUE to specify that the pattern is treated as an extended regular expression. This default affects the characters '+', '(', '|', among others. To treat the pattern as a basic (or 'obsolete') regular expression, add the argument extended=FALSE. Alternatively, you can use the argument fixed=TRUE if the pattern does not represent a regular expression, but just a literal string to match. Or you can leave extended=TRUE and put a double backslash (\\) before the affected character to to cause it to be taken literally.

Earlier versions of S-PLUS supplied a version of grep that was different on different platforms. This version is still available as oldGrep.

SEE ALSO:

, , , , .

EXAMPLES:

grep("ia$", state.name, value=TRUE)
  # returns all states that end in "ia"
grep("I", state.name, value=TRUE, ignore.case=TRUE)
  # returns all states containing "I" or "i"
grep("^[AEIOUY].*[aeiouy]$", state.name, value=TRUE) 
  # returns states that begin and end with a vowel
grep("^[AEIOUY]|[aeiouy]$", state.name, value=TRUE) 
  # returns states that begin or end with a vowel
grep("[aeiouy]{3,}", state.name, value=TRUE)
  # names with 3 or more vowels in a row
grep("^([^aeiouy][aeiouy]+)*$", state.name, ignore.case=TRUE, value=TRUE)
  # names where every consonant is followed by at least one vowel

grep("^\\+", c("+1","-10","+3","0"))
  # items starting with a plus sign

  # using a backslash with grep in S-PLUS: 
str <- c("SP500","S.P500") 
grep("^S.", str)  # S followed by any character
  # [1] 1 2 
grep("^S\.", str) # same as above because S removes the \ 
  # [1] 1 2 
grep("^S\\.", str) # S followed by a period
  # [1] 2