sed program entirely within S-PLUS (function
sedit).
The
substring.location function returns the first and last position
numbers that a sub-string occupies in a larger string. The
substring2<-
function does the opposite of the builtin function
substring.
It is named
substring2 because for S-Plus 5.x there is a built-in
function
substring, but it does not handle multiple replacements in
a single string.
replace.substring.wild
edits character strings in the fashion of
"change xxxxANYTHINGyyyy to aaaaANYTHINGbbbb", if the "ANYTHING"
passes an optional user-specified
test function. Here, the
"yyyy" string is searched for from right to left to handle
balancing parentheses, etc.
numeric.string
and
all.digits are two examples of
test functions, to check,
respectively if each of a vector of strings is a legal numeric or if it contains only
the digits 0-9. For the case where
old="*$" or "^*", or for
replace.substring.wild
with the same values of
old or with
front=TRUE
or
back=TRUE,
sedit (if
wild.literal=FALSE) and
replace.substring.wild
will edit the largest substring
satisfying
test.
substring2 is just a copy of
substring so that
substring2<-
will work.
sedit(text, from, to, test, wild.literal=FALSE) substring.location(text, string, restrict) # substring(text, first, last) <- setto # S-Plus only replace.substring.wild(text, old, new, test, front=FALSE, back=FALSE) numeric.string(string) all.digits(string) substring2(text, first, last=1e6) substring2(text, first, last) <- value
sedit, substring2, substring2<-
or a single character string for
substring.location,
replace.substring.wild.
sedit.
A single asterisk wild card, meaning allow any sequence of characters
(subject to the
test function, if any) in place of the
"*".
An element of
from may begin with
"^" to force the match to
begin at the beginning of
text, and an element of
from can end with
"$" to force the match to end at the end of
text.
sedit.
If a corresponding element in
from had an
"*", the element
in
to may also have an
"*". Only single asterisks are allowed.
If
to is not the same length as
from, the
rep function
is used to make it the same length.
substring.location,
numeric.string,
all.digits
substring2<-.
first may also be a vector of character strings
that are passed to
sedit to use as patterns for replacing
substrings with
setto. See one of the last examples below.
first is character,
last must be
omitted.
substring2<-
replace.substring.wild.
May be
"*$" or
"^*" or any string containing a single
"*" but
not beginning with
"^" or ending with
"$".
replace.substring.wild
TRUE or
FALSE according
to whether that string element qualifies as the wild card string for
sedit, replace.substring.wild
TRUE to not treat asterisks as wild cards and to not look for
"^" or
"$" in
old
substring.location which specifies a
range to which the search for matches should be restricted
front=TRUE and
old="*" is the same as specifying
old="^*"
back=TRUE and
old="*" is the same as specifying
old="*$"
sedit returns a vector of character strings the same length as
text.
substring.location
returns a list with components named
first
and
last, each specifying a vector of character positions corresponding
to matches.
replace.substring.wild returns a single character string.
numeric.string
and
all.digits return a single logical value.
Frank Harrell
Department of Biostatistics
Vanderbilt University School of Medicine
f.harrell@vanderbilt.edu
x <- 'this string'
substring2(x, 3, 4) <- 'IS'
x
substring2(x, 7) <- ''
x
substring.location('abcdefgabc', 'ab')
substring.location('abcdefgabc', 'ab', restrict=c(3,999))
replace.substring.wild('this is a cat','this*cat','that*dog')
replace.substring.wild('there is a cat','is a*', 'is not a*')
replace.substring.wild('this is a cat','is a*', 'Z')
qualify <- function(x) x==' 1.5 ' | x==' 2.5 '
replace.substring.wild('He won 1.5 million $','won*million',
'lost*million', test=qualify)
replace.substring.wild('He won 1 million $','won*million',
'lost*million', test=qualify)
replace.substring.wild('He won 1.2 million $','won*million',
'lost*million', test=numeric.string)
x <- c('a = b','c < d','hello')
sedit(x, c('=','he*o'),c('==','he*'))
sedit('x23', '*$', '[*]', test=numeric.string)
sedit('23xx', '^*', 'Y_{*} ', test=all.digits)
replace.substring.wild("abcdefabcdef", "d*f", "xy")
x <- "abcd"
substring2(x, "bc") <- "BCX"
x
substring2(x, "B*d") <- "B*D"
x