sed
program entirely within S-PLUS (function
sedit
).
The
substring.location
function returns the first and last position
numbers that a sub-string occupies in a larger string. The
substring2<-
function does the opposite of the builtin function
substring
.
It is named
substring2
because for S-Plus 5.x there is a built-in
function
substring
, but it does not handle multiple replacements in
a single string.
replace.substring.wild
edits character strings in the fashion of
"change xxxxANYTHINGyyyy to aaaaANYTHINGbbbb", if the "ANYTHING"
passes an optional user-specified
test
function. Here, the
"yyyy" string is searched for from right to left to handle
balancing parentheses, etc.
numeric.string
and
all.digits
are two examples of
test
functions, to check,
respectively if each of a vector of strings is a legal numeric or if it contains only
the digits 0-9. For the case where
old="*$" or "^*"
, or for
replace.substring.wild
with the same values of
old
or with
front=TRUE
or
back=TRUE
,
sedit
(if
wild.literal=FALSE
) and
replace.substring.wild
will edit the largest substring
satisfying
test
.
substring2
is just a copy of
substring
so that
substring2<-
will work.
sedit(text, from, to, test, wild.literal=FALSE) substring.location(text, string, restrict) # substring(text, first, last) <- setto # S-Plus only replace.substring.wild(text, old, new, test, front=FALSE, back=FALSE) numeric.string(string) all.digits(string) substring2(text, first, last=1e6) substring2(text, first, last) <- value
sedit, substring2, substring2<-
or a single character string for
substring.location,
replace.substring.wild
.
sedit
.
A single asterisk wild card, meaning allow any sequence of characters
(subject to the
test
function, if any) in place of the
"*"
.
An element of
from
may begin with
"^"
to force the match to
begin at the beginning of
text
, and an element of
from
can end with
"$"
to force the match to end at the end of
text
.
sedit
.
If a corresponding element in
from
had an
"*"
, the element
in
to
may also have an
"*"
. Only single asterisks are allowed.
If
to
is not the same length as
from
, the
rep
function
is used to make it the same length.
substring.location
,
numeric.string
,
all.digits
substring2<-
.
first
may also be a vector of character strings
that are passed to
sedit
to use as patterns for replacing
substrings with
setto
. See one of the last examples below.
first
is character,
last
must be
omitted.
substring2<-
replace.substring.wild
.
May be
"*$"
or
"^*"
or any string containing a single
"*"
but
not beginning with
"^"
or ending with
"$"
.
replace.substring.wild
TRUE
or
FALSE
according
to whether that string element qualifies as the wild card string for
sedit, replace.substring.wild
TRUE
to not treat asterisks as wild cards and to not look for
"^"
or
"$"
in
old
substring.location
which specifies a
range to which the search for matches should be restricted
front=TRUE
and
old="*"
is the same as specifying
old="^*"
back=TRUE
and
old="*"
is the same as specifying
old="*$"
sedit
returns a vector of character strings the same length as
text
.
substring.location
returns a list with components named
first
and
last
, each specifying a vector of character positions corresponding
to matches.
replace.substring.wild
returns a single character string.
numeric.string
and
all.digits
return a single logical value.
Frank Harrell
Department of Biostatistics
Vanderbilt University School of Medicine
f.harrell@vanderbilt.edu
x <- 'this string' substring2(x, 3, 4) <- 'IS' x substring2(x, 7) <- '' x substring.location('abcdefgabc', 'ab') substring.location('abcdefgabc', 'ab', restrict=c(3,999)) replace.substring.wild('this is a cat','this*cat','that*dog') replace.substring.wild('there is a cat','is a*', 'is not a*') replace.substring.wild('this is a cat','is a*', 'Z') qualify <- function(x) x==' 1.5 ' | x==' 2.5 ' replace.substring.wild('He won 1.5 million $','won*million', 'lost*million', test=qualify) replace.substring.wild('He won 1 million $','won*million', 'lost*million', test=qualify) replace.substring.wild('He won 1.2 million $','won*million', 'lost*million', test=numeric.string) x <- c('a = b','c < d','hello') sedit(x, c('=','he*o'),c('==','he*')) sedit('x23', '*$', '[*]', test=numeric.string) sedit('23xx', '^*', 'Y_{*} ', test=all.digits) replace.substring.wild("abcdefabcdef", "d*f", "xy") x <- "abcd" substring2(x, "bc") <- "BCX" x substring2(x, "B*d") <- "B*D" x