scan(file="", what=numeric(), n=<<see below>>, sep="",
multi.line=F, flush=F, append=F, skip=0, widths=NULL,
strip.white=<<see below>>,
scan.as.integer=F, locale)
file is missing or empty (
""), data is read from standard input. In this case,
scan prompts with the index for the next data item, and input can be terminated by a blank line.
For more details on reading data from connections, see the help file for
file.
"numeric",
"character", or
"complex", or a list of vectors of these modes.
Objects of mode
"logical" are not allowed. The
scan function reads all fields in the file as data of the same mode as
what.
Thus,
what=character() or
what="" reads data as character fields.
If
what is missing,
scan interprets all fields as numeric.
If
what is a list, then each record is considered to have length
(what) fields, and the mode of each field is the mode of the corresponding component in
what. When
widths is given as a vector of length greater than one,
what must be a list the same length as
widths.
file, or to an empty line if reading from standard input.
"\t" for tabs
or
"\n" for newlines. If omitted, any amount of white space (blanks, tabs, and possibly newlines) can separate fields. If the
widths argument is specified,
sep specifies the separator to insert into fixed-format records. By default,
sep="".
multi.line=FALSE, all fields must appear on one line.
If
scan reaches the end of a line without reading all the fields, an error occurs.
Thus, the number of fields on each line must be a multiple of the length of
what, unless
flush=TRUE. This is useful for checking that no fields have been omitted. If
multi.line=TRUE, reading continues and the positions of newlines are disregarded.
By default,
multi.line=FALSE.
flush=TRUE, the
scan function flushes to the end of the line after reading the last of the fields requested. This allows you to include comments that are not read by
scan after the last field. It also prevents multiple sets of items from being placed on one line. By default,
flush=FALSE.
append=TRUE, the returned object includes all of the elements in the
what argument, with the input data for the respective fields appended to each component. If
append=FALSE, the data in
what is ignored and only the modes matter. By default,
append=FALSE.
skip=0 and reading begins at the top of the file.
what argument.
The
widths argument provides for common fixed-format input.
If
widths is not
NULL, then as
scan reads the characters in a record, it automatically inserts a
sep character after reading
widths[1] characters;
widths[1] represents the width of the first field.
The
scan function then inserts another
sep after
widths[2] characters, and so on, allowing the record to be read as if your input was originally delimited by the
sep character.
The default
sep used when
widths is supplied is
"\001" (binary 1); if your input contains this character, you should set the
sep argument to a character that is not contained anywhere in the input.
One caveat: the
widths vector you specify must correspond exactly to field widths in your input.
If they do not, you may get
"field undecipherable" errors in seemingly odd places, or the input may be silently but incorrectly digested.
By default,
widths=NULL. Note that if
widths has a length greater than one, the
what argument must be a list of the same length.
what argument.
The
strip.white argument allows you to strip leading and trailing white space from character fields;
scan always strips numeric fields in this way.
If
strip.white is not
NULL, it must be either of length 1, in which case the single logical value tells whether to strip all fields read, or it must be the same length as
what, in which case the logical vector tells which fields to strip.
For example, if
strip.white[1]=TRUE and field 1 is character,
scan strips the leading and trailing white space from field 1.
If
widths is specified,
strip.white=TRUE by default and all fields are stripped.
Otherwise,
strip.white=NULL by default and no fields are stripped. If you read free-format input by leaving
sep unspecified, then
strip.white has no effect.
what
argument. The default
scan.as.integer=FALSE means to treat it as double precision
while
TRUE means to treat them as integers. This is here because previous
versions of S-PLUS parsed
what=1 as double precision but now it is parsed
as an integer.
Sys.setlocale function.
If given, read numbers as if you were in the given locale.
what argument if it is present, and a numeric vector if
what is omitted.
It is possible to read files that contain more than one mode by specifying a list as the
what argument. For example, if the fields in the file myfile are alternately numeric and character, the command
scan(myfile, what=list(0,"")) reads them and returns an object of mode
"list" that has a numeric vector and a character vector as its two elements.
The elements of
what can be anything, as long as you have numbers where you want numeric fields, character data where you want character fields, and complex numbers where you want complex fields.
A
NULL component in
what causes the corresponding field to be skipped during input.
The elements are used only to decide the kind of field, unless
append=TRUE.
Note that
scan retains the
names attribute of the list, if any.
Thus, the command
z <- scan(myfile, what=list(pop=0, city="")) allows you to refer to
z$pop and
z$city.
Any numeric field containing the characters
NA is returned as a missing value. If the field separator (the
sep argument) is given and the field is empty, the returned value is
NA for a numeric or complex field and
"" for a character field.
The main use of separators is to allow white space inside character fields.
For example, suppose in the command above that the numeric field is to be followed by a tab, with text filling out the rest of the line.
The command
z <- scan(myfile, what=list(pop=0, city=""), sep="\t") allows blanks in the city name.
With no separator, arbitrary white space can be included by quoting the whole string.
With a separator, quotes are not used; if the separator character is to be included in a string, it must be escaped by a preceding backslash.
Fields of mode
"logical" cannot be read directly. Instead, read them as character fields and convert them by using expressions such as
x=="T".
Any field that cannot be interpreted according to the mode(s) supplied to
scan causes an error.
The
scan function employs C scan formats to read numeric data, rather than using the S-PLUS parser (the
parse function).
Exponential notation must use
"e"; numbers that use
"d" or other letters will be read incorrectly.
You will need to change your data from the
"d" notation to the
"e" notation with, for instance, the sed utility in UNIX.
As it reads more and more records,
scan allocates more space to accommodate the growing vectors.
If you supply a
what argument that is identical in size to the result you expect, S-PLUS uses that space and does not have to perform memory allocations.
This may produce significant memory savings when dealing with large files of data.
The
make.fields function preprocesses files that have fixed-format fields and places separators after each field.
It can be used as a separate step instead of using the
widths argument with
scan.
The advantage of using
widths is that you do not need to create any temporary files.
The
read.table function reads data from a file and returns a data frame.
It is often a better choice than
scan if the data are in a regular table format with rows of equal length.
The
count.fields function returns the number of fields in each line of a file, which is useful for determining if
read.table is appropriate.
The
count.fields function is also helpful when using
scan to return a list, if the number of fields in each line is a proper multiple of the length of
what.
The
readline is another function that accepts data interactively.
# Read numeric values from standard input.
num <- scan()
# Read a label and two numeric fields to make a matrix.
z <- scan("myfile", list(name="", 0, 0))
mat <- cbind(z[[2]], z[[3]])
dimnames(mat) <- list(z$name, c("X","Y"))
# Like previous, but make columns integer
z <- scan("myfile", list(name="", 0, 0), scan.as.integer = T)
# Read in a vector of character data.
personnel <- scan("person", what="")
# Create a list with two NULL components, a character component,
# and a numeric component. Fields are separated by tabs.
ff <- scan("myfile", what=list(NULL, name="", data=0, NULL),
multi.line=T, sep="\t")
# Delete NULL components from ff.
ff <- ff[sapply(ff, length) > 0]
# Save in single precision, skip the first five lines of the file.
scan("myfile", single(0), skip=5)
# Example of reading a fixed format file using the widths and
# strip.white arguments. Blanks are read as NA for numeric fields.
# Assignment can be suppressed for a field using NULL in the what argument.
# For this example, the file 'dfile' contains the following lines:
# 01giraffe.9346H01-04
# 88donkey .1220M00-15
# 77ant L04-04
# 20gerbil .1220L01-12
# 22swallow.2333L01-03
# 12lemming L01-23
mydf.what <- list(code=0, name="", x=0, s="", n1=0, NULL, n2=0)
mydf.widths <- c(2, 7, 5, 1, 2, 1, 2)
# strip.white defaults to TRUE if widths is specified.
# You can also use strip.white = c(F, T, F, F, F, F, F).
mydf <- scan("dfile", what=mydf.what, widths=mydf.widths)
mydf
# This produces the following output:
# $code:
# [1] 1 88 77 20 22 12
# $name:
# [1] "giraffe" "donkey" "ant" "gerbil" "swallow" "lemming"
# $x:
# [1] 0.9346 0.1220 NA 0.1220 0.2333 NA
# $s:
# [1] "H" "M" "L" "L" "L" "L"
# $n1:
# [1] 1 0 4 1 1 1
# [[6]]:
# NULL
# $n2:
# [1] 4 15 4 12 3 23
# Now with strip.white argument:
mydf <- scan("dfile", what=mydf.what, widths=mydf.widths, strip.white=F)
mydf$name
# This produces a list just like the one above, except
# the columns are not stripped:
# [1] "giraffe" "donkey " "ant " "gerbil " "swallow" "lemming"