scan(file="", what=numeric(), n=<<see below>>, sep="", multi.line=F, flush=F, append=F, skip=0, widths=NULL, strip.white=<<see below>>, scan.as.integer=F, locale)
file
is missing or empty (
""
), data is read from standard input. In this case,
scan
prompts with the index for the next data item, and input can be terminated by a blank line.
For more details on reading data from connections, see the help file for
file
.
"numeric"
,
"character"
, or
"complex"
, or a list of vectors of these modes.
Objects of mode
"logical"
are not allowed. The
scan
function reads all fields in the file as data of the same mode as
what
.
Thus,
what=character()
or
what=""
reads data as character fields.
If
what
is missing,
scan
interprets all fields as numeric.
If
what
is a list, then each record is considered to have length
(what)
fields, and the mode of each field is the mode of the corresponding component in
what
. When
widths
is given as a vector of length greater than one,
what
must be a list the same length as
widths
.
file
, or to an empty line if reading from standard input.
"\t"
for tabs
or
"\n"
for newlines. If omitted, any amount of white space (blanks, tabs, and possibly newlines) can separate fields. If the
widths
argument is specified,
sep
specifies the separator to insert into fixed-format records. By default,
sep=""
.
multi.line=FALSE
, all fields must appear on one line.
If
scan
reaches the end of a line without reading all the fields, an error occurs.
Thus, the number of fields on each line must be a multiple of the length of
what
, unless
flush=TRUE
. This is useful for checking that no fields have been omitted. If
multi.line=TRUE
, reading continues and the positions of newlines are disregarded.
By default,
multi.line=FALSE
.
flush=TRUE
, the
scan
function flushes to the end of the line after reading the last of the fields requested. This allows you to include comments that are not read by
scan
after the last field. It also prevents multiple sets of items from being placed on one line. By default,
flush=FALSE
.
append=TRUE
, the returned object includes all of the elements in the
what
argument, with the input data for the respective fields appended to each component. If
append=FALSE
, the data in
what
is ignored and only the modes matter. By default,
append=FALSE
.
skip=0
and reading begins at the top of the file.
what
argument.
The
widths
argument provides for common fixed-format input.
If
widths
is not
NULL
, then as
scan
reads the characters in a record, it automatically inserts a
sep
character after reading
widths[1]
characters;
widths[1]
represents the width of the first field.
The
scan
function then inserts another
sep
after
widths[2]
characters, and so on, allowing the record to be read as if your input was originally delimited by the
sep
character.
The default
sep
used when
widths
is supplied is
"\001"
(binary 1); if your input contains this character, you should set the
sep
argument to a character that is not contained anywhere in the input.
One caveat: the
widths
vector you specify must correspond exactly to field widths in your input.
If they do not, you may get
"field undecipherable"
errors in seemingly odd places, or the input may be silently but incorrectly digested.
By default,
widths=NULL
. Note that if
widths
has a length greater than one, the
what
argument must be a list of the same length.
what
argument.
The
strip.white
argument allows you to strip leading and trailing white space from character fields;
scan
always strips numeric fields in this way.
If
strip.white
is not
NULL
, it must be either of length 1, in which case the single logical value tells whether to strip all fields read, or it must be the same length as
what
, in which case the logical vector tells which fields to strip.
For example, if
strip.white[1]=TRUE
and field 1 is character,
scan
strips the leading and trailing white space from field 1.
If
widths
is specified,
strip.white=TRUE
by default and all fields are stripped.
Otherwise,
strip.white=NULL
by default and no fields are stripped. If you read free-format input by leaving
sep
unspecified, then
strip.white
has no effect.
what
argument. The default
scan.as.integer=FALSE
means to treat it as double precision
while
TRUE
means to treat them as integers. This is here because previous
versions of S-PLUS parsed
what=1
as double precision but now it is parsed
as an integer.
Sys.setlocale
function.
If given, read numbers as if you were in the given locale.
what
argument if it is present, and a numeric vector if
what
is omitted.
It is possible to read files that contain more than one mode by specifying a list as the
what
argument. For example, if the fields in the file myfile are alternately numeric and character, the command
scan(myfile, what=list(0,""))
reads them and returns an object of mode
"list"
that has a numeric vector and a character vector as its two elements.
The elements of
what
can be anything, as long as you have numbers where you want numeric fields, character data where you want character fields, and complex numbers where you want complex fields.
A
NULL
component in
what
causes the corresponding field to be skipped during input.
The elements are used only to decide the kind of field, unless
append=TRUE
.
Note that
scan
retains the
names
attribute of the list, if any.
Thus, the command
z <- scan(myfile, what=list(pop=0, city=""))
allows you to refer to
z$pop
and
z$city
.
Any numeric field containing the characters
NA
is returned as a missing value. If the field separator (the
sep
argument) is given and the field is empty, the returned value is
NA
for a numeric or complex field and
""
for a character field.
The main use of separators is to allow white space inside character fields.
For example, suppose in the command above that the numeric field is to be followed by a tab, with text filling out the rest of the line.
The command
z <- scan(myfile, what=list(pop=0, city=""), sep="\t")
allows blanks in the city name.
With no separator, arbitrary white space can be included by quoting the whole string.
With a separator, quotes are not used; if the separator character is to be included in a string, it must be escaped by a preceding backslash.
Fields of mode
"logical"
cannot be read directly. Instead, read them as character fields and convert them by using expressions such as
x=="T"
.
Any field that cannot be interpreted according to the mode(s) supplied to
scan
causes an error.
The
scan
function employs C scan formats to read numeric data, rather than using the S-PLUS parser (the
parse
function).
Exponential notation must use
"e"
; numbers that use
"d"
or other letters will be read incorrectly.
You will need to change your data from the
"d"
notation to the
"e"
notation with, for instance, the sed utility in UNIX.
As it reads more and more records,
scan
allocates more space to accommodate the growing vectors.
If you supply a
what
argument that is identical in size to the result you expect, S-PLUS uses that space and does not have to perform memory allocations.
This may produce significant memory savings when dealing with large files of data.
The
make.fields
function preprocesses files that have fixed-format fields and places separators after each field.
It can be used as a separate step instead of using the
widths
argument with
scan
.
The advantage of using
widths
is that you do not need to create any temporary files.
The
read.table
function reads data from a file and returns a data frame.
It is often a better choice than
scan
if the data are in a regular table format with rows of equal length.
The
count.fields
function returns the number of fields in each line of a file, which is useful for determining if
read.table
is appropriate.
The
count.fields
function is also helpful when using
scan
to return a list, if the number of fields in each line is a proper multiple of the length of
what
.
The
readline
is another function that accepts data interactively.
# Read numeric values from standard input. num <- scan() # Read a label and two numeric fields to make a matrix. z <- scan("myfile", list(name="", 0, 0)) mat <- cbind(z[[2]], z[[3]]) dimnames(mat) <- list(z$name, c("X","Y")) # Like previous, but make columns integer z <- scan("myfile", list(name="", 0, 0), scan.as.integer = T) # Read in a vector of character data. personnel <- scan("person", what="") # Create a list with two NULL components, a character component, # and a numeric component. Fields are separated by tabs. ff <- scan("myfile", what=list(NULL, name="", data=0, NULL), multi.line=T, sep="\t") # Delete NULL components from ff. ff <- ff[sapply(ff, length) > 0] # Save in single precision, skip the first five lines of the file. scan("myfile", single(0), skip=5) # Example of reading a fixed format file using the widths and # strip.white arguments. Blanks are read as NA for numeric fields. # Assignment can be suppressed for a field using NULL in the what argument. # For this example, the file 'dfile' contains the following lines: # 01giraffe.9346H01-04 # 88donkey .1220M00-15 # 77ant L04-04 # 20gerbil .1220L01-12 # 22swallow.2333L01-03 # 12lemming L01-23 mydf.what <- list(code=0, name="", x=0, s="", n1=0, NULL, n2=0) mydf.widths <- c(2, 7, 5, 1, 2, 1, 2) # strip.white defaults to TRUE if widths is specified. # You can also use strip.white = c(F, T, F, F, F, F, F). mydf <- scan("dfile", what=mydf.what, widths=mydf.widths) mydf # This produces the following output: # $code: # [1] 1 88 77 20 22 12 # $name: # [1] "giraffe" "donkey" "ant" "gerbil" "swallow" "lemming" # $x: # [1] 0.9346 0.1220 NA 0.1220 0.2333 NA # $s: # [1] "H" "M" "L" "L" "L" "L" # $n1: # [1] 1 0 4 1 1 1 # [[6]]: # NULL # $n2: # [1] 4 15 4 12 3 23 # Now with strip.white argument: mydf <- scan("dfile", what=mydf.what, widths=mydf.widths, strip.white=F) mydf$name # This produces a list just like the one above, except # the columns are not stripped: # [1] "giraffe" "donkey " "ant " "gerbil " "swallow" "lemming"