read.table(file, header=<<see below>>, sep, row.names=NULL, col.names=<<see below>>, as.is=<<see below>>, na.strings="NA", skip=0, stringsAsFactors=<<see below>>, locale)
sep
, or the file may be fixed format
with the fields starting at fixed points within each row.
TRUE
, then the first line of the file is used as the
variable names of the resulting data frame.
The default is
FALSE
, unless there is one less field in the first line of
the file than in the second line.
"\t"
for tab.
If omitted, any amount of white space (blanks or tabs) can separate fields.
To read fixed format files, make
sep
a numeric vector giving the
initial columns of the fields.
row.names
is missing, the function will use the first nonnumeric field with
no duplicates as the row names.
If no such field exists, the row names are
1:nrow(x)
.
You can force this last version, regardless of suitable fields to
use as row names, by giving
row.names=NULL
.
Row names, wherever they come from, must be unique.
"V"
and the field number are be pasted together.
Variable names, wherever they come from, must be unique.
Variable names will be converted
to syntactic names before assignment, but
notif they came from an explicit
col.names
argument.
as.is=FALSE
, non-numeric fields are turned into factors, except if they
are used as row names.
The argument will be replicated as needed to be of length equal to the
number of fields; thus,
as.is=FALSE
converts all character fields.
Converting them sometimes can save time
and memory and make it easier to use the statistical modelling function.
as.is
argument.
The default is
TRUE
, unless one sets
options(stringsAsFactors=FALSE)
.
na.strings
will be excluded from
the levels of the factor,
so that if any of the character data were one of the strings in
na.strings
the corresponding element of the factor would be NA.
Also, in a numeric column, these strings will be converted to NA.
Sys.setlocale
function.
If given, read numbers as if you were in the given locale.
header==T
)
and as many variables as the file has fields (or one less if one variable was
used for row names).
Fields are initially read in as character data.
If all the items in a field are numeric, the corresponding variable is numeric.
Otherwise, it is character, except as controlled by the
as.is
argument.
All lines must have the same number of fields (except the header, which can
have one less if the first field is to be used for row names).
This function should be compared to
scan
;
read.table
tries much harder to interpret the input data automatically,
figuring out the number of variables and whether fields are numeric.
It also produces a more structured object as output.
The price for this, aside from
read.table
being somewhat slower,
is that the input data must themselves be more regular and
that
read.table
decides what to do with each field,
except for the use of the
as.is
argument.
With
scan
, input lines do not need to correspond to one complete
set of fields, and the user decides what mode each field should have.
Overall,
read.table
will usually be the easy way to construct
data frames from tables.
If it doesn't do what you want,
consider the functions
scan
,
make.fields
,
or
count.fields
, as well as text-editing tools and languages outside S-PLUS.
# Example 1: Fields are in fixed columns separated by variable white space. # Fields have internal white space. First two lines of file "cars": # Price Country Reliability Mileage Type # Acura Integra 4 11950 Japan Much better NA Small # Give sep argument a vector of column numbers. # First line has same no. of fields as all the rest, so # extract row labels using scan, then set header explicitly: cars.names <- scan("cars", what="", flush = T, widths=30, skip = 1, strip = TRUE) cars <- read.table("cars", header = TRUE, row.names = cars.names, sep = c(30, 36, 46, 58, 66)) # Example 2: Fields are separated by ~ character; header defaults # to TRUE, and first character field is automatically assigned to # the row labels. First two lines of file: # ~Price~Country~Reliability~Mileage~Type # Acura Integra 4~11950~Japan~Much better~NA~Small cars <- read.table("cars.tab", sep = "~") # Example 3: Fields are separated by variable white space. # There is no internal white space, so you need not specify sep. # Use na.strings to specify string used for missing data. # First three lines of file: # Price Country Reliability Mileage Type # Acura_Integra_4 11950 Japan Excelent N/A Small # Dodge_Colt_4 6851 Japan N/A N/A Small cars <- read.table("cars.na" , na.strings="N/A")