Create a Data Frame by Reading a Table

DESCRIPTION:

Reads from a file or connection in table format and creates a data frame with the same number of rows as there are lines in the file, and the same number of variables as there are fields in the file.

USAGE:

read.table(file, header=<<see below>>, sep, row.names=NULL, 
       col.names=<<see below>>, as.is=<<see below>>, na.strings="NA", skip=0,
       stringsAsFactors=<<see below>>, locale) 

REQUIRED ARGUMENTS:

file
the text file or S-PLUS connection from which to read the data. The file should contain one line per row of the table. The fields may be separated by the character in sep, or the file may be fixed format with the fields starting at fixed points within each row.

OPTIONAL ARGUMENTS:

header
logical flag: if TRUE, then the first line of the file is used as the variable names of the resulting data frame. The default is FALSE, unless there is one less field in the first line of the file than in the second line.
sep
the field separator (single character), often "\t" for tab. If omitted, any amount of white space (blanks or tabs) can separate fields. To read fixed format files, make sep a numeric vector giving the initial columns of the fields.
row.names
optional specification of the row names for the data frame. If provided, it can give the actual row names, as a vector of length equal to the number of rows, or it can be a single number or character string. In the latter case, the argument indicates which variable in the data frame to use as row names (the variable will then be dropped from the frame). If row.names is missing, the function will use the first nonnumeric field with no duplicates as the row names. If no such field exists, the row names are 1:nrow(x). You can force this last version, regardless of suitable fields to use as row names, by giving row.names=NULL. Row names, wherever they come from, must be unique.
col.names
optional names for the variables. If missing, the header information, if any, is used; if all else fails, "V" and the field number are be pasted together. Variable names, wherever they come from, must be unique. Variable names will be converted to syntactic names before assignment, but notif they came from an explicit col.names argument.
as.is
control over conversions to factor objects. If as.is=FALSE, non-numeric fields are turned into factors, except if they are used as row names. The argument will be replicated as needed to be of length equal to the number of fields; thus, as.is=FALSE converts all character fields. Converting them sometimes can save time and memory and make it easier to use the statistical modelling function.
stringsAsFactors
An alternate (now preferred) name for the inverse of the as.is argument. The default is TRUE, unless one sets options(stringsAsFactors=FALSE).
na.strings
character vector; when character data is converted to factor data the strings in na.strings will be excluded from the levels of the factor, so that if any of the character data were one of the strings in na.strings the corresponding element of the factor would be NA. Also, in a numeric column, these strings will be converted to NA.
skip
the number of lines in the file to skip before reading data.
locale
character string as used in the Sys.setlocale function. If given, read numbers as if you were in the given locale.

VALUE:

a data frame with as many rows as the file has lines (or one less if header==T) and as many variables as the file has fields (or one less if one variable was used for row names). Fields are initially read in as character data. If all the items in a field are numeric, the corresponding variable is numeric. Otherwise, it is character, except as controlled by the as.is argument. All lines must have the same number of fields (except the header, which can have one less if the first field is to be used for row names).

DETAILS:

This function should be compared to scan; read.table tries much harder to interpret the input data automatically, figuring out the number of variables and whether fields are numeric. It also produces a more structured object as output. The price for this, aside from read.table being somewhat slower, is that the input data must themselves be more regular and that read.table decides what to do with each field, except for the use of the as.is argument. With scan, input lines do not need to correspond to one complete set of fields, and the user decides what mode each field should have. Overall, read.table will usually be the easy way to construct data frames from tables. If it doesn't do what you want, consider the functions scan, make.fields, or count.fields, as well as text-editing tools and languages outside S-PLUS.

SEE ALSO:

, , , , , .

EXAMPLES:

# Example 1: Fields are in fixed columns separated by variable white space. 
# Fields have internal white space.  First two lines of file "cars": 
#                      Price   Country Reliability Mileage   Type 
#      Acura Integra 4 11950     Japan Much better      NA  Small 
# Give sep argument a vector of column numbers. 
# First line has same no. of fields as all the rest, so   
# extract row labels using scan, then set header explicitly: 
cars.names <- scan("cars", what="", flush = T, widths=30, 
     skip = 1, strip = TRUE) 
cars <- read.table("cars", header = TRUE, row.names = cars.names, 
          sep = c(30, 36, 46, 58, 66)) 
# Example 2:  Fields are separated by ~ character; header defaults  
# to TRUE, and first character field is automatically assigned to  
# the row labels.  First two lines of file: 
# ~Price~Country~Reliability~Mileage~Type 
#       Acura Integra 4~11950~Japan~Much better~NA~Small 
cars <- read.table("cars.tab", sep = "~") 
# Example 3:  Fields are separated by variable white space.   
# There is no internal white space, so you need not specify sep. 
# Use na.strings to specify string used for missing data. 
# First three lines of file: 
# Price                           Country Reliability     Mileage Type 
# Acura_Integra_4         11950   Japan   Excelent        N/A     Small 
# Dodge_Colt_4            6851    Japan   N/A     N/A     Small 
 cars <- read.table("cars.na" ,  na.strings="N/A")