Read ASCII Files Containing Spatial Contiguity Information

DESCRIPTION:

Reads ASCII files containing spatial data with variable length records and creates an object of class spatial.neighbor.

USAGE:

read.neighbor(filename, field.names="region.id", region.id= 
             length(field.names), first.neighbor=length(field.names)+1, 
             keep=T, which.numeric= rep(T, length(field.names)), char=F, 
             sep=<<see below>>, skip=0, ...)  

REQUIRED ARGUMENTS:

filename
character string giving the directory path and name of an existing file containing the spatial information to be read in. If no path is given, the filename is assumed to be in the current directory. This file is assumed to have one record (row) per spatial unit, which must contain neighborhood (contiguity) information. The number of records per row may vary but some information in addition to the neighbors may appear consistently from record to record. The varying record length is assumed to be completely due to the varying number of neighbors that each region (spatial unit) may have.

OPTIONAL ARGUMENTS:

field.names
character vector containing the names of the "fixed" fields (in the sense that there is exactly one field per record for all records) in the ASCII file. This vector MUST include the name that we want to give to the region identifier. The default assumes that there are no fields other than the neighborhood information, and hence determines that the only "fixed" field is the region identifier named "region.id".
region.id
character string: which of those variables in field.names is to denote the region identifier? This must be an element of field.names. It could also be an integer between 1 and length(field.names) denoting its index in field.names. The default assumes that the "fixed" fields are the laid out first and that the region identifier is the last one of these.
first.neighbor
integer denoting the column that contains the first neighbor in the record. It defaults to length(field.names)+1 which implies that the "fixed" fields appear first in each record. This argument allows flexibility on the placement of the varying number of fields (the neighbors) anywhere in the record so long this location is the same from record to record (starting at first.neighbor).
keep
logical flag: should we discard or keep the information in the fixed fields? If TRUE a data frame is generated with the "fixed" information, if FALSE then only the spatial neighborhood information is returned as an object of class "spatial.neighbor". Default is TRUE.
which.numeric
a logical vector. Which fixed variables are to be considered numeric? This must be the same length as field.names. Its components must be TRUE if the corresponding variable in field.names is numeric, FALSE otherwise. This must be used if any character variables appear in the data or an error will be generated. Default is all numeric fields.
char
logical flag: is the neighborhood information all character? Defaults to FALSE, since the most common case is when all neighbors are numeric. If this flag is set to TRUE, the resulting object implies a mapping from the character id's onto integers representing the indices of the spatial neighbor matrix.
sep
separator (single character), often `"\t"' for tab or `"\n"' for newline. If omitted, any amount of white space (blanks, tabs, and possibly newlines) can separate fields. Same as argument sep for the S-PLUS function scan.
skip
the number of initial lines of the file that should be skipped prior to reading. Must be used if the file filename contains extraneous information such as a header. Same as argument skip for the S-PLUS function scan.
...
Routine scan is used to read all of the data as character strings. Other arguments to scan that can be used include widths and strip.white. See the help file for this function for more information on these two arguments.

VALUE:

if keep=T a list with two named components:
data
a data frame containing the variables listed in argument field.names.
nhbr
an object of class "spatial.neighbor" with the neighborhood information read. Note that this function forces all weights to be equal to 1.

If keep=F then an object of class "spatial.neighbor" is returned. This is the same as component nhbr above.

DETAILS:

read.neighbor is designed to make the input of spatial information into SPATIALSTATS simple for the most common situation.

The file in filename is assumed to consist of one record for each region or spatial unit. Each of these records (arranged as rows of filename) contains a fixed number of variables, and a varying number of neighboring region identifiers.

As a first step, all data is read as a single vector of character strings. The information provided in arguments field.names and region.id (and first.neighbor, if specified) is then used to convert the data to the resulting data frame and spatial neighbor objects.

If specified, argument which.numeric can be used to determine which variables (columns in the returned data component) should be of mode numeric or of mode character.

For example, suppose a record contains the following fields:

city, county, county.id, n1, n2, ..., nk, comments1, comments2

where n1, n2, ... and possibly more are fields containing region identifiers for neighbors of the current region, county.id is the field containing an identifier for the current region, and the remaining variables are covariates. In this case, the necessary arguments could be declared as follows:

field.names = c("city", "county", "county.id", "comments1", "comments2")

region.id = "county.id"

first.neighbor = 4

which.numeric = c(F, F, T, F, F)

and the neighbor fields ( n1, n2, ... nk) do not need listing.

SEE ALSO:

, , , .

EXAMPLES:

# Given the file "test" formatted as follows: 
#   1 2 3 4 
#   2 1 3 
#   3 2 1 4 
#   4 1 3 
cat(file="test", sep="\n", 
  c("1 2 3 4", "2 1 3", "3 2 1 4", "4 1 3")) 
read.neighbor("test") 
### 
# Now if file "test1" is: 
#   "bill" 1 2 3 4 
#   "bob" 2 1 3 
#   "mary" 3 2 1 4 
#   "pat" 4 1 3 
cat(file='test1', sep='\n',  
  c('"bill" 1 2 3 4', '"bob" 2 1 3', '"mary" 3 2 1 4', '"pat" 4 1 3')) 
read.neighbor("test1",field.names=c("surveyor","id"),which.numeric=c(F,T)) 
### 
# Another file is "test2": 
#   1 2 3 4 "bill" 
#   2 1 3 "bob" 
#   3 2 1 4 "mary" 
#   4 1 3 "pat" 
cat(file='test2', sep='\n', 
  c('1 2 3 4 "bill"', '2 1 3 "bob"', '3 2 1 4 "mary"', '4 1 3 "pat"')) 
read.neighbor("test2", field.names=c("id","surveyor"), region.id="id", first=2, 
              which.numeric=c(T,F)) 
# Or we may have a different separator and a header, as in file "test3": 
#   Neighbourhood information for Survey 5 
#   Id Neighbours Surveyor 
#   1,2,3,4,bill 
#   2,1,3,bob 
#   3,2,1,4,mary 
#   4,1,3,pat 
cat(file='test3', sep='\n', 
  c('Neighbourhood information for Survey 5', 
    'Id Neighbours Surveyor', 
    '1,2,3,4,bill', '2,1,3,bob', '3,2,1,4,mary', '4,1,3,pat')) 
read.neighbor("test3", field.names=c("id","surveyor"), region.id="id", 
              first.neighbor=2, which.numeric=c(T,F), sep=",", skip=2)  
# Use logical flag char=T if the neighborhood information is character 
# For example, say that we have regions A,B,C,D and E in "test4": 
#   A 23 B C D 
#   B 12 A E 
#   C 07 
#   D 02 B E 
#   E 17 A D 
cat(file='test4', sep='\n', 
  c('A 23 B C D', 'B 12 A E', 'C 07', 'D 02 B E', 'E 17 A D')) 
read.neighbor("test4",c("County","Record #"), region.id=1, char=T) 
# read.neighbor("sids.dat", field.names=c("Name","id","nwbirths.ft", 
#              "sid.ft","births","sid","nwbirths"), region.id="id", 
#              which.numeric=c(F,T,T,T,T,T,T)) 
# (Note that the file "sids.dat" is not distributed with SpatialStats)