slm
that represents a fit of a spatial
linear (generalized least squares) regression model.
slm(formula, cov.family, data=<<see below>>, subset=<<see below>>, spatial.arglist=NULL, na.action=na.fail, model=F, x=F, y=F, contrasts=NULL, ...)
+
operators, on the right.
"cov.family"
giving the spatial covariance family to
be fit. Valid values are:
CAR
(conditional auto-regression),
SAR
(simultaneous auto-regression), or
MA
(moving average). These are
S-PLUS objects containing functions required by the slm fitting algorithm.
The covariance model is defined by argument
cov.family
and is further
defined by the variables listed in argument
spatial.arglist
.
subset
arguments.
If this is missing, then the variables in the formula should be on the
search list. This may also be a single number to handle some special
cases -- see NAMES below for details.
cov.family
. Instead of
entering these arguments individually,
spatial.arglist
is used
to allow the algorithms to be generalized to different kinds of
models. For all of the models currently fit by
slm
, the
spatial.arglist
argument contains the following variables:
neighbor
- an object of class
"spatial.neighbor"
containing the
neighbors and weights to be used when defining the covariance model
(see
spatial.neighbor
).
region.id
- a vector containing the rows currently available in the
spatial neighbor object. Argument
region.id
must be given whenever
argument
subset
is given and rows have previously been removed from
the spatial
neighbor
object. This is described below in the DETAILS
section. Also see the help file for
spatial.subset
.
weights
- the
cov.family
uses the
neighbor
argument to
determine a covariance matrix for the residuals. All current types
for
cov.family
allow the specification of a diagonal matrix of
weights in the parameterization of the covariance matrix. See the
the
cov.family
for the parameterization. If specified, vector
weights
contains these diagonal values. If omitted, weights equal
to 1 are used.
start
- vector of starting values for the optimization algorithm. Since
a profile likelihood is optimized, only starting values for the covariance
matrix parameters (vector
parameters
in the output) can be provided. If
not provided, these typically default to zero, but this depends upon the
cov.family
.
print.level
- if
TRUE
, then the function evaluations are printed as the
optimization algorithm proceeds. This can be quite useful for checking on
convergence of the algorithm to the maximum likelihood estimates.
subset
argument has been used. The default
(with
na.fail
) is to create an error if any missing values are
found. A possible alternative is
na.omit
, which deletes observations
that contain one or more missing values.
TRUE
, the model frame is returned in component
model
.
TRUE
, the model matrix is returned in component
x
.
TRUE
, the response is returned in component
y
.
TRUE
, the QR decomposition of the model matrix is returned
in component
qr
.
slm.nlminb
and which effect the iterative estimation algorithm.
In particular, various algorithmic control values can be passed, along
with the lower and upper bounds of the parameters.
"slm"
. Objects of class
"slm"
contain most elements available in class
"lm"
objects (but they
do not inherit from class
"lm"
objects), and they also contain items
returned by the function
nlminb
. These elements are as follows:
singular.ok
was
true, there will be missing values in the coefficients corresponding
to inestimable coefficients.
cov.family.object
help file or the
CAR
,
SAR
, or
MA
help files
for more information.
X
contains the independent variables, and
beta
contains the coefficients of the linear model.
R
, the columns
of
R
will have been pivoted, and missing values will have been
inserted in the coefficients. The upper-left
rank
rows and columns
of
R
are the nonsingular part of the fit, and the remaining columns
of the first
rank
rows give the aliasing information (see
alias
).
i
th element of the list is the vector saying which coefficients
correspond to the
i
th term. It may be of length 0 if there were no
estimable effects for the term.
model=TRUE
.
x=TRUE
.
y=TRUE
.
spatial.arglist
) used in the
model.
slm
fits maximum likelihood estimates of spatial
regression models (these are equivalent to generalized least squares
estimates) using finite difference derivatives and a quasi-Newton
optimization algorithm. In such models one assumes a linear model,
E(y/x) = x beta,
for the means of the dependent variable given the fixed
covariate values, but the errors are assumed to arise from a
multivariate normal distribution with a covariance structure as
specified by the covariance structure model
cov.family
.
See the help files for the
MA
,
CAR
and
SAR
objects for types of
covariance structures available (for the usual model based on
independent errors,
lm
may be used.)
The sparse matrix routines of Kundert (1988) are used in solving
linear systems and computing determinants required by the likelihood
function. The use of these routines makes the algorithm much
more efficient than would otherwise be the case. Even so, the
cpu time required by the algorithm can be quite large, so lattices
with more than, say, 200 to 400 regions should be handled carefully
to ensure that cpu time will be available.
A profile likelihood is computed. In this likelihood an equation for
the linear model parameters (beta) is obtained for known covariance
model parameters. Substituting this equation back into the
likelihood, the "profile" likelihood is obtained as a function of the
covariance model parameters alone. Because a profile likelihood is
used, there is a relatively small number of parameters to optimize,
making the use of finite difference derivatives more attractive.
Subsetting operations on the spatial data frame are more difficult
because the spatial neighbor object must also be subset. This means
that a correspondence must be maintained between the "data" object
which contains the fixed covariates and the "neighbor" object which
maintains information about neighbor relationships. The
region.id
variable of argument
spatial.arglist
provides this
correspondence. In the following, for the sake of clarity, we suppose
that the linear model is specified via a data frame argument
data
.
Vector
region.id
must be the same length as the vectors in the
linear model, and the i-th element of
region.id
must "name" the
region for the i-th row of
data
in exactly the same manner that the
row.id
and
col.id
values in the
"spatial.neighbor"
object name a
region. Then the elements of
region.id
are keys to the
row.id
and
col.id
columns of object
neighbor
. If rows of object
data
are
removed, the names of these rows is given by the elements of object
region.id
, and these names are the same names as are used in the
row.id
and
col.id
columns of object
neighbor
. Then rows in
neighbor
can be removed by the subsetting operation.
If the
subset
argument is present, it is evaluated in
the context of the data frame, like the terms in
formula
.
It is also used in the
computation of subsets for any of the arguments contained in
spatial.arglist
, including variables
neighbor
and
region.id
. The
specific action of
subset
on the model arguments is as follows:
the model frame is computed on
allrows, then the appropriate subset is extracted.
A variety of special cases make such an interpretation
desirable (e.g., the use of
lag
or other functions that may need
more than the data used in the fit to be fully defined).
On the other hand, if you meant the subset to avoid computing
undefined values or to escape warning messages, you may be surprised.
For example,
slm(y ~ log(x), cov.family = SAR, data = mydata, subset = x > 0)
will still generate warnings from
log
. If this is a problem, do
the subsetting on the data frame directly:
slm(y ~ log(x), cov.family = SAR, data = mydata[mydata$x > 0,])
The
subset
argument acts on variable
neighbor
of the
spatial.arglist
argument as follows:
Let
region.id
of
spatial.arglist
identify the row
numbers in
neighbor
of
spatial.arglist
corresponding to the rows of the data frame given in argument
data
.
Then
region.id[subset]
is a listing of the row and column numbers to be
used in
neighbor
. Rows and columns of
neighbor
not in the vector
region.id[subset]
are removed.
As in
lm
, the
formula
argument is passed around
unevaluated;that is, the variables mentioned in the formula in
slm
will be
defined when the model frame is computed, not when
slm
is
initially called. In particular, if
data
is given, all these names
should be defined as variables in that data frame.
Generic functions such as
print
have methods to show the results of
the fit.
NAMES.
Variables occurring in a formula are evaluated differently from
arguments to S-PLUS functions, because the formula is an object
that is passed around unevaluated from one function to another.
The functions such as
slm
that finally arrange to evaluate the
variables in the formula try to establish a context based on the
data
argument. (More precisely, the function
model.frame.default
does the actual evaluation, assuming that its caller behaves in
the way described here.) If the
data
argument to
slm
is missing or
is an object (typically, a data frame), then the local context for
variable names is the frame of the function that called
slm
, or the
top-level expression frame if the user called
slm
directly.
Names in the formula can refer to variables in the local context as well
as global variables or variables in the
data
object.
The
data
argument can also be a number, in which case that number defines
the local context. This can arise, for example, if a function is
written to call
slm
, perhaps in a loop, but the local context is
definitely
notthat function. In this case, the function can set
data
to
sys.parent()
, and the local context will be the next function up the
calling stack. See the third example below. A numeric value for
data
can also be supplied if a local context is being explicitly created by
a call to
new.frame
. Notice that supplying
data
as a number
implies that this is the
onlylocal context; local variables in any other function will not be
available when the model frame is evaluated. This is potentially
subtle. Fortunately, it is not something the ordinary user of
slm
needs to worry about. It is relevant for those writing functions that
call
slm
or other such model-fitting functions.
Cliff, A. D. and Ord, J. K. (1981).
Spatial Processes - Models and Applications.
Pion Limited. London.
Cressie, N. A. C. (1993).
Statistics for Spatial Data.
(Revised Edition). Wiley, New York.
Haining, R. (1990).
Spatial Data Analysis in the Social and Environmental Sciences.
Cambridge University Press. Cambridge.
Kundert, Kenneth S. and Sangiovanni-Vincentelli, Alberto (1988).
A Sparse Linear Equation Solver.
Department of EE and CS, University of California, Berkeley.
Ripley, B. D. (1981).
Spatial Statistics.
Wiley, New York.
There is a vast literature on spatial regression and generalized least
squares, the references above are just a small sample of what is
available.
sids.maslm <- slm(sid.ft ~ nwbirths.ft, cov.family=MA, data=sids, spatial.arglist=list(neighbor=sids.neighbor)) sids.sarslm <- slm(sid.ft ~ nwbirths.ft, cov.family=SAR, data=sids, subset=c(-5,-1), spatial.arglist=list(neighbor=sids.neighbor, region.id=1:100, weights=1/sids$births)) # myfit calls slm, using the caller to myfit as the local context # for variables in the formula (see aov for an actual example) myfit <- function(formula, cov.family, data=sys.parent(), ...) { .. .. fit <- slm(formula, cov.family, data, ...) .. .. }