"lm",
"mlm",
or
"bdLm" that represents a linear model fit.
lm(formula, data=<<see below>>, weights=<<see below>>, subset=<<see below>>, na.action=na.fail, method="qr", model=F, x=F, y=F, contrasts=NULL, ...)
~ operator
and the terms, separated by
+ operators,
on the right.
The response may be a single numeric variable or a matrix.
formula,
subset, and
weights
arguments.
If data is bdFrame then the function
bdLm will be called.
See the DETAILS section below for additional information and
restrictions when using lm with a bdFrame.
data may also be a single number to handle some special cases --
see below for details.
If
data is missing,
the variables in the model formula should be in the search path.
weights must be the same
as the number of observations.
The weights must be nonnegative
and it is recommended that they be strictly positive,
since zero weights are ambiguous.
To exclude particular observations from the model,
use the
subset argument instead of zero weights.
model.frame
after any
subset argument has been applied.
The default is
na.fail,
which returns an error if any missing values are found.
An alternative is
na.exclude,
which deletes observations that contain one or more missing values.
"qr",
"svd", and
"chol",
and the default is
"qr".
The method
"model.frame" simply
returns the model frame.
TRUE,
then the model frame is returned in the
model
component of the fitted object.
TRUE,
then the model matrix is returned in the
x
component of the fitted object.
TRUE,
then the response is returned in the
y
component of the fitted object.
TRUE,
then the QR decomposition of the model matrix is returned
in the
qr component of the fitted object.
lm.fit and the functions it calls.
Two possibilities are
singular.ok=T,
which instructs the fitting algorithm to continue in the presence
of over-determined models,
and
tolerance,
which specifies the tolerance level for over-determined models.
The default tolerance is 1e-07.
"lm"
or
"mlm" representing the fit.
See
lm.object for details.
If the response is a matrix,
then the returned object is of class
"mlm".
In this case, the coefficients, residuals, and effects are also matrices,
with columns corresponding to the individual response variables.
If the
data argument is a bdFrame then the function
bdLm
is immediately called by
lm.
The
bdLm function does not support all the arguments that
lm
does.
See the
help file
for more information.
The
formula argument is passed around
unevaluated; that is,
the variables in the formula are defined when the model frame is computed,
and not when
lm is initially called.
In particular, if
data is given,
the variables in
formula
should generally be defined as variables in
data.
Because they are passed unevaluated from one function to another,
variables in a model formula are evaluated differently
than arguments to S-PLUS functions.
Functions such as
lm that are able
to evaluate the formula variables try to establish a context
based on the
data argument.
More precisely, the function
model.frame.default
does the actual evaluation,
assuming that its caller behaves in the way described here.
If the
data argument
to
lm is missing
or is an object (typically, a data frame),
then the local context for variable names is the frame of the function
that called
lm.
If the user called
lm directly,
the local context for variable names is the top-level expression frame.
Names in the model formula can refer to variables in the local context,
as well as to global variables
or variables in the
data object.
The
data argument can also be a number,
in which case it defines the local context.
This can arise, for example, if a function is written
to call
lm
but the local context is definitely not that function's frame.
In this case, the function can set
data
to
sys.parent(),
and the local context will be the next function up in the calling stack.
See the last example below for an illustration of this.
A numeric value for
data can
also be supplied if a local context is explicitly created
by a call to
new.frame.
Note that supplying
data as a number
implies that it is the only local context;
local variables in any other function will not be available
when the model frame is evaluated.
This is potentially subtle.
Fortunately, it is not something the ordinary user
of
lm needs to worry about.
It is relevant, however, for those writing functions
that call
lm
(or other similar model-fitting functions).
The
subset argument,
like the terms in the model formula,
is evaluated in the context of the
data argument,
if present.
The specific action of
subset is as follows:
the model frame, including
weights
and
subset,
is computed on all rows of the data set
and then the appropriate subset is extracted.
A variety of special cases make such an interpretation desirable.
For example, functions such as
lag
may need more than the data used in the fit to be fully defined.
On the other hand, if you use
subset
to avoid computing undefined values or to escape warning messages,
you may be surprised.
For example,
lm(y ~ log(x), data=mydata, subset=x > 0)
still generates warnings from
log.
To avoid this, do the subsetting on the data frame directly:
lm(y ~ log(x), data=mydata[mydata$x > 0, ])
Generic functions such as
print
and
summary have methods
for showing the results of a fit.
See
lm.object for a description
of the fit components.
The functions
residuals,
coefficients
,
and
effects should be used to extract components,
rather than subscripting them directly
from the
lm.object.
The extractor functions take correct account of special circumstances,
such as overdetermined models.
S-PLUS implements observation weights
through the
weights argument
to most regression functions.
Observation weights are appropriate when the variances
of individual observations are inversely proportional to the weights.
For a set of weights
wi,
one interpretation is that the ith observation
is the average of
wi other observations,
each having the same predictors and (unknown) variance.
This is the interpretation of the weights included
in the
claims example below.
Another situation in which these types of weights arise is
when the relative precision of the observations is known in advance.
It is important to note that an observation weight is not the same
as a frequency, or case weight,
which represents the number of times a particular observation is repeated.
It is possible to include frequencies as
a
weights argument
to a S-PLUS regression function;
although this produces the correct coefficients for the model,
inference tools such as standard errors, p-values,
and confidence intervals are incorrect.
In addition, S-PLUS does not currently support weighted regression
when the absolute precision of the observations is known.
This situation arises often in physics and engineering,
when the uncertainty associated with a particular measurement
is known in advance due to properties of the measuring procedure or device.
If you know the absolute precision of your observations,
it is possible to supply them
to the
weights argument.
This computes the correct coefficients for your model,
but the standard errors and other inference tools will be incorrect.
Belsley, D. A., Kuh, E. and Welsch, R. E. (1980). Regression Diagnostics. New York: Wiley.
Draper, N. R. and Smith, H. (1981). Applied Regression Analysis (second edition). New York: Wiley.
Myers, R. H. (1986). Classical and Modern Regression with Applications. Boston: Duxbury.
Rousseeuw, P. J. and Leroy, A. (1987). Robust Regression and Outlier Detection. New York: Wiley.
Seber, G. A. F. (1977). Linear Regression Analysis. New York: Wiley.
Weisberg, S. (1985). Applied Linear Regression (second edition). New York: Wiley.
There is a vast literature available on regression; the references above are just a small sample. The book by Myers is an introductory text that includes a discussion of many of the recent advances in regression technology. The Seber book is at a higher mathematical level and covers much of the classical theory of least squares.
lm(freeny.y ~ freeny.x)
lm(Fuel ~ . , data=fuel.frame)
# formulas have intercepts by default, so include
# a -1 for regression without an intercept.
lm(Mileage ~ Weight - 1, data=fuel.frame)
# example of weighted regression
lm(cost ~ age + type + car.age, data=claims,
weights=number, na.action=na.exclude)
# myfit calls lm, using the caller to myfit
# as the local context for variables in the formula
# (see aov for an actual example)
myfit <- function(formula, data=sys.parent(), ...) {
.. ..
fit <- lm(formula, data, ...)
.. ..
}