lm
or
mlm
that represents a fit
of a linear model. The
lm
function is generic (see Methods); method
functions can be written to handle specific classes of
data. Classes which already have methods for this function include:
model.list
.
lm(formula, data=<<see below>>, weights, subset, na.action, method="qr", model=F, x=F, y=F, contrasts=NULL, ...)
~
operator,
and the terms, separated by
+
operators, on the right.
subset
and the
weights
argument.
If this is missing, then the variables in the formula should be on the
search list.
This may also be a single number to handle some special cases -- see
below for details.
weights
must be the same as the number of observations.
The weights must be nonnegative and it is strongly recommended that they
be strictly positive, since zero weights are ambiguous, compared to use
of the
subset
argument.
subset
argument has been used.
The default (with
na.fail
) is to create an error
if any missing values are found.
A possible alternative is
na.omit
, which deletes observations
that contain one or more missing values.
"qr"
. Or may indicate that a data structure is to be returned
before fitting. The method
"model.frame"
simply returns the model
frame, and
"model.list"
returns the model list. In this latter case,
the fitting method may included as well, in case the model list is to
be fit later (by a call to
).
For example,
c("model.list", "svd")
(the order is not important). This is the only case in
which a vector is recognized.
TRUE
, the model frame is returned in component
model
.
TRUE
, the model matrix is returned in component
x
.
TRUE
, the response is returned in component
y
.
TRUE
, the QR decomposition of the model matrix is returned
in component
qr
.
singular.ok=T
to instruct the fitting to
continue in the presence of over-determined models,
and
tolerance
(default 1e-07) to change the tolerance for determining
when models are over-determined.
"lm"
or
"mlm"
representing the fit, or of class
"model.frame"
or
"model.list"
if signalled by the
method
argument.
See
,
,
,
and
for details.
The
formula
argument is passed around
unevaluated ;that is, the variables mentioned in the formula will be defined when
the model frame is computed, not when
lm
is initially called.
In particular, if
data
is given, all these names should generally
be defined as variables in that data frame.
The
subset
argument, like the terms in
formula
, is evaluated in the context
of the data frame, if present.
The specific action of the argument is as follows: the model frame,
including weights and subset, is computed on
allthe rows, and then the appropriate subset is extracted.
A variety of special cases make such an interpretation
desirable (e.g., the use of
lag
or other functions that may need
more than the data used in the fit to be fully defined).
On the other hand, if you meant the subset to avoid computing
undefined values or to escape warning messages, you may be surprised.
For example,
lm(y ~ log(x), mydata, subset = x > 0)
will still generate warnings from
log
. If this is a problem, do
the subsetting on the data frame directly:
lm(y ~ log(x), mydata[mydata$x > 0, ])
Generic functions such as and have methods to show the results of the fit. See for the components of the fit, but the functions , , and should be used rather than extracting the components directly, since these functions take correct account of special circumstances, such as overdetermined models.
The response may be a single numeric variable or a matrix.
In the latter case,
,
,
and
will also be
matrices, with columns corresponding to the response variables.
In either case, the object inherits from class
"lm"
.
For multivariate response, the first element of the class is
"mlm"
.
NAMES.
Variables occurring in a formula are evaluated differently from
arguments to S-PLUS functions, because the formula is an object
that is passed around unevaluated from one function to another.
The functions such as
lm
that finally arrange to evaluate the
variables in the formula try to establish a context based on the
data
argument.
(More precisely, the function
does the actual evaluation, assuming that its caller behaves in
the way described here.)
If the
data
argument to
lm
is missing or is an object (typically, a data frame),
then the local context for
variable names is the frame of the function that called
lm
, or the top-level
expression frame if you called
lm
directly.
Names in the formula can refer to variables in the local context as well
as global variables or variables in the
data
object.
The
data
argument can also be a number, in which case that number defines
the local context.
This can arise, for example, if a function is written to call
lm
, perhaps
in a loop, but the local context is definitely
notthat function.
In this case, the function can set
data
to
sys.parent()
, and the local
context will be the next function up the calling stack.
See the third example below.
A numeric value for
data
can also be supplied if a local context
is being explicitly created by a call to
new.frame
.
Notice that supplying
data
as a number implies that this is the
onlylocal context; local variables in any other function will not be
available when the model frame is evaluated.
This is potentially subtle.
Fortunately, it
is not something the ordinary user of
lm
needs to worry about.
It is relevant for those writing functions that call
lm
or other
such model-fitting functions.
Belsley, D. A., Kuh, E. and Welsch, R. E. (1980), Regression Diagnostics, Wiley, New York.
Draper, N. R. and Smith, H. (1981), Applied Regression Analysis, (second edition). Wiley, New York.
Myers, R. H. (1986), Classical and Modern Regression with Applications, Duxbury, Boston.
Rousseeuw, P. J. and Leroy, A. (1987), Robust Regression and Outlier Detection, Wiley, New York.
Seber, G. A. F. (1977), Linear Regression Analysis, Wiley, New York.
Weisberg, S. (1985), Applied Linear Regression, Second Edition, Wiley, New York.
There is a vast literature on regression, the references above are just a small sample of what is available. The book by Myers is an introductory text that includes a discussion of much of the recent advances in regression technology. The Seber book is at a higher mathematical level and covers much of the classical theory of least squares.
lm(Fuel ~ . , fuel.frame) lm(cost ~ age + type + car.age, claims, weights = number, na.action = na.omit) lm(freeny.y ~ freeny.x) # myfit calls lm, using the caller to myfit # as the local context for variables in the formula # (see aov for an actual example) myfit <- function(formula, data = sys.parent(), ...) { .. .. fit <- lm(formula, data, ...) .. .. }