formula
.
This class of objects represents the structural models in all model-fitting
functions, and is used also in a number of other functions, particularly
for plots.
These are created by a call to the
~
operator.
This is typically done inside a call to a model fitting function, but
need not be.
Generic functions that have methods specific to
"formula"
include:
alias
,
deriv
,
dput
,
formula
,
pairs
,
plot
,
print
,
update
.
A formula is a call (i.e., of mode
"call"
) to the
~
operator.
Thus, considered as a list it has (generally) three components.
The first component is the name
~
, the second component is the
response, and the third component is the explanatory variables.
It is possible to build a formula without a response, in which case it must be processed before being used in the typical fashion.
Formulas are their own value; that is, they represent an expression
calling the operator
~
, but evaluating this expression just returns
the expression itself.
The purpose of formula objects is to supply the essential information to
fit models, produce plots, etc., in a readable form that can be passed
around, stored in other objects, and manipulated to determine the
terms and response of a model.
Names in the formula will eventually be interpreted as objects,
often as variables in a data frame.
This interpretation, however, only takes place when the related subexpressions
have been removed from the formula object.
Operators in formulas have the same precedence as anywhere else in S-PLUS.
Operators that are especially significant in formulas are:
+
,
-
,
*
,
/
,
:
,
^
and
%in%
.
The functions
and
are also used, as is
.
(period).
##### General rules for formulas T ~ F T is modeled as F, where F may include other terms Fa + Fb Include both Fa and Fb in the model Fa - Fb Include all of Fa except what is in Fb in the model Fa : Fb The interaction between Fa and Fb Fa * Fb Fa + Fb + Fa : Fb Fb %in% Fa Fb is nested within Fa Fa / Fb Fa + Fb %in% Fa F^m All terms in F crossed to order m . in update(), the previous set, in other formulas, all variables (except those on the other side of ~) In these rules, F may be the name of a variable, an expression such as bs(x, df=4) that returns a variable or matrix, or a formula subexpression built using the same syntax. # Examples attach(fuel.frame) # or use data=fuel.frame in calls below lm(Fuel ~ Weight) lm(Fuel ~ Weight + Disp.) lm(Fuel ~ Weight - 1) # no intercept lm(Fuel ~ Weight + Type) # Type is a factor; this gives a single slope, # and separate intercepts for each level of Type lm(Fuel ~ Weight * Type) # Separate intercepts and slopes for each level lm(Fuel ~ Weight + Type + Weight:Type) # equivalent to previous lm(Fuel ~ Weight * Type - Weight:Type) # equivalent to Weight + Type lm(Fuel ~ I(Weight + Disp.)) # force use of ordinary arithmetic fit1 <- lm(Fuel ~ ., data=fuel.frame) # . = all variables except Fuel fit2 <- update(fit1, . ~ . - Mileage) # remove Mileage as a variable # Formulas may be saved and used later my.form <- pre.mean ~ spinsp + devtime aov(my.form, wafer) tree(my.form, wafer) # Exceptions # There are some unusual aspects of formulas, which seem to occur # primarily when individual terms are multiplied or raised to powers. # It is probably best to avoid multiplying a term with itself, # to list all desired powers explicitly, or use poly(). A _ 1:5 B _ 5:1 model.matrix(~A:B) # (Intercept) A:B model.matrix(~A*B) # (Intercept) A B A:B model.matrix(~(A+B)^2) # (Intercept) A B A:B # Main effects and interaction, no quadratic terms. model.matrix(~A^2) # (Intercept) I(A^2) # Quadratic term, no interaction -- A^2 is interpreted as I(A^2) model.matrix(~I(A^2)) # (Intercept) I(A^2) # Equivalent to previous. model.matrix(~A*A) # (Intercept) A # Gives only main effect, not quadratic term model.matrix(~A + I(A^2)) # (Intercept) A I(A^2) # linear and quadratic term model.matrix(~poly(A,2)) # (Intercept) poly(A, 2)1 poly(A, 2)2 # linear and quadratic term - orthogonal