formula.
This class of objects represents the structural models in all model-fitting
functions, and is used also in a number of other functions, particularly
for plots.
These are created by a call to the
~ operator.
This is typically done inside a call to a model fitting function, but
need not be.
Generic functions that have methods specific to
"formula" include:
alias,
deriv,
dput,
formula,
pairs,
plot,
print,
update
.
A formula is a call (i.e., of mode
"call") to the
~ operator.
Thus, considered as a list it has (generally) three components.
The first component is the name
~, the second component is the
response, and the third component is the explanatory variables.
It is possible to build a formula without a response, in which case it must be processed before being used in the typical fashion.
Formulas are their own value; that is, they represent an expression
calling the operator
~, but evaluating this expression just returns
the expression itself.
The purpose of formula objects is to supply the essential information to
fit models, produce plots, etc., in a readable form that can be passed
around, stored in other objects, and manipulated to determine the
terms and response of a model.
Names in the formula will eventually be interpreted as objects,
often as variables in a data frame.
This interpretation, however, only takes place when the related subexpressions
have been removed from the formula object.
Operators in formulas have the same precedence as anywhere else in S-PLUS.
Operators that are especially significant in formulas are:
+
,
-,
*,
/,
:,
^ and
%in%.
The functions
and
are also used, as is
. (period).
##### General rules for formulas
T ~ F T is modeled as F, where F may include other terms
Fa + Fb Include both Fa and Fb in the model
Fa - Fb Include all of Fa except what is in Fb in the model
Fa : Fb The interaction between Fa and Fb
Fa * Fb Fa + Fb + Fa : Fb
Fb %in% Fa Fb is nested within Fa
Fa / Fb Fa + Fb %in% Fa
F^m All terms in F crossed to order m
. in update(), the previous set,
in other formulas, all variables (except those on the
other side of ~)
In these rules, F may be the name of a variable,
an expression such as bs(x, df=4) that returns a variable or matrix,
or a formula subexpression built using the same syntax.
# Examples
attach(fuel.frame) # or use data=fuel.frame in calls below
lm(Fuel ~ Weight)
lm(Fuel ~ Weight + Disp.)
lm(Fuel ~ Weight - 1) # no intercept
lm(Fuel ~ Weight + Type) # Type is a factor; this gives a single slope,
# and separate intercepts for each level of Type
lm(Fuel ~ Weight * Type) # Separate intercepts and slopes for each level
lm(Fuel ~ Weight + Type + Weight:Type) # equivalent to previous
lm(Fuel ~ Weight * Type - Weight:Type) # equivalent to Weight + Type
lm(Fuel ~ I(Weight + Disp.)) # force use of ordinary arithmetic
fit1 <- lm(Fuel ~ ., data=fuel.frame) # . = all variables except Fuel
fit2 <- update(fit1, . ~ . - Mileage) # remove Mileage as a variable
# Formulas may be saved and used later
my.form <- pre.mean ~ spinsp + devtime
aov(my.form, wafer)
tree(my.form, wafer)
# Exceptions
# There are some unusual aspects of formulas, which seem to occur
# primarily when individual terms are multiplied or raised to powers.
# It is probably best to avoid multiplying a term with itself,
# to list all desired powers explicitly, or use poly().
A _ 1:5
B _ 5:1
model.matrix(~A:B) # (Intercept) A:B
model.matrix(~A*B) # (Intercept) A B A:B
model.matrix(~(A+B)^2) # (Intercept) A B A:B
# Main effects and interaction, no quadratic terms.
model.matrix(~A^2) # (Intercept) I(A^2)
# Quadratic term, no interaction -- A^2 is interpreted as I(A^2)
model.matrix(~I(A^2)) # (Intercept) I(A^2)
# Equivalent to previous.
model.matrix(~A*A) # (Intercept) A
# Gives only main effect, not quadratic term
model.matrix(~A + I(A^2)) # (Intercept) A I(A^2)
# linear and quadratic term
model.matrix(~poly(A,2)) # (Intercept) poly(A, 2)1 poly(A, 2)2
# linear and quadratic term - orthogonal