tree
object from a specified formula and data.
tree(formula, data=<<see below>>, weights=<<see below>>, subset=<<see below>>, na.action=na.fail, method="recursive.partition", control=<<see below>>, model=NULL, x=F, y=T, ...)
response ~ predictors
.
formula
,
subset
, and
weights
arguments.
If this is missing, the variables should be on the search list.
This can also be a number which indicates the frame in which to look
for the data.
subset
argument is preferred for deleting observations.
By default, an unweighted analysis is performed.
model.frame
after
any
subset
argument has been used.
The default (with
na.fail
) is to create an error
if any missing values are found.
A possible alternative is
na.exclude
, which deletes observations
that contain one or more missing values.
"model.frame"
, the function returns the model frame
used to build the tree.
tree.control
for their names and default values.
These can also be set as arguments to
tree
itself.
(You should not set the
nobs
argument to tree.control,
as that is set by
tree
and is the number of observations
used to build the tree.)
formula
and
data
arguments are ignored,
and
model
is used to define the model.
TRUE
, the
model.matrix
is returned.
TRUE
, the response variable is returned.
tree.control
.
tree
is returned.
See
tree.object
for details.
The model is fitted using
binary recursive partitioning whereby the data are
successively split
along coordinate axes of the predictor variables so that at any node, the
split which maximally distinguishes the response variable
in the left and the right branches is selected.
Splitting continues until nodes are pure or data are too sparse; terminal nodes
are called leaves, while the initial node is called the root.
If the response variable is a factor,
the tree is called a classification tree.
The model used for classification assumes that the response variable
follows a multinomial
distribution.
weights
are not used in the computation of the deviance
in classification trees.
If the response variable is numeric, the tree
is called a regression tree. The model used for regression
assumes that the numeric response variable has a
normal (Gaussian) distribution.
weights
are used if they are
specified.
See
Statistical Models in S
for a more detailed discussion of the difference between regression
and classification trees.
This function allows up to 128 levels for factor response variables.
Factor predictor variables have a limit of 32 levels
because if a factor predictor has k levels then
the 2^(k-1)-1 splits which must be examined impose
severe demands on the system.
The fitted model can be examined by
print
,
summary
, and
plot
.
Its contents can be extracted using
predict
,
residuals
,
deviance
,
and
formula
.
It can be modified using
update
.
Other generic functions that have methods for
tree
objects are
text
,
identify
,
browser
,
and
[
.
Breiman L., Friedman J.H., Olshen R.A., and Stone, C.J., (1984). Classification and Regression Trees. Wadsworth International Group, Belmont CA. Chambers, J.M., and Hastie, T.J. (1991). Statistical Models in S, pg. 414.
# fit regression tree to all variables z.solder <- tree(skips ~ ., data = solder.balance) # fit classification tree to data in kyphosis data frame z.kyphosis <- tree(kyphosis)