arbor
model.
arbor(formula, data, weights, subset, na.action=na.arbor, method, cppFunctions, model=F, x=F, y=T, parms, control=arbor.control(), cost, nRandomSplitVars=0, ...)
lm
function.
y
is missing,
but keeps those in which one or more predictors are missing.
"anova"
,
"poisson"
,
"class"
,
"exp"
,
or a list which implies a user specified method written in S-PLUS.
If
method
is missing
hen the routine tries to make an intelligent guess.
However, the code cannot distinguish between the two column response input
for Poisson and longitudinal data.
For multi-column input the method must be specified.
If
y
is a survival object,
then
method="exp"
is assumed,
if
y
is a factor
then
method="class"
is assumed,
otherwise
method="anova"
is assumed.
It is wisest to specify the method directly,
especially as more criteria are added to the function.
See manual for details on user specified method.
The
"longitudinal"
method is not implemented
in this version of arbor.
split
,
eval
and
error
functions to be used in the partitioning algorithm.
The
split
function is used to decide
the best split for a node.
error
is used to compute the error at an individual observation.
eval
is used to compute
the prediction value at a node and the node error.
The default, which depends on
method
,
is used for any function
not specified.
model
is a model frame (likely from an earlier
call to the
arbor
function), then this frame is used rather than constructing
new data.
x
matrix in the result.
Anova and longitudinal methods have no parameters.
For Poisson splitting,
the list components can include the coefficient of variation
of the prior distribution on the rates
(component
shrink
),
and an error method (component
method
).
method
can be either
"deviance"
or
"sqrt"
.
method
defaults to
"deviance"
.
shrink
can be any positive numeric value.
The default for
shrink
is 1 when
method
=
"deviance"
and 0 when
method
=
"sqrt"
.
Exponential splitting uses the same parameter options as Poisson.
For classification splitting, the list can contain any of:
the vector of prior probabilities
(component
prior
),
the loss matrix (component
loss
)
or the splitting index (component
split
).
The priors must be positive and sum to 1.
The loss matrix must have zeros on the diagonal
and positive off-diagonal elements.
The splitting index can be
"gini"
or
"information"
.
The default priors are proportional to the data counts,
the losses default to 1, and the split defaults to "gini".
arbor
algorithm.
forest()
.
arbor.control
may also be specified
in the call to
arbor
.
arbor
, a superset of class
tree
.
Atkinson and Therneau (1997). An Introduction to Recursive Partitioning Using the RPART Routines. Technical Report.
Breiman, L. (2001). Statistical Modeling: The Two Cultures. Statistical Science, Vol. 16, No. 3, 199-231.
Breiman, L. (2001). Random Forests. University of California Statistics Dept. Tech. Report.
Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and Regression Trees. Monterey: Wadsworth and Brooks/Cole.
Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning New York: Springer.
fit <- arbor(Kyphosis ~ Age + Number + Start, data=kyphosis) fit2 <- arbor(Kyphosis ~ Age + Number + Start, data=kyphosis, parms=list(prior=c(.65, .35), split='information')) fit3 <- arbor(Kyphosis ~ Age + Number + Start, data=kyphosis, control=arbor.control(cp=.05)) par(mfrow=c(1,2)) plot(fit) text(fit, use.n=T) plot(fit2) text(fit2, use.n=T) # return the model frame and use it in a new fit fit4 <- arbor(cbind(time,status) ~ inst + age + sex + ph.ecog + ph.karno + pat.karno + meal.cal + wt.loss, method="poisson", data=lung, model=T) fit5 <- arbor(model=fit4$model, method=fit4$method, cp=.001, xval=0)