prune.tree(tree) | R Documentation |
Determines a nested sequence of subtrees of the supplied tree by recursively "snipping" off the least important splits.
prune.tree(tree, k = NULL, best = NULL, newdata, nwts, method = c("deviance", "misclass")[1], loss, eps = 1e-3) prune.misclass(tree, k = NULL, best = NULL, newdata, nwts, loss, eps = 1e-3)
tree |
fitted model object of class tree . This is assumed to be the result
of some function that produces an object with the same named
components as that returned by the tree() function.
|
k |
cost-complexity parameter defining either a specific subtree of tree
(k a scalar) or the (optional) sequence of subtrees minimizing the
cost-complexity measure (k a vector). If missing, k is determined
algorithmically.
|
best |
integer requesting the size (i.e. number of terminal nodes) of a
specific subtree in the cost-complexity sequence to be returned. This
is an alternative way to select a subtree than by supplying a scalar
cost-complexity parameter k . If there is no tree in the sequence of
the requested size, the next largest is returned.
|
newdata |
data frame upon which the sequence of cost-complexity subtrees is evaluated. If missing, the data used to grow the tree are used. |
nwts |
weights for the newdata cases.
|
method |
character string denoting the measure of node heterogeneity used to
guide cost-complexity pruning. For regression trees, only the
default, deviance , is accepted. For classification trees, the
default is deviance and the alternative is misclass
(number of misclassifications or total loss).
|
loss |
a matrix giving for each true class (row) the numeric loss of predicting the class (column). The classes should be in the order of the levels of the response. It is conventional for a loss matrix to have a zero diagonal. |
eps |
a lower bound for the probabilities, used to compute deviances if
events of predicted probability zero occur in newdata .
|
Determines a nested sequence of subtrees of the supplied tree by
recursively "snipping" off the least important splits, based upon
the cost-complexity measure. prune.misclass
is an abbreviation for
prune.tree(method = "misclass")
for use with cv.tree
.
If k
is supplied, the optimal subtree is returned.
k
is supplied and is a scalar, a tree
object is returned that
minimizes the cost-complexity measure for that k
. If best
is
supplied, a tree
object of size best
is returned. Otherwise, an
object of class tree.sequence
is returned. The object contains the
following components:
size |
number of terminal nodes in each tree in the cost-complexity pruning sequence. |
deviance |
total deviance of each tree in the cost-complexity pruning sequence. |
k |
the value of the cost-complexity pruning parameter of each tree in the sequence. |
library(MASS) data(fgl) fgl.tr <- tree(type ~ ., fgl) plot(print(fgl.tr)) fgl.cv <- cv.tree(fgl.tr,, prune.tree) for(i in 2:5) fgl.cv$dev <- fgl.cv$dev + cv.tree(fgl.tr,, prune.tree)$dev fgl.cv$dev <- fgl.cv$dev/5 plot(fgl.cv)