influence(data, statistic, args.stat, group, subject, label, statisticNames, assign.frame1 = F, weights, epsilon = 0.001, unbiased = F, returnL = F, save.group, save.subject, subjectDivide = F, modifiedStatistic)
weights
;
other arguments may be passed using
args.stat
.
mean(x,trim=.2)
.
If
data
is given by name (e.g.
data=x
) then use that name
in the expression,
otherwise (e.g.
data=air[,4]
) use the name
data
in the expression.
If
data
is a data frame, the expression may involve variables
in the data frame.
statistic
when calculating
the statistic.
data
, for
stratified sampling or multiple-sample problems.
Sampling is done separately for each group
(determined by unique values of this vector).
If
data
is a data frame, this may be a variable in the data frame,
or expression involving such variables.
data
;
if present then subjects
(determined by unique values of this vector) are resampled rather than
individual observations.
If
data
is a data frame, this may be a variable in the data frame,
or an expression involving such variables.
If
group
is also present,
subject
must be nested within
group
(each subject must be in only one group).
assign.frame1=T
if all estimates are identical (this is slower).
subject
. Otherwise the weights are taken to be
ordered with respect to the sorted values of
subject
.
If
data
is a data frame, this may be a variable in the data frame,
or an expression involving such variables. The default implies equal
weights.
TRUE
then standard error estimates are computed
using a divisor of
(n-1)
instead of
n
; then squared standard error
estimates are more nearly unbiased.
TRUE
then only the
L
matrix is returned, rather
than the list described below.
TRUE
then
group
and
subject
vectors, respectively,
are saved in the returned object. Both defaults are
TRUE
if
n<=10000
.
TRUE
then the weight for each subject is divided among observations
for that subject before calculating the statistic;
if
FALSE
the subject weight is replicated to observations for that subject.
Also, if
TRUE
and
weights
contains observation weights,
then initial subject weights will be the sums of weights for the
observations.
statistic
is an expression that calls a function with a "hidden"
weights
argument, then pass this to indicate how to call your function.
See below.
c("influence", "resamp")
, with
components
call
,
observed
,
replicates
estimate
,
B
,
n
,
dim.obs
,
L
,
epsilon
,
defaultLabel
,
and perhaps (depending on whether sampling by group, subject, etc.)
label
,
groupSizes
,
group
,
subject
,
modifiedStatistic
,
replicates2
,
and
epsilon2
.
see
for components not described below:
statistic
evaluated at distance
epsilon
in each direction from
weights
. If sampling by subject, the rows
are named with the unique values of
subject
.
subject
. Includes attributes
"method"
(which is set to
"influence"
) and
"epsilon"
.
replicates
, and estimated bias and standard error.
In addition, if
weights
is missing, columns containing
estimates of acceleration,
z0
, and
cq
used by other bootstrap
procedures.
The empirical influence values measure the effect on
statistic
of
perturbing the empirical (weighted) distribution represented by
data
. The ith influence value is essentially the derivative
in the "direction" of the i'th observation (or subject, if
sampling by subject). The derivatives are approximated with finite
difference quotients by reweighting the original distribution.
The name
"Splus.resamp.weights"
is reserved for internal use by
influence
. To avoid naming conflicts, that name can not be used as
a variable name in
data
, if
data
is a data frame.
When statistic is an expression, for example
mean(x)
, a modified expression
mean(x, weights = Splus.resamp.weights)
is created.
Only calls to functions that have an argument named
weights
are modified; e.g.
sum(x)/length(x)
would fail because
sum
does not have a
weights
argument.
If your expression calls a function with a "hidden" weights argument,
e.g. you may pass weights as part of the
...
list, then
use the
modifiedStatistic
argument to specify that, e.g.
modifiedStatistic = myFun(x, weights = Splus.resamp.weights)
.
An expression such as
mean(y[a==1])
is converted to
mean(y[a==1], weights = Splus.resamp.weights)
which will fail because the weights vector was not subscripted along
with
y
.
In cases such as these pass a function that performs the desired
calculations, or use
modifiedStatistic = mean(y[a==1], weights = Splus.resamp.weights[a==1])
For statistics which are not smooth functions of weights,
derivatives calculated using small values of
epsilon
will be unstable.
Consider a larger value of for such statistics, e.g.
epsilon=1/sqrt(n)
(the "butcher knife").
Davison, A.C. and Hinkley, D.V. (1997), Bootstrap Methods and Their Application, Cambridge University Press.
Efron, B. (1982), The Jackknife, the Bootstrap and Other Resampling Plans, Society for Industrial and Applied Mathematics, Philadelphia.
Efron, B. and Tibshirani, R.J. (1993), An Introduction to the Bootstrap, San Francisco: Chapman & Hall.
Hesterberg, T.C. (1995), "Tail-Specific Linear Approximations for Efficient Bootstrap Simulations," Journal of Computational and Graphical Statistics, 4, 113-133.
influence
can fail if
statistic
calls a modeling function like
lm
. See
for details.
More details on many arguments, see .
Print, summarize, plot: , , , .
Description of the object, extract parts: , , .
Confidence intervals: , .
Modify an "influence" object: .
For an annotated list of functions in the package, including other high-level resampling functions, see: .
# Influence in robust estimation set.seed(1); x <- rcauchy(40) influence.obj <- influence(x, location.m) plot(x, influence.obj$L) # outliers have less influence # influence function is useful for linear approximations obj <- bootstrap(x, location.m, B=200, save.indices=T) plot(indexMeans(influence.obj$L, obj$indices), obj$replicates) # Use extra quantities for BCa interval limits.bca(obj, acceleration = influence.obj$estimate$accel, z0 = influence.obj$estimate$z0) # Sampling by subject (type of auto) influence(fuel.frame, mean(Fuel), subject = Type)$L