linearApproxReg(replicates, indices, n=max(indices), model.mat, formula, data, group, subject, weights, transform=T, df=3, details=T, ...)
B
rows (the number of bootstrap samples) and
one or more columns (for univariate or multivariate statistics).
B
columns,
and normally with
n
rows (the number of observations or subjects
in the original data).
max(indices)
, but it is better to supply it.
~
operator, and the terms, separated by
+
operators, on the
right. Used to create the model matrix; if supplied then
model.mat
is ignored.
group
vector, if the original resampling was by group.
subject
vector, if the original resampling was by subject.
B
, importance sampling weights.
TRUE
(the default) then after an initial regression
transform the response variable (the replicates) to obtain a more
linear relationship with the predicted values, and perform a second
regression.
TRUE
(the default) then attach the multiple correlation
of the (transformed)
replicates
and the predicted values as an
attribute when returning the linear approximation values.
In the univariate case a vector
L
such that
replicates[i] ~= c + mean(L[indices[,i]])
where
c
is the statistic value for the observed data.
In the multivariate case this relationship holds for each column.
There are
n
rows, where
n
is the original number of observations
or subjects; and
p
columns, where the
statistic is
p
-valued. In the subject case the
rows names of the result are the unique values of the
subject
argument taken from the call to
or
.
The results are normalized to sum to zero
(by group, if sampling by
group
; see below).
If
details==TRUE
the result has a
"correlation"
attribute giving the multiple correlation between
(transformed) bootstrap replicates and the linear approximation.
This function is normally called by
resampGetL.bootstrap
, but may
also be called directly.
The
model.mat
matrix should have one row for each observation
(or for each subject).
An initial column of 1's is optional (it is added if not present).
It should contain columns which together
have a high "multiple correlation" with the statistic of interest.
For example, if the statistic is
var(x)
, then
cbind(x, x^2)
or
cbind(x, (x-mean(x))^2)
would be suitable.
Here "multiple correlation" is between
the original bootstrapped statistic (
replicates
)
and the (multivariate) bootstrapped sample means of the model matrix,
using the same bootstrap indices.
In other words, you can view each column of the model matrix as a set
of data whose sample mean is bootstrapped; these sample means
should have high multiple correlation with the actual statistics
in order for the resulting linear approximations to be accurate.
The
indices
argument normally has
n
rows.
However, it may have more or less, when bootstrap sampling with size
not equal to the original sample size.
Or, in permutation testing for two-sample problems, this may be
the indices corresponding to just one of the samples.
Davison, A.C. and Hinkley, D.V. (1997), Bootstrap Methods and Their Application, Cambridge University Press.
Efron, B. and Tibshirani, R.J. (1993), An Introduction to the Bootstrap, San Francisco: Chapman & Hall.
Hesterberg, T.C. (1995), "Tail-Specific Linear Approximations for Efficient Bootstrap Simulations," Journal of Computational and Graphical Statistics, 4, 113-133.
Hesterberg, T.C. and Ellis, S.J. (1999), "Linear Approximations for Functional Statistics in Large-Sample Applications," Technical Report No. 86, http://www.insightful.com/Hesterberg
### Example: correlation for bivariate data set.seed(1); x <- rmvnorm(100, d=2, rho=.5) bfit2 <- bootstrap(x, cor(x[,1], x[,2]), save.indices=T) L1 <- resampGetL(bfit2) # "ace" method L2 <- resampGetL(bfit2, model.mat = cbind(x, x^2, x[,1]*x[,2])) L3 <- linearApproxReg(bfit2$replicates, bfit2$indices) L4 <- linearApproxReg(bfit2$replicates, bfit2$indices, model.mat = cbind(x, x^2, x[,1]*x[,2])) # L1 and L3 are identical; L2 and L4 are identical.