residuals(fit,type="score")
function implemented, such as
lrm
,
cph
,
coxph
, and ordinary linear models (
ols
).
The fit must have specified the
x=TRUE
and
y=TRUE
options for certain models.
Observations in different clusters are assumed to be independent.
For the special case where every cluster contains one observation, the
corrected covariance matrix returned is the "sandwich" estimator
(see Lin and Wei). This is a consistent estimate of the covariance matrix
even if the model is misspecified (e.g. heteroscedasticity, underdispersion,
wrong covariate form).
For the special case of ols fits,
robcov
can compute the improved
(especially for small samples) Efron estimator that adjusts for
natural heterogeneity of residuals (see Long and Ervin (2000)
estimator HC3).
robcov(fit, cluster, method=c('huber','efron'))
Design()
series
cluster
may be any type of vector
(factor, character, integer). NAs are not allowed.
Unique values of
cluster
indicate
possibly correlated groupings of observations. Note the data used in
the fit and stored in
fit$x
and
fit$y
may have had observations
containing missing values deleted. It is assumed that if any NAs were
removed during the original model fitting, an
naresid
function
exists to restore NAs so that the rows of the score matrix coincide
with
cluster
.
If
cluster
is omitted,
it defaults to the integers 1,2,...,n to obtain the "sandwich" robust
covariance matrix estimate.
"efron"
for ols fits (only). Default is Huber-White
estimator of the covariance matrix.
orig.var
added.
orig.var
is
the covariance matrix of the original fit. Also, the original
var
component is replaced with the new Huberized estimates.
Frank Harrell
Department of Biostatistics
Vanderbilt University
f.harrell@vanderbilt.edu
Huber, PJ. Proc Fifth Berkeley Symposium Math Stat 1:221–33, 1967.
White, H. Econometrica 50:1–25, 1982.
Lin, DY, Wei, LJ. JASA 84:1074–8, 1989.
Rogers, W. Stata Technical Bulletin STB-8, p. 15–17, 1992.
Rogers, W. Stata Release 3 Manual,
deff
,
loneway
,
huber
,
hreg
,
hlogit
functions.
Long, JS, Ervin, LH. The American Statistician 54:217–224, 2000.
# A dataset contains a variable number of observations per subject, # and all observations are laid out in separate rows. The responses # represent whether or not a given segment of the coronary arteries # is occluded. Segments of arteries may not operate independently # in the same patient. We assume a "working independence model" to # get estimates of the coefficients, i.e., that estimates assuming # independence are reasonably efficient. The job is then to get # unbiased estimates of variances and covariances of these estimates. set.seed(1) n.subjects <- 30 ages <- rnorm(n.subjects, 50, 15) sexes <- factor(sample(c('female','male'), n.subjects, TRUE)) logit <- (ages-50)/5 prob <- plogis(logit) # true prob not related to sex id <- sample(1:n.subjects, 300, TRUE) # subjects sampled multiple times table(table(id)) # frequencies of number of obs/subject age <- ages[id] sex <- sexes[id] # In truth, observations within subject are independent: y <- ifelse(runif(300) <= prob[id], 1, 0) f <- lrm(y ~ lsp(age,50)*sex, x=TRUE, y=TRUE) g <- robcov(f, id) diag(g$var)/diag(f$var) # add ,group=w to re-sample from within each level of w anova(g) # cluster-adjusted Wald statistics # fastbw(g) # cluster-adjusted backward elimination plot(g, age=30:70, sex='female') # cluster-adjusted confidence bands # Get design effects based on inflation of the variances when compared # with bootstrap estimates which ignore clustering g2 <- robcov(f) diag(g$var)/diag(g2$var) # Get design effects based on pooled tests of factors in model anova(g2)[,1] / anova(g)[,1] # A dataset contains one observation per subject, but there may be # heteroscedasticity or other model misspecification. Obtain # the robust sandwich estimator of the covariance matrix. # f <- ols(y ~ pol(age,3), x=TRUE, y=TRUE) # f.adj <- robcov(f)