multicomp.lm
and
multicomp.default
.
multicomp(x, ...) multicomp.lm(x, focus=NULL, adjust=NULL, lmat=NULL, comparisons="mca", alpha=0.05, bounds="both", error.type="fwe", method="best.fast", crit.point=NULL, Srank=NULL, control=NULL, simsize=NULL, plot=F, labels=NULL, valid.check=T, est.check=T) multicomp.default(x, vmat, lmat=diag(length(x)), comparisons="mca", df.residual=Inf, alpha=0.05, bounds="both", error.type="fwe", method="best.fast", crit.point=NULL, Srank=NULL, control=NULL, simsize=NULL, plot=F, labels=NULL, valid.check=T, ylabel=NULL)
multicomp.lm
, an object inheriting from class
"lm"
. Typically, this is the result of a previous call to
lm
or
aov
. For
multicomp.default
,
x
is a numeric vector of estimates.
x
. This is only required for
multicomp.default
, when
x
is a numeric vector.
focus
,
adjust
,
lmat
and/or
comparisons
arguments together specify the linear combinations. If
focus
,
adjust
, and
lmat
are all NULL, the first factor in the model (if any) is used as the
focus
factor. This argument applies only to
multicomp.lm
, when
x
is a model object.
focus
factor), and specified adjustment values for them.
If
adjust=NULL
, the adjustment values are the average over the levels of every (non-
focus
) factor, and the grand average values of numeric covariates.
Several combinations of values may be specified for each factor and covariate; adjusted means for the
focus
factor are computed for every combination specified in the
adjust
list.
This argument applies only to
multicomp.lm
, when
x
is a model object.
lmat
specifies a linear combination to be estimated under the "textbook parametrization" of the linear model (see below for details). Factor levels are ordered alphabetically within factor variables, so the linear combinations should be defined accordingly. Specifying
lmat
directly overrides the
focus
and
adjust
arguments.
"mca"
for all pairwise differences;
"mcc"
for pairwise differences between a single adjusted mean of the
focus
factor or
lmat
column and the remaining adjusted means (see the
control
argument below);
and,
"none"
if the adjusted means or
lmat
columns themselves are of interest without further differencing.
Any
comparisons
value besides the three keywords has the same effect as
"none"
.
If several adjustment combinations are specified in the
adjust
list, the differences given by
comparisons
are applied within each of the combinations.
multicomp.default
.
1-alpha
is the desired joint confidence level if
error.type="fwe"
, and it is the comparison-wise confidence level if
error.type="cwe"
. The default is
alpha=0.05
.
"both"
for two-sided intervals;
"lower"
for intervals with infinite upper bounds but sharper lower bounds than those obtained with
bounds="both"
;
and,
"upper"
for intervals with infinite lower bounds but sharper upper bounds than those obtained with
bounds="both"
.
Mixed bounds can be achieved by specifying, for example,
bounds="upper"
and then providing the negative versions of combinations for which lower bounds are desired in
lmat
.
"fwe"
specifies family-wise error rate protection, so the probability that all bounds hold is at least
1-alpha
. The
"cwe"
option specifies comparison-wise error rate protection, so the probability that any one preselected bound holds is
1-alpha
.
"best.fast"
. Available methods at this time are:
"lsd"
: Fishers unprotected lsd method. The critical point for two-sided intervals is the
upper-alpha/2
Student's t-value,
qt(1-alpha/2, df.residual)
. For confidence bounds, the critical point is the
upper-alpha
point. This is the only method available if
error.type="cwe"
, and it is not available if
error.type="fwe"
.
"tukey"
: Tukey's method. If the linear combinations specify all pairwise differences between several quantities, the critical point is the Tukey studentized-range quantile scaled by
sqrt(2)
. When more than three quantities are to be compared, validity of the Tukey method is checked using the Hayter (1989) sufficient condition, unless the user specifies
valid.check=F
.
"dunnett"
: Dunnett's method. If
comparisons="mcc"
, the critical point for comparisons-with-control intervals or bounds is computed, if valid (see Dunnett (1964)). The validity condition requires the covariance matrix of the treatment-control differences to be equivalent to that of a one-factor model (allowing unequal sample sizes). The user may override the validity check by specifying
valid.check=F
.
"sidak"
: Sidak's method.
If
bounds="both"
, the critical point is the
upper-alpha
quantile from the maximum absolute value of
k
"uncorrelated" multivariate t random variables (see Sidak (1967)).
If
bounds="upper"
or
bounds="lower"
, and the estimators specified by
lmat
and/or
comparisons
are uncorrelated (or
valid.check=F
), then the critical point is the corresponding quantile of the maximum without taking absolute values.
"bon"
: Bonferroni's method. If a total of
m
bounds are to be computed (counting each interval as 2 bounds), the critical point is
qt(1-alpha/m, df.residual)
.
"scheffe"
: Scheffe's method. If the rank of the covariance matrix of the estimators for the linear combinations is
Srank
, the critical point is
sqrt(Srank*qf(1-alpha, Srank, df.residual))
.
"sim"
: Simulation method. An approximate critical point is generated using the simulation-based method described in Edwards and Berry (1987). This may take a few seconds or more of computer time, depending on the size of the problem. With the default simulation size, the critical point generated gives a family-wise error rate within 10% of the specified
alpha
, with 99% confidence. See the
simsize
argument below.
"best.fast"
: The default method. The smallest valid critical point among the valid "fast" methods (i.e., those excluding
method="sim"
) is computed.
"best"
: the smallest critical point among all the valid methods above is computed.
method
argument suits the user, a value for the critical point may be specified directly with
crit.point
. In this case, ensuring validity of the critical point is the user's responsibility. When
crit.point
is defined,
alpha
and
error.type
are merely labels in the output object, and may have no meaning.
lmat
that should be treated as the control when
comparisons="mcc"
. The default is the last column of
lmat
.
"sim"
method of computing critical points. The default choice provides intervals or bounds that have family-wise error rates within 10% of the nominal
alpha
, with probability 0.99. This amounts to simulation sizes in the tens of thousands for most cases. Smaller simulation sizes are not recommended; see Edwards and Berry (1987).
TRUE
, a plot of the calculated intervals is displayed on the current graphics device. Alternatively, the output object can be used as an argument to the generic function
plot
.
multicomp
attempts to generate sensible labels.
TRUE
, validity of the specified critical point calculation method is checked. If the validity condition fails,
multicomp
terminates with an error message.
TRUE
, estimability of the desired linear combinations is checked. If the condition fails,
multicomp
terminates with an error message. Note that in certain cases, too much rounding in the
lmat
entries can cause the estimability condition to fail. This argument applies only to
multicomp.lm
.
multicomp.default
.
"multicomp"
with the components:
labels
argument.
alpha
.
error.type
.
method="sim"
.
plot=T
, a plot is displayed on the current graphics device.
The textbook parameterization used by
multicomp.lm
is designed to facilitate the specification of meaningful linear combinations of the model parameters.
For example, if
Trt
is a factor with 3 levels, the parameters for the one-way anova model
Y ~ Trt
are
(mu, Trt1, Trt2, Trt3)
.
For a 3x2 factorial design of
A
and
B
, the model
Y ~ A*B
has the parameters
(mu, A1, A2, A3, B1, B2, A1B1, A1B2, A2B1, A2B2, A3B1, A3B2)
.
Note that these are overspecifications, and not every linear combination of these parameters is estimable.
Warning: the
lm
function orders the levels of a factor alphabetically according to the character strings used to identify levels. This is therefore the order of the factor's effects in the textbook parameterization as well.
If you are in doubt about the textbook parameterization for your model, invoke
multicomp.lm
with your
lm
object and inspect the row labels of the
lmat
element in the output list.
When specified,
focus
and
adjust
attempt to generate an
lmat
consisting of coefficients of adjusted means for the levels of the
focus
factor, at each combination in the
adjust
list.
Once
lmat
is specified, either directly through the
lmat
argument or indirectly through
focus
and
adjust
, it may be further modified by
comparisons
.
If you are in doubt about the
lmat
matrix in your call to
multicomp
, carefully examine the
lmat
element of the output list.
After
lmat
is specified, linear combinations are checked for estimability by verifying that each is (up to a tolerance) a linear combination of the rows of the design matrix that corresponds to the textbook parameterization.
After checking estimability,
multicomp.lm
generates estimates of the linear combinations of interest, along with their covariance matrix.
The critical point and the intervals or bounds are then computed by a call to
multicomp.default
.
The computed intervals are of the form
t(lmat)%*%x +/- crit.point*sqrt(t(lmat) %*% vmat %*% lmat)
.
The Tukey, Dunnett, and one-sided Sidak methods for critical point computations have not been shown to be valid for all choices of the covariance matrix of the estimators,
t(lmat) %*% vmat %*% lmat
.
If this matrix is larger than 3x3, the Hayter (1989) sufficient condition is used to check the validity of Tukey's critical point.
For Dunnett's method, the matrix is checked to see if it is of a form resulting from all-to-one comparisons of uncorrelated estimators.
Sidak's method is always valid for two-sided intervals, and it is exact when estimators are uncorrelated; for one-sided bounds, this condition validates the Sidak method.
These conditions are sufficient for the validity of these methods, but in most cases they are not necessary.
The expert user is therefore encouraged to use
valid.check=F
or a
crit.point
definition (at their own risk) to override the built-in safety measures.
Dunnett, C.W. (1964). New tables for multiple comparisons with a control. Biometrics, 20: 482-491.
Edwards, D. and Berry, J.J. (1987). The efficiency of simulation-based multiple comparisons. Biometrics, 43: 913-928.
Hayter, A.J. (1989). Pairwise comparisons of generally correlated means. Journal American Statistical Association, 84: 208-213.
Hochberg, Y. and Tamhane, A.C. (1987). Multiple Comparison Procedures. New York: Wiley.
Hsu, J.C. (1996). Multiple Comparisons: Theory and Methods. London: Chapman and Hall.
Sidak, Z. (1967). Rectangular confidence regions for the means of multivariate
normal distributions. Journal American Statistical Association, 62: 626-633.
# all-pairwise comparisons in a one-way anova via the Tukey-Kramer method. lm.fuel <- lm(Fuel~Type, data=fuel.frame) mc.fuel <- multicomp(lm.fuel) print(mc.fuel) plot(mc.fuel) # 90% simultaneous upper bounds for comparisons of all other models # with the Van by Dunnett's method lm.fuel <- lm(Fuel~Type, data=fuel.frame) multicomp(lm.fuel, comparisons="mcc", method="dunnett", bounds="upper", plot=T, alpha=.10) # 95% simultaneous lower bounds for comparisons of all other models # with Compact cars. The focus factor is specified directly. lm.fuel <- lm(Fuel~Type, data=fuel.frame) multicomp(lm.fuel, focus="Type", comparison="mcc", bounds="lower", control=1, plot=T) # use lmat to directly specify mcc intervals with the Van; # method=best.fast automatically selects Dunnett. lm.fuel <- lm(Fuel~Type, data=fuel.frame) lmat.mcc <- rbind(rep(0,5), contr.sum(6)) mcclabels <- c("Compact-Van", "Large-Van", "Medium-Van", "Small-Van", "Sporty-Van") multicomp(lm.fuel, comparisons="none", lmat=lmat.mcc, labels=mcclabels) # all-pairwise comparisons of adjusted means in an # Analysis of Covariance. # method="best" chooses simulation-based. lmancova.fuel <- lm(Fuel~Type+Weight, data=fuel.frame) multicomp(lmancova.fuel, method="best") # non-simultaneous intervals for adjusted mean fuel mileage, # adjusting to both Weight=2500 and Weight=3500 lmancova.fuel <- lm(Fuel~Type+Weight, data=fuel.frame) multicomp(lmancova.fuel, adjust=list(Weight=c(2500, 3500)), comparisons="none", method="lsd", error.type="cwe") # all-pairwise comparisons using summary statistics. bvec <- c(4.167655, 4.967794, 4.601413, 3.27338, 3.957606, 5.313283) names(bvec) <- c("Compact", "Large", "Medium", "Small", "Sporty", "Van") vmat <- (.422^2)*diag(1/c(15, 3, 13, 13, 9, 7)) mcout <- multicomp(bvec, vmat, df.residual=54, ylabel="Fuel") print(mcout) plot(mcout)