t.test(x, y=NULL, alternative="two.sided", mu=0, paired=F, var.equal=F, conf.level=.95, treatment)
NA
s and
Inf
s are
allowed but will be removed.
NA
s and
Inf
s are allowed
but will be removed. If
paired=TRUE
, then
x
and
y
must have the same length, and observation pairs
(x[i], y[i])
with at least one
NA
or
Inf
will be removed.
"greater"
,
"less"
or
"two.sided"
, or just the initial letter of each, indicating the specification
of the alternative hypothesis. For the one-sample and paired t-tests,
alternative
refers to the true mean of the parent population in relation to the hypothesized value
mu
.
For two-sample t-tests,
alternative
refers
to the difference between the true population mean for
x
and that for
y
, in relation to
mu
.
TRUE
,
x
and
y
are considered as paired vectors.
TRUE
,
the variances of the parent populations of
x
and
y
are assumed equal.
Argument
var.equal
should be supplied
only for the two-sample (i.e., unpaired) tests.
x
, with two unique values. This is
a grouping variable used to split
x
into two samples. If supplied
then
y
should not be used.
The t-statistic numerator is mean1-mean2, where mean1 corresponds
to the first unique value in
treatment
.
"htest"
, containing the following components:
names
attribute
"t"
.
statistic
.
Component
parameters
has
names
attribute
"df"
.
conf.level
. When
alternative
is not
"two.sided"
, the confidence interval will be half-infinite, to reflect
the interpretation of a confidence interval as the set of all values
k
for which one
would not reject the null hypothesis that the true mean or difference in means is
k
.
Here infinity will be represented by
NA
.
estimate
has a
names
attribute describing its elements.
mu
. Component
null.value
has a
names
attribute describing its elements.
alternative
:
"greater"
,
"less"
or
"two.sided"
.
x
and
y
.
For the one-sample t-test, the null hypothesis is that the mean of the population from which
x
is drawn is
mu
.
For the paired t-test, the null hypothesis is that the population mean of the difference
x - y
is equal to
mu
. For the two-sample t-tests,
the null hypothesis is that the population mean for
x
minus that for
y
is
mu
.
The alternative hypothesis in each case indicates the direction of divergence of the population mean for
x
(or difference of means for
x
and
y
)
from
mu
(i.e.,
"greater"
,
"less"
,
"two.sided"
).
A t-statistic has a t distribution if the underlying populations are normal,
the variances are equal, and you set
var.equal=TRUE
.
These conditions are never satisfied in practice.
More importantly, the actual distribution
is approximately a t distribution if the sample sizes are reasonably
large, the distributions are not skewed, and you set
var.equal=FALSE
.
You should set
var.equal=TRUE
only if you have good reason to believe the variances are equal.
You should beware of using
var.test
to provide that evidence,
as the F test for comparing variances is not robust against non-normality.
"To make a preliminary test on variances is rather like putting to sea in
a rowing boat to find out whether conditions are sufficiently
calm for a ocean linear to leave port!" (Box page 333).
The effect of skewness cancels out in a two-sample problem with equal sample sizes where the underlying populations have the same variance and skewness. In one-sample problems, or when sample sizes differ, the effect of skewness on the distribution of the t-statistic disappears very slowly as the sample size increases, at the rate O(1/sqrt(n)).
The t-test and the associated confidence interval are quite robust with respect to level toward heavy-tailed non-Gaussian distributions (e.g., data with outliers). However, the t-test is quite non-robust with respect to power, and the confidence interval is quite non-robust with respect to average length, toward these same types of distributions.
(a) One-Sample t-Test.
The arguments
y
,
paired
and
var.equal
determine the type of test. If
y
is
NULL
, a one-sample t-test is
carried out with
x
. Here
statistic
is given by:
t = (mean(x) - mu) / ( sqrt(var(x)) / sqrt(length(x)) )If
x
was drawn from a normal population,
t
has a t-distribution
with
length(x) - 1
degrees of freedom under the null hypothesis.
(b) Paired t-Test.
If
y
is not
NULL
and
paired=TRUE
,
a paired t-test is performed; here
statistic
is defined through
t = (mean(d) - mu) / ( sqrt(var(d)) / sqrt(length(d)) )where
d
is the vector of differences
x - y
. Under the null
hypothesis,
t
follows a t-distribution with
length(d) - 1
degrees of freedom, assuming normality of the differences
d
.
(c) Equal-Variance Two-Sample t-Test.
If
y
is not
NULL
and
paired=FALSE
,
either an ordinary ("Welch modified") or pooled-variance two-sample t-test is performed, depending on whether
var.equal
is
TRUE
or
FALSE
. For the pooled-variance t-test,
statistic
is
t = (mean(x) - mean(y) - mu) / s1,with
s1 = sp * sqrt(1/nx + 1/ny), sp = sqrt( ( (nx-1)*var(x) + (ny-1)*var(y) ) / (nx + ny - 2) ), nx = length(x), ny = length(y).Assuming that
x
and
y
come from normal populations with equal variances,
t
has a t-distribution with
nx + ny - 2
degrees of freedom under the null
hypothesis.
(d) Welch Modified Two-Sample t-Test.
If
y
is not
NULL
,
paired=FALSE
and
var.equal=FALSE
, the Welch modified two-sample t-test is performed. In this case
statistic
is
t = (mean(x) - mean(y) - mu) / s2with
s2 = sqrt( var(x)/nx + var(y)/ny ), nx = length(x), ny = length(y).If
x
and
y
come from normal populations, the distribution of
t
under the null hypothesis can be approximated by a t-distribution with (non-integral)
degrees of freedom
1 / ( (c^2)/(nx-1) + ((1-c)^2)/(ny-1) )where
c = var(x) / (nx * s2^2).
In all cases, if the distributions are not normal but sample sizes are large, then t-distributions hold approximately (under certain regularity conditions). However, large sample sizes are no help if using the pooled-variance test and the variances are not equal.
For each of the above tests, an expression for the related confidence interval (returned component
conf.int
)
can be obtained in the usual way by inverting the expression for the test statistic. Note however that, as
explained under the description of
conf.int
, the confidence interval
will be half-infinite when
alternative
is not
"two.sided"
;
infinity will be represented by
NA
.
Box, G. E. P. (1953), "Non-normality and Tests on Variances," Biometrika, pp. 318-335.
Hogg, R. V. and Craig, A. T. (1970). Introduction to Mathematical Statistics, 3rd ed. Toronto, Canada: Macmillan.
Mood, A. M., Graybill, F. A. and Boes, D. C. (1974). Introduction to the Theory of Statistics, 3rd ed. New York: McGraw-Hill.
Snedecor, G. W. and Cochran, W. G. (1980). Statistical Methods, 7th ed. Ames, Iowa: Iowa State University Press.
x <- rnorm(12) t.test(x) # Two-sided one-sample t-test. The null hypothesis is # that the population mean for 'x' is zero. The # alternative hypothesis states that it is either greater # or less than zero. A confidence interval for the # population mean will be computed. data.before <- c(31, 20, 18, 17, 9, 8, 10, 7) data.after <- c(18, 17, 14, 11, 10, 7, 5, 6) t.test(data.after, data.before, alternative="less", paired=T) # One-sided paired t-test. The null hypothesis is that # the population mean "before" and the one "after" are # the same, or equivalently that the mean change ("after" # minus "before") is zero. The alternative hypothesis is # that the mean "after" is less than the one "before", # or equivalently that the mean change is negative. A # confidence interval for the mean change will be # computed. x <- c(7.8, 6.6, 6.5, 7.4, 7.3, 7., 6.4, 7.1, 6.7, 7.6, 6.8) y <- c(4.5, 5.4, 6.1, 6.1, 5.4, 5., 4.1, 5.5) t.test(x, y, conf.level=0.90) # Two-sided two-sample t-test. The null # hypothesis is that the population means for 'x' and 'y' # are the same. The alternative hypothesis is that they # are not. The confidence interval for the difference in # true means ('x' minus 'y') will have a confidence level # of 0.90. t.test(x, y, mu=2) # Two-sided pooled-variance two-sample t-test. # This assumes that the two populations variances are equal. # The null hypothesis is that the population mean for 'x' # minus that for 'y' is 2. # The alternative hypothesis is that this difference # is not 2. A confidence interval for the true difference # will be computed.