Student's t-Tests

DESCRIPTION:

Performs a one-sample, two-sample, or paired t-test.

USAGE:

t.test(x, y=NULL, alternative="two.sided", mu=0, paired=F, 
        var.equal=F, conf.level=.95, treatment)
 

REQUIRED ARGUMENTS:

x
numeric vector. NAs and Infs are allowed but will be removed.

OPTIONAL ARGUMENTS:

y
numeric vector. NAs and Infs are allowed but will be removed. If paired=TRUE, then x and y must have the same length, and observation pairs (x[i], y[i]) with at least one NA or Inf will be removed.
alternative
character string, one of "greater", "less" or "two.sided", or just the initial letter of each, indicating the specification of the alternative hypothesis. For the one-sample and paired t-tests, alternative refers to the true mean of the parent population in relation to the hypothesized value mu. For two-sample t-tests, alternative refers to the difference between the true population mean for x and that for y, in relation to mu.
mu
a single number representing the value of the mean or difference in means specified by the null hypothesis.
paired
logical flag: if TRUE, x and y are considered as paired vectors.
var.equal
logical flag: if TRUE, the variances of the parent populations of x and y are assumed equal. Argument var.equal should be supplied only for the two-sample (i.e., unpaired) tests.
conf.level
confidence level for the returned confidence interval, restricted to lie between zero and one.
treatment
a vector the same length as x, with two unique values. This is a grouping variable used to split x into two samples. If supplied then y should not be used. The t-statistic numerator is mean1-mean2, where mean1 corresponds to the first unique value in treatment.

VALUE:

A list of class "htest", containing the following components:
statistic
the t-statistic, with names attribute "t".
parameters
the degrees of freedom of the t-distribution associated with statistic. Component parameters has names attribute "df".
p.value
the p-value for the test.
conf.int
a confidence interval (vector of length 2) for the true mean or difference inmeans. The confidence level is recorded in the attribute conf.level. When alternative is not "two.sided", the confidence interval will be half-infinite, to reflect the interpretation of a confidence interval as the set of all values k for which one would not reject the null hypothesis that the true mean or difference in means is k. Here infinity will be represented by NA.
estimate
vector of length 1 or 2, giving the sample mean(s) or mean of differences; these estimate the corresponding population parameters. Component estimate has a names attribute describing its elements.
null.value
the value of the mean or difference in means specified by the null hypothesis. This equals the input argument mu. Component null.value has a names attribute describing its elements.
alternative
records the value of the input argument alternative: "greater", "less" or "two.sided".
method
character string giving the name of the test used.
data.name
a character string (vector of length 1) containing the actual names of the input vectors x and y.

NULL HYPOTHESIS:

For the one-sample t-test, the null hypothesis is that the mean of the population from which x is drawn is mu. For the paired t-test, the null hypothesis is that the population mean of the difference x - y is equal to mu. For the two-sample t-tests, the null hypothesis is that the population mean for x minus that for y is mu.

The alternative hypothesis in each case indicates the direction of divergence of the population mean for x (or difference of means for x and y) from mu (i.e., "greater", "less", "two.sided").

TEST ASSUMPTIONS:

A t-statistic has a t distribution if the underlying populations are normal, the variances are equal, and you set var.equal=TRUE. These conditions are never satisfied in practice. More importantly, the actual distribution is approximately a t distribution if the sample sizes are reasonably large, the distributions are not skewed, and you set var.equal=FALSE.

You should set var.equal=TRUE only if you have good reason to believe the variances are equal. You should beware of using var.test to provide that evidence, as the F test for comparing variances is not robust against non-normality. "To make a preliminary test on variances is rather like putting to sea in a rowing boat to find out whether conditions are sufficiently calm for a ocean linear to leave port!" (Box page 333).

The effect of skewness cancels out in a two-sample problem with equal sample sizes where the underlying populations have the same variance and skewness. In one-sample problems, or when sample sizes differ, the effect of skewness on the distribution of the t-statistic disappears very slowly as the sample size increases, at the rate O(1/sqrt(n)).

The t-test and the associated confidence interval are quite robust with respect to level toward heavy-tailed non-Gaussian distributions (e.g., data with outliers). However, the t-test is quite non-robust with respect to power, and the confidence interval is quite non-robust with respect to average length, toward these same types of distributions.

DETAILS:

(a) One-Sample t-Test.

The arguments y, paired and var.equal determine the type of test. If y is NULL, a one-sample t-test is carried out with x. Here statistic is given by:

t = (mean(x) - mu) / ( sqrt(var(x)) / sqrt(length(x)) )

If x was drawn from a normal population, t has a t-distribution with length(x) - 1 degrees of freedom under the null hypothesis.

(b) Paired t-Test.

If y is not NULL and paired=TRUE, a paired t-test is performed; here statistic is defined through

t = (mean(d) - mu) / ( sqrt(var(d)) / sqrt(length(d)) )

where d is the vector of differences x - y. Under the null hypothesis, t follows a t-distribution with length(d) - 1 degrees of freedom, assuming normality of the differences d.

(c) Equal-Variance Two-Sample t-Test.

If y is not NULL and paired=FALSE, either an ordinary ("Welch modified") or pooled-variance two-sample t-test is performed, depending on whether var.equal is TRUE or FALSE. For the pooled-variance t-test, statistic is

t = (mean(x) - mean(y) - mu) / s1,

with
s1 = sp * sqrt(1/nx + 1/ny),
sp = sqrt( ( (nx-1)*var(x) + (ny-1)*var(y) ) / (nx + ny - 2) ),
nx = length(x),  ny = length(y).

Assuming that x and y come from normal populations with equal variances, t has a t-distribution with nx + ny - 2 degrees of freedom under the null hypothesis.

(d) Welch Modified Two-Sample t-Test.

If y is not NULL, paired=FALSE and var.equal=FALSE, the Welch modified two-sample t-test is performed. In this case statistic is

t = (mean(x) - mean(y) - mu) / s2

with
s2 = sqrt( var(x)/nx + var(y)/ny ),
nx = length(x),  ny = length(y).

If x and y come from normal populations, the distribution of t under the null hypothesis can be approximated by a t-distribution with (non-integral) degrees of freedom
1 / ( (c^2)/(nx-1) + ((1-c)^2)/(ny-1) )

where
c = var(x) / (nx * s2^2).

In all cases, if the distributions are not normal but sample sizes are large, then t-distributions hold approximately (under certain regularity conditions). However, large sample sizes are no help if using the pooled-variance test and the variances are not equal.

CONFIDENCE INTERVALS:

For each of the above tests, an expression for the related confidence interval (returned component conf.int) can be obtained in the usual way by inverting the expression for the test statistic. Note however that, as explained under the description of conf.int, the confidence interval will be half-infinite when alternative is not "two.sided"; infinity will be represented by NA.

REFERENCES:

Box, G. E. P. (1953), "Non-normality and Tests on Variances," Biometrika, pp. 318-335.

Hogg, R. V. and Craig, A. T. (1970). Introduction to Mathematical Statistics, 3rd ed. Toronto, Canada: Macmillan.

Mood, A. M., Graybill, F. A. and Boes, D. C. (1974). Introduction to the Theory of Statistics, 3rd ed. New York: McGraw-Hill.

Snedecor, G. W. and Cochran, W. G. (1980). Statistical Methods, 7th ed. Ames, Iowa: Iowa State University Press.

SEE ALSO:

, .

EXAMPLES:

x <- rnorm(12) 
t.test(x)                  
        # Two-sided one-sample t-test. The null hypothesis is  
        # that the population mean for 'x' is zero. The 
        # alternative hypothesis states that it is either greater 
        # or less than zero. A confidence interval for the 
        # population mean will be computed. 
 
data.before <- c(31, 20, 18, 17, 9, 8, 10, 7) 
data.after <- c(18, 17, 14, 11, 10, 7, 5, 6) 
t.test(data.after, data.before, alternative="less", paired=T)
        # One-sided paired t-test. The null hypothesis is that 
        # the population mean "before" and the one "after" are 
        # the same, or equivalently that the mean change ("after"
        # minus "before") is zero. The alternative hypothesis is 
        # that the mean "after" is less than the one "before", 
        # or equivalently that the mean change is negative. A 
        # confidence interval for the mean change will be
        # computed.

x <- c(7.8, 6.6, 6.5, 7.4, 7.3, 7., 6.4, 7.1, 6.7, 7.6, 6.8) 
y <- c(4.5, 5.4, 6.1, 6.1, 5.4, 5., 4.1, 5.5) 
t.test(x, y, conf.level=0.90)
        # Two-sided two-sample t-test. The null 
        # hypothesis is that the population means for 'x' and 'y' 
        # are the same. The alternative hypothesis is that they
        # are not. The confidence interval for the difference in
        # true means ('x' minus 'y') will have a confidence level 
        # of 0.90.
 
t.test(x, y, mu=2)
        # Two-sided pooled-variance two-sample t-test.
        # This assumes that the two populations variances are equal.
        # The null hypothesis is that the population mean for 'x' 
        # minus that for 'y' is 2.
        # The alternative hypothesis is that this difference
        # is not 2. A confidence interval for the true difference
        # will be computed.