ks.gof(x, y = NULL, alternative = "two.sided", distribution = "normal", ...)
NA
s and
Inf
s are
allowed, but will be removed.
NA
s and
Inf
s are allowed, but will be removed.
alternative
can be one of
"greater"
,
"less"
or
"two.sided"
,
or just the initial letter of each. Two exceptions are the normal and
exponential distributions when the parameters are
estimated.
In this case, only the
"two.sided"
alternative is tested.
For the one-sample KS-test,
alternative
refers to the relation between the
empirical distribution function of
x
and the hypothesized distribution.
Usually, you must supply the parameters of the
hypothesized distribution. The only exceptions are parameters
for the normal and exponential distributions.
alternative
can only be
"two.sided"
.
Here
alternative
refers to the relationship
between the empirical distribution function for x and that for y.
y
is specified, this argument is ignored.
distribution
can be one of:
"normal"
,
"beta"
,
"cauchy"
,
"chisquare"
,
"exponential"
,
"f"
,
"gamma"
,
"lognormal"
,
"logistic"
,
"t"
,
"uniform"
,
"weibull"
,
"binomial"
,
"geometric"
,
"hypergeometric"
,
"negbinomial"
,
"poisson"
,
or
"wilcoxon"
.
You need only supply
the first characters that uniquely specify the distribution name.
For example, "logn" and "logi" uniquely specify the
lognormal and logistic distributions.
htest
", containing the following components:
names
attribute "
ks
".
estimate
has a names attribute
describing its elements.
Let G(x) denote a distribution function.
For the one-sample KS test, the null hypotheses corresponding
to the "two.sided", ("less", "greater") alternative is that
G(x) equals (is greater than, is less than)
the true distribution function of
x
.
Usually the parameters of G(x) must be specified.
For the normal and exponential distributions,
the parameters may be estimated,
in which case the null hypothesis is composite,
that x is drawn from one of a family of normal (or exponential) distributions.
For the two sample KS-test, the null hypothesis is that the true
distribution functions of x and y are equal.
The alternative hypothesis in the one sample test is that
the true distribution function of x is not equal ("two.sided"), "less" than,
or "greater" than, the hypothesized distribution function.
For the two sample test, the alternative hypothesis is that the true
distribution functions of x and y are not equal.
Whenever
ks.gof
only performs the two-sided test, you may
approximate
the one-sided probability by half the two-sided probability. This
approximation is good for small p-values (say less than 0.10),
but is unreliable for large p-values.
In the one sample problem,
the test statistic corresponding to the "two.sided" alternative is
the greatest absolute vertical distance between
the empirical distribution function of x and the hypothetical
distribution function, each evaluated on the sample.
The test statistic corresponding to the "less" alternative is the
greatest vertical distance attained by the hypothetical distribution function
over the empirical distribution function of
x. The test statistic corresponding to the "greater" alternative is naturally
just the opposite: the
greatest vertical distance attained by the empirical distribution function of
x over the hypothetical distribution function.
For the two sample problem, the test statistics are defined by
replacing the hypothetical distribution above by the empirical distribution
function of y.
The data are assumed to be measured on at least an ordinal scale. The one sample test further assumes that x is a random sample. The two sample test further assumes that the samples are mutually independent random samples. Note: the two sample test gives exact results only if the underlying distributions are continuous.
A variety of algorithms are used to calculate p-values.
Some of these calculate analytic or asymptotic approximations.
For the one sample test, the choice of algorithm depends on the
alternative tested, the
sample size, and whether the hypothesized distribution is continuous.
A brief list of the algorithms used, and their references, follow.
For continuous distributions and small sample sizes, the algorithm of
Pomeranz (1973) is used to test the two-sided alternative; and the
Birnbaum and Tingey (1951) algorithm is used to test the other alternatives.
For large samples,
Smirnov (1948) derives the distribution of the test statistic for the
two sided alternative; for the other alternatives,
a simple transformation of the test statistic is asymptotically
chi-squared with 2 degrees of freedom (Kendall and Stuart, 1973).
For discontinuous distributions and small sample sizes, Conover (1980)
gives the exact distribution for test statistics of the one sided alternatives.
The sum of these approximates
the p-value of the test statistic for the two sided alternative. This
approximation is close to the true critical level in most cases, and
the error is conservative.
There are no p-values calculated
for large samples from discontinuous distributions. Instead, we
recommend using the Chi-square test (the S-PLUS function
chisq.gof
).
Dallal and Wilkinson (1986) give an analytic approximation to
the p-values for the statistic
testing composite normality against the two sided alternative.
They derive this approximation for p-values less than 0.10.
Stephens (1970) outlines a procedure for composite exponentiality.
This approximation is accurate to two
significant figures: in the two-sided test for p-values less than 0.38;
in one-sided
tests for p-values less than 0.26.
For any p-values outside the range of good approximation,
ks.gof
sets the p-value to 0.5
and issues a warning message.
Finally, Kim and Jennrich (1973) give an algorithm to calculate the
exact distribution of the two sided test for various sample sizes.
In other cases, they recommend approximations using
Smirnov's asymptotic distribution, after applying continuity corrections
to the scaled test statistic.
Birnbaum, Z. W. and Tingey, F. H. (1951),
One-sided confidence contours for probability distribution functions,
Annals of Mathematical Statistics.
22, 592-596.
Conover, W. J. (1980).
Practical Nonparametric Statistics.
New York: John Wiley and Sons, Chapter 6.
Dallal, G. E. and Wilkinson, L. (1986).
An analytic approximation to the distribution of Lilliefor's test for normality.
The American Statistician
40, 294-296.
Kendall, M. G. and Stuart, A., (1979).
The Advanced Theory of Statistics, Vol. 2: Inference and Relationship,
(4th edition).
New York: Oxford University Press.
Kim, P. J. and Jennrich, R. I. (1973).
Tables of the exact sampling distribution of the two sample
Kolmogorov-Smirnov criterion.
In
Selected Tables in Mathematical Statistics, Vol. 1.
H. L. Harter and D. B. Owen, eds.
Providence, Rhode Island: American Mathematical Society.
Pomeranz, J. (1973).
Exact cumulative distribution of the Kolmogorov-Smirnov statistic
for small samples (Algorithm 487).
Collected Algorithms from CACM.
Smirnov, N. V. (1948).
Table for estimating the goodness of fit of empirical distributions.
Annals of Mathematical Statistics
19, 279-281.
Stephens, M. A. (1970).
Use of the Kolmogorov-Smirnov, Cramer-von Mises and
Related Statistics Without Extensive Tables.
Journal of the Royal Statistical Society, Series B,
32, 115-122.
Stephens, M. A. (1986).
Tests based on EDF statistics.
In
Goodness-of-Fit Techniques.
D'Agostino, R. B. and Stevens, M. A., eds.
New York: Marcel Dekker.
# one sample z <- rnorm(100) ks.gof(z, distribution = "normal") # hypothesize a normal distn. ks.gof(z, distribution = "chisquare", df = 2) # hypothesize a chisquare distn. # two sample x <- rnorm(90) y <- rnorm(8, mean = 2.0, sd = 1) ks.gof(x, y)