Hypergeometric Distribution

DESCRIPTION:

Density, cumulative probability, quantiles and random generation for the Hypergeometric discrete distribution.

USAGE:

dhyper(q, m, n, k, log = FALSE) 
phyper(q, m, n, k) 
qhyper(p, m, n, k) 
rhyper(nn, m, n, k, bigdata=F) 

REQUIRED ARGUMENTS:

q
vector or bdVector of values of a random variable representing the number of red balls out of a sample of size k drawn from an urn containing m red balls and n black ones.
p
vector or bdVector of probabilities. Missing values ( NAs) are allowed. Its values should be between 0 and 1.
nn
sample size, nn random hypergeometrically distributed numbers are returned unless length(nn) is larger than 1, in which case length(nn) random numbers are returned.
m
number of red balls in the urn. This could be a vector or bdVector with non-negative integer elements.
n
number of black balls in the urn. This could also be a vector or bdVector with non-negative integer elements.
k
number of balls drawn from an urn with m red and n black balls. This can be a vector or bdVector like m and n.

OPTIONAL ARGUMENTS:

bigdata
a logical value; if TRUE, an object of type bdVector is returned. Otherwise, a vector object is returned. This argument can be used only if the bigdata library section has been loaded.
log
a logical scalar; if TRUE, dhyper will return the log of the density, not the density itself.

VALUE:

dhyper returns discrete probability values. Other functions return vectors or bdVectors of cumulative probabilities ( phyper), quantiles ( qhyper), or random samples ( rhyper) for the Hypergeometric distribution.

SIDE EFFECTS:

The function rhyper causes creation of the dataset .Random.seed if it does not already exist, otherwise its value is updated.

DETAILS:

Missing values ( NAs) and +-Infs are allowed as components of q, p or nn, but not in the vectors or bdVectors of parameters. If q, m, n , or k are vectors or bdVectors of different lengths, each is replicated cyclically to the length of the longest. The values of q, m , n, and k are rounded to the nearest integer value before any calculations are made.

BACKGROUND:

The Hypergeometric distribution can be described by an Urn Model with m red and n black balls. Any sequence of k drawings resulting in k-q black and q red balls has the same probability. It is similar to the Binomial distribution but sampled from a finite population without replacement.

A hypergeometric variable corresponds to the conditional distribution of the number in the upper left cell of a 2 by 2 table with row marginal totals m and n and column marginal totals k and N-k, if the unconditional distributions of cell counts are Poisson, where N=m+n is the grand total. By symmetry between rows and columns, phyper(q, m, n, k) = phyper(q, k, N-k, m).

The range of the distribution is max(0, k-n) <= q <= min(m, k), the density is p(q, m, n, k) = choose(m, q) * choose(n, k-q) / choose(N, k), the expected value is m * k / N, and variance is m * n * k * (N-k) / (N^2 * (N-1)).

For details on the uniform random number generator implemented in S-PLUS, see the set.seed help file.

REFERENCES:

Hoel, P., Port, S. and Stone, C. (1971). Introduction to Probability Theory. Houghton-Mifflin, Boston, MA.

Johnson, N. L. and Kotz, S. (1970). Discrete Univariate Distributions, vol. 2. Houghton-Mifflin, Boston, MA.

SEE ALSO:

, .

EXAMPLES:

cumsum(dhyper(0:5,4,6,7))   # cumulative distribution function 
phyper(0:5,4,6,7)           # same thing
phyper(0:5,7,3,4)           # same thing, by symmetry of rows and columns 
rhyper(10,4,6,7)            # 10 random values
dhyper(rep(3,3), m=c(5,8,12), n=4, k=4)