pDiscreteMean(q, values, size = <<see below>>, weights = NULL, group = NULL, conv.factor = 0, ...) qDiscreteMean(p, values, size = <<see below>>, weights = NULL, group = NULL, conv.factor = 0, ...) dDiscreteMean(x, values, size = <<see below>>, weights = NULL, group = NULL, conv.factor = 0, ...) saddlepointP(tau, L, size = <<see below>>, weights = NULL, group = NULL, mean = T, conv.factor = 0) saddlepointD(tau, L, size = <<see below>>, weights = NULL, group = NULL, mean = T, conv.factor = 0) saddlepointPSolve(p, L, size = <<see below>>, weights = NULL, group = NULL, mean = T, conv.factor = 0, initial, tol = 1E-6, tol.tau = tol, maxiter = 100)
n
(the length of
L
).
See "DETAILS", below.
n
; if supplied then
sampling is with these (unequal) probabilities on the values in
L
.
n
indicating stratified sampling or
multiple-group problems; unique
values of this vector determine the groups.
In the current implementation, only one of
group
and
size
may be supplied.
TRUE
then calculations are for the sample mean, or
sum of sample means for groups. If
FALSE
, then calculations
are for the sample sum or sum of group sample sums.
p
;
initial values used in iteratively solving for tau.
tau
on the scale of
p
.
tau
on the scale of
tau
.
tau
which
bracket the solution for each
p
(after the root is bracketed additional
iterations may be performed).
pDiscreteMean
and
dDiscreteMean
these are any other arguments acceptable to
tiltMeanSolve
. For
qDiscreteMean
, these are any other arguments
acceptable to
saddlepointPSolve
.
dDiscreteMean
and
saddlepointD
),
probability (
pDiscreteMean
and
saddlepointP
),
quantile (
qDiscreteMean
),
or
saddlepoint tilting parameter (
saddlepointPSolve
)
for the mean or sum of random values from a discrete distribution.
The output is a vector as the same length as the primary input
(
p
,
q
,
x
, or
tau
).
"density" is a misnomer, as the distribution is not continuous. However, if the values in the discrete distribution are themselves drawn from a continuous distribution, then this distribution is practically continuous (Hall 1986); the "density" is the density for a continuous approximation to the distribution.
Suppose that Y is the mean of
size
observations sampled with
replacement from
L
. Then
(tiltMean(tau, L), saddlepointP(tau, L, size))
are parametric equations in
tau
that trace the saddlepoint estimate of the
cumulative distribution function of Y.
If
group
is supplied, then calculations are for the distribution
of the sum of group means, or sum of group sums.
Arbitrary sample sizes within groups are not supported.
In the sum of group means case, the tilting parameter used for
group
g
is
tau / (n[g]/n)
, which is consistent with
tiltMean
.
The standard saddlepoint estimate for density is due to Daniels (see also Kolassa 1997). The cumulative distribution function estimate used here is formula (3.8) in Barndorff-Nielsen (1986), often referred to as the "r*" approximation in the literature. This is similar to the Lugannani and Rice saddlepoint approximation (see Kolassa). The cdf approximation is modified to avoid numerical problems in the center.
These estimates are for continuous distributions, though here they
are applied to discrete distributions. If the sample is reasonably
large and observations (
L
) are not lattice-valued this should not
matter, but for small samples the estimates may break down, and for
lattice-valued observations (e.g. integers) the estimates do not
reflect the discrete steps in the actual cdf.
The
conv.factor
argument convolves the distribution of the sum (or
mean) of
size
observations (chosen from
L
with probabilities
weights
) with a single normally distributed observation with
variance
conv.factor*var(L,weights,unbiased=F)
. This serves three
purposes. First, it provides some smoothing.
Second, it inflates the variance of the distribution, and may be used
to get (nearly) unbiased variances. Recall that the usual estimate of
sample variance (
var(x,unbiased=T)
) uses a denominator of
(n-1)
rather than
n
, where
n
is the sample size; this corresponds to a
variance inflation factor of
n/(n-1)
. Here the expected value of
the variance for the mean of
size
independent observations without
weights from a distribution with variance
sigma^2
is (n-1)/n
sigma^2 (size+conv.factor)/size^2. With
size=n
and
conv.factor=n/(n-1)
that simplifies to
sigma^2/n
.
Third, the argument makes estimates reliable in extreme cases, when
size
is very small and
L
or
weights
is skewed (see
"EXAMPLES"). Saddlepoint density and distribution estimates break down
in the tails for all discrete distributions when
size
is fixed: the
density approximation approaches infinity as tau approaches plus or
minus infinity; the r* cdf approximation approaches 0 as tau
approaches infinity and 1 as tau approaches negative infinity
(the
Lugannani-Rice approximation approaches negative infinity as tau
approaches positive infinity and positive infinity as tau approaches
negative infinity). On most examples, however, the approximations fail
only in extreme regions of the tails, and may not fail at all up to
machine precision. In case of questionable results, set
conv.factor
to a small positive value, say
0.1
, to get the correct tail
behavior.
saddlepointP
produces a warning if the cdf approximation is
decreasing at any
tau
value.
pDiscreteMean
calls
tiltMeanSolve
to calculate
tau
for given quantiles,
then calls
saddlepointP
.
dDiscreteMean
calls
tiltMeanSolve
, then
saddlepointD
, and
qDiscreteMean
calls
saddlepointPSolve
, then
tiltMean
.
saddlepointPSolve
uses a bracketed secant method to iteratively
solve for
tau
Barndorff-Nielsen, O. E. (1986), "Inference on full or partial parameters based on the standardized signed log likelihood ratio", Biometrika, 73, 307-322.
Daniels, H.E. (1954), "Saddlepoint approximations in statistics," Ann. Math. Statist., 25, 631-650.
Hall, P. (1986), "On the number of bootstrap simulations required to construct a Confidence Interval", Annals of Statistics 14, 1453-1462.
Hesterberg, T.C. (1994), "Saddlepoint Quantiles and Distribution Curves, with Bootstrap Applications," Computational Statistics, 9(3), 207-212.
Kolassa, J.E. (1997). Series Approximation Methods in Statistics. Second edition; Springer-Verlag, Lecture Notes in Statistics, no. 88.
set.seed(0) x <- rexp(30) p <- c(.01, .025, .05, .5, .95, .975, .99) tau <- saddlepointPSolve(p, x) plot(tiltMean(tau, x)$q, p) # saddlepoint distribution curve tau2 <- seq(min(tau), max(tau), length = 200) lines(tiltMean(tau2, x)$q, saddlepointP(tau2, x)) # variance decreases as sample size increases (use qDiscreteMean) points(qDiscreteMean(p, x, size = 50), p, col = 3) p2 <- seq(.01, .99, by = .005) lines(qDiscreteMean(p2, x, size = 50), p2, col = 3) # Find the saddlepoint cdf and density estimates at a particular x value q <- 1:40/20 # quantile values .05, .1, ..., 2 plot(q, dDiscreteMean(q, x), type = "l") # density lines(q, pDiscreteMean(q, x), col = 2) # cdf # Stratified sampling set.seed(0) gs <- c(10, 20, 10) L1 <- rnorm(gs[1], mean = 0, sd = 1) L2 <- rnorm(gs[2], mean = 1, sd = 1) L3 <- rnorm(gs[3], mean = 2, sd = 3) L <- c(L1, L2, L3) group <- rep(1:3, gs) p <- 1:9/10 plot(qDiscreteMean(p, L = L, group = group), p) p2 <- seq(.1, .9, by = .01) lines(qDiscreteMean(p2, L = L, group = group), p2) # An example showing failure of the approximations in the tails L <- c(0,3,6,10) taup <- c(seq(-4,-1,length=10),seq(-1,1,length=50),seq(1,4,length=10)) taud <- seq(-3,3,length=100) plot(taup, saddlepointP(taup, L), type = "l") # density: warning messages lines(taud, saddlepointD(taud, L), col = 3) # cdf # Improve with a convolution plot(taup, saddlepointP(taup, L, conv.factor = .1), type = "l") # density lines(taud, saddlepointD(taud, L, conv.factor = .1), col = 3) # cdf