concomitants(x, ...) concomitants(x, y, qfun, args.qfun = NULL, qx = NULL, df = 3, weights = NULL)
concomitants.default
.
x
, containing the empirical
distribution to be adjusted.
qfun(runif(n))
should give random values
from the known distribution for
x
.
qfun
. For
example, if
qfun=qnorm
, then this could be
list(mean=2, sd=3)
.
x
. Supply either
qx
or
qfun
.
If supplied, this should contain values from the known distribution
for
x
. This defaults to
qfun(ppoints(n))
, where
n
is the
length of
x
.
x
and
y
.
This should be at least
2
; a linear relationship results if
df=2
, while a smoothing spline is used for larger values.
NULL
(indicating no weights) or a vector the same length as
x
,
containing probabilities for a weighted distribution of
x
and
y
.
y
, but adjusted based on the difference between
x
and
qx
. This is basically
y
plus (prediction for
y
given
qx
) minus (prediction for
y
given
x
); in the linear
case this reduces to
y + beta * (qx - x)
.
Methods may return other objects; in particular,
returns an object of class
concomitants.bootstrap
that inherits from
"bootstrap"
.
This implementation uses
smooth.spline
to allow
the relationship between
x
and
y
to be curvilinear.
The higher the correlation between
x
and a smooth monotone
transformation of
y
, the more accurate the result is.
With a perfect nonlinear relationship (conditional variance
of
y
given
x
equal to zero)
the result would be
equal to
qfuny(ppoints(n))
where
qfuny
is the quantile
function for
y
(aside from errors due to imperfect estimation
of the nonlinear relationship).
If weights are present, then we presume that
x
and
y
were
obtained by importance sampling or some other mechanism that yields
weighted samples
Let F be the target distribution and G the design distribution for
x
;
i.e. mean(x <= a)  = G(a) and mean(x <= a, weights=weights)  = F(a).
In this case,
qfun
should be the inverse of F, and the weighted
distribution of
qx
should be approximately F
(the unweighted
qx
values correspond to G).
The output
y
are values from the weighted distribution for
y
.
Do, K. and Hall, P. (1992), "Distribution Estimation using Concomitants of Order Statistics, with Application to Monte Carlo Simulation for the Bootstrap," Journal of the Royal Statistical Society Series B, 54(2), 595-607.
Efron, B. (1990), "More Efficient Bootstrap Computations," Journal of the American Statistical Society, 85, 79-89.
Hesterberg, T.C. (1995), "Tail-Specific Linear Approximations for Efficient Bootstrap Simulations," Journal of Computational and Graphical Statistics, 4, 113-133.
Hesterberg, T.C. (1997), "Fast Bootstrapping by Combining Importance Sampling and Concomitants," Computing Science and Statistics, 29(2), 72-78.
set.seed(0) x <- rnorm(100) y <- .95*x+sqrt(1-.95^2) * rnorm(100) qx <- qnorm(ppoints(100)) adj.y <- concomitants(x, y, qx=qx) qx.o <- qx qx.o[order(x)] <- qx plot(x, y) arrows(x, y, x2=qx.o, y2=adj.y, size=.05, col=5) # The arrows run from the original unadjusted points to # the adjusted values. In the original data there are too # many large values of x; given the relationship between # x and y, this probably means that the values of y are # also too large. The arrowheads are at the adjusted points # Show the empirical and theoretical values of x axis(3, labels=F, at=x) axis(3, labels=F, at=qx, tck=.02, col=5) # Show the empirical and adjusted values of y axis(4, labels=F, at=y) axis(4, labels=F, at=adj.y, tck=.02, col=5) # Normal probabiity plots, for empirical y and adjusted y par(mfrow=c(2,1)) qqnorm(y); abline(0,1) qqnorm(adj.y); abline(0,1) par(mfrow=c(1,1)) # Note that the adjusted values of y are closer to the # exact distribution # Nonlinear relationship set.seed(1) y <- x + x^2/9 + .05*rnorm(100) plot(x,y) adj.y <- concomitants(x, y, qx=qx, df=4) arrows(x, y, x2=qx.o, y2=adj.y, size=.05, col=5) # The adjustments folow the curve of the relationship axis(3, labels=F, at=x) axis(3, labels=F, at=qx, tck=.02, col=5) axis(4, labels=F, at=y) axis(4, labels=F, at=adj.y, tck=.02, col=5) # Nonlinear relationship, with weights set.seed(1) y2 <- x2+x2^2/9 + .05*rnorm(100) adj.y2 <- concomitants(x2, y2, weights = w2, qx=qx2, df=4) plot(x2, y2) arrows(x2, y2, x2=qx2.o, y2=adj.y2, size=.05, col=5) # The adjustments folow the curve of the relationship axis(3, labels=F, at=x2) axis(3, labels=F, at=qx2, tck=.02, col=5) axis(4, labels=F, at=y2) axis(4, labels=F, at=adj.y2, tck=.02, col=5)