Concomitants variance reduction

DESCRIPTION:

Adjust an empirical distribution to control for the difference between the difference between the empirical distribution of a covariate and its known distribution. This is a generic function; methods include:

USAGE:

concomitants(x, ...) 
concomitants(x, y, qfun, args.qfun = NULL, qx = NULL, 
             df = 3, weights = NULL) 

REQUIRED ARGUMENTS:

x
a vector, containing values of a covariate.
...
additional arguments, passed to methods. The remaining arguments here are for the default method, concomitants.default.
y
a vector, the same length as x, containing the empirical distribution to be adjusted.
qfun
a quantile function -- qfun(runif(n)) should give random values from the known distribution for x.
args.qfun
a list, containing additional arguments to pass to qfun. For example, if qfun=qnorm, then this could be list(mean=2, sd=3).
qx
vector the same length as x. Supply either qx or qfun. If supplied, this should contain values from the known distribution for x. This defaults to qfun(ppoints(n)), where n is the length of x.
df
scalar real value, giving the degrees of freedom estimating the relationship between x and y. This should be at least 2; a linear relationship results if df=2, while a smoothing spline is used for larger values.
weights
NULL (indicating no weights) or a vector the same length as x, containing probabilities for a weighted distribution of x and y.

VALUE:

a vector like y, but adjusted based on the difference between x and qx. This is basically y plus (prediction for y given qx) minus (prediction for y given x); in the linear case this reduces to y + beta * (qx - x).

Methods may return other objects; in particular, returns an object of class concomitants.bootstrap that inherits from "bootstrap".

DETAILS:

This implementation uses smooth.spline to allow the relationship between x and y to be curvilinear.

The higher the correlation between x and a smooth monotone transformation of y, the more accurate the result is. With a perfect nonlinear relationship (conditional variance of y given x equal to zero) the result would be equal to qfuny(ppoints(n)) where qfuny is the quantile function for y (aside from errors due to imperfect estimation of the nonlinear relationship).

If weights are present, then we presume that x and y were obtained by importance sampling or some other mechanism that yields weighted samples Let F be the target distribution and G the design distribution for x; i.e. mean(x <= a)  = G(a) and mean(x <= a, weights=weights)  = F(a). In this case, qfun should be the inverse of F, and the weighted distribution of qx should be approximately F (the unweighted qx values correspond to G). The output y are values from the weighted distribution for y.

REFERENCES:

Do, K. and Hall, P. (1992), "Distribution Estimation using Concomitants of Order Statistics, with Application to Monte Carlo Simulation for the Bootstrap," Journal of the Royal Statistical Society Series B, 54(2), 595-607.

Efron, B. (1990), "More Efficient Bootstrap Computations," Journal of the American Statistical Society, 85, 79-89.

Hesterberg, T.C. (1995), "Tail-Specific Linear Approximations for Efficient Bootstrap Simulations," Journal of Computational and Graphical Statistics, 4, 113-133.

Hesterberg, T.C. (1997), "Fast Bootstrapping by Combining Importance Sampling and Concomitants," Computing Science and Statistics, 29(2), 72-78.

SEE ALSO:

, .

EXAMPLES:

set.seed(0) 
x <- rnorm(100) 
y <- .95*x+sqrt(1-.95^2) * rnorm(100) 
qx <- qnorm(ppoints(100)) 
adj.y <- concomitants(x, y, qx=qx) 
 
qx.o <- qx 
qx.o[order(x)] <- qx 
plot(x, y) 
arrows(x, y, x2=qx.o, y2=adj.y, size=.05, col=5) 
# The arrows run from the original unadjusted points to 
# the adjusted values.  In the original data there are too 
# many large values of x; given the relationship between 
# x and y, this probably means that the values of y are 
# also too large.  The arrowheads are at the adjusted points 
 
# Show the empirical and theoretical values of x 
axis(3, labels=F, at=x) 
axis(3, labels=F, at=qx, tck=.02, col=5) 
 
# Show the empirical and adjusted values of y 
axis(4, labels=F, at=y) 
axis(4, labels=F, at=adj.y, tck=.02, col=5) 
 
# Normal probabiity plots, for empirical y and adjusted y 
par(mfrow=c(2,1)) 
qqnorm(y); abline(0,1) 
qqnorm(adj.y); abline(0,1) 
par(mfrow=c(1,1)) 
# Note that the adjusted values of y are closer to the 
# exact distribution 
 
# Nonlinear relationship 
set.seed(1) 
y <- x + x^2/9 + .05*rnorm(100) 
plot(x,y) 
adj.y <- concomitants(x, y, qx=qx, df=4) 
arrows(x, y, x2=qx.o, y2=adj.y, size=.05, col=5) 
# The adjustments folow the curve of the relationship 
axis(3, labels=F, at=x) 
axis(3, labels=F, at=qx, tck=.02, col=5) 
axis(4, labels=F, at=y) 
axis(4, labels=F, at=adj.y, tck=.02, col=5) 
 
# Nonlinear relationship, with weights 
set.seed(1)  
y2 <- x2+x2^2/9 +  .05*rnorm(100)  
adj.y2 <- concomitants(x2, y2, weights = w2, qx=qx2, df=4) 
plot(x2, y2) 
arrows(x2, y2, x2=qx2.o, y2=adj.y2, size=.05, col=5)  
# The adjustments folow the curve of the relationship  
axis(3, labels=F, at=x2)  
axis(3, labels=F, at=qx2, tck=.02, col=5)  
axis(4, labels=F, at=y2)  
axis(4, labels=F, at=adj.y2, tck=.02, col=5)