weights vector is a vector the same
length of
x, containing frequency counts that in effect expand
x
by these counts.
weights can also be sampling weights, in which
setting
normwt to
TRUE will often be appropriate. This results in
making
weights sum to the length of the non-missing elements in
x
.
normwt=TRUE thus reflects the fact that the true sample size is
the length of the
x vector and not the sum of the original values of
weights
(which would be appropriate had
normwt=FALSE). When
weights
is all ones, the estimates are all identical to unweighted estimates
(unless one of the non-default quantile estimation options is
specified to
wtd.quantile). When missing data have already been
deleted for,
x,
weights, and (in the case of
wtd.loess.noiter)
y,
specifying
na.rm=FALSE will save computation time. Omitting the
weights
argument or specifying
NULL or a zero-length vector will
result in the usual unweighted estimates.
wtd.mean,
wtd.var, and
wtd.quantile compute
weighted means, variances, and quantiles, respectively.
wtd.ecdf
computes a weighted empirical distribution function.
wtd.table
computes a weighted frequency table (although only one stratification
variable is supported at present).
wtd.rank computes weighted
ranks, using mid–ranks for ties. This can be used to obtain Wilcoxon
tests and rank correlation coefficients.
wtd.loess.noiter is a
weighted version of
loess.smooth when no iterations for outlier
rejection are desired. This results in especially good smoothing when
y
is binary.
num.denom.setup is a utility function that allows one to deal with
observations containing numbers of events and numbers of trials, by
outputting two observations when the number of events and non-events
(trials - events) exceed zero. A vector of subscripts is generated
that will do the proper duplications of observations, and a new binary
variable
y is created along with usual cell frequencies (
weights)
for each of the
y=0,
y=1 cells per observation.
wtd.mean(x, weights=NULL, normwt="ignored", na.rm=TRUE)
wtd.var(x, weights=NULL, normwt=FALSE, na.rm=TRUE)
wtd.quantile(x, weights=NULL, probs=c(0, .25, .5, .75, 1),
type=c('quantile','(i-1)/(n-1)','i/(n+1)','i/n'),
normwt=FALSE, na.rm=TRUE)
wtd.ecdf(x, weights=NULL,
type=c('i/n','(i-1)/(n-1)','i/(n+1)'),
normwt=FALSE, na.rm=TRUE)
wtd.table(x, weights=NULL, type=c('list','table'),
normwt=FALSE, na.rm=TRUE)
wtd.rank(x, weights=NULL, normwt=FALSE, na.rm=TRUE)
wtd.loess.noiter(x, y, weights=rep(1,n), robust=rep(1,n),
span=2/3, degree=1, cell=.13333,
type=c('all','ordered all','evaluate'),
evaluation=100, na.rm=TRUE)
num.denom.setup(num, denom)
category or
factor vector
for
wtd.table)
normwt=TRUE to make
weights sum to
length(x) after deletion
of NAs
FALSE to suppress checking for NAs
wtd.quantile,
type defaults to
quantile to use the same
interpolated order statistic method as
quantile. Set
type to
"(i-1)/(n-1)",
"i/(n+1)", or
"i/n" to use the inverse of the
empirical distribution function, using, respectively, (wt - 1)/T,
wt/(T+1), or wt/T, where wt is the cumulative weight and T is the
total weight (usually total sample size). These three values of
type are the possibilities for
wtd.ecdf. For
wtd.table the
default
type is
"list", meaning that the function is to return a
list containing two vectors:
x is the sorted unique values of
x
and
sum.of.weights is the sum of weights for that
x. This is the
default so that you don't have to convert the
names attribute of the
result that can be obtained with
type="table" to a numeric variable
when
x was originally numeric.
type="table" for
wtd.table
results in an object that is the same structure as those returned from
table. For
wtd.loess.noiter the default
type is
"all",
indicating that the function is to return a list containing all the
original values of
x (including duplicates and without sorting) and
the smoothed
y values corresponding to them. Set
type="ordered
all" to sort by
x, and
type="evaluate" to evaluate the smooth
only at
evaluation equally spaced points between the observed limits
of
x.
x
loess.smooth. The default is linear (
degree=1) and 100 points
to evaluation (if
type="evaluate").
The functions correctly combine weights of observations having
duplicate values of
x before computing estimates.
wtd.rank does not handle NAs as elegantly as
rank if
weights is
specified.
wtd.mean and
wtd.var return scalars.
wtd.quantile returns a
vector the same length as
probs.
wtd.ecdf returns a list whose
elements
x and
ecdf correspond to unique sorted values of
x.
If the first CDF estimate is greater than zero, a point (min(x),0) is
placed at the beginning of the estimates.
See above for
wtd.table.
wtd.rank returns a vector the same
length as
x (after removal of NAs, depending on
na.rm). See above
for
wtd.loess.noiter.
Frank Harrell
Department of Biostatistics
Vanderbilt University School of Medicine
f.harrell@vanderbilt.edu
Research Triangle Institute (1995): SUDAAN User's Manual, Release 6.40, pp. 8–16 to 8–17.
set.seed(1) x <- runif(500) wts <- sample(1:6, 500, TRUE) std.dev <- sqrt(wtd.var(x, wts)) wtd.quantile(x, wts) death <- sample(0:1, 500, TRUE) plot(wtd.loess.noiter(x, death, wts, type='evaluate')) describe(~x, weights=wts) # describe uses wtd.mean, wtd.quantile, wtd.table xg <- cut2(x,g=4) table(xg) wtd.table(xg, wts, type='table') # Here is a method for getting stratified weighted means y <- runif(500) g <- function(y) wtd.mean(y[,1],y[,2]) summarize(cbind(y, wts), llist(xg), g, stat.name='y') # Restructure data to generate a dichotomous response variable # from records containing numbers of events and numbers of trials num <- c(10,NA,20,0,15) # data are 10/12 NA/999 20/20 0/25 15/35 denom <- c(12,999,20,25,35) w <- num.denom.setup(num, denom) w # attach(my.data.frame[w$subs,])