data.frame,
resamp
,
series,
bdFrame
,
bdTimeSeries,
and
bdSignalSeries objects.
colMeans(x, na.rm=F, dims=1, weights, freq, n) colSums(x, na.rm=F, dims=1, weights, freq, n) colVars(x, na.rm=F, dims=1, unbiased=T, SumSquares=F, weights, freq, n) colStdevs(x, na.rm=F, dims=1, unbiased=T, SumSquares=F, weights, freq, n) rowMeans(x, na.rm=F, dims=1, weights, freq, n) rowSums(x, na.rm=F, dims=1, weights, freq, n) rowVars(x, na.rm=F, dims=1, unbiased=T, SumSquares=F, weights, freq, n) rowStdevs(x, na.rm=F, dims=1, unbiased=T, SumSquares=F, weights, freq, n) sd(x, na.rm=F)
timeSeries object,
or an object for which a method has been written.
FALSE, missing values (
NA)
in the input result in missing values in corresponding elements of the output.
If
TRUE then missing values are omitted from calculations.
x
is an array with more than two dimensions (say 5),
dims determines what dimensions are summarized;
if
dims=3, then
rowMeans is a 3 dimensional array
consisting of the means across the remaining 2 dimensions, and
colMeans is a 2 dimensional array consisting of the
means across the last 3 dimensions.
You can specify
dims=1 for a big data object (for
example, the big data versions of
colMeans,
colSums,
colVars
, and
colStdevs).
Any other value is not allowed.
TRUE, then variances are sample variances, e.g.
sum((x-mean(x))^2)/(n-1)
n is the length of the vector.
This is unbiased if the values in
x are obtained by simple random sampling.
If
FALSE, the definition
sum((x-mean(x))^2)/n
TRUE, then unnormalized sums of squares are returned, with
no division by either
n or
(n-1).
If this is
TRUE then
unbiased is ignored.
x
(number of rows or columns for
colmeans and
rowMeans, respectively,
if
x is a matrix).
If present,
argument
unbiased is ignored and
the definition used is
sum(weights * (x-mean(x, weights=weights))^2)
SumSquares=T and
sum(weights * (x-mean(x, weights=weights))^2)/sum(weights)
x.
If present, the
kth row of
x is repeated
k times.
The effect is similar to the
weights argument, except this does
not cause the
unbiased argument to be ignored, and division
is by
(sum(freq)-1) rather than
(n-1).
x is an array
and the value of
dims implies that the result has at least two
dimensions.
If
n is supplied then a vector without names returned (
dims is ignored).
Otherwise the result has names or dimnames if these are found in
x.
colVars(x) is equivalent to
diag(var(x)) if
x is a matrix,
but is faster (and uses column names).
Supplying
n improves speed, largely because names are discarded.
However, the primary use of
n is to compute summaries for a vector
without turning it into an array first.
Variances are computed by the numerically accurate corrected two-pass method described in Chan, Golub, and LeVeque (1983). Summations are done by adding results for groups of size 256, then adding the group sums; this is motivated by the numerically-accurate pairwise summation method described in the same article.
Chan, T., Golub, G., and LeVeque, R. (1983). Algorithms for computing the sample variance: analysis and recommendations. The American Statistician, 37, 242-247.
x <- matrix(1:12, 4) rowMeans(x) colMeans(x) ## Summaries for regular subsets of a vector x <- 1:10 colMeans(x, n=5) # groups of 5 consecutive observations rowMeans(x, n=5) # groups of every fifth observation ## Higher-dimensional array x <- array(runif(24), dim=c(2,3,4)) rowMeans(x) # vector of length 2. rowMeans(x, dims=2) # 2x3 matrix. apply(x, 1:2, mean) # same as previous colMeans(x) # 3x4 matrix. colMeans(x, dims=2) # vector of length 4. colMeans(aperm(x, c(2,1,3))) # 2x4 matrix colVars(x[1,,]) # vector of length 4 diag(var(x[1,,])) # same as previous ### Investigate the distribution of the sample mean and t-statistic ### when the underlying population is not normal x <- rexp(1000 * 20) # 1000 samples of size 20 means <- colMeans(x, n=20) stdevs <- colStdevs(x, n=20) qqnorm(means) plot(means, stdevs) # These would be independent for a normal population qqnorm( (means - 1) / stdevs ) # The first three lines in that study could be replaced with x <- matrix(rexp(1000 * 20), 20) # 1000 samples of size 20 means <- colMeans(x) stdevs <- colStdevs(x) ### Bootstrap the sample mean y <- runif(10) indices <- sample(1:10, 10*1000, replace=T) # 1000 samples # One way -- make use of the argument "n" colMeans(y[indices], n=10) # Alternative (slower) boot.y <- y[indices] dim(boot.y) <- c(10, 1000) colMeans(boot.y) # Same as previous, but much slower apply(boot.y, 2, mean)