data.frame
,
resamp
,
series
,
bdFrame
,
bdTimeSeries
,
and
bdSignalSeries
objects.
colMeans(x, na.rm=F, dims=1, weights, freq, n) colSums(x, na.rm=F, dims=1, weights, freq, n) colVars(x, na.rm=F, dims=1, unbiased=T, SumSquares=F, weights, freq, n) colStdevs(x, na.rm=F, dims=1, unbiased=T, SumSquares=F, weights, freq, n) rowMeans(x, na.rm=F, dims=1, weights, freq, n) rowSums(x, na.rm=F, dims=1, weights, freq, n) rowVars(x, na.rm=F, dims=1, unbiased=T, SumSquares=F, weights, freq, n) rowStdevs(x, na.rm=F, dims=1, unbiased=T, SumSquares=F, weights, freq, n) sd(x, na.rm=F)
timeSeries
object,
or an object for which a method has been written.
FALSE
, missing values (
NA
)
in the input result in missing values in corresponding elements of the output.
If
TRUE
then missing values are omitted from calculations.
x
is an array with more than two dimensions (say 5),
dims
determines what dimensions are summarized;
if
dims=3
, then
rowMeans
is a 3 dimensional array
consisting of the means across the remaining 2 dimensions, and
colMeans
is a 2 dimensional array consisting of the
means across the last 3 dimensions.
You can specify
dims=1
for a big data object (for
example, the big data versions of
colMeans
,
colSums
,
colVars
, and
colStdevs
).
Any other value is not allowed.
TRUE
, then variances are sample variances, e.g.
sum((x-mean(x))^2)/(n-1)
n
is the length of the vector.
This is unbiased if the values in
x
are obtained by simple random sampling.
If
FALSE
, the definition
sum((x-mean(x))^2)/n
TRUE
, then unnormalized sums of squares are returned, with
no division by either
n
or
(n-1)
.
If this is
TRUE
then
unbiased
is ignored.
x
(number of rows or columns for
colmeans
and
rowMeans
, respectively,
if
x
is a matrix).
If present,
argument
unbiased
is ignored and
the definition used is
sum(weights * (x-mean(x, weights=weights))^2)
SumSquares
=T and
sum(weights * (x-mean(x, weights=weights))^2)/sum(weights)
x
.
If present, the
k
th row of
x
is repeated
k
times.
The effect is similar to the
weights
argument, except this does
not cause the
unbiased
argument to be ignored, and division
is by
(sum(freq)-1)
rather than
(n-1)
.
x
is an array
and the value of
dims
implies that the result has at least two
dimensions.
If
n
is supplied then a vector without names returned (
dims
is ignored).
Otherwise the result has names or dimnames if these are found in
x
.
colVars(x)
is equivalent to
diag(var(x))
if
x
is a matrix,
but is faster (and uses column names).
Supplying
n
improves speed, largely because names are discarded.
However, the primary use of
n
is to compute summaries for a vector
without turning it into an array first.
Variances are computed by the numerically accurate corrected two-pass method described in Chan, Golub, and LeVeque (1983). Summations are done by adding results for groups of size 256, then adding the group sums; this is motivated by the numerically-accurate pairwise summation method described in the same article.
Chan, T., Golub, G., and LeVeque, R. (1983). Algorithms for computing the sample variance: analysis and recommendations. The American Statistician, 37, 242-247.
x <- matrix(1:12, 4) rowMeans(x) colMeans(x) ## Summaries for regular subsets of a vector x <- 1:10 colMeans(x, n=5) # groups of 5 consecutive observations rowMeans(x, n=5) # groups of every fifth observation ## Higher-dimensional array x <- array(runif(24), dim=c(2,3,4)) rowMeans(x) # vector of length 2. rowMeans(x, dims=2) # 2x3 matrix. apply(x, 1:2, mean) # same as previous colMeans(x) # 3x4 matrix. colMeans(x, dims=2) # vector of length 4. colMeans(aperm(x, c(2,1,3))) # 2x4 matrix colVars(x[1,,]) # vector of length 4 diag(var(x[1,,])) # same as previous ### Investigate the distribution of the sample mean and t-statistic ### when the underlying population is not normal x <- rexp(1000 * 20) # 1000 samples of size 20 means <- colMeans(x, n=20) stdevs <- colStdevs(x, n=20) qqnorm(means) plot(means, stdevs) # These would be independent for a normal population qqnorm( (means - 1) / stdevs ) # The first three lines in that study could be replaced with x <- matrix(rexp(1000 * 20), 20) # 1000 samples of size 20 means <- colMeans(x) stdevs <- colStdevs(x) ### Bootstrap the sample mean y <- runif(10) indices <- sample(1:10, 10*1000, replace=T) # 1000 samples # One way -- make use of the argument "n" colMeans(y[indices], n=10) # Alternative (slower) boot.y <- y[indices] dim(boot.y) <- c(10, 1000) colMeans(boot.y) # Same as previous, but much slower apply(boot.y, 2, mean)