Apply a Function to Sections of an Array

DESCRIPTION:

Returns a vector or array by applying a specified function to sections of an array.

USAGE:

apply(X, MARGIN, FUN, ...)

REQUIRED ARGUMENTS:

X
an array (not a data frame; see Note below). Missing values ( NAs) are allowed if FUN accepts them.
MARGIN
the subscripts over which the function is to be applied. For example, if X is a matrix, MARGIN=1 indicates rows and MARGIN=2 indicates columns. If the dimensions of X are named, then the names can be used to specify MARGIN. Note that MARGIN tells which dimensions of X are retained in the result.
FUN
a function to be applied to the specified array sections, or a character string giving the name of the function. The character form is necessary only for functions with unusual names, such as "%*%".

OPTIONAL ARGUMENTS:

...
any arguments to FUN. They are passed unchanged to each call of FUN and include their names.

VALUE:

If each call to FUN returns a vector of length N, and N>1, apply returns an array of dimension c(N, dim(X)[MARGIN]). If N==1 and MARGIN has length > 1, the value is an array of dimension dim(X)[MARGIN]. Otherwise, apply returns a vector.

DETAILS:

A subarray is extracted from X for each combination of the levels of the subscripts named in MARGIN. The function FUN is invoked for each of these subarrays, and the results (if any) are concatenated into a new array.

Each section of the input array is passed as the first argument to an invocation of FUN. It is passed without a keyword modifier, so, by attaching keywords in ..., it is possible to make the array section correspond to any argument of FUN. See the examples section of the lapply help file for details. The arguments to apply have unusual upper-case names so they do not conflict with names that might be used by FUN.

In the second example below, z is a 4-way array with the dimension vector (2,3,4,5). The expression apply(z, c(1,3), sum) computes the 2 by 4 matrix obtained by summing over the second and fourth extents of z; i.e., sum is called 8 times, each time on 15 values.

The sorting examples below show the difference between row and column operations when the result returned by FUN is a vector. The returned value is the first dimension of the result, hence the transpose is necessary with row sorts.

If the results returned by FUN are of different dimensions or lengths, apply returns a list rather than an array, as shown in the last example below.

Avoiding Scoping Problems with apply():

S scoping rules require that you pass in as an argument to an *apply() function any object defined outside of your local call. For example:

  > p <- 3
  > V0 <- matrix( ... t(apply(M,1,function(x, p) x*diag(p), p = p)) ...


For more information about S scoping rules, see the chapter "Data Management" in the Programmer's Guide, or see Chapter 3, "The S Language: Advanced Aspects" in S-Programming, Venables & Ripley, 2000.

NOTE:

The functions lapply and sapply apply a function to each element of a list. The function tapply applies a function to a ragged array defined by categories. The sweep function for arrays is similar to apply. To perform explicit looping, use for, while , or repeat.

If X is a data frame and MARGIN is 2, then apply invokes sapply. If X is a data frame and MARGIN is anything else, then X is coerced to a matrix using as.matrix. This may give surprising results. If there are any factor columns then as.matrix converts all columns to character, and any missing values are converted to the ordinary character string "NA". sapply will try to convert all of the outputs of FUN(X[[i]]) to the same mode and will convert factors to integers. For more predictable results, operate on columns with . For rows, consider using to explicitly convert to a numerical matrix.

ISSUES:

When FUN is a generic function f that calls .Internal instead of UseMethod, apply will not find the correct method for it and will use the default method instead. Example functions that fall into this category include dim , +, and as.vector. In these cases, it is best to replace FUN=f by FUN=function(x, ...) f(x, ...) in the call to apply. For example, replace FUN="+" by FUN=function(e1,e2) e1+e2; replace FUN=dim by FUN=function(x) dim(x). This issue applies to any function that takes another function as an argument, not just to apply.

When FUN is a generic function, lapply and sapply will sometimes not find the proper function when you call lapply(X,FUN) . It will work if you use lapply(X,function(x)FUN(x)). This issue applies to any function that takes another function as an argument, not just to lapply and sapply .

SEE ALSO:

, , , , , , , .

EXAMPLES:

# 25% trimmed column means. The result is a vector of length ncol(x).
x <- matrix(20:1, ncol=5)
apply(x, 2, mean, trim=.25)

# This apply command returns the equivalent of
# z[,1,,1] + z[,1,,2] + z[,1,,3] + z[,1,,4] + z[,1,,5] +
# z[,2,,1] + z[,2,,2] + z[,2,,3] + z[,2,,4] + z[,2,,5] +
# z[,3,,1] + z[,3,,2] + z[,3,,3] + z[,3,,4] + z[,3,,5]
z <- array(c(1:24, 101:124, 201:224, 301:324, 401:424), dim=c(2,3,4,5))
apply(z, c(1, 3), sum)

# Common operations, for which you don't need apply()
colSums(x)
colMeans(x)
colVars(x)
colStdevs(x)
# There are corresponding row operations

# Sort the columns of x.
apply(x, 2, sort)
# Transpose the result of row sort.
t(apply(x, 1, sort))

# Find the names of the dimensions.
names(dimnames(barley.exposed))
apply(barley.exposed, "cultivar", mean)

# This command returns a list, since the lengths of results are not equal.
apply(matrix(1:12, nrow=4), 2, function(x) which(x <= 6))

# Example using floor() as the generic function 
#(bigdata library section must be loaded)
apply(as.bdFrame(fuel.frame)[,-5], function(x)floor(x))

# If you call apply() from another function, and get a message
# about an object not found, it is probably because the object is
# local to the function and is not visible globally.
# For example, this fails:
x <- rmvnorm(300, d=5)
colTrimmedMeans <- function(x, myTrim){
  f <- function(x) mean(x, trim = myTrim)
  apply(x, 2, f)
}
#           colTrimmedMeans(x, .2)
# The previous line would fail, because myTrim is local to colTrimmedMeans.
#
# The recommended solution is to use ...
# The inner function (f) must take an additional argument, and the
# outer function (colTrimmedMeans) should pass that argument
# when calling apply:
colTrimmedMeans <- function(x, myTrim){
  f <- function(x, myTrim) mean(x, trim = myTrim)
  apply(x, 2, f, myTrim = myTrim)
}
colTrimmedMeans(x, .2)