Subscript a Data Frame

DESCRIPTION:

Extract or replace values from a data frame object.

USAGE:

x[j]
x[j] <- value
x[[j]]
x[[j]] <- value
x[i, j, drop=<<see below>>]
x[i, j] <- value
x[[i, j]]
x[[i, j]] <- value

REQUIRED ARGUMENTS:

x
an object inheriting from class "data.frame".

OPTIONAL ARGUMENTS:

i, j
subscript expressions used to identify the elements to extract or replace. The expressions may be empty, which corresponds to all possible subscripts and therefore extracts/replaces the entire data frame (or one entire dimension of the data frame). The expressions may also be logical, numeric, or character. Numeric subscripts should be integers, such as the output from : (the sequence operator).

If a single argument is given that is not a matrix or list ( x[j] or x[[j]] ), then x is treated as a frame or list and the index j is assumed to index the variables of the data frame.

If the subscripting looks matrix-like (i.e., x[i,j] , x[i,] , or x[,j] ), then i and j apply to the rows (observations) and columns (variables), respectively. The methods treat x as a matrix.

drop
a logical flag controlling whether dimensions of length 1 are dropped from the return object. By default, single columns returned by the subscript operators are dropped to the corresponding variable but single rows remain data frames. If drop=FALSE, single columns remain data frames. If drop=TRUE, single rows become ordinary lists.
value
the replacement value for the relevent piece of the data frame. It is recommended that this be a data frame if you wish to replace data in more than one variable, unless value is a constant. The replacement value can also be an atomic vector or a list. For double subscripts, value should not be a list or data frame.

VALUE:

The extraction functions [ and [[ return the data formed by the designated elements of the data frame x . When extracting a single variable ( x[i,] or x[,j] ), the returned data may be dropped from a data frame to a list or variable. The expression x[[j]] returns a single variable unless j is a matrix (see below).

SIDE EFFECTS:

The replacement functions [<- and [[<- replace the data formed by the designated elements of the data frame x .

DETAILS:

Arguments are passed to the subscript operators by position and not name. The exception to this is drop, which must be passed by name. The drop argument is ignored if only a single subscript is given. For example, x[j, drop=T] returns a data frame, not a vector.

When you replace one or more columns of a data frame, S-PLUS checks that the new values have the right number of rows and generally tries to ensure that the assignment leaves the data frame in a valid state. With x[i,j] <- value, the characteristics (class, mode, dimension, attributes) of variables in the result depend on the characteristics of the variables in both x and value . With x[[j]] <- value, x[j] <- value, and x[,j] <- value, they depend solely on value.

For data frames containing matrices, these operators handle column subscripts as if each matrix were a single column. For example, if Y has 3 elements where the third is a matrix with 5 columns, Y[,3] is dropped to a matrix with 5 columns and Y[,4] is undefined. With x[i,j] <- value, if x contains matrices then both of the following are true:

(1) The i subscript must refer solely to existing rows. In other cases, the dimensions of the data frame are expanded as necessary.

(2) Elements of value are assigned to columns of x without regard to the dimensions of matrices in x. TIBCO recommends that either value be a data frame with matrices having the same dimensions as those in x[i,j] , or that double subscripting is used for one variable at a time. For example, Y[[3]][i,] <- 2*Y[[3]][i,].

A special case for x[j] is when j is a matrix, either logical with the same dimensions as x, or numeric with two columns where the first column refers to rows and the second to column numbers. In this situation, all columns of x are coerced to a common data type and matrix variables are converted to multiple columns before subscripting with the matrix j. Matrix subscripts are not allowed for the replacement operation x[j] <- value.

Attributes of x other than row.names and dup.row.names are lost if columns are subscripted (i.e., x[j], x[,j] , x[i,j] ).

NOTES:

To select a single variable from a data frame (for example, in a loop), x[[j]] is faster than x[,j] .

A common programming error is to neglect to specify drop=FALSE . Use x[<subset>, drop=FALSE] or x[<subset>] to ensure the result is a data frame, even when <subset> references only one variable. Using x[<subset>] is slightly faster.

Using x[i,] or x[i,j] can be slow if there are many rows and duplicate values in i. This is because creating unique row names is a slow process. If attr(x, "dup.row.names") is not NULL , subscripting is faster.

SEE ALSO:

, , .

EXAMPLES:

solder[sample(900,10), ]

attach(fuel.frame)
fuel.frame[, "h"] <- Weight/Disp.