Split Data by Groups

DESCRIPTION:

Returns a list in which each component is a data frame, a bdFrame, or vector containing the values from data that correspond to unique values of group.

USAGE:

split(data, group) 

REQUIRED ARGUMENTS:

data
vector, matrix, bdFrame, or data frame containing the values to be grouped. If data is a vector, its values are grouped according to the group argument; if data is a matrix, bdFrame, or data frame, its rows are grouped instead. Missing values ( NAs) are allowed.
group
vector or factor variable giving the groups for the data values. If this is shorter than data, it is replicated to be the same length. If data is a matrix, bdFrame, or data frame, the group variable is replicated so that it has the same number of elements as there are rows in data. If group is longer than data, a warning is issued and some of the components in the result have zero length. Missing values are not accepted.

VALUE:

list in which each component contains all data values associated with a particular value of group. For example, if the third value in group is 12, the third value (or row) in data is placed in a list component with all other data values that have group values of 12.

Within each group, data values are ordered as they originally appeared in data. The names of the list components in the output are the corresponding group values if group is a numeric vector, or the corresponding levels if group is a factor variable.

DETAILS:

A common use for split is to create a data structure accepted by boxplot. A combination of factor and tapply is usually preferred to using split followed by sapply. See the examples below for more details.

If group is not a factor variable, split converts it to one before grouping and assigns the levels of the factor to sort(unique(group)). If you want a different order for the levels, convert the group vector to a factor and define the levels explicitly before passing it to split.

SEE ALSO:

, , , .

EXAMPLES:

split(c("Martin", "Mary", "Matt"), c("M", "F", "M")) 

split(fuel.frame, fuel.frame$Type)
boxplot(split(fuel.frame$Fuel, fuel.frame$Type), notch = TRUE) 

# mean usage by age group 
attach(market.frame)
sapply(split(usage,age), mean) 
# alternative computation 
tapply(usage,list(age), mean)  

# component for each month 
split(ship, cycle(ship))  
 
# survival time by sex 
attach(lung) 
split(time, sex)