Count Association Rule Items

DESCRIPTION:

Count the occurance of items within a set of transactions, without storing all of the different items in memory. This can be used to avoid memory problems generating association rules there are many different possible items.

This function requires the bigdata library section to be loaded.

USAGE:

bd.assoc.rules.get.item.counts(data,
                               input.format="item.list", 
                               item.columns=NULL,
                               id.columns=character(0),
                               id.sort=T)

REQUIRED ARGUMENTS:

data
The input data to be analyzed. It can be a data.frame or bdFrame. The input.format argument specifies how transactions are read from this object.

OPTIONAL ARGUMENTS:

input.format
This specifies how transaction items are represented in the data object. It must be one of the four strings "item.list", "column.value", "column.flag", or "transaction.id". The four formats are described in the documentation for .
item.columns
The names or numbers of the data columns containing items. If NULL, all columns in data are items.
id.columns
The names or numbers of the data columns identifying the transaction in the "transaction.id" input format.
id.sort
If this argument is true, the input data will be sorted by the "id.columns" columns before processing the "transaction.id" input format. If the input data is already sorted, this argument can be set to false to avoid sorting the data.

VALUE:

a bdFrame with the following columns:

"item" : An item string.

"count" : The number of transactions the item appears in.

"totalTransactions" : The total number of transactions read. This will be the same for every output row.

DETAILS:

This function is called automatically by when its prescan.items argument is true, to select the items to be processed, without storing all of the possible items in memory.

The arguments to this function are a subset of the arguments to , and are interpreted exactly the same way as in the other function.

This function is defined as a separate function to allow the user to (1) generate the list of items, (2) filter this list to produce a vector of "interesting" items, and (3) pass this vector of items as the init.items argument to .

SEE ALSO:

.

EXAMPLES:

bd.assoc.rules.get.item.counts(
    data.frame(aa=c("A","A","B","B","B"),
               bb=c("C","C","C","C","D"),
               stringsAsFactors=F),
    input.format="item.list")