Big Data Objects

DESCRIPTION:

The bdFrame and bdVector classes represent big data objects in S-PLUS.

DETAILS:

The bdFrame class is like a data frame: it is a rectangular data structure with rows and columns, where the columns contain different types of data. A bdFrame object can have an arbitrary number of rows, because the data is not held in memory; rather, it is stored in an external cache file on disk. Functions for manipulating big data objects read small parts of the external cache file at a time.

A bdVector object is like a vector, but with arbitrary length. Special cases include bdCharacter, bdFactor, bdNumeric, and bdTimeDate.

The bdFrame and bdVector objects are opaque: their slots should not be accessed directly, but only through access functions such as names. New bdFrame and bdFrame objects are created using functions such as , , , and , or other big data functions.

The big data library introduces a new set of "bd" functions designed to work efficiently on large data. These functions minimize the number of passes made through the data. These are the functions to use for best performance.

In addition, big data object methods are available for many commonly-used S-PLUS functions. These methods allow you to use the same functions on bdFrame and bdVector objects that you would use on data.frame and vector objects. These methods are typically more flexible than the "bd" functions, but they can be less efficient.

For best performance, it is important that you write code minimizing the number of passes through the data. A key tool for doing this is to use the function to perform multiple transformations during a single pass through the data, rather than applying a series of separate transformations during multiple passes through the data. See the help file for details.

You might find it useful to convert data or subsets of data from large data objects to standard data objects, or vice versa. To change a big data object to a standard data object, use the function bd.coerce and pass the big data object as the argument. To change a standard data frame to a bd.Frame, use the function as.bdFrame, and pass the standard data frame as the argument. For example, as.bdFrame(fuel.frame) converts fuel.frame from a standard data frame to a bdFrame, regardless of its size. When you convert a big data object to a standard data frame, make sure your computer has enough memory to store and manage the data frame.

Note that for standard functions such as nrow, you can access the help topic by calling the function help(nrow) or by clicking a link for nrow. To see the actual method used, type getMethods("nrow") or selectMethod("nrow", "bdFrame").

For a more detailed list of the functions available in the Big Data library, see the Big Data Library User's Guide, Appendix A.

Vector Classes:








Data Frame Object Classes:

Series Object Classes:


Model Object Classes:




Simple Properties:



Displaying bdFrame Data:



Exploring Data:



Manipulating Data:



















Subsetting Rows






Evaluating Expressions: