bdFrame
and
bdVector
classes represent
big data objects in S-PLUS.
The
bdFrame
class
is like a data frame: it is a rectangular data structure
with rows and columns, where the columns contain different types of data.
A
bdFrame
object can have an arbitrary number of rows, because
the data is not held in memory; rather, it is stored in an external cache file
on disk. Functions for manipulating big data objects read small parts of
the external cache file at a time.
A
bdVector
object is like a vector,
but with arbitrary length. Special cases include
bdCharacter
,
bdFactor
,
bdNumeric
, and
bdTimeDate
.
The
bdFrame
and
bdVector
objects are opaque: their
slots should not be accessed directly, but only through access functions
such as
names
.
New
bdFrame
and
bdFrame
objects are
created using functions such as
,
,
, and
,
or other big data functions.
The big data library introduces a new set of
"bd"
functions
designed to work efficiently on large data. These functions minimize the number of
passes made through the data. These are the functions to use for best
performance.
In addition, big data object methods are available for many commonly-used
S-PLUS functions. These methods allow you to use the same
functions on
bdFrame
and
bdVector
objects that you would use on
data.frame
and
vector
objects. These methods are typically more
flexible than the
"bd"
functions, but they can be
less efficient.
For best performance, it is important that you write code minimizing the number of passes through the data. A key tool for doing this is to use the function to perform multiple transformations during a single pass through the data, rather than applying a series of separate transformations during multiple passes through the data. See the help file for details.
You might find it useful to convert data or subsets of data from large data
objects to standard data objects, or vice versa. To change a big data object
to a standard data object, use the function
bd.coerce
and pass
the big data object as the argument. To change a standard data frame to a
bd.Frame
, use the function
as.bdFrame
, and pass
the standard data frame as the argument. For example,
as.bdFrame(fuel.frame)
converts
fuel.frame
from a standard data frame to a
bdFrame
,
regardless of its size. When you convert a big data object to a standard data frame,
make sure your computer has enough memory to store and manage the data frame.
Note that for standard functions such as
nrow
, you can access the
help topic by calling the function
help(nrow)
or by clicking a link
for
nrow
. To see the actual method used, type
getMethods("nrow")
or
selectMethod("nrow", "bdFrame")
.
For a more detailed list of the functions available in the Big Data library, see the Big Data Library User's Guide, Appendix A.
Subsetting Rows