Sample Rows

DESCRIPTION:

Sample rows from a dataset, using one of several methods.

This function requires the bigdata library section to be loaded.

USAGE:

bd.sample(data, n=NULL,
           percent=NULL, method="simple",
           stratify.column=NULL, equal.sizes=F,
           seed=NULL)

REQUIRED ARGUMENTS:

data
input data set: a bdFrame or data.frame.
n
if given, number of rows to be output.
percent
if given (and n is NULL), percentage of rows to be output (1-100).

OPTIONAL ARGUMENTS:

method
sampling method, one of
"simple"

simple random sampling.

"everyNRows"
sample every N rows.
"firstNRows"
sample first N rows.
"stratified"
stratified sampling.

stratify.column
if method is "stratified", name or number of a stratification column. Must be a categorical column.
equal.sizes
if TRUE, stratified sampling attempts to select the same number of samples within each of the levels of the stratification column. If the number of rows requested within a level is greater than the number of rows that have that level, the rows with that level will be sampled with replacement.
If FALSE, attempts to select a number of samples within each of the levels of the stratification column that is proportional to the number of rows containing that stratification level. These sample sizes may not be exactly correct, particularly when sampling from stratification levels with only a few rows.
seed
if NULL, uses a new random seed every time.
If an integer, it uses this for the seed.
The default value sets the seed based on the S-PLUS random seed.

VALUE:

A bdFrame or data.frame of the same type as x. Contains a sample of the rows within the input dataset.

DETAILS:

EXAMPLES:

## Take a simple random sample.
bd.sample(fuel.frame, percent=10)
## Take a stratified sample, where each of the
## values of Type appears the same number of times.
bd.sample(fuel.frame, n=24, method="stratified",
           stratify.column="Type",equal.sizes=T)