This function requires the bigdata library section to be loaded.
bd.sample(data, n=NULL, percent=NULL, method="simple", stratify.column=NULL, equal.sizes=F, seed=NULL)
bdFrame
or
data.frame
.
n
is
NULL
), percentage of rows to be output (1-100).
simple random sampling.
method
is
"stratified"
, name or number of a stratification column. Must be a categorical column.
TRUE
, stratified sampling attempts to select the same number
of samples within each of the levels of the stratification column. If the number of rows requested within a level is greater than the number of rows that have that level, the rows with that level will be sampled with replacement.
FALSE
, attempts to select a number
of samples within each of the levels of the stratification column
that is proportional to the number of rows containing that stratification level.
These sample sizes may not be exactly correct, particularly when sampling from
stratification levels with only a few rows.
NULL
, uses a new random seed every time.
bdFrame
or
data.frame
of the same type as
x
.
Contains a sample of the rows within the input dataset.
## Take a simple random sample. bd.sample(fuel.frame, percent=10)
## Take a stratified sample, where each of the ## values of Type appears the same number of times. bd.sample(fuel.frame, n=24, method="stratified", stratify.column="Type",equal.sizes=T)