Calculate Univariate Statistics

DESCRIPTION:

This function calculates a number of univariate statistics for a dataset. The output can be grouped by a column.

This function requires the bigdata library section to be loaded.

USAGE:

bd.univariate(data, columns=NULL, by.columns=NULL, all=T, n=all,
              nmiss=all, mean=all, median=all, mode=all, min=all,
              max=all, range=all, var=all, stdev=all, stderr=all,
              cv=all, skewness=all, kurtosis=all, q1=all, q3=all,
              iqr=all, quantiles=all, probs=c(99, 95, 90, 10, 5, 1),
              sum=all, uss=all, css=all, t=all, pr.t=all, msign=all,
              pr.msign=all, sign.rank=all, pr.sign.rank=all, mu=NULL)

REQUIRED ARGUMENTS:

data
data.frame or bdFrame obect

OPTIONAL ARGUMENTS:

columns
identifies which columns to calculate univariate stats for. defaults to all columns
by.columns
identifies which categorical columns to group statistics on. defaults to no columns
all
Default value for statistic arguments. By default all statistics are computed. To selectively exclude statistics, use all=T and set the arguments for the statistics to F. To selectively include statistics, use all=F and set the arguments for the statistics to T.
n
if TRUE, the count is included in the output.
nmiss
if TRUE, the missing value count is included in the output.
mean
if TRUE, the mean is included in the output.
median
if TRUE, the median is included in the output.
mode
if TRUE, the mode is included in the output.
min
if TRUE, the minimum is included in the output.
max
if TRUE, the maximum is included in the output.
range
if TRUE, the range is included in the output.
var
if TRUE, the variance is included in the output.
stdev
if TRUE, the standard deviation is included in the output.
stderr
if TRUE, the standard error is included in the output.
cv
if TRUE, the coefficient of variation is included in the output.
skewness
if TRUE, the skewness is included in the output.
kurtosis
if TRUE, the kurtosis is included in the output.
q1
if TRUE, the first quartile is included in the output.
q3
if TRUE, the third quartile is included in the output.
iqr
if TRUE, the range between 1st and 3rd quartiles is included in the output.
quantiles
if TRUE, the quantiles corresponding to the probs values are included in the output.
probs
specify the probability values for which quantiles are desired. Used if quantiles is TRUE.
sum
if TRUE, the sum is included in the output.
uss
if TRUE, the uncorrected sum of squares is included in the output.
css
if TRUE, the corrected sum of squares is included in the output.
t
if TRUE, the Student's t is included in the output.
pr.t
if TRUE, the prob > |t| is included in the output.
msign
if TRUE, the sign statistic is included in the output.
pr.msign
if TRUE, the prob > |msign| is included in the output.
sign.rank
if TRUE, the signed rank statistic is included in the output.
pr.sign.rank
if TRUE, the prob > |sign.rank| is included in the output.
mu
numeric value used in Student's T, Signed Rank, and Sign Statistic. Defaults to column mean.

VALUE:

if a single set of univariate statistics is returned, a named list is returned. If several sets are returned (for example statistics for several columns or a column grouped on by.columns), a matrix of named lists is returned.

SEE ALSO:

EXAMPLES:

# Include the 33rd and 66th quantiles in the univariate statistics for Fuel
bd.univariate(data=fuel.frame, columns="Fuel", probs=c(33,66))

# Include the 33rd and 66th quantiles in the univariate statistics
#   for all columns grouped by Type
bd.univariate(data=fuel.frame, by.columns="Type", probs=c(33,66))