Calculate Univariate Statistics
DESCRIPTION:
This function calculates a number of univariate statistics for a dataset.
The output can be grouped by a column.
This function requires the bigdata library section to be loaded.
USAGE:
bd.univariate(data, columns=NULL, by.columns=NULL, all=T, n=all,
nmiss=all, mean=all, median=all, mode=all, min=all,
max=all, range=all, var=all, stdev=all, stderr=all,
cv=all, skewness=all, kurtosis=all, q1=all, q3=all,
iqr=all, quantiles=all, probs=c(99, 95, 90, 10, 5, 1),
sum=all, uss=all, css=all, t=all, pr.t=all, msign=all,
pr.msign=all, sign.rank=all, pr.sign.rank=all, mu=NULL)
REQUIRED ARGUMENTS:
- data
-
data.frame or bdFrame obect
OPTIONAL ARGUMENTS:
- columns
-
identifies which columns to calculate univariate stats for. defaults to all columns
- by.columns
-
identifies which categorical columns to group statistics on. defaults to no columns
- all
-
Default value for statistic arguments. By default all statistics are computed.
To selectively exclude statistics, use all=T and set the arguments for the statistics to F.
To selectively include statistics, use all=F and set the arguments for the statistics to T.
- n
-
if TRUE, the count is included in the output.
- nmiss
-
if TRUE, the missing value count is included in the output.
- mean
-
if TRUE, the mean is included in the output.
- median
-
if TRUE, the median is included in the output.
- mode
-
if TRUE, the mode is included in the output.
- min
-
if TRUE, the minimum is included in the output.
- max
-
if TRUE, the maximum is included in the output.
- range
-
if TRUE, the range is included in the output.
- var
-
if TRUE, the variance is included in the output.
- stdev
-
if TRUE, the standard deviation is included in the output.
- stderr
-
if TRUE, the standard error is included in the output.
- cv
-
if TRUE, the coefficient of variation is included in the output.
- skewness
-
if TRUE, the skewness is included in the output.
- kurtosis
-
if TRUE, the kurtosis is included in the output.
- q1
-
if TRUE, the first quartile is included in the output.
- q3
-
if TRUE, the third quartile is included in the output.
- iqr
-
if TRUE, the range between 1st and 3rd quartiles is included in the output.
- quantiles
-
if TRUE, the quantiles corresponding to the
probs
values are included in the output.
- probs
-
specify the probability values for which quantiles are desired. Used if
quantiles
is TRUE.
- sum
-
if TRUE, the sum is included in the output.
- uss
-
if TRUE, the uncorrected sum of squares is included in the output.
- css
-
if TRUE, the corrected sum of squares is included in the output.
- t
-
if TRUE, the Student's t is included in the output.
- pr.t
-
if TRUE, the prob > |t| is included in the output.
- msign
-
if TRUE, the sign statistic is included in the output.
- pr.msign
-
if TRUE, the prob > |msign| is included in the output.
- sign.rank
-
if TRUE, the signed rank statistic is included in the output.
- pr.sign.rank
-
if TRUE, the prob > |sign.rank| is included in the output.
- mu
-
numeric value used in Student's T, Signed Rank, and Sign Statistic. Defaults to column mean.
VALUE:
if a single set of univariate statistics is returned, a named list is returned.
If several sets are returned
(for example statistics for several columns or a column grouped on by.columns),
a matrix of named lists is returned.
SEE ALSO:
EXAMPLES:
# Include the 33rd and 66th quantiles in the univariate statistics for Fuel
bd.univariate(data=fuel.frame, columns="Fuel", probs=c(33,66))
# Include the 33rd and 66th quantiles in the univariate statistics
# for all columns grouped by Type
bd.univariate(data=fuel.frame, by.columns="Type", probs=c(33,66))