Summary Statistics
The Summary Statistics dialog provides basic univariate summaries for continuous variables and counts for categorical variables. Summaries may be calculated within groups based on one or more grouping variables.
To generate summary statistics
Choose Statistics Data
Summaries
Summary
Statistics. The
dialog shown below appears.
Data page
In the Summary Statistics dialog, the Data page has the following options:
Data
Data Set
Select a data set from the dropdown list or type the name of a data set. You can also type into the Data Set edit field any expression that evaluates to a data set.
Variable
Select the columns of the data set to include in the analysis. To include all columns, select ALL. Making no selection has the same effect since ALL is the default value.
Summaries by Group
The variable names for the selected data set appear in the scrolling list.
Specify a number of unique values for numeric columns. Numeric variables having more distinct values than this number are binned. The default value is 10.
Number of Bins for Numeric Values
Specify the number of new data sets to be created when the Group Column is numeric and contains more unique values than the number specified in Maximum Unique Numeric Values. The default value is 6.
Results
Enter the name of the object in which to save the results of the analysis.
Summarize Categorical Variables
Includes summaries of the categorical variables (factors) in the data set. The corresponding summaries are the factor levels and a count of how many values in each level are in the factor column.
Print Results
Select this to print out the results of the analysis in the designated output window.
Statistics page
In the Summary Statistics dialog, the Statistics page has the following options:
Mean
Mean
Generate the mean value
for each numeric column of the data set.
Std. Error of Mean
Generate the mean standard error
of the mean
for each numeric column of
the data set, where
is the standard deviation.
Conf. Limits for Mean
Generate the confidence levels for
the mean
for each numeric column of the data set, where
is the function that returns the quantiles of the t-distribution,
is the desired confidence level, and
is the # of degrees of freedom in the t-distribution.
Conf. Level
Select the confidence level to be used for the confidence limit for the mean from the dropdown list.
Quantiles
Minimum
Generate the minimum value for each numeric column of the data set.
First Quartile
Generate the first quartile value for each numeric column of the data set.
Median
Generate the median value for each numeric column of the data set.
Third Quartile
Generate the third quartile value for each numeric column of the data set.
Maximum
Generate the maximum value for each numeric column of the data set.
Scale
Variance
Generate the variance estimate
for each numeric column of the data set.
Std. Deviation
Generate the standard deviation
value
for each numeric column of the data set. This value is calculated
using the n-1 or unbiased method.
Shape
Skewness
Calculate skewness (using Fisher's
G) for each numeric column of the data set. This value is caluclated
using fisher's G1 measure:
Kurtosis
Calculate kurtosis (using Fisher's
G2) for each numeric column of the data set: .
Other Statistics
Number of Rows
Generate the number of rows value
for each numeric column of the data set: .
Number of Missing Rows
Generate the number of missing values (NAs) in each numeric column of the data set.
Total Sum
Generate the sum of all numeric values in each column of the data set.
Related S-Plus language functions
min, median, mean, summary, var, max, print.table