Summary Statistics

The Summary Statistics dialog provides basic univariate summaries for continuous variables and counts for categorical variables. Summaries may be calculated within groups based on one or more grouping variables.

To generate summary statistics

Choose Statistics __image\arrow5.gif Data Summaries __image\arrow5.gif Summary Statistics. The dialog shown below appears.

Data page

__image\sumstat1.gif

In the Summary Statistics dialog, the Data page has the following options:

Data

Data Set

Select a data set from the dropdown list or type the name of a data set. You can also type into the Data Set edit field any expression that evaluates to a data set.

Variable

Select the columns of the data set to include in the analysis. To include all columns, select ALL. Making no selection has the same effect since ALL is the default value.

Summaries by Group

Group Variables

The variable names for the selected data set appear in the scrolling list.

Maximum Unique Numeric Values

Specify a number of unique values for numeric columns. Numeric variables having more distinct values than this number are binned. The default value is 10.

Number of Bins for Numeric Values

Specify the number of new data sets to be created when the Group Column is numeric and contains more unique values than the number specified in Maximum Unique Numeric Values. The default value is 6.

Results

Save As

Enter the name of the object in which to save the results of the analysis.

Summarize Categorical Variables

Includes summaries of the categorical variables (factors) in the data set. The corresponding summaries are the factor levels and a count of how many values in each level are in the factor column.

Print Results

Select this to print out the results of the analysis in the designated output window.

Statistics page

__image\sumstat2.gif

In the Summary Statistics dialog, the Statistics page has the following options:

Mean

Mean

Generate the mean value for each numeric column of the data set.

Std. Error of Mean

Generate the mean standard error of the mean

 for each numeric column of the data set, where __image\S.gif is the standard deviation.

Conf. Limits for Mean

Generate the confidence levels for the mean __image\conf_limits_for_mean.gif for each numeric column of the data set, where __image\qt.gif is the function that returns the quantiles of the t-distribution, __image\alpha.gif is the desired confidence level, and __image\n_minus_one.gif is the # of degrees of freedom in the t-distribution.

Conf. Level

Select the confidence level to be used for the confidence limit for the mean from the dropdown list.

Quantiles

Minimum

Generate the minimum value for each numeric column of the data set.

First Quartile

Generate the first quartile value for each numeric column of the data set.

Median

Generate the median value for each numeric column of the data set.

Third Quartile

Generate the third quartile value for each numeric column of the data set.

Maximum

Generate the maximum value for each numeric column of the data set.

Scale

Variance

Generate the variance estimate __image\variance.gif for each numeric column of the data set.

Std. Deviation

Generate the standard deviation value __image\std_deviation.gif for each numeric column of the data set. This value is calculated using the n-1 or unbiased method.

Shape

Skewness

Calculate skewness (using Fisher's G) for each numeric column of the data set. This value is caluclated using fisher's G1 measure: __image\number_of_rows.gif __image\skewness2.gif __image\skewness3.gif

Kurtosis

Calculate kurtosis (using Fisher's G2) for each numeric column of the data set: __image\kurtosis.gif.

Other Statistics

Number of Rows

Generate the number of rows value for each numeric column of the data set: __image\number_of_rows.gif.

Number of Missing Rows

Generate the number of missing values (NAs) in each numeric column of the data set.

Total Sum

Generate the sum of all numeric values in each column of the data set.

Related S-Plus language functions

min, median, mean, summary, var, max, print.table