Boxplots

DESCRIPTION:

Produces side by side boxplots from a number of vectors. The boxplots can be made to display the variability of the median, and can have variable widths to represent differences in sample size. Boxplots are sometimes called box-and-whisker plots. See bwplot for the Trellis graphics version of boxplots.

USAGE:

boxplot(x, ..., range=1.0, width=<<see below>>, varwidth=F,  
      names=<<see below>>, plot=T, notch=F, style.bxp=list(),  
      boxwex=.5, boxcol=3, medchar=F, medpch=NA, medline=T, 
      medlwd=5, medcol=0, confint=F, confcol=2, confangle=45, 
      confdensity=25, confnotch=F, whisklty=2, staplelty=1, 
      staplewex=1, staplehex=1, outchar=F, outpch=NA, 
      outline=T, outwex=1) 

REQUIRED ARGUMENTS:

x, ...
vectors or lists containing numeric components (e.g., the output of split). Missing values (NA) are allowed.

OPTIONAL ARGUMENTS:

The following arguments must be specified in the name=value form, and the names cannot be abbreviated.
range
controls the strategy for the whiskers and the detached points beyond the whiskers. See the Details section below.
width
vector of relative box widths. See also the varwidth argument. The default is that all widths are the same.
varwidth
if TRUE, box widths are proportional to the square root of the number of observations for the box. This is ignored if width is specified.
names
vector of names for the groups. If omitted, names used in labeling the plot are taken from the names attributes of the first list of data.
plot
if TRUE, the boxplot is produced; otherwise, the calculated summaries of the arguments are invisibly returned.
notch
if TRUE, notched boxes are drawn. Notches show approximate 95% confidence limits for the median. (NOTE: The notch parameter is provided primarily for backward compatibility. See the confint, confnotch, confcol, confangle and confdensity parameters below for more versatile control of the displaying of confidence intervals.)
style.bxp
character string or list indicating the style of the boxplot. If specified as a character string, the string is appended to " bxp." to get the name of a dataset which is a list. Component names of this list should match the names of the parameters below; the component values serve as the defaults for the corresponding parameters (i.e., other arguments supplied to the function override the style.bxp component values). Standard style.bxp option values include "splus" (new S-PLUS style), "att" (new AT&T style) and "old".
boxwex
Box width expansion. The width of the boxes, along with the width of the staples (whisker end caps) and outliers (if drawn as lines), are proportional to this parameter. The default is 0.5, but the "att" and "old" styles set this to 1.
border
logical flag that specifies whether the border of the boxplot will be drawn. Border can also be specified as a single color value or a vector of color values. For example, border="blue" draws the border in the specified color, and border=1:3 can be used to draw borders of multiple boxplots using a vector of three colors.
col
fill color used for boxes. If col is not specified, boxcol will be used (see below). The col argument can be specified as a single color value or a vector of color values. For example, col="blue" fills the box with the specified color, and col=1:3 can be used to fill multiple boxes using a vector of three colors.
boxcol
filled box color(s). If one number is supplied, the box is filled with the indicated color. If a vector of two non-negative numbers is supplied, the area below the median is filled with the first color and the area above the median is filled with the second color. A color of 0 can be used to designate filling with the background color. A specification of boxcol=-1 is used to designate "no fill" at all. The default is to fill with color 3, but the "att" and "old" styles set this for no filling. This argument supports using colors by name as well as number.
medchar
logical flag indicating whether to show the median as a plotted character. This parameter is implicitly set to TRUE if a medpch parameter is supplied. The default is FALSE, but the "att" style implicitly sets the default to TRUE (by specifying medpch).
medpch
median plotting character. Setting this parameter implicitly sets the medchar parameter to be TRUE. The special value, NA, can be used to indicate the current plotting character ( par("pch")). The default is NA, but the "att" style set the default to 16 (filled octagon).
medline
logical flag indicating whether to show the median as a line across the box. This parameter is implicitly set to TRUE if the medlwd parameter is supplied. The default is TRUE, but the "att" style sets it to FALSE.
medlwd
median line width. Setting this parameter implicitly sets the medline parameter to TRUE. The special value, NA, is used to indicate the current line width ( par("lwd")). The default is 5, and the "old" and "att" styles set the it to 5.
medcol
the color of the median line or character. The special value, NA, indicates the current plotting color ( par("col")). The default is 0 (the background color), but the "old" and "att" styles set the default to NA. This argument supports using colors by name as well as number.
confint
if TRUE, confidence intervals are shown. Notches show approximate 95% confidence limits for the median. How the confidence intervals are displayed is determined by the confnotch, confcol, confangle and confdensity parameters. The 95% level for the confidence interval cannot be changed.
confnotch
confidence interval notch logical flag. If TRUE, confidence intervals are notched. The default is FALSE, but the "old" and "att" styles set this parameter to TRUE.
confcol
confidence interval color. If supplied, confidence intervals are filled with the indicated color. The default is 2, but the "old" and "att" styles set it to -1 (no filling). This argument supports using colors by name as well as number.
confangle
confidence interval hatching angle. If supplied, confidence intervals are hatched at the indicated angle, in degrees. If confdensity is supplied and confangle is not, confangle defaults to 45.
confdensity
confidence interval hatching density. If supplied, confidence intervals are hatched at the indicated density, in lines per inch. If confangle is supplied and confdensity is not, confdensity defaults to 25.
whisklty
whisker line type. The special value, NA, indicates the current line type ( par("lty")). The default is 2 (dotted line), but the "old" and "att" styles set it to 4 (dashed line).
staplelty
staple (whisker end cap) line type. The special value, NA, indicates the current line type ( par("lty")). The default is 1 (solid line), but the "att" style sets the default to 4 (dashed line).
staplewex
staple width expansion. Proportional to the box width. The default is 1, but the "old" style sets the default to 0.125.
staplehex
staple height expansion. Proportional to a standard height of about 1/100th the height of the plotting area. The default is 1 but the "old" style sets the default to 0.
outchar
logical flag indicating whether to show the outliers as a plotted characters. This parameter is implicitly set to TRUE if an outpch parameter is supplied. The default is FALSE, but the "old" style sets it to TRUE, and the "att" style implicitly sets it to TRUE (by setting outpch).
outpch
outlier plotting character. Setting this parameter implicitly sets the outchar parameter to be TRUE. The special value, NA, indicates the current plotting character ( par("pch")). The default is NA, but the "att" style sets the default to 1 (an octagon).
outline
logical flag indicating whether to show the outliers as horizontal lines. This parameter is implicitly set to TRUE if the outwex parameter is supplied. The default is TRUE, but the "old" and "att" styles set it to FALSE.
outwex
outlier line width expansion, proportional to the box width. The default is 1.

Graphical parameters may also be supplied as arguments to this function (see ). In addition, the high-level graphics arguments described under and the arguments to may be supplied to this function. However, boxplot always uses linear axes: the log and [xy]axt arguments are ignored. You can apply any transformation to your data before calling boxplot with axes=F and use the axis function to add a axis labeled to reflect the transformation.

VALUE:

if plot is TRUE, the function bxp is invoked with these components, plus optional width, varwidth , notch, and style (and associated parameters), to produce the plot. Note that bxp returns a vector of x-coordinates of box centers.

if plot is FALSE, an invisible list with the components listed below. These statistics are calculated excluding NA and Inf values.

stats
matrix (of size 5 by the number of boxes) giving the upper extreme (excluding outliers), upper hinge, median, lower hinge, and lower extreme (excluding outliers) for each box. By default, anything farther than 1.5 times the (upper hinge - lower hinge) is considered an outlier. See the Details section below and the range argument above.
n
the number of observations in each group. This is zero if all values are NA or Inf.
conf
matrix (of size 2 by the number of boxes) giving approximate 95% confidence limits for the median. The limits are functions of the quartiles, so a few outliers have little effect on them.
out
optional vector of outlying points (outliers). See the Details section below.
group
vector giving the box to which each point in out belongs.
names
names for each box (see argument names above).

SIDE EFFECTS:

If plot is TRUE, a plot is created on the current graphics device.

Outlier lines and points are always drawn in color 1 of the palette.

DETAILS:

A boxplot plot displays the center half of the data (the box) with the median marked. The top and bottom of the box are defined by the hinges (see below). By default, whiskers are drawn to the nearest value not beyond a standard span from the hinges. Points beyond the end of the whiskers (outliers) are drawn individually. Giving range=0 forces whiskers to the full data range. Any positive value of range multiplies the standard span by this amount. The standard span is 1.5*(upper hinge - lower hinge).

If neither width nor varwidth is supplied and there are n vectors, then the x-coordinate of the center of the first box is 100(2n+1)/(3n(n+1)) , the spacing between centers is 100(3n+2)/(3n(n+1)) , and the width of a box is 100/(3n) * (boxwex/.5) . boxwex does not change the centers of boxes, only their width. Horizontal spacing and box widths are an implementation detail that is subject to change.

Do not give argument names for ... arguments, e.g. do not use boxplot(a=a,b=b) . Argument names are not used to label the plot. The function will fail if you give argument names and omit an x argument.

To specify plot labels, use the names argument or give a named list, e.g. boxplot(list(a=a,b=b)) . However, the name for a list with one component is ignored (this issue is subject to change).

The boxplot function uses hinges, as original defined by Tukey, for the lower and upper limits of the box. The hinges are the median value of each half of the data where the overall median defines the halves. Hinges are similar to quartiles. The main difference between the two is that the depth (distance from the lower and upper limits of the data) of the hinges is calculated from the depth of the median. Hinges often lie slightly closer to the median than do the quartiles. The difference between hinges and quartiles is usually quite small. If you are interested in quantiles, you should use the quantile or summary.default functions instead of the stats component returned by boxplot.

BACKGROUND:

Boxplots have proven to be quite a good exploratory tool, especially when several boxplots are placed side by side for comparison. The most striking visual feature is the box which shows the limits of the middle half of the data (the line inside the box represents the median). Extreme points are also highlighted. Boxplots show not only the location and spread of data but indicate skewness, as well.

REFERENCES:

Hoaglin, D. C., Mosteller, F., and Tukey, J. W., editors (1983). Understanding Robust and Exploratory Data Analysis. New York: Wiley.

McGill, R., Tukey, J. W., and Larsen, W. A. (1978). Variations of box plots. The American Statistician, 32, 12-16.

Tukey, J. W. (1990). Data-based graphics: visual display in the decades to come. Statistical Science 5, 327-339.

Velleman, P. F. and Hoaglin, D. C. (1981). Applications, Basics, and Computing of Exploratory Data Analysis. Boston: Duxbury.

SEE ALSO:

, , , , , , , , .

EXAMPLES:

boxplot(lottery.payoff, lottery2.payoff, lottery3.payoff) 
boxplot(split(fuel.frame$Fuel, fuel.frame$Type), style.bxp="att") 
boxplot( 
      split(lottery.payoff, lottery.number%/%100), 
      main="NJ Pick-it Lottery (5/22/75-3/16/76)", 
      sub="Leading Digit of Winning Numbers", 
      ylab="Payoff")