bwplot
for the Trellis graphics version of boxplots.
boxplot(x, ..., range=1.0, width=<<see below>>, varwidth=F, names=<<see below>>, plot=T, notch=F, style.bxp=list(), boxwex=.5, boxcol=3, medchar=F, medpch=NA, medline=T, medlwd=5, medcol=0, confint=F, confcol=2, confangle=45, confdensity=25, confnotch=F, whisklty=2, staplelty=1, staplewex=1, staplehex=1, outchar=F, outpch=NA, outline=T, outwex=1)
split
).
Missing values (NA) are allowed.
name=value
form, and the names cannot be abbreviated.
varwidth
argument. The default is that all widths are the same.
TRUE
, box widths are proportional to
the square root of the number of observations for the box.
This is ignored if
width
is specified.
names
attributes of the first list of data.
TRUE
, the boxplot is produced;
otherwise, the calculated summaries of the arguments are
invisibly returned.
TRUE
, notched boxes are drawn.
Notches show approximate 95% confidence limits for the median.
(NOTE: The
notch
parameter is provided
primarily for backward compatibility.
See the
confint
,
confnotch
,
confcol
,
confangle
and
confdensity
parameters below for more versatile control of the displaying of confidence
intervals.)
bxp.
"
to get the name of a dataset which is a list.
Component names of this list should match the names of the parameters below;
the component values serve as the defaults for the corresponding parameters
(i.e., other arguments supplied to the function
override the
style.bxp
component values).
Standard
style.bxp
option values
include
"splus"
(new S-PLUS style),
"att"
(new AT&T style)
and
"old"
.
0.5
,
but the
"att"
and
"old"
styles
set this to
1
.
col
is not specified,
boxcol
will be used (see below).
The
col
argument can be specified
as a single color value or a vector of color values.
For example, col="blue" fills the box with the specified color,
and col=1:3 can be used to fill multiple boxes using a vector of three colors.
0
can be used to designate
filling with the background color.
A specification of
boxcol=-1
is used to
designate "no fill" at all.
The default is to fill with color
3
,
but the
"att"
and
"old"
styles set this for no filling.
This argument supports using colors by name as well as number.
TRUE
if a
medpch
parameter is supplied.
The default is
FALSE
,
but the
"att"
style implicitly sets
the default to
TRUE
(by specifying
medpch
).
medchar
parameter to be
TRUE
.
The special value,
NA
,
can be used to indicate the current plotting character
(
par("pch")
).
The default is
NA
,
but the
"att"
style set the default
to
16
(filled octagon).
TRUE
if the
medlwd
parameter is supplied.
The default is
TRUE
,
but the
"att"
style sets it
to
FALSE
.
medline
parameter to
TRUE
.
The special value,
NA
, is used to
indicate the current line width (
par("lwd")
).
The default is
5
,
and the
"old"
and
"att"
styles set the it to
5
.
NA
,
indicates the current plotting color (
par("col")
).
The default is
0
(the background color),
but the
"old"
and
"att"
styles set the
default to
NA
.
This argument supports using colors by name as well as number.
TRUE
, confidence intervals are shown.
Notches show approximate 95% confidence limits for the median.
How the confidence intervals are displayed is determined by the
confnotch
,
confcol
,
confangle
and
confdensity
parameters.
The 95% level for the confidence interval cannot be changed.
TRUE
, confidence intervals are notched.
The default is
FALSE
,
but the
"old"
and
"att"
styles set this parameter
to
TRUE
.
2
,
but the
"old"
and
"att"
styles set it to -1 (no filling).
This argument supports using colors by name as well as number.
confdensity
is supplied
and
confangle
is not,
confangle
defaults
to
45
.
confangle
is
supplied and
confdensity
is not,
confdensity
defaults to
25
.
NA
,
indicates the current line type (
par("lty")
).
The default is
2
(dotted line),
but the
"old"
and
"att"
styles set it
to
4
(dashed line).
NA
,
indicates the current line type (
par("lty")
).
The default is
1
(solid line),
but the
"att"
style sets the default
to
4
(dashed line).
1
,
but the
"old"
style sets the default
to
0.125
.
1
but the
"old"
style sets the default
to
0
.
TRUE
if an
outpch
parameter is supplied.
The default is
FALSE
,
but the
"old"
style sets it
to
TRUE
,
and the
"att"
style implicitly sets it
to
TRUE
(by setting
outpch
).
outchar
parameter to be
TRUE
.
The special value,
NA
,
indicates the current plotting character
(
par("pch")
).
The default is
NA
,
but the
"att"
style sets the default
to
1
(an octagon).
TRUE
if the
outwex
parameter is supplied.
The default is
TRUE
,
but the
"old"
and
"att"
styles
set it to
FALSE
.
1
.
Graphical parameters may also be supplied as arguments to
this function (see
).
In addition, the high-level graphics arguments described under
and the arguments to
may be supplied to this function.
However,
boxplot
always uses linear axes:
the
log
and
[xy]axt
arguments are ignored.
You can apply any transformation to your data before
calling
boxplot
with
axes=F
and use the
axis
function to add a axis
labeled to reflect the transformation.
plot
is
TRUE
,
the function
bxp
is invoked with these components,
plus optional
width
,
varwidth
,
notch
,
and
style
(and associated parameters),
to produce the plot.
Note that
bxp
returns a vector
of x-coordinates of box centers.
if
plot
is
FALSE
,
an invisible list with the components listed below.
These statistics are calculated excluding
NA
and
Inf
values.
5
by the number of boxes)
giving the upper extreme (excluding outliers),
upper hinge, median, lower hinge, and lower extreme (excluding outliers)
for each box.
By default, anything farther than 1.5 times the (upper hinge - lower hinge)
is considered an outlier.
See the Details section below
and the
range
argument above.
NA
or
Inf
.
2
by the number of boxes) giving
approximate 95% confidence limits for the median.
The limits are functions of the quartiles, so a few outliers have
little effect on them.
out
belongs.
names
above).
plot
is
TRUE
,
a plot is created on the current graphics device.
Outlier lines and points are always drawn in color 1 of the palette.
A boxplot plot displays the center half of the data (the box) with
the median marked.
The top and bottom of the box are defined by the hinges (see below).
By default, whiskers are drawn
to the nearest value not beyond a standard span from the hinges.
Points beyond the end of the whiskers (outliers) are drawn individually.
Giving
range=0
forces whiskers
to the full data range.
Any positive value of
range
multiplies
the standard span by this amount.
The standard span is 1.5*(upper hinge - lower hinge).
If neither
width
nor
varwidth
is supplied
and there are
n
vectors,
then the x-coordinate of the center of the first box is
100(2n+1)/(3n(n+1))
,
the spacing between centers is
100(3n+2)/(3n(n+1))
,
and the width of a box is
100/(3n) * (boxwex/.5)
.
boxwex
does not change the
centers of boxes, only their width.
Horizontal spacing and box widths
are an implementation detail that is subject to change.
Do not give argument names for
...
arguments,
e.g. do not use
boxplot(a=a,b=b)
.
Argument names are not used to label the plot.
The function will fail if you give argument names
and omit an
x
argument.
To specify plot labels, use the
names
argument or give a named list, e.g.
boxplot(list(a=a,b=b))
.
However, the name for a list with one component is ignored (this issue is
subject to change).
The
boxplot
function uses hinges,
as original defined by Tukey,
for the lower and upper limits of the box.
The hinges are the median value of each half of the data where the
overall median defines the halves.
Hinges are similar to quartiles.
The main difference between the two is that the depth
(distance from the lower and upper limits of the data)
of the hinges is calculated from the depth of the median.
Hinges often lie slightly closer to the median than do the quartiles.
The difference between hinges and quartiles is usually quite small.
If you are interested in quantiles, you should use the
quantile
or
summary.default
functions instead of the
stats
component returned by
boxplot.
Boxplots have proven to be quite a good exploratory tool, especially when several boxplots are placed side by side for comparison. The most striking visual feature is the box which shows the limits of the middle half of the data (the line inside the box represents the median). Extreme points are also highlighted. Boxplots show not only the location and spread of data but indicate skewness, as well.
Hoaglin, D. C., Mosteller, F., and Tukey, J. W., editors (1983). Understanding Robust and Exploratory Data Analysis. New York: Wiley.
McGill, R., Tukey, J. W., and Larsen, W. A. (1978). Variations of box plots. The American Statistician, 32, 12-16.
Tukey, J. W. (1990). Data-based graphics: visual display in the decades to come. Statistical Science 5, 327-339.
Velleman, P. F. and Hoaglin, D. C. (1981). Applications, Basics, and Computing of Exploratory Data Analysis. Boston: Duxbury.
boxplot(lottery.payoff, lottery2.payoff, lottery3.payoff) boxplot(split(fuel.frame$Fuel, fuel.frame$Type), style.bxp="att") boxplot( split(lottery.payoff, lottery.number%/%100), main="NJ Pick-it Lottery (5/22/75-3/16/76)", sub="Leading Digit of Winning Numbers", ylab="Payoff")