bdVector
of the desired quantiles of the data.
quantile(x, probs = 0:4/4, na.rm = F, ...) quantile.default(x, probs = 0:4/4, na.rm = F, alpha = 1, rule = 1, weights = NULL, freq = NULL)
bdVector
of data.
Missing values are not allowed unless
na.rm=TRUE
.
bdVector
of desired probability levels.
Values must be between 0 and 1 inclusive.
The default produces a "five number summary":
the minimum, lower quartile, median, upper quartile, and maximum
of
x
.
rule
is 1,
NA
s are supplied for any such points.
If
rule
is 2, the extreme values of
x
are used.
If
rule
is 3, linear extrapolation is used.
This option is irrelevant if
alpha=1
.
x
, or
NULL
if no weights. Quantiles are calculated for the
weighted distribution with probabilities proportional to
weights
on the values of
x
.
x
, giving
frequencies. If supplied then results are equivalent to supplying
rep(x, freq)
instead of
x
. The effect is
similar to the
weights
argument, except that values are
actually repeated so that the quantiles returned may be exactly equal
to a repeated value of
x
rather than interpolated between
adjacent values.
bdVector
of empirical quantiles corresponding to the
probs
levels in the sorted
x
data.
The algorithm linearly interpolates between order statistics of
x
,
assuming that the
i
th order statistic is the
(i-alpha)/(n-1+2*alpha)
quantile if no weights are present,
where
n=length(x)
.
The algorithm uses partial sorting, hence is quickly able to find
a few quantiles even of large datasets.
approx((1:n - alpha) / (n + 1 - 2 * alpha), x, probs, rule=rule)
If
x
contains randomly-generated values from a
distribution, then
alpha=1
gives quantiles which are
biased (they tend to be too narrow),
alpha=1/3
gives
approximately median-unbiased estimates of the quantiles of the
distribution, and
alpha=0
matches the correct
probabilities for a new observation "X" from that distribution, i.e.
prob(X < quantile(x, p, alpha=0)) = p
(the relationship is exact if
p=k/(n+1)
for some integer
k
and the distribution is continuous, and approximate
otherwise).
If weights are present, then
alpha=.5
corresponds to
interpolating between the midpoints of segments of the step function
with step widths proportional to
weights
. For other
values of
alpha
the horizontal positions of those
midpoints are transformed linearly; for
alpha=1
the
horizontal positions of the two extreme midpoints are at 0 and 1.
If weights are present and there are ties in
x
, then the
corresponding weights are averged, so that results are independent
of the order of observations.
If both weights and frequencies are supplied, then
x
and
weights
are replicated using the frequencies. This may
use a lot of memory.
Hyndman, R. J. and Fan, Y (1996), "Sample Quantiles in Statistical Packages," The American Statistician, 50, 361-364.
quantile(car.miles) # five number summary quantile(testscores[,1], c(.33,.67)) # 33% and 67% quantiles of # data from testscores diff(quantile(testscores[,1], c(.25, .75))) # interquartile range # create function iqr iqr <- function (x) diff(quantile(x, c(.25, .75))) iqr(car.miles) # returns 23 set.seed(2); x <- runif(9) probs <- seq(0, 1, length=101) plot(probs, quantile(x, probs, alpha=1), type="l", ylim=c(-.14,1)) lines(probs, quantile(x, probs, alpha=.5), col=2) lines(probs, quantile(x, probs, alpha=0), col=3) lines(probs, quantile(x, probs, alpha=0, rule=3), col=3, lty=3) # weighted distributions plot(probs, quantile(sort(x), probs, weights=1:9, alpha=.5), type="l", ylim=0:1) w <- 1:9 / sum(1:9) points(cumsum(w)-w/2, sort(x)) lines(cumsum(w), sort(x), type="S", col=2) lines(probs, quantile(sort(x), probs, weights=1:9, alpha=1), col=3) # Frequencies quantile(rep(x, 1:9)) # For reference quantile(x, freq = 1:9) # This should match the previous