Robust Estimates of Scale

DESCRIPTION:

Returns a robust scale estimate of the data. By default, the median is taken as the center of the data and the estimate is scaled to be consistent with the standard deviation of the Gaussian model.

USAGE:

mad(y, center=median(y), constant=1.4826, na.rm=F, low=F) 
scale.tau(y, center=median(y), weights=<<see below>>, 
  init.scale=<<see below>>, tuning=1.95, na.rm=F) 
scale.a(y, center=median(y), weights=<<see below>>, 
  init.scale=<<see below>>, tuning=3.85, na.rm=F) 

REQUIRED ARGUMENTS:

y
vector of numeric data, or bdNumeric. Missing values (NA) are allowed.

OPTIONAL ARGUMENTS:

center
location parameter to be subtracted from each element of y before computing the scale estimate.
weights
vector or bdVector the same length as y of observation weights. The default is to give equal weight to all observations.
init.scale
value used as the initial scale estimate. The default is to use the Gaussian consistent MAD with the low argument equal to TRUE.
constant
number that multiplies the median of the absolute values. The default value makes the estimate consistent with the standard deviation of the Gaussian model.
na.rm
logical flag; indicates whether missing values should be removed before computations.
low
logical flag; if TRUE, the low median is used; if FALSE, the central median is used. There is no difference for an odd number of data points.
tuning
tuning parameter for the scale.a and scale.tau estimates. Larger numbers make the estimates more efficient at the Gaussian distribution, but susceptible to larger bias.

VALUE:

number that is a robust estimate of scale. The estimate is consistent for the standard deviation of Gaussian data. The mad function returns constant * median(abs(y - center)); scale.tau returns a Huber tau-estimate of scale, and scale.a returns a bisquare A-estimate of scale. Both of the latter are 80 percent efficient with the default tuning parameters. The MAD is about 36% efficient.

DETAILS:

If na.rm is FALSE, then missing values in the data cause the final result to be NA. Missing values are removed before computations are performed when na.rm is TRUE.

The MAD scale estimate has a 50% breakdown point. With "contaminated" data, the MAD generally has small bias when compared to other scale estimators. Tau-estimates and A-estimates also have 50% breakdown, but are more efficient for Gaussian data. The A-estimate that scale.a computes is redescending, so it is inappropriate if you require a scale estimate that always increases as the size of a data point increases. However, the A-estimate is very good if all contamination is far from the "good" data.

Burns and Martin (1992) compares tau-estimates and A-estimates. A-estimates are also discussed in Hoaglin, Mosteller and Tukey (1983). Code for another class of scale estimate can be found in Croux and Rousseeuw (1992).

REFERENCES:

Burns, P.J. and Martin, R.D. (1992). One-sample robust scale estimation in contamination models (submitted).

Croux, C. and Rousseeuw, P.J. (1992). Time-efficient algorithms for two highly robust estimators of scale. COMPSTAT, Proceedings of the 10th Symposium on Computational Statistics. Vienna: Physica, pp. 1935-1951.

Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J. and Stahel, W.A. (1986). Robust Statistics: The Approach Based on Influence Functions. New York: Wiley.

Hoaglin, D.C., Mosteller, F. and Tukey, J.W., editors (1983). Understanding Robust and Exploratory Data Analysis. New York: Wiley.

SEE ALSO:

, .

EXAMPLES:

mad(rnorm(200))     # approximately equal to 1
mad(corn.yield, constant = 1) 
scale.a(lottery.payoff)
# Tau-estimate of scale using a robust 
# M-estimate for the center of the data.
scale.tau(lottery.payoff, center = location.m(lottery.payoff))