RCp is a robust version of Mallows' Cp based on weighted least squares
which allows the selection of subset models which fit the bulk of the data
in the presence of outliers or deviations from the assumption of normal
errors.
a matrix of explanatory variables. Each column of xfile is a variable, each
row is an observation.
yfile
a vector of the dependent variable with length equal to the number of rows
of xfile.
OPTIONAL ARGUMENTS:
k
the value of the bending constant for the Huber psi function involving the
rescaled residuals in the weight function. The default for this is
1.345. Other values allowed are 1.5, 2 and 100. Note that using k=100
will set the part of the weight function involving the residuals equal to
one.
b
the product of b and the average or median of f(x) is the bending constant
used for the weight component v1(x) = psi(f(x))/f(x). The default value of
b is 1.5.
MEDIAN
if TRUE, the median will be used to calculate the bending constant for v1,
and if FALSE the mean will be used. The default value is TRUE.
iter
the maximium number of iterations of weighted least squares to be
performed. The default value is 20.
nbest
the number of 'best' subsets to be found for each subset size.
conv
a numeric value. If the maxium difference of the weights on successive
iterations is less than this, the iterations will stop. The default value
for this is 0.001.
tol
the tolerance allowed for the Cholesky decomposition used to solve for the
matrix B if v1 = "Bx" is chosen. The default is 1e-06.
iter1
the maximum number of iterations performed to find the matrix B if
v1 = "Bx" is chosen. The default is 25.
P2
if TRUE, pivoting will be done when the Cholesky decomposition is found if
v1 = "Bx" is chosen.
s
a logical matrix with the same number of columns as xfile. Each row
of s represents a subset selection where a T indicates the inclusion of the
corresponding variable and a F its exclusion. The default is the matrix
representing all possible selections.
v1
specifies the function of x to be used in the weight function. The options
are v1 = "h" for f(x) = h, v1 ="huber" (the default) for f(x) = 1 (this
will just give Huber type weights), v1 ="sqrth" for f(x) = sqrt(h), v1 =
"h/(1-h)" for f(x) = h/(1-h) and v1 = "Bx" for f(x) = ||Bx|| (optimal B -
robust estimators). Here h is the leverage, that is the ith diagonal
element of the hat matrix (X %*% solve(t(X) %*% X) %*% t(X)).
DIGITS
gives the number of decimal places to be given in the output. The default
value is 2.
rescale
if TRUE the scale factor for the residuals is recalculated after each
iteration. The default is T.
VALUE:
a list with class "RCp" with the following components:
RCp
the robust Cp value for the nbest subset selections of each size.
size
the number of variables including the constant term in each selection.
label
a character vector, each element giving the names of the variables in
the subset.
Vp
the value of Vp for each selection.
Up
the value of Up for each selection.
res
a matrix containing the residuals from the final fit for each selection;
each row representing a submodel and each column an observation.
coef
a matrix containing the coefficients from the final fit for each
selection; each row representing a submodel and each column a variable.
w
a matrix containing the final weights for each selection; each row
representing a submodel and each column an observation.
v1
a matrix containing the component of the weight function involving the x's;
each row representing a submodel and each column an observation. Note that
when v1 = "huber" this will be just a matrix of 1's.
v2
a matrix containing the component of the weight function involving the
residuals; each row representing a submodel and each column an
observation.
method
the type of weight function used; either "mallows" or "huber".
which
a logical matrix with each row representing a selected submodel and each
column an explanatory variable.
REFERENCES:
Ronchetti, E. and Staudte, R.G. (1994). "A robust version of Mallows' Cp,"
Journal of the American Statistical Association, 89 550-559.
Sommer, S. and Staudte, R.G (1995). "Robust variable selection in
regression in the presence of outliers and leverage points" Australian
Journal of Statistics, 37 323-336.