Projection Pursuit Regression

DESCRIPTION:

Computes an exploratory nonlinear regression method that models y as a sum of nonparametric functions of projections of the x variables.

USAGE:

ppreg(x, y, min.term, max.term=min.term, wt=rep(1, nrow(x)),  
      rwt=rep(1, ncol(y)), xpred=NULL, optlevel=2, bass=0, span="cv") 

REQUIRED ARGUMENTS:

x
matrix of explanatory variables. Rows represent observations, and columns represent variables. Missing values are not accepted. The ppreg function is not very useful if x contains only one column.
y
vector or matrix of response variables. Rows represent observations, and columns represent variables. Missing values are not accepted.
min.term
minimum number of terms to include in the model; ppreg will return complete results only for this minimum number of terms.

OPTIONAL ARGUMENTS:

max.term
maximum number of terms to choose from in the model.
wt
vector of weights for the observations. The length must be the same as the number of rows in x. Missing values are not accepted.
rwt
vector of weights for the responses. The length must be the same as the number of columns in y. Missing values are not accepted.
xpred=
vector or matrix of explanatory variables for which responses are to be estimated. If xpred is omitted, then the original x data will be regressed on, and the residuals will be returned in ypred. Missing values are not accepted.
optlevel=
integer from 0 to 3 which determines the throughness of an optimization routine in ppreg. A higher number means more optimization.
bass=
super smoother bass tone control used with automatic span selection (see supsmu); the range of values is 0 to 10, with increasing values resulting in increased smoothing.
span=
super smoother span control (see supsmu). The default is "cv", which results in automatic span selection by local cross validation. span can also take a value from 0 < span <= 1.

VALUE:

a list containing the following components:
ypred
matrix of predicted values for y given the matrix xpred. If xpred was not input, then ypred contains the residuals for the model fit.
fl2
the sum of squared residuals divided by the total corrected sums of squares.
alpha
a minterm by ncol(x) matrix of the direction vectors, alpha[m,j] contains the j-th component of the direction in the m-th term.
beta
a minterm by ncol(y) matrix of term weights, beta[m,k] contains the value of the term weight for the m-th term and the k-th response variable.
z
a matrix of values to be plotted against zhat. z[i,m] contains the z value of the i-th observation in the m-th model term, i.e., z equals x %*% t(alpha). The columns of z have been sorted.
zhat
a matrix of function values to be plotted. zhat[i,m] is the smoothed ordinate value (phi) of the i-th observation in the m-th model term evaluated at z[i,m].
allalpha
a three dimensional array, the [m,j,M] element contains the j-th component of the direction in the m-th model term for the solution consisting of M terms. Values are zero for M less than minterm.
allbeta
a three dimensional array, the [m,k,M] element contains the term weight for the m-th term and the k-th response variable for the solution consisting of M terms. Values are zero for M less than minterm.
esq
esq[M] contains the fraction of unexplained variance for the solution consisting of M terms. Values are zero for M less than minterm.
esqrsp
matrix that is ncol(y) by maxterm containing the fraction of unexplained variance for each response. esqrsp[k,M] is for the k-th response variable for the solution consisting of M terms, for M ranging from min.term to max.term. Other columns are zero.

DETAILS:

The z component of the result is sorted, thus it can not be compared with the original data.

REFERENCES:

Friedman, J. H. and Stuetzle, W. (1981). Projection pursuit regression. Journal of the American Statistical Association 76, 817-823.

The chapter "Regression and Smoothing for Continuous Response Data" in the S-PLUS Guide to Statistical and Mathematical Analysis.

SEE ALSO:

, , .

EXAMPLES:

x1 <- rnorm(100) ; x2 <- rnorm(100) ; eps <- rnorm(100, 0, .1) 
x <- matrix(c(x1, x2), 100, 2) 
y <- x1*x2 + eps 
# Set up a matrix of predictor values. 
xpred <- matrix(c(0, 0, 0, 1, 1, 0, 1, 1), 4, 2, byrow=T) 
# Use ppreg with unit weights for both the observations and 
# the response, and a 2 term regression model (picked from 3 terms). 
a <- ppreg(x, y, 2, 3, xpred=xpred) 
# Plot the function values versus their abscissas, to look for structure. 
matplot(a$z, a$zhat)