Fit a Local Regression Model

DESCRIPTION:

This function fits a local regression model.

USAGE:

loess(formula, data, weights, subset, na.action, span = 0.75,  
      enp.target, degree = 2, parametric = F, drop.square = F,  
      normalize = T, family = c("gaussian", "symmetric"),  
      model = F, control, ...) 

REQUIRED ARGUMENTS:

formula
a formula object, with the response on the left of a ~ operator, and the terms, separated by "*" operators, on the right.

OPTIONAL ARGUMENTS:

data
an optional data.frame in which to interpret the variables named in the formula, the subset and the weights argument.
weights
optional expression for weights to be given to individual observations in the sum of squared residuals that forms the local fitting criterion. By default, an unweighted fit is carried out. If supplied, weights is treated as an expression to be evaluated in the same data frame as the model formula. It should evaluate to a non-negative numeric vector. If the different observations have nonequal variances, weights should be inversely proportional to the variances.
subset
expression saying which subset of the rows of the data should be used in the fit. This can be a logical vector (which is replicated to have length equal to the number of observations), or a numeric vector indicating which observation numbers are to be included, or a character vector of the row names to be included. All observations are included by default.
na.action
a function to filter missing data. This is applied to the model.frame after any subset argument has been used. The default (with na.fail) is to create an error if any missing values are found. A possible alternative is na.exclude, which deletes observations that contain one or more missing values.
family
the assumed distribution of the errors. The values are "gaussian" or "symmetric". The first value is the default. If the second value is specified, a robust fitting procedure is used.
normalize
logical that determines if numeric predictors should be normalized. If TRUE, the standard normalization is used. If FALSE, no normalization is carried out.
span
smoothing parameter.
enp.target
another way to specify the amount of smoothing. An approximation is used to compute a value of span that will yield approximately enp.target equivalent number of parameters.
degree
overall degree of locally-fitted polynomial. 1 is locally-linear fitting and 2 is locally-quadratic fitting.
drop.square
for cases with degree equal to 2 and with two or more numeric predictors, this argument specifies those numeric predictors whose squares should be dropped from the set of fitting variables. The argument can be a character vector of the predictor names given in formula, or a numeric vector of indices that gives positions as determined by the order of specification of the predictor names in formula, or a logical vector of length equal to the number of predictor names in formula.
parametric
for two or more numeric predictors, this argument specifies those variables that should be conditionally-parametric. The method of specification is the same as for drop.square.
control
a list that controls the methods of computation in the loess fitting. The list can be created by the function loess.control, whose documentation describes the computational options.
...
arguments of the function loess.control can also be specified directly in the call to loess without using the argument control.

VALUE:

an object of class "loess" representing the fitted model. See the documentation for loess.object for more information on the components.

DETAILS:

If loess runs slowly on a particular machine, components of the control list can be changed to speed up the computations by using further approximations. For example, changing the component trace.hat to "approximate" can reduce the computation time substantially for large datasets.

When the data consists of a bdFrame, there can be up to 3 predictor variables (variables on the right-hand side of the formula). The data is aggregated before smoothing. The ranges of the predictor variables are divided into pieces, creating a grid of bins. With 1, 2, and 3 predictor variables, there are respectively 1000, 10000 (100*100), and 27000 (30*30*30) bins. The mean for the predictors and response is computed in each bin. A weighted loess() is then applied to the bin means weighted based on the bin counts. This will give values that differ somewhat from those when loess() is applied to the unaggregated data. The values are generally close enough to produce similar results when used in a plot, but the difference could be important when loess() is used for prediction or optimization.

LIMITATIONS:

Locally quadratic models may have at most 4 predictor variables; locally linear models may have at most 8. The memory needed by loess increases exponentially with the number of variables.

REFERENCES:

Chambers, J.M., and Hastie, T.J. (1991). Statistical Models in S, 309-376.
Cleveland, W.S., and Devlin, S.J., (1988) Locally-weighted Regression: An Approach to Regression Analysis by Local Fitting. J. Am. Statist. Assoc., Vol. 83, pp 596-610.
Cleveland, W.S., and Grosse, E. (1991) Computational Methods for Local Regression. Statistics and Computing, Vol. 1.

SEE ALSO:

, , , , , , .

EXAMPLES:

loess(NOx ~ C * E, span = 1/2, degree = 2, 
      parametric = "C", drop.square = "C", data=ethanol) 
#  produces the following output: 
  Call: 
  loess(formula = NOx ~ C * E, span = 1/2, degree = 2, 
  parametric = "C", drop.square = "C")  
                                          
   Number of Observations:          88     
   Equivalent Number of Parameters: 9.2    
   Residual Standard Error:         0.1842 
   Multiple R-squared:              0.98   
   Residuals:  
       min   1st Q  median   3rd Q    max  
   -0.5236 -0.0973 0.01386 0.07345 0.5584