loess(formula, data, weights, subset, na.action, span = 0.75, enp.target, degree = 2, parametric = F, drop.square = F, normalize = T, family = c("gaussian", "symmetric"), model = F, control, ...)
~
operator,
and the terms, separated by
"*"
operators, on the right.
data.frame
in which to interpret the variables named in the
formula, the
subset
and the
weights
argument.
weights
is
treated as an expression to be evaluated in the same data frame as the
model formula. It should evaluate to a non-negative numeric vector. If
the different observations have nonequal variances,
weights
should be
inversely proportional to the variances.
model.frame
after
any
subset
argument has been used.
The default (with
na.fail
) is to create an error
if any missing values are found.
A possible alternative is
na.exclude
, which deletes observations
that contain one or more missing values.
"gaussian"
or
"symmetric"
. The first value is the default.
If the second value is specified, a robust fitting procedure is used.
TRUE
, the standard normalization
is used. If
FALSE
, no normalization is carried out.
span
that will yield approximately
enp.target
equivalent number
of parameters.
1
is locally-linear fitting and
2
is locally-quadratic fitting.
degree
equal to
2
and with two or more numeric predictors,
this argument
specifies those numeric predictors whose squares should be dropped from
the set of fitting
variables. The argument can be a character vector of the predictor names
given in
formula
, or
a numeric vector of indices that gives positions as determined
by the order of specification of the predictor names in
formula
, or
a logical vector of length equal to the number of predictor
names in formula.
drop.square
.
loess.control
, whose
documentation describes the computational options.
loess.control
can also be specified
directly in the call to
loess
without using the argument
control
.
"loess"
representing the fitted model. See the documentation
for
loess.object
for more information on the components.
If
loess
runs
slowly on a particular machine, components of the
control
list
can be changed to speed up the computations by using further approximations.
For example, changing
the component
trace.hat
to
"approximate"
can reduce the computation
time substantially for large datasets.
When the data consists of a
bdFrame
, there can be up to 3 predictor variables (variables on the right-hand side of the formula). The data is aggregated before smoothing. The ranges of the predictor variables are divided into pieces, creating a grid of bins. With 1, 2, and 3 predictor variables, there are respectively 1000, 10000 (100*100), and 27000 (30*30*30) bins. The mean for the predictors and response is computed in each bin. A weighted
loess()
is then applied to the bin means weighted based on the bin counts. This will give values that differ somewhat from those when
loess()
is applied to the unaggregated data. The values are generally close enough to produce similar results when used in a plot, but the difference could be important when
loess()
is used for prediction or optimization.
Locally quadratic models may have at most 4 predictor variables; locally linear models may have at most 8. The memory needed by loess increases exponentially with the number of variables.
Chambers, J.M., and Hastie, T.J. (1991). Statistical Models in S,
309-376.
Cleveland, W.S., and Devlin, S.J., (1988)
Locally-weighted Regression: An Approach to Regression Analysis
by Local Fitting. J. Am. Statist. Assoc.,
Vol. 83, pp 596-610.
Cleveland, W.S., and Grosse, E. (1991)
Computational Methods for Local Regression. Statistics and Computing,
Vol. 1.
loess(NOx ~ C * E, span = 1/2, degree = 2, parametric = "C", drop.square = "C", data=ethanol) # produces the following output: Call: loess(formula = NOx ~ C * E, span = 1/2, degree = 2, parametric = "C", drop.square = "C") Number of Observations: 88 Equivalent Number of Parameters: 9.2 Residual Standard Error: 0.1842 Multiple R-squared: 0.98 Residuals: min 1st Q median 3rd Q max -0.5236 -0.0973 0.01386 0.07345 0.5584