smooth.spline(x, y, w = <<see below>>, df = <<see below>>, spar = 0, cv = F, all.knots = F, df.offset = 0, penalty = 1)
x
values.
x
and
y
can be supplied in a variety of different forms, along the
lines of the function
plot
; e.g., a list with components
x
and
y
,
a two-column matrix, or simply a single vector, taken to be a time series if
it is not complex..
x
.
x
and
y
.
If measurements at different values of
x
have different variances,
w
should be inversely proportional to the variances.
The default is that all weights are equal.
df
and
spar
are
supplied,
spar
is used unless it is 0, in which case
df
is used.
spar
is
0
or missing and
df
is missing, cross-validation is used to
automatically
select
spar
. If a value of
spar
greater than zero is supplied, it is
used as the smoothing parameter.
TRUE
) or generalized (
FALSE
) cross
validation score should be computed.
FALSE
, a suitable fine grid of knots is chosen,
usually less in number than
the number of unique values of
x
.
If
TRUE
, the unique values of
x
are used as knots.
df
term used in the
calculation of the
GCV
criterion:
df=tr(S) + df.offset
.
df
quantity used in GCV to be charged a
cost = penalty
per degree of freedom.
smooth.spline
is returned,
consisting of the fitted smoothing spline evaluated at the supplied data,
some fitting criteria and constants, and a structure that
contains the essential information for computing the spline
and its derivatives for any values of
x
.
The components of the returned list are:
x
values
x
.
x
, and in the case of ties,
will consist of the accumulated weights at each unique value of
x
.
x
values (weighted averages of input
y
)
lev
.
If
df
was supplied as the smoothing parameter,
then the prescribed and resultant values of
df
should match
within 0.1 percent of the supplied
df
.
df
was
used to specify the amount of smoothing).
predict.smooth.spline
.
The two arguments
df.offset
and
penalty
are experimental and typically will not be used.
If used, the GCV criterion is
RSS/(n - (penalty*(trace(S)-1) + df.offset +1))
.
A cubic B-spline is fit with care taken to insure that the algorithm
runs linear in the number of data points. For small data vectors (n<50),
a knot is placed at every distinct data point,
and the regression is fit by penalized least squares.
For larger data sets the number of knots is chosen judiciously in
order to keep the computation time manageable (if
all.knots=F
).
The penalty
spar
can be chosen automatically by cross-validation
(if
spar=0
), can be supplied explicitly, or supplied implicitly via
the more intuitive
df
number.
When the data consists of
bdVectors
, the data
is aggregated before smoothing. The range of the "x" variable is divided
into 1000 bins, and the mean for "x" and "y" is computed in each bin. A
weighted smooth is then computed on the bin means weighted based on the bin
counts. This gives values that differ somewhat from those when the smoother
is applied to the unaggregated data. The values are generally close enough
to be indistinguishable when used in a plot, but the difference could be
important when the smoother is used for prediction or optimization.
Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models. Chapman and Hall, London.
attach(air) plot(ozone,temperature) lines(smooth.spline(ozone,temperature)) lines(smooth.spline(ozone,temperature, df = 5), lty = 2) # smoothing spline fit and approximate 95% "confidence" intervals # need to create object x and y fit <- smooth.spline(x, y) # smooth.spline fit res <- (fit$yin - fit$y)/(1-fit$lev) # jackknife residuals sigma <- sqrt(var(res)) # estimate sd upper <- fit$y + 2.0*sigma*sqrt(fit$lev) # upper 95% conf. band lower <- fit$y - 2.0*sigma*sqrt(fit$lev) # lower 95% conf. band matplot(fit$x, cbind(upper, fit$y, lower), type="plp", pch=".")