glm
function.
This function is typically not called directly by users
but it is invoked through a call to
glm
when the
data
argument is of class
"bdFrame"
.
This function requires the bigdata library section to be loaded.
bdGlm(formula, family=gaussian, data, weights, subset, na.action, control=glm.control(...), contrasts=NULL, correlation=TRUE)
data
.
By default, all observations are weighted equally.
na.fail
,
which returns an error if any missing values are found.
An alternative is
na.exclude
,
which deletes observations that contain one or more missing values.
glm.control
for their names and default values.
These can also be given directly as arguments to
bdGlm
itself,
instead of through
control
contr.helmert
for examples.
p
by
p
where
p
is the
number of predictors. This can get large when there are factors with
many levels. To avoid extracting this matrix, specify
correlation=F
.
glm
object.
Methods available include:
print
,
summary
,
coef
,
plot
,
residuals
,
fitted
,
predict
,
anova
and
deviance
.
While the methods behave the same, the actual structure of the
glm
and
bdGlm
objects differ.
The
bdGlm
function is typically not called directly by a user.
It is invoked through a call to
glm
when the
data
argument is a big data object (an object of class
"bdFrame"
).
The evaluation for the formula and the creation of the model matrix with
contrasts, weights, subset and na.action are done the same as in an ordinary
glm
model.
Limitations of
bdGlm
relative to
glm
:
- the response variable cannot be a matrix
- the predictors cannot contain terms using:
offset, poly, bs, ns
- there are no ordered factors for bigdata objects so the ordered factors
contrasts will never be used
- the
glm
arguments:
start
,
method
,
model
,
x
and
y
do not work in
bdGlm
- user defined families are not supported
- the predict method,
predict.bdGlm
, does not support arguments
other than
object
,
newdata
and
type
- the predict method does not support
type="terms"
- the plot method,
plot.bdGlm
, does not create a normal QQ
residuals plot.
The plots that are produced are hexbin scatterplots.
- the anova method,
anova.bdGlm
, only produces a summary
ANOVA when given a single model object
Chambers, J. M. and Hastie, T. J. (1993). Statistical Models in S. London: Chapman and Hall.
McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. London: Chapman and Hall.
# Convert kyphosis to a bdFrame and use bdGlm: bigkyphosis <- as.bdFrame(kyphosis) bigGlm <- glm(Kyphosis ~ Age + Number, family=binomial, data=bigkyphosis) # Check class of the model object: class(bigGlm) # Print the model object: bigGlm