glm function.
This function is typically not called directly by users
but it is invoked through a call to
glm
when the
data argument is of class
"bdFrame".
This function requires the bigdata library section to be loaded.
bdGlm(formula, family=gaussian, data, weights, subset, na.action,
control=glm.control(...), contrasts=NULL, correlation=TRUE)
data.
By default, all observations are weighted equally.
na.fail,
which returns an error if any missing values are found.
An alternative is
na.exclude,
which deletes observations that contain one or more missing values.
glm.control for their names and default values.
These can also be given directly as arguments to
bdGlm itself,
instead of through
control
contr.helmert for examples.
p by
p where
p is the
number of predictors. This can get large when there are factors with
many levels. To avoid extracting this matrix, specify
correlation=F.
glm object.
Methods available include:
print,
summary,
coef,
plot,
residuals,
fitted,
predict,
anova and
deviance.
While the methods behave the same, the actual structure of the
glm and
bdGlm objects differ.
The
bdGlm function is typically not called directly by a user.
It is invoked through a call to
glm when the
data
argument is a big data object (an object of class
"bdFrame").
The evaluation for the formula and the creation of the model matrix with
contrasts, weights, subset and na.action are done the same as in an ordinary
glm model.
Limitations of
bdGlm relative to
glm:
- the response variable cannot be a matrix
- the predictors cannot contain terms using:
offset, poly, bs, ns
- there are no ordered factors for bigdata objects so the ordered factors
contrasts will never be used
- the
glm arguments:
start,
method,
model,
x and
y do not work in
bdGlm
- user defined families are not supported
- the predict method,
predict.bdGlm, does not support arguments
other than
object,
newdata and
type
- the predict method does not support
type="terms"
- the plot method,
plot.bdGlm, does not create a normal QQ
residuals plot.
The plots that are produced are hexbin scatterplots.
- the anova method,
anova.bdGlm, only produces a summary
ANOVA when given a single model object
Chambers, J. M. and Hastie, T. J. (1993). Statistical Models in S. London: Chapman and Hall.
McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. London: Chapman and Hall.
# Convert kyphosis to a bdFrame and use bdGlm: bigkyphosis <- as.bdFrame(kyphosis) bigGlm <- glm(Kyphosis ~ Age + Number, family=binomial, data=bigkyphosis) # Check class of the model object: class(bigGlm) # Print the model object: bigGlm