Fit a Generalized Linear Model

DESCRIPTION:

Produces an object of class "glm" or "bdGlm" which is a generalized linear fit of the data.

USAGE:

glm(formula, family=gaussian, data=<<see below>>, weights, 
    subset, na.action=na.fail, start=<<see below>>, 
    control=glm.control, method="glm.fit", model=F, x=F, 
    y=T, contrasts=NULL, ...) 

REQUIRED ARGUMENTS:

formula
a formula expression as for other regression models, of the form response ~ predictors. For details, see the documentation for lm and formula. See the DETAILS section below for special forms the response variable can take in logistic regression.

OPTIONAL ARGUMENTS:

family
a family object. This is a list of expressions for defining the link, variance function, initialization values, and iterative weights for the generalized linear model. Supported families are: gaussian, binomial, poisson, Gamma, inverse.gaussian and quasi. Functions like binomial produce a family object and can be given without the parentheses. Family functions can take arguments, as in binomial(link=probit). For more details, see the help files for family and family.object.
data
a data frame or bdFrame in which to interpret the variables occurring in the formula. If data is bdFrame then the function bdGlm will be called. See the DETAILS section below for additional information and restrictions when using glm with a bdFrame.
weights
the weights for the fitting criterion. By default, all observations are weighted equally.
subset
an expression defining which subset of the rows in the data to use in the fit. This can be a logical vector, which is replicated to have length equal to the number of observations, a numeric vector indicating which observation numbers to include, or a character vector of the row names to include. By default, all observations are included.
na.action
a function to filter missing data. This is applied to the model.frame after any subset argument has been used. The default value of na.action=na.fail creates an error if any missing values are found. A possible alternative is na.exclude, which deletes observations that contain one or more missing values.
start
a vector of initial values on the scale of the linear predictor. This argument is useful in rare cases where the default starting values pose convergence problems to the underlying algorithm. For more information, see Chambers and Hastie (1993).
control
a list of iteration and algorithmic constants. See glm.control for their names and default values. These can also be given directly as arguments to glm itself, instead of through control.
method
the method to use in fitting the model. By default, the function glm.fit is used and the model is fit via iteratively reweighted least squares. However, other fitting methods can be defined by the user; see Chambers and Hastie (1993) pages 245 to 246 for more information.
model
if model=TRUE, the model.frame is returned. If model is itself a model.frame object, the formula and data arguments are ignored and model is used to define the model. By default, model=FALSE.
x
a logical flag; if x=TRUE, the model.matrix is returned. By default, x=FALSE.
y
a logical flag. The default value of y=TRUE causes the response variable to be returned.
contrasts
a list of contrasts to be used for some or all of the factors appearing as variables in the model formula. The names of the list should be the names of the corresponding variables. The elements of the list should be either contrast-type matrices (matrices with as many rows as levels of the factor, and with columns linearly independent of each other and of a column of ones), or else they should be functions that compute such contrast matrices. See the help file for contr.helmert for examples.
...
additional arguments are passed to glm.fit. In particular, the qr argument to glm.fit, which determines whether the QR decomposition from the fitting algorithm is returned, can be given to glm directly. See glm.fit for details.

VALUE:

an object of class "glm" is returned, which inherits from lm. See glm.object for details. If data is of class "bdFrame" then an object of class "bdGlm" is returned. See

The output object from glm has all the components of an lm object, with a few more. It can be examined with print, summary , plot , and anova. Components can be extracted using predict, fitted , residuals, coefficients , deviance , effects , formula, and family. A glm object can be modified using update. Other generic functions that have methods for glm objects are drop1, add1 , step and preplot.

DETAILS:

If the data argument is a bdFrame then the function bdGlm is immediately called by glm. The bdGlm function does not support all the arguments that glm does. See the help file for more information.

The required formula argument to glm is in the same format as most other formulas in S-PLUS, with the response on the left side of a tilde ( ) and the predictor variables on the right. In logistic regression, however, the response can assume a few different forms:

1)

If the response is a logical vector or a two-level factor, it is treated as a 0/1 binary vector. The zero values correspond to failures and the ones correspond to successes.

2)
If the response is a multilevel factor, S-PLUS assumes the first level codes failures (0) and all of the remaining levels code successes (1).
3)
If the response is a two-column matrix, S-PLUS assumes the first column holds the number of successes for each trial and the second column holds the number of failures.

In addition, if the response is a general numeric vector, S-PLUS assumes that it holds the proportion of successes. That is, the ith value in the response vector is s[i]/n[i], where s[i] denotes the number of successes out of n[i] total trials. The n[i] should be given as weights to the weights argument, to indicate the relative importance of different cases. Note that the weights are not interpreted as counts. This does not affect predictions or coefficients estimated by the model, but degrees of freedom and standard errors are calculated as if the number of observations is length(weights) rather than sum(weights).

The model is fit using Iterative Reweighted Least Squares (IRLS). The working response and iterative weights are computed using the functions contained in the family object. The workhorse of glm is the function glm.fit , which expects x and y arguments rather than a formula.

Generalized linear models can also be fit with the gam function. See the help file file for gam for further details.

REFERENCES:

Chambers, J. M. and Hastie, T. J. (1993). Statistical Models in S. London: Chapman and Hall.

McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. London: Chapman and Hall.

SEE ALSO:

, , , , , , , , , , .

EXAMPLES:

glm(ozone^(1/3) ~ bs(radiation, 5) + poly(wind, temperature, 
    degree = 2), data = air) 

# Poisson regression.
glm(skips ~ ., family = poisson, data = solder.balance) 

# Logistic regression using a binary response vector.
glm(Kyphosis ~ poly(Age, 2) + (Number > 5)*Start, 
    family = binomial, data = kyphosis) 

# Logistic regression using a matrix for the response.
# The absence of kyphosis is considered a success.
kyph.mat <- t(as.matrix(table(kyphosis$Kyphosis)))
glm(kyph.mat ~ 1, family = binomial)