Fit a Generalized Linear Model

DESCRIPTION:

Produces an object of class glm which is a generalized linear fit of the data. The glm function is generic (see Methods); method functions can be written to handle specific classes of data. Classes which already have methods for this function include:
model.list.

USAGE:

glm(formula, family = gaussian, data=<<see below>>, 
    weights, subset=<<see below>>, na.action, 
    start, control, method = "glm.fit", model = F, x = F, y = T, 
    contrasts = NULL, ...) 

REQUIRED ARGUMENTS:

formula
a formula expression as for other regression models, of the form response ~ predictors. See the documentation of and for details.

OPTIONAL ARGUMENTS:

family
a family object -- a list of functions and expressions for defining the link and variance functions, initialization and iterative weights. Families supported are , , , , and . Functions like produce a family object, but can be given without the parentheses. Family functions can take arguments, as in binomial(link=probit).
data
a in which to interpret the variables named in the formula, or in the subset and the weights argument. If this is missing, then the variables in the formula should be on the search list. This may also be a single number to handle some special cases -- see below for details.
weights
the optional weights for the fitting criterion.
subset
expression saying which subset of the rows of the data should be used in the fit. This can be a logical vector (which is replicated to have length equal to the number of observations), or a numeric vector indicating which observation numbers are to be included, or a character vector of the row names to be included. All observations are included by default.
na.action
a function to filter missing data. This is applied to the after any subset argument has been used. The default (with na.fail) is to create an error if any missing values are found. A possible alternative is na.omit, which deletes observations that contain one or more missing values.
start
a vector of initial values on the scale of the linear predictor.
control
a list of iteration and algorithmic constants. See for their names and default values. These can also be set as arguments to glm itself.
method
character. May indicate the fitting method to be used; the default (and only option, currently) is "glm.fit". Or may indicate that a data structure is to be returned before fitting. The method "model.frame" returns the model frame, and "model.list" returns the model list; in either case there is no fitting. If method="model.list" the fitting method may be included as well, in case the model list is to be fit later (by a call to ). For example, c("model.list", "glm.fit") (the order is not important). This is the only case in which a vector is recognized.
model
if TRUE, the is returned in component model. If this argument is itself a , then the formula and data arguments are ignored, and model is used to define the model.
x
logical flag: if TRUE, the is returned in component x.
y
logical flag: if TRUE, the response variable is returned in component y (default is TRUE).
contrasts
a list of contrasts to be used for some or all of the factors appearing as variables in the model formula. The names of the list should be the names of the corresponding variables, and the elements should either be contrast-type matrices (matrices with as many rows as levels of the factor and with columns linearly independent of each other and of a column of one's), or else they should be functions that compute such contrast matrices.
...
control arguments may be given directly; see the control argument. May also pass additional arguments for the fitting routines (see ). One possibility is qr=TRUE, in which case the QR decomposition of the model.matrix is returned in component qr.

VALUE:

an object of class "glm" representing the fit, or of class "model.frame" or "model.list" if signalled by the method argument. See , , , and for details.

DETAILS:

The output can be examined by , , , and . Components can be extracted using predict, fitted, residuals , deviance, formula, and family. It can be modified using . It has all the components of an object, with a few more. Other generic functions that have methods for glm objects are drop1, add1 , step and preplot. Use for further details.

The response variable must conform with the definition of family, for example factor or binary data if family=binomial is declared.

The model is fit using Iterative Reweighted Least Squares(IRLS). The working response and iterative weights are computed using the functions contained in the family object. GLM models can also be fit using the function . The workhorse of glm is the function which expects an x and y argument rather than a formula.

NAMES. Variables occurring in a formula are evaluated differently from arguments to S-PLUS functions, because the formula is an object that is passed around unevaluated from one function to another. The functions such as glm that finally arrange to evaluate the variables in the formula try to establish a context based on the data argument. (More precisely, the function does the actual evaluation, assuming that its caller behaves in the way described here.) If the data argument to glm is missing or is an object (typically, a data frame), then the local context for variable names is the frame of the function that called glm, or the top-level expression frame if you called glm directly. Names in the formula can refer to variables in the local context as well as global variables or variables in the data object.

The data argument can also be a number, in which case that number defines the local context. This can arise, for example, if a function is written to call glm, perhaps in a loop, but the local context is definitely notthat function. In this case, the function can set data to sys.parent(), and the local context will be the next function up the calling stack. A numeric value for data can also be supplied if a local context is being explicitly created by a call to new.frame. Notice that supplying data as a number implies that this is the onlylocal context; local variables in any other function will not be available when the model frame is evaluated. This is potentially subtle. Fortunately, it is not something the ordinary user of glm needs to worry about. It is relevant for those writing functions that call glm or other such model-fitting functions.

REFERENCES:

McCullagh, P. and Nelder, J. A. (1983), Generalized Linear Models, Chapman and Hall, London.

SEE ALSO:

, , , , , , .

EXAMPLES:

glm(skips ~ ., family = poisson, data = solder.balance) 
glm(Kyphosis ~ poly(Age, 2) + (Number > 5)*Start, 
    family = binomial, data = kyphosis) 
glm(ozone^(1/3) ~ bs(radiation, 5) + poly(wind, temperature, degree = 2), 
    data = air)