Nonlinear Regression Model

Nonlinear Regression is used when the dependent (response) variable is a nonlinear function of the independent (predictor) variables. Parameters are estimated by minimizing the sum of squared residuals.

Suppose n independent observations y can be modeled as a nonlinear parametric function f of a vector x of predictor variables and a vector of parameters, b.

__image\ebd_ebd19.gif

where the errors, e, are assumed normally distributed. The nonlinear least-squares problem finds parameter estimates b which minimize:

__image\ebd_ebd21.gif

Example: Puromycin

A biochemical experiment measured reaction velocity in cells with and without treatment by Puromycin. There are three variables in the Puromycin data frame:

 

Variable

Description

conc

the substrate concentration

vel

the reaction velocity

state

indicator of treated or untreated

Assume a Michaelis-Menten relationship between velocity and concentration:

__image\ebd_ebd23.gif

where V is the velocity, c is the enzyme concentration, Vmax is a parameter representing the asymptotic velocity as c -> infinity, K is the Michaelis parameter, and e is the experimental error. Assuming the treatment with the drug would change Vmax but not K, the optimization function is:

__image\ebd_ebd24.gif

where I {treated} is the function indicating if the cell was treated with Puromycin.

Formula

For nonlinear models a formula is an S-PLUS expression involving data, parameters in the model, and any other relevant quantities. The parameters must be specified in the formula because there is no assumption about where they are to be placed (as there is in Linear Regression).

In the Puromycin example you would specify a formula for the simple model, described in the Michealis-Menten equation __image\ebd_ebd25.gif by:

 

__image\velocity.gif

 

The parameters Vm and K are specified along with the data vel and conc.

For nonlinear least-squares formulas the response on the left of ~ and the predictor on the right must evaluate to numeric vectors of the same length. The fitting algorithm tries to estimate parameters to minimize the sum of squared differences between response and prediction. If the response is left out the formula is interpreted as a residual vector.

Parameters

Before the formulas can be evaluated, the fitting functions must know which names in the formula are parameters to be estimated and must have starting values for these parameters. Specify the name of each parameter together with its starting value:

Vm = 200, K = 0.1

Results

Printed Results

Short Output The short output for Linear Regression returns information on the following components of the fitted model:

· The residual sum of squares.

· The estimated parameter values.

· The formula used to construct the fit.

· The number of observations.

Long Output The long output for Nonlinear Regression provides a detailed summary of the fit, including information on:

· Coefficients, standard errors of the coefficients, and t-statistics for testing whether the coefficient is significantly different from zero.

· The residual standard error, e.g. the estimated standard error of the residuals.

· The residual standard error, e.g. the estimated standard error of the residuals.

· Degrees of freedom for the model and the residuals.

· Correlations between coefficients.

· The formula used to construct the fit.

Saved Results

Fitted values and residuals may be saved as columns in a data set.

Predict

Predictions may be obtained for either the data used to fit the model or a new data set. If new data is specified, the data set must contain columns with the same names as the independent (predictor) variables used in the model. If factors are used, the factors in the new data must have levels identical to the factors in the original data.