Hat Diagonal Regression Diagnostic

DESCRIPTION:

Returns the diagonal of the hat matrix for a least squares regression.

USAGE:

hat(x, intercept=T) 

REQUIRED ARGUMENTS:

x
matrix of explanatory variables in the regression model y=xb+e, or the QR decomposition of such a matrix. Missing values are not accepted.

OPTIONAL ARGUMENTS:

intercept
logical flag, if TRUE an intercept term is included in the regression model. This is ignored if x is a QR object.

VALUE:

vector with one value for each row of x. These values are the diagonal elements of the least-squares projection matrix H . (Fitted values for a regression of y on x are H %*% y.) Large values of these diagonal elements correspond to points with high leverage.

BACKGROUND:

The diagonals of the hat matrix indicate the amount of leverage (influence) that observations have in a least squares regression. Note that this is independent of the value of y. Observations that have large hat diagonals have more say about the location of the regression line; an observation with a hat diagonal close to 1 will have a residual close to 0 no matter what value the response for that observation takes.

The hat diagonals lie between 1/n and 1 and their average value is p/n where p is the number of variables, i.e., the number of columns of x (plus 1 if int=T), and n is the number of observations (the number of rows of x). Belsley, Kuh and Welsch (1980) suggest that points with a hat diagonal greater than 2p/n be considered high leverage points, though they state that too many points will be labeled leverage points by this rule when p is small. Another rule of thumb is to consider any point with a hat diagonal greater than .2 (or .5) as having high leverage. If p is large relative to n, then all points can be "high leverage" points.

By the way, it is called the "hat" matrix because in statistical jargon multiplying the matrix by a vector y puts a "hat" on y, that is, the estimated fit is the result.

REFERENCES:

Belsley, D. A., Kuh, E. and Welsch, R. E. (1980). Regression Diagnostics. Wiley, New York.

Cook, R. D. and Weisberg, S. (1982). Residuals and Influence in Regression. Chapman and Hall, New York.

SEE ALSO:

, , , , .

EXAMPLES:

h <- hat(freeny.x) 
plot(h, xlab="index number", ylab="hat diagonal") 
abline(h=2*ncol(freeny.x)/nrow(freeny.x))