loglin(table, margin, start=<<see below>>, fit=F, eps=0.1, iter=20, param=F, print=T)
table
function.
Neither negative nor missing values are allowed.
The number of dimensions of
table
must be less than or equal to 15.
list(1:2, 3:4)
would indicate
fitting the 1,2 margin (summing over variables 3 and 4) and
the 3,4 margin in a four-way table.
The names of factors (i.e.,
names(dimnames(table))
)
may be used rather than indices.
start
is omitted, a
start is used that will assure convergence. If structural
zeros appear in
table
,
start
should contain zeros in
corresponding entries, ones in other places. This assures
that the fit will contain those zeros.
FALSE
saves computation as well as space.
TRUE
, the final deviation and number
of iterations will be printed.
2 * sum(observed * log(observed/expected))
sum((observed - expected)^2/expected)
.
margin
except
that the names of the factors are used if present.
table
, but containing fitted values.
This is returned only when the argument
fit
is
TRUE
.
constant
component describes the overall
mean, each single factor sums to zero, each two factor parameter sums to
zero both by rows and columns, etc.
This is returned only when the argument
param
is
TRUE
.
The fit is produced by the Iterative Proportional Fitting algorithm as
presented in Haberman (1972).
Convergence is considered to be achieved if the maximum deviation between
an observed and a fitted margin is less than
eps
.
At most
iter
iterations will be performed.
The fitting is currently done in single
precision, other computations are in double precision.
The margins to be fit describe the model, similar to describing an ANOVA
model. A high-order term automatically includes all the lower-order terms
within it; e.g., the term
c(1,3)
includes the one-factor terms
1
and
3
. A factor that had constraints in the sampling plan should
always be included. For example, if the sampling plan was such that there
would be (precisely)
x
females and
y
males sampled, then
gender
should
be in all models.
Both the LRT and the Pearson test statistics are asymptotically distributed
chisquare with
df
degrees of freedom (assuming there are no zeros).
A general rule of thumb is that the asymptotic distribution is trustworthy
when the number of observations is 10 times the number of cells.
If the two test statistics differ considerably, not much faith can be put
in the test.
Using the test statistics to select a model is a rather backward use of
hypothesis testing - a model can be "proved" wrong, but passing the test
doesn't mean that the model is right. Bayesian techniques have been
developed to select a good model (or models).
The
start
argument can be used to produce analyses when the cells are
assigned different weights, see Clogg and Eliason (1988).
The start should be one over the weights.
A suggested analysis strategy is to use the default settings to narrow down
the number of models, and then to set the
fit
and
param
options to
TRUE
in order to investigate the more promising models further.
Log-linear analysis studies the relationship between a number of categorical variables, extending the idea of simply testing for independence of the factors. Typically the number of observations falling into each combination of the levels of the variables (factors) is modeled. The model, as the name suggests, is that the logarithm of the counts follows a linear model depending on the levels of the factors.
Clogg, C. C. and Eliason, S. R. (1988). Some Common Problems in Log-Linear
Analysis. In
Common Problems/Proper Solutions
J. Scott Long, ed.
Newbury Park, Calif.: SAGE.
Fienberg, S. E. (1980).
The Analysis of Cross-Classified Categorical Data
(2nd edition). Cambridge, Mass.: MIT Press.
Haberman, S. J. (1972).
Log-linear fit for contingency tables---Algorithm AS51.
Applied Statistics
21, 218-225.
Lunneborg, C. E. and Abbott, R. D. (1983).
Elementary Multivariate Analysis for the Behavioral Sciences.
New York: North-Holland.
loglin(barley.exposed, list("cultivar", "time", "cluster")) # model of independence loglin(barley.exposed, list(1:2, c(1, 3))) # factors 2 and 3 are independent conditional on factor 1 bar.ci1 <- loglin(barley.exposed, list(1:2, c(1, 3)), param=T, fit=T) # return parameter values and the fit (barley.exposed - bar.ci1$fit)/sqrt(bar.ci1$fit) # scaled residuals