Home > E-book list >Log-Linear Analysis WEIGHTED LEAST SQUARES REGRESSION

Overview

Also called multiway frequency analysis (MFA), log-linear analysis is a special case of the general linear model (GLM, which includes regression and ANOVA models) created to better treat the case of dichotomous and categorical variables. It is a method of analyzing the distribution of cases in a table when all the variables of interest are categorical. Usually there is no "dependent variable" as in regression, though the special case of logit log-linear analysis, discussed below, can handle dependent variables. Ordinarily, however, what is predicted is not a variable but instead is the distribution of values in the table formed by categorical variables. The table is not limited to the usual two-way table but may be of any order (any number of categorical variables).

Thus log-linear analysis deals with association of categorical or grouped variables, looking at all levels of possible main and interaction effects, comparing this saturated model with reduced models. The primary purpose is to find the most parsimonious model which can account for cell frequencies in the table being analyzed. While log-linear analysis is a non-dependent procedure for accounting for the distribution of cases in a crosstabulation of categorical variables, it is closely related to such dependent procedures as logit and logistic, probit, and tobit regression.

Log-linear analysis is different from logistic regression in three ways:

```1.	The expected distribution of the categorical variables is Poisson, not binomial or multinomial.
2.	The link function is the natural log of the dependent variable, not the logit of the dependent as in logistic regression. (A logit is the natural log of the odds, which is the probability the dependent equals a given value [usually 1, indicating an event has occurred or a trait is present] divided by the probability it does not).
3.	Predictions are estimates of the cell counts in a contingency table, not the logit of y. That is, the cell count is the dependent variable in log-linear analysis.
```

Log-linear methods also differ from multiple regression by substituting maximum likelihood estimation of a link function of the dependent for regression's use of least squares estimation of the raw dependent variable itself. The link function transforms the dependent variable and it is this transform, not the raw variable, which is linearly related to the predictor side of the model.

There are several possible purposes for undertaking log-linear modeling, the primary being to determine the most parsimonious model which is not significantly different from the saturated model, which is a model that fully but trivially accounts for the cell frequencies of a table. Log-linear analysis also is used to determine if variables are related, to predict the expected frequencies (table cell values) of a dependent variable, the understand the relative importance of different independent variables in predicting a dependent, and to confirm models using a goodness of fit test (the likelihood ratio). Residual analysis can also determine where the model is working best and worst. Often researchers will use hierarchical log-linear analysis (in SPSS, the Model Selection option under Log-linear) for exploratory modeling, then use general log-linear analysis for confirmatory modeling.

SPSS supports these related procedures, among others:

1. Generalized linear modeling. Generalized linear modeling (GZLM), discussed in a separate Statistical Associates "Blue Book" volume, represents a more recent approach for analyzing categorical dependents and independents, thus constituting a different method for implementing log-linear analysis, as well as models for logit, probit, Poisson regression on cell count data, and others.
2. Hierarchical log-linear analysis (HILOG). Select Analyze, Log-linear, Model Selection. HILOG is often used for automatic selection of the best hierarchical model.
3. General log-linear analysis (GENLOG). Select Analyze, Log-linear, General. GENLOG is often used to refine the best hierarchical model to be more parsimonious by dropping terms.
4. Logit loglinear analysis and logit regression. Used when there are one or more dependent variables.
• In summary, traditional approaches to categorical data relied on chi-square and other measures of significance to establish if a relationship existed in a table, then employed any of a wide variety of measures of association to come up with a number, usually between 0 and 1, indicating how strong the relationship was. Log-linear methods are similar in function but have the advantage of making it far easier to analyze multi-way tables (more than two categorical variables) and to understand just which values of which variables and which interaction effects are contributing the most to the relationship. For simple two-variable tables, traditional approaches may still be preferred but for multivariate analysis of three or more categorical variables, log-linear analysis is preferred.

The full content is now available from Statistical Associates Publishers. Click here.

```Log-linear Analysis
Overview	8
Key Concepts and Terms	10
Types of log-linear analysis	10
General log-linear analysis	10
Hierarchical log-linear analysis	11
Types of variables	11
Factors	12
Covariates	12
Cell structure variables/cell weight variables	12
Contrast variables	12
Types of models	12
Saturated models and effects	12
Parsimonious models	14
The complete independence model	15
The one factor independence model	15
The conditional independence model	16
The homogenous association model	18
The symmetry model	19
The conditional symmetry model	19
General log-linear modeling: SPSS user interface	20
The "Model" button	21
The "Options" button	23
The "Save" button	24
General log-linear analysis compared to crosstabulation (SPSS)	24
Log-linear effects as categorical control variables in crosstabulation	24
General log-linear analysis of the crosstab example	26
Goodness of fit in log-linear analysis	28
Types of goodness of fit measures	28
Likelihood ratio	28
Pearson chi-square	29
Factor list warning	29
A simple goodness of fit example	29
General log-linear analysis using SPSS	30
Overview	30
Example	31
The saturated model	32
The independence model	34
Model dropping the highest level of interaction	36
The conditional independence model	37
General log-linear analysis using SAS	39
Example	39
SAS syntax	39
SAS output for the saturated model	41
SAS output for the independence model	41
SAS output for the homogenous association model	42
SAS output for the conditional independence model	43
Residual analysis	45
Overview	45
Residuals depend on the model	45
Residuals of the most parsimonious model	46
Normal probability (Q-Q) plots	48
Deviance residual plots	50
Normal probability (Q-Q) plots for deviance	51
Parameter estimates and odds ratios	51
Overview	51
Parameter estimates	52
Standardized parameter estimates (Z scores)	54
Model equations in log-linear analysis	54
Predicted frequencies	55
Odds ratios	57
Example	57
Hierarchical log-linear analysis	61
Overview	61
The SPSS user interface for hierarchical linear modeling	61
The initial "Model Selection Loglinear Analysis" dialog	61
The "Model" button dialog	62
The "Options" button dialog	63
Statistical output for hierarchical log-linear analysis in SPSS	64
The "Cell Counts and Residuals" table	64
The "Step Summary" table	65
The "Goodness of Fit Tests" table	67
The "Parameter Estimates" table	68
"Tests of K-Way and Higher-Order Effects" table	70
The "Partial Associations" table	71
Ordinal log-linear models	73
Overview	73
Linear-by-linear association models	73
Linear-by-linear modeling in SPSS	73
Example	73
Data setup	74
Statistical output for the linear-by-linear ordinal model	75
Row-effects models	76
Overview	76
Data setup	76
Statistical output for the row-effects ordinal model	77
Column-effects models	78
Logit log-linear models and logit regression	79
Overview	79
Example	79
The SPSS user interface for logit log-linear analysis	79
The main logit log-linear user interface	79
The "Model" button dialog	81
The "Options" button dialog	83
The "Save" button dialog	84
Logit log-linear statistical output in SPSS	84
Model	84
The "Goodness-of-fit Tests" table	84
The "Analysis of Dispersion" and "Measure of Association" tables	85
The "Parameter Estimates" table	86
The "Cell Counts and Residuals" table	88
START HERE	89
Conditional logit regression models	89
Matched pairs or panel data	89
Conditional logit regression in SPSS	90
Choice models	90
Statistical output for conditional logit regression in SPSS	91
Assumptions of log-linear models	91
Not assumed	91
Well-populated tables	91
Small models with few variables	92
No zero cells	92
No important outliers	93
Normally distributed residuals	93
No binned interval-level data	93
Evenly distributed categories	93
Independence	93
Data distribution assumptions	94
Appropriate dispersion	95
Absence of endogenous regressors	95
Why not just use regression with dichotomous dependents?	95
Why not just use crosstabulation and ordinal measures of association rather than ordinal log-linear analysis?	96
What computer packages implement log-linear analysis?	96
What are second-order and partial odds ratios?	96
What are structural zeros and sampling zeros in the SPSS "Data Information" table?	97
Since logit and probit generally lead to the same statistical conclusions, when is one better than the other?	97
Do I really need to do multinomial logit (multinomial logistic regression) or multinomial probit? Could I just apply M different logit or probit models for a variable with M levels?	98
What if my variables are multiple-response type?	98
Explain "partial odds".	98
Explain coding in saturated vs. nonsaturated models.	98
What is log-linear analysis with latent variables?	99
Bibliography	99
Pagecount: 103

```