
WEIGHTED LEAST SQUARES REGRESSION
Overview
Also called multiway frequency analysis (MFA), loglinear analysis is a special case of the general linear model (GLM, which includes regression and ANOVA models) created to better treat the case of dichotomous and categorical variables. It is a method of analyzing the distribution of cases in a table when all the variables of interest are categorical. Usually there is no "dependent variable" as in regression, though the special case of logit loglinear analysis, discussed below, can handle dependent variables. Ordinarily, however, what is predicted is not a variable but instead is the distribution of values in the table formed by categorical variables. The table is not limited to the usual twoway table but may be of any order (any number of categorical variables).
Thus loglinear analysis deals with association of categorical or grouped variables, looking at all levels of possible main and interaction effects, comparing this saturated model with reduced models. The primary purpose is to find the most parsimonious model which can account for cell frequencies in the table being analyzed. While loglinear analysis is a nondependent procedure for accounting for the distribution of cases in a crosstabulation of categorical variables, it is closely related to such dependent procedures as logit and logistic, probit, and tobit regression.
Loglinear analysis is different from logistic regression in three ways:
1. The expected distribution of the categorical variables is Poisson, not binomial or multinomial. 2. The link function is the natural log of the dependent variable, not the logit of the dependent as in logistic regression. (A logit is the natural log of the odds, which is the probability the dependent equals a given value [usually 1, indicating an event has occurred or a trait is present] divided by the probability it does not). 3. Predictions are estimates of the cell counts in a contingency table, not the logit of y. That is, the cell count is the dependent variable in loglinear analysis.
Loglinear methods also differ from multiple regression by substituting maximum likelihood estimation of a link function of the dependent for regression's use of least squares estimation of the raw dependent variable itself. The link function transforms the dependent variable and it is this transform, not the raw variable, which is linearly related to the predictor side of the model.
There are several possible purposes for undertaking loglinear modeling, the primary being to determine the most parsimonious model which is not significantly different from the saturated model, which is a model that fully but trivially accounts for the cell frequencies of a table. Loglinear analysis also is used to determine if variables are related, to predict the expected frequencies (table cell values) of a dependent variable, the understand the relative importance of different independent variables in predicting a dependent, and to confirm models using a goodness of fit test (the likelihood ratio). Residual analysis can also determine where the model is working best and worst. Often researchers will use hierarchical loglinear analysis (in SPSS, the Model Selection option under Loglinear) for exploratory modeling, then use general loglinear analysis for confirmatory modeling.
SPSS supports these related procedures, among others:
The full content is now available from Statistical Associates Publishers. Click here.
Below is the unformatted table of contents.
Loglinear Analysis Table of Contents Overview 8 Key Concepts and Terms 10 Types of loglinear analysis 10 General loglinear analysis 10 Hierarchical loglinear analysis 11 Types of variables 11 Factors 12 Covariates 12 Cell structure variables/cell weight variables 12 Contrast variables 12 Types of models 12 Saturated models and effects 12 Parsimonious models 14 The complete independence model 15 The one factor independence model 15 The conditional independence model 16 The homogenous association model 18 The symmetry model 19 The conditional symmetry model 19 General loglinear modeling: SPSS user interface 20 The "Model" button 21 The "Options" button 23 The "Save" button 24 General loglinear analysis compared to crosstabulation (SPSS) 24 Loglinear effects as categorical control variables in crosstabulation 24 General loglinear analysis of the crosstab example 26 Goodness of fit in loglinear analysis 28 Types of goodness of fit measures 28 Likelihood ratio 28 Pearson chisquare 29 Factor list warning 29 A simple goodness of fit example 29 General loglinear analysis using SPSS 30 Overview 30 Example 31 The saturated model 32 The independence model 34 Model dropping the highest level of interaction 36 The conditional independence model 37 General loglinear analysis using SAS 39 Example 39 SAS syntax 39 SAS output for the saturated model 41 SAS output for the independence model 41 SAS output for the homogenous association model 42 SAS output for the conditional independence model 43 Residual analysis 45 Overview 45 Residuals depend on the model 45 Residuals of the most parsimonious model 46 Adjusted residuals plots 47 Normal probability (QQ) plots 48 Deviance residual plots 50 Normal probability (QQ) plots for deviance 51 Parameter estimates and odds ratios 51 Overview 51 Parameter estimates 52 Standardized parameter estimates (Z scores) 54 Model equations in loglinear analysis 54 Predicted frequencies 55 Odds ratios 57 Example 57 Hierarchical loglinear analysis 61 Overview 61 The SPSS user interface for hierarchical linear modeling 61 The initial "Model Selection Loglinear Analysis" dialog 61 The "Model" button dialog 62 The "Options" button dialog 63 Statistical output for hierarchical loglinear analysis in SPSS 64 The "Cell Counts and Residuals" table 64 The "Step Summary" table 65 The "Goodness of Fit Tests" table 67 The "Parameter Estimates" table 68 "Tests of KWay and HigherOrder Effects" table 70 The "Partial Associations" table 71 Ordinal loglinear models 73 Overview 73 Linearbylinear association models 73 Linearbylinear modeling in SPSS 73 Example 73 Data setup 74 Statistical output for the linearbylinear ordinal model 75 Roweffects models 76 Overview 76 Data setup 76 Statistical output for the roweffects ordinal model 77 Columneffects models 78 Logit loglinear models and logit regression 79 Overview 79 Example 79 The SPSS user interface for logit loglinear analysis 79 The main logit loglinear user interface 79 The "Model" button dialog 81 The "Options" button dialog 83 The "Save" button dialog 84 Logit loglinear statistical output in SPSS 84 Model 84 The "Goodnessoffit Tests" table 84 The "Analysis of Dispersion" and "Measure of Association" tables 85 The "Parameter Estimates" table 86 The "Cell Counts and Residuals" table 88 START HERE 89 Conditional logit regression models 89 Matched pairs or panel data 89 Conditional logit regression in SPSS 90 Choice models 90 Statistical output for conditional logit regression in SPSS 91 Assumptions of loglinear models 91 Not assumed 91 Wellpopulated tables 91 Small models with few variables 92 Adequate sample size 92 No zero cells 92 No important outliers 93 Normally distributed residuals 93 No binned intervallevel data 93 Evenly distributed categories 93 Independence 93 Data distribution assumptions 94 Appropriate dispersion 95 Absence of endogenous regressors 95 Frequently Asked Questions 95 Why not just use regression with dichotomous dependents? 95 Why not just use crosstabulation and ordinal measures of association rather than ordinal loglinear analysis? 96 What computer packages implement loglinear analysis? 96 What are secondorder and partial odds ratios? 96 What are structural zeros and sampling zeros in the SPSS "Data Information" table? 97 Since logit and probit generally lead to the same statistical conclusions, when is one better than the other? 97 Do I really need to do multinomial logit (multinomial logistic regression) or multinomial probit? Could I just apply M different logit or probit models for a variable with M levels? 98 What if my variables are multipleresponse type? 98 Explain "partial odds". 98 Explain coding in saturated vs. nonsaturated models. 98 What is loglinear analysis with latent variables? 99 Bibliography 99 Pagecount: 103