A comparison of loglinear modeling and logistic regression in management research

A comparison of loglinear modeling and logistic regression in management research

Journal of Management 1996. Vol. 22. No. 2, 339-35X A Comparison of Loglinear Modeling and Logistic Regression in Management Research Richard Tansey ...

2MB Sizes 31 Downloads 119 Views

Journal of Management 1996. Vol. 22. No. 2, 339-35X

A Comparison of Loglinear Modeling and Logistic Regression in Management Research Richard Tansey University of Alaska at Fairbanks

Michael White Louisiana Tech University

Rebecca G. Long Louisiana Tech Universio

Mark Smith University of Southwestern Louisiana

There is little formal guidance in the applied statistical literature concerning the relationship between loglinear modeling and logistic regression. In order to more clearly delineate this relationship, this manuscript compares and contrasts loglinear modeling and logistic regression analysis and demonstrates the advantages and disadvantages of each technique. In addition, a formal comparison of the statistical assumptions and numerical calculation problems for both of these techniques is provided.

Since the mid-1970s categorical data techniques such as probit, logit, loglinear modeling and logistic regression have increasingly been used in economics, medical research, sociology, marketing and international business, but have remained underutilized in applied behavioral research. In their analysis of prominent management journals between 1982-1991, Drazin and Kazanjian (1993) found 34 articles in the Academy of Management Journal, Administrative Science Quarterly, and Strategic Management Journal that reported a contingency table in their results section. Of these 34 studies, five used descriptive statistics (percentages), 17 used descriptive statistics and chi-square tests, and only 12 studies combined descriptive statistics, chi-square tests, and multivariate techniques such as loglinear modeling and logistic regression. Drazin and Kazanjian (1993) contend that because of researchers’ limited knowledge of advanced multivariate techniques, such as loglinear modeling, they generally avoid making any specific hypotheses when analyzing contingency tables. Direct all correspondence to: Michael White, LouisianaTech ing, P.O. Box 103 18, Ruston, LA 7 1272.

Copyright 0 1996 by JAI Press Inc. 0149-2063 339

University,

Department of Management

and Market-

340

TANSEY. WHITE, LONG AND SMITH

A review of the Journal of Management for the years 1991 to 1993 revealed three major types of hypothesized statistical models which were used to present the expected relationships among a set of research variables. Contingency tables were especially used in the strategy and human research management areas to analyze expected patterns produced by various combinations of certain variable levels. There was only one research study in the Journal ofManagement between 1991 and 1993 that used a contingency table format to present a literature review of previous work on a given topic along several dimensions. Given the categorical nature of much of the data in management research it is unfortunate that the multivariate statistical packages are underutilized. The lack of diffusion has created at least three general problems which focus on either a lack of awareness of the techniques or misunderstandings as to their interrelationships and appropriate uses. The first problem is a growing concern that researchers are increasingly using regression statistics and ordinary least squares (OLS) with binary dependent variables even though they are violating important assumptions about the use of non-interval measured variables (Agresti, 1990; Barry, 1993; Hardy, 1993; Huselid & Day, 1991; Santner & Duffy, 1989). It is further argued that these problems could be easily handled with general linear models or logistic regression. Specifically, OLS is a linear technique utilizing a least squares estimation procedure that provides unbiased parameter estimates when the dependent variable is continuous and the estimated errors are normally distributed. Neither of these assumptions, however, are satisfied when a binary dependent measure is employed. For example, a recent critique of a major research stream involving organizational commitment, job involvement, and turnover argues that the results obtained in many of the studies were probably in error because the researchers used OLS regression and related models when the dependent variable was binary (Huselid & Day, 1991). The primary focus of Huselid & Day’s (1991, p. 381) criticism rests on the contention that using OLS violates three important assumptions when used to estimate a binary dependent variable: 1. 2.

3.

Predicted values can fall outside the logical O-l boundaries, yielding meaningless results (Amemiya, 1981, pp. 1,486; Maddala, 1983, p. 16). Heteroscedasticity and nonormality of the errors invalidate the coefficient t-tests for the independent variables (Doran, 1989, p. 315; Maddala, 1988, p. 269). Estimates of the marginal effects of an independent variable are biased because they depend on the mean value of the dependent variable (Doran, 1989, p. 3 16; Maddala, 1983, p. 24).

A second problem is the belief that logistic regression is identical to regression analysis (Cox & Wermuth, 1992; Roncek, 1993; Demaris, 1993). Such a belief is incorrect and can lead to erroneous applications of the statistic. For example, regression methods for binary data are frequently based on the logistic model. Researchers need to be aware that R2 is not suitable to judge the effectiveness of linear regressions with binary responses even if an important relation is present (Freeman, 1987). A third problem is the implicit assumption that logistic regression is the preferred JOURNAL OF MANAGEMENT,

VOL. 22, NO. 2, 19%

341

LOGLINEAR MODELING AND LOGISTIC REGRESSION

choice over loglinear modeling (Agresti, 1994). Unfortunately, we are unaware of any presentation in the statistical theory literature which explicitly compares and contrasts loglinear modeling and logistic regression as tools in applied research. Therefore, the primary purpose of this manuscript is to compare the relative statistical advantages and problems that researchers confront when evaluating the pros and cons of loglinear modeling and logistic regression as the principal statistical tools for applied research. This manuscript will present (1) applications of both loglinear modeling and logistic regression; (2) a discussion of the three major advantages logistic regression enjoys over loglinear modeling; (3) an analysis of the four major advantages loglinear modeling enjoys over logistic regression; and (4) a demonstration of the technique for human resource management data. Applications

of Loglinear Modeling and Logistic Regression

Applications of Loglinear

Modeling

Loglinear modeling is essentially a discrete multivariate statistical technique that is designed specifically for analyzing data when both the independent and dependent variables are categorical or nominal. It is recommended in the statistical theory literature that there are two broad applications of this technique for empirical research (Agresti, 1990; Upton 1978): 1.

2.

measuring the strength of interactions (associations) among a set of variables without conceptually distinguishing between a response (dependent) variable and a set of explanatory (independent) variables; measuring the strength of the association between each designated independent variable and each designated response variable controlling for the influence of all the remaining independent and dependent variables.

Traditionally, most loglinear applications have restricted their focus to the first application measuring the partial interactions, as reflected in lambda weights (estimates of the magnitude) and their respective Wald Z-tests, among a set of variables. The Wald test is obtained by comparing the maximum likelihood estimate (MLEs) of the slope parameter to an estimate of its standard error. MLEs are used for estimating the population parameters most likely to have resulted in the observed sample data. Usually loglinear modeling has been performed in an exploratory research setting. Upton (1978, pp. 74-75) summarizes the overall goal of this use of loglinear modeling by observing: “Essentially the situation is one of locating [partial] correlations rather than of regression.” Upton (1978) described the second broad application of loglinear modeling as measuring the strength between each designated outcome variable and each designated explanatory variable controlling for the influence of all the remaining variables. This second research strategy for categorical data is typical of a regression or ANOVA/MANOVA framework, since researchers must designate each variable as a dependent or independent variable. The choice of either research strategy will influence how expected cell frequencies are calculated with MLEs under hierarchial loglinear modeling JOURNAL OF MANAGEMENT,

VOL. 22, NO. 2, 1996

342

TANSEY. WHITE, LONG AND SMITH

(Agresti, 1990). Under the first research strategy, emphasizing a partial correlation analysis, the expected cell frequencies of a contingency table are a function of the row and column effects (for a two dimensional table) of each variable and the hypothesized interactions (associations) among these variables (White, Tansey, Smith & Barnett, 1993; White, Tansey & Smith, 1994). MLEs of each table cell are based on minimal sufficient statistics derived from various marginal subtable totals specified for an hypothesized theoretical model that researchers feel will fit the observed cell counts (Agresti, 1990). The minimal sufficient statistics are the marginal distributions corresponding to the terms in the model symbol. For the model (XY, XZ, YZ) having all two-factor dependencies, for example, the minimal sufficient statistics are the two-way marginal tables. MLEs for each cell can be directly calculated from statistical formulas for only certain loglinear models; such models are called direct or closedform models. MLEs can only be estimated by iterative computer algorithm procedures for indirect models which are common in contingency tables with four or more variables. Standard errors for the lambda weights, a measure of the association between two variables, are only available for direct models (Lee, 1977); thus, most loglinear experts focus on a variable’s incremental contribution to G2 (the likelihood ratio chi-square test statistic). In essence, the G2 test statistic reflects how well a variable improves the model’s goodness-of-fit beyond the fit produced by a baseline model without this variable. Under the mixed response/explanatory research strategy, researchers only focus on a limited subset of possible hypothesized models by automatically accounting for all the possible interactions among the explanatory factors. Specifically, include the highest order interaction term among the explanatory factors, and then test for associations between each dependent (response) variable and each independent variable (explanatory) as well as for associations between each pair of response variables. Under the second research strategy expected cell frequencies are based on three inputs: 1.

2.

3.

estimating the simple row and column effects for each variable (such marginal distributions are often a result of a researcher’s sampling design); specifying the highest order interaction term among all the explanatory variables, thus controlling for maximum joint influence of the explanatory variables before proceeding to the third input; the interactions or partial associations, between each response variable and each independent variable and the interactions between multiple response variables.

Applications

of Logistic Regression

Logistic regression is a special case or subset of loglinear modeling and is particularly appropriate in estimating a binary dependent variable using a maximum likelihood estimation procedure. However, for the analysis of a set of nomiJOURNAL OF MANAGEMENT,

VOL. 22, NO. 2, 1996

LOGLINEAR

MODELING

AND LOGISTIC

REGRESSION

343

nal and ordinal variables, researchers may select either logistic regression or loglinear modeling (Agresti, 1990; Hosmer & Lemeshow, 1989). In our discussion of logistic regression we will adhere to Hosmer and Lemeshow’s (1989) use of the terms “covariate” to refer to a right-hand side independent variable and “outcome variable” to refer to the left-hand side dependent variable. Since 1970, statisticians have recommended two broad applications of logistic regression for applied research: 1. 2.

predicting group membership for the dependent variable; measuring the “instantaneous rate of change in the probability of occurrence of an event with change in a given predictor” (Demaris, 1990, p. 273).

Similar to discriminant analysis, logistic regression has been used to predict group membership based on information contained in a hypothesized model of covariates (Hosmer & Lemeshow, 1989). Specifically, logistic regression users can obtain the probability of accurately classifying the presence of an event, the absence of an event, and the overall pooled rate of all the sample cases across both those representing the presence or absence of a certain type of event. However, Hosmer and Lemeshow (1989) offer several caveats for using logistic regression as a classification tool: 1.

2.

3.

4.

a classification table created by the accurate predictions based on a logistic regression model is most appropriate when classification is the applied researcher’s stated goal; the use of such classification tables is only a secondary (supplemental) way of testing the goodness-of-tit for a given logistic regression model. Testing a model’s fit should be primarily based on more rigorous tests. For example, the Hosmer-Lemeshow test or Score test (Brown, 1982); the notion that if the logistic regression model “predicts group membership accurately according to some criterion, then this is thought to provide evidence that the model fits” (Hosmer & Lemeshow, 1989, p. 146) is questionable, since there are situations in which the model is correct, but does a poor job correctly classifying the cases; the tendency for logistic regression to classify cases according to the relative size of the two dependent variable’s groups with the result that more cases are classified in the larger group based on its size not on the nature of the model; the expected error rate for the classification table created by a logistic regression model is a function of the “magnitude of the slope, not necessarily of the fit of the model” (Hosmer & Lemeshow, 1989, p. 147).

The second broad use of logistic regression is to assess the impact, i.e., sign and magnitude of a covariate on a change in the dependent variable can be estimated in several different ways (Morgan & Teachman, 1989). If all the variables in the model are categorical (nominal or ordinal), applied researchers can use JOURNAL

OF MANAGEMENT,

VOL. 22, NO. 2, 1996

TANSEY,

344

WHITE.

LONG AND SMITH

weighted least squares or maximum likelihood procedures. However, if all the variables are interval-level or a mixed set containing both interval and categorical/ ordinal data, only maximum likelihood procedures should be used. For logistic regression, MLEs are consistent and asymptotically efficient (Hirji, Tsiatis & Mehti, 1989). However, a partial slope coefficient, (i.e., an estimate of a covariate’s impact on a dependent variable) can be nonfinite for sparse data sets which when cross-classified as a table contains multiple zero cells. Furthermore the optimality properties for MLEs only hold for large samples. For small and medium-size samples, the validity of these MLE properties are in doubt. The interpretation of the partial slope coefficients for logistic regression is significantly different from that in classical OLS regression (Demaris, 1990; Morgan & Teachman, 1989). In OLS regression the partial slope partial regression coefficient indicates the change in the expected value of the dependent variable for a one-unit change in a given predictor, while the rest of the predictor variables are held constant. This partial slope is constant across the range of values for a particular independent variable. In contrast, the partial slope for a covariate in logistic regression is not a constant, but represents the instantaneous rate of change in the log odds of the dependent variable for each covariate pattern, a covariate pattern represents only one of usually many different combinations of the independent variables. In other words, the partial slope “varies for every different possible combination of values of all of the independent variables” (Demaris, 1990, p. 273). Furthermore, logistic regression is particularly appropriate in situations in which there are declining marginal effects for both tails of a covariate’s distribution (Morgan & Teachman, 1989). Comparative Methodological

Accessibility:

Advantages

of Logistic Regression

Logistic Regression’s

Similarity to OLS Regression

Because logistic regression designates a left-hand outcome variable and a set of right-hand covariates in the same manner as OLS regression, new users of this technique will intuitively understand the overall goal of this technique (i.e., for each covariate pattern how much variation in the outcome variable can be attributed to each covariate controlling for the remaining covariates). Logistic regression provides Wald Z and likelihood ratio test statistics to estimate the magnitude, standard error and sign for the impact of each covariate on the outcome variable (See Tables 1 and 2). In loglinear modeling estimates of the magnitude (lambdas) of the association between two or more variables can be obtained from the inverse information matrix. Standard errors based on the delta method are only available in this case (Agresti, 1990, p. 179). If the expected cell frequencies for a hypothesized model cannot be derived from mathematical formulas, but must be iteratively estimated via a computer algorithm, then the loglinear model only provides the lambda weights without any estimates of their standard errors. Thus, without closed form equations researchers cannot divide the lambda weights by the standard errors to test the null hypothesis that the lambda weight equals zero (Lee, 1977). Often, in models with more than three variables, various hypothesized models can only be JOURNAL

OF MANAGEMENT,

VOL. 22, NO. 2, 1996

LOGLINEAR

MODELING

AND LOGISTIC

345

REGRESSION

Table 1. A Comparison of the Statistical Capabilities Loglinear Modeling and Logistic Regression L.qlineur

of

Logistic

Modeling

Regression

Partial and Marginal Associations

Pairwise Correlation

Measuring Relationships Between Multiple Dependent Variables and Multiple Independent Variables

Simultaneous test controlling for the association between each dependent variable and a set of independent variables while using multiple dependent variables

Sequential series of equations modeling each outcome variable for a covariate set

Formal Zero Cell Definitions Solution

A) Conceptual distinction between observed and structural zero cell types B) Formal models incorporating structural zeros into quasiindependence and quaaisymmetry models C) For sampling zeros, MLEs can produce positive cell counts

A) No formal distinction between various types of zero cells B) Reliance on ad hoc rules for handling zero cell problems

Formal Collapsibility

A) Sufficient conditions provided for collapsing the levels of a third variable to calculate the marginal association between two other variables B) Use of conditional G2 tests of independence for collapsing a third variable’s levels: conditionally independent means that X and Y are not independent given Z, but independent if we just take X and Y into consideration

A) No formal rules; ad hoc procedures to eliminate one or more levels containing zero cells for a nominal/ordinal variable

Assessing the Impact of an Explanatory Variable on an Outcome Variable

A) Lambda weights to measure the association between two variables; no measure of how much variation in outcome variable is produced by an explanatory variable controlling for the influence of the rest of the explanatory variables

A) Wald and likelihood ratio tests to estimate the sign, magnitude, and standard errors for the impact of each covariate on the outcome variable

Measuring Independent Associations

Variable

and

Matrix

B) Asymptotic standard errors provided for lambda weights only for models that have closed-form solutions Estimating Conditional Odds Ratio for the Outcome Variable

Not directly available from loglinear models; must convert loglinear model into equivalent logit model

Conditional odds for the outcome variable are estimated, as well as the standard error for these odds

Data Analysis of Mixed Data Sets Containing Interval and NonInterval Data

Most uses of loglinear analysis are confined to analyzing nominal and ordinal data

Statistically powerful tool for analyzing such mixed data sets

Sources: (Agresti, 1984; Andersen, 1980; Hosmer & Lemeshow, Duffy, 1989; Upton, 1978)

JOURNAL

1989; Everett, 1977; Fienberg,

OF MANAGEMENT,

1980; Santner &

VOL. 22, NO. 2, 1996

346

TANSEY, WHITE. LONG AND SMITH

Table 2. Comparing the Differences Between Logistic Regression and Loglinear Modeling for Testing the Association Between Independent and Dependent Variables Lo,+ric Rr~msion Two Step Process (A) Model selection strategy using a Pearson or likelihood ratio chi-square test statistic to determine whether IV-DV association significantly improves the overall fit of a given model. (B) Lambda Z tests to test which category of the IV are significantly related to one or more of the dependent variable categories

One Step Process using either A or B: (A) Wald b statistic testing null hypothesis that slope b=O. It is distributed either as a Z or chisquare statistic (Hosmer & Lemeshow, 1989: Hauck & Donner, 1977). (B) Likelihood ratio statistic that offers a comparative test whether adding or subtracting a covariate improves a model‘s overall goodness-o!-fit by significantly redu;ing G-. Twice the difference in G-scores between a larger and smaller nested model is called the deviance.

Assumptions about the nature of the relationship between an IV and DV

No assumption monotonicity

Linear relationship between IV and natural log of the odds ratio that an event occurs against that it does not.

Testing a single or compound hypothesis

Lambda Z tests represent a micro approach of simultaneously testing a set of multiple hypotheses between each category of an IV and each category of the DV.

Wald’s b statistic represents a macro approach of testing a single overall hypotheses about the linear relationship between IV and DV.

Statistical problem, estimates

Lambda Z tests arc asymptotically distributed, thus become unstable with sparse (10s than 5) cell counts or zero sample

Wald’s b statistic has the following deficiencies: (A) lack of statisticalpower to detect significant IV and DV relationships (Hauck & Dormer, 1977) (B) Multicollinearity, complete separation. and zero cells often create unusually large b weights and standard errors of the b weights for Wald’s b statistic (Hosmer & Lemcshow, 1989).

Tests for determining DV-IV relationship

a significant

of parameter

about linearity or

WllS.

Interpretability coefficients

Supplemental test

of parameter

parameter coefficient

Krv: IV = independent

JOURNAL

Limited interpretability since lambda Z tests only tests whether certain levels of both the IV and DV are significantly related: it does not tell the magnitude, direction or nature (linear or nonlinear) of the associations between these two variables.

Wald’s b statistic provides following types of information: (A) magnitude of the impact of a one unit change in the IV on the relative odds or probabilities of the DV (8) direction of the relationship, i.e., is the IV positively or negatively related linearly to the DV.

None

Use of likelihood ratio test IO overcome the lack of statistical power in the Wald b test. The likelihood ratio test essentially tests the difference in G (-2 log likelihood) between two nested models which are identical except one model includes the tested parameter and other excludesit. It is distributed as a chi-square statistic with the degrees of freedom equal to the difference in parameters between the larger and smaller.

variable; DV = dependent variable

OF MANAGEMENT.

VOL. 22, NO. 2, 1996

341

LOGLINEAR MODELING AND LOGISTIC REGRESSION

estimated by computer algorithms such as Iterative Proportional Fitting (Agresti, 1990). Thus, logistic regression provides researchers a superior opportunity to estimate each covariate’s quantitative impact on the outcome variable. Estimating the Conditional Odds for the Dependent Variable Logistic regression can provide point estimates and confidence bound intervals for estimating the natural log of the ratio that an event occurs compared to its non-occurrence, given a certain set of conditions measured by the covariates. This is a highly desirable attribute in applied research fields such as epidemiology, where medical researchers need precise estimates of the various marginal odds of contracting a fatal disease based on individual characteristics such as age, sex, race, and marital status (Hosmer & Lemeshow, 1989). This type of information can only be extracted from loglinear modeling by experienced users capable of transforming a selected loglinear model into an equivalent logit model (Aldrich & Nelson, 1984; Cramer, 1991; Demaris, 1993). Logistic Regression’s

Superior Flexibility for Analyzing Mixed Data Sets

Logistic regression is a more flexible instrument than loglinear modeling for analyzing a mixed set of nominal/ordinal and interval variables (Hosmer & Lemeshow, 1989). In logistic regression, interval variables are easily included in the model, while nominal variables are also easily accommodated as dummy or product variables. In contrast, most uses of loglinear modeling are currently restricted to categorical data relying on nominal and ordinal variables (Agresti, 1990). Comparative

Advantages of Loglinear Modeling

Loglinear experts argue that loglinear modeling enjoys four specific advantages over logistic regression (see Tables 2 and 3). Under the logistic regression framework, the researcher’s primary goal is to measure the independent impact of each covariate (independent variable) on the log odds of the ratio of a binary variable. Compared to loglinear modeling, the logistic regression framework does not provide researchers an opportunity to assess the dependence structure or associations among all covariates simultaneously (Agresti, 1990; Hosmer & Lemeshow, 1989). Loglinear modeling enjoys a second advantage over logistic regression because of its ability simultaneously to test relationships between multiple outcome and multiple explanatory variables (Fienberg, 1980; Upton, 1978; Agresti, 1990). The use of simultaneous tests under the loglinear approach permits researchers to test the association between each outcome variable and each explanatory variable, as well as the associations between outcome variables for tests involving multiple outcome variables. In contrast, logistic regression users must perform a series of sequential tests on models with only one specified outcome model under the assumption that the associations among the multiple outcome variables are not statistically significant (Hosmer & Lemeshow, 1989). Formal Definitions and Solutions for Zero Cell Counts A major weakness of logistic regression is its failure to distinguish between sampling and structural zero cells (Agresti, 1990). Furthermore, zero cell counts JOURNAL OF MANAGEMENT,

VOL. 22, NO. 2, 1996

348

TANSEY, WHITE, LONG AND SMITH

Table 3. A Comparison of the Methodological Assumptions and Numerical Calculation Problems for Loglinear Modeling and Logistic Regression Methodological

Assumptions

Loglinear Modeling

Logistic Regression

A) Observed cell counts are the result of one of three sampling designs, i.e., Poisson, multinomial, or product multinomial

A) Error terms are binomiallv tributed

dis-

B) Identity link function equals logit, i.e., forms the relationship between the probability distribution of the response variable and the linear function of the explanatory variable C) Linearity in the logit, i.e., log odds for the presence of an outcome is a linear function of the variables included in the final model D) For each covariate a non-constant slope parameter to describe its relationship to the outcome variable E) No extrabinomial outcome variable

variation

in

F) Range of probabilities for a successful outcome in dependent variable is between .20 and .80. G) Period of observation for all study subjects Numerical Problems

Calculation

Multiple sampling zeros lead to marginal subtable zeros which make it impossible to calculate MLEs for each table cell

A) Zero cell counts B) When a covariate perfectly predicts membership in each outcome group, then there is no overlap in the distribution of the covariates between the two outcome groups; MLEs cannot be calculated when this occurs C) Multicollinearity covariate set

Sources:

is equal

problem

for a

(Nelder, 1974; Cox & Wermuth, 1992; Agresti, 1990; Everett, 1977; Fingleton, 1984; Hosmer & Lemeshow, 1989; Reynolds, 1977; Knoke &I Burke, 1980; Knoke, 1975; Santner & Duffy, 1989; Upton, 1978)

often lead to serious numerical calculation problems in logistic regression, especially for overestimating parameter coefficients and their standard errors for both the dependent variable and individual covariates (Hosmer & Lemeshow, 1989). The occurrence of large standard errors should alert researchers to the possibility of serious numerical calculation problems created by zero cell counts. It should, however, be noted that Hosmer and Lemeshow (1989) recommend three strategies as ad hoc solutions for eliminating the problems caused by zero cells: JOURNAL

OF MANAGEMENT.

VOL. 22, NO. 2, 1996

LOGLINEAR

1.

2. 3.

MODELING

AND LOGISTIC

REGRESSION

349

collapse the categories of a nominal variable by combining a zero cell with a non-zero cell, thus eliminating the zero cell by reducing the number of variable categories (levels) by pooling two or more cell counts; simply eliminate the zero cell by discarding the variable category in which it appears; treat the variable as intervally measured, if the variable with a zero cell in one of its categories is an ordinal measure.

To evaluate these strategies for handling zero cells for logistic regression, it is important to note that such guidelines are not the result of formal statistical theories or extensive Monte-Carlo simulation tests, but are the general results of the rule of thumb based on the personal experience of applied social science researchers. Thus, caution should be exercised in the widespread application of these ad hoc strategies for applied research using logistic regression, especially in other social science domains outside of the health-related fields. In sum, the standard logistic regression textbook does not formally distinguish between different types of zero cell counts and does not currently offer well established rules for overcoming the numerical calculation problems created by zero cell counts. Fortunately, loglinear modeling has two additional advantages over logistic regression in its ability to offer formal definitions and solutions to the problem of zero cell counts for contingency tables composed of nominal and ordinal variables. As Table 3 indicates, loglinear modeling presents fewer numerical calculation problems and statistical assumptions for deriving MLEs of parameter coefficients than logistic regression. Recent research (Agresti, 1990; Hosmer & Lemeshow, 1989) suggests that zero cell counts represent the most serious numerical calculation issue in using loglinear or logistic regression for empirical data. While logistic regression does not formally distinguish between observed sampling zeros which are random variables ranging between zero and positive infinity and structural zero cells which are constants because of certain definitions and logical constraints, loglinear modeling distinguishes between these two distinct and non-overlapping types of zero cell counts: 1.

2.

sampling zeros occur in situations where one or more cases exist in the population of interest, but zero cell counts arise because of sampling variation, especially the use of small sample sizes for a contingency table composed of a large number of cells. Such zero cells will tend to disappear as sample size is increased (Knoke & Burke, 1980). The classic example used by Fienberg (1980) to illustrate sampling zeros is the observed zero cell count for Iowa Jewish farmers; such individuals exist, but small simple random samples of Iowa farmers will often not include these people because of their small population size; structural zeros, also called logical zeros (Knoke & Burke, 1980) occur when it is logically impossible to observe positive cell counts for specific combinations of various categories. Again, a classic example used by Fienberg (1980) to illustrate structural zeros is the logical impossibility of ever observing male obstetrical patients. JOURNAL

OF MANAGEMENT,

VOL. 22, NO. 2, 1996

350

TANSEY,

WHITE, LONG AND SMITH

For loglinear modeling, applied researchers should carefully distinguish between these two types of zero cell counts, because this statistical technique uses different solutions for these two opposing types of zero cells. One of the strengths of loglinear modeling is its ability to produce positive expected cell counts for contingency tables characterized by one or more sampling zeros (Agresti, 1990). Thus, even if no Iowa Jewish farmers occur in the sample data, a hypothesized loglinear model will estimate the presence of such individuals under the assumption that such persons actually exist in a given population. In contrast to its solution for estimating positive cell counts for sampling zeros, loglinear modeling treats structural zeros as logical impossibilities and thus does not estimate expected cell frequencies for such events (Knoke & Burke, 1980). Instead, the loglinear model treats structural zeros as fixed constants and focuses its efforts on producing MLEs for expected cell frequencies for the remaining non-structural zero cells in a table. Management

Example of Logistic Regression and Loglinear Modeling

One of the more dominant themes in management literature is the topic of employee turnover and will also be the focus of the example developed in this manuscript. Of particular note in this literature is the increasing use of nominally measured independent variables. For example, in the Journal ofMunagement in 1993, nominal variables were used in five separate voluntary turnover studies (i.e., Dalton & Todor, 1993; Dougherty, Dreher & Whitely, 1993: Johnston, Griffeth, Burton & Carson, 1993; Omstein & Isabella, 1993; Wright & Bonett, 1993). This growing use of nominal predictor variables suggests that we increase our use of loglinear and logit methods for analyzing such data sets. The data used in this presentation were derived from a national sample of young adults (12,686 persons, ages 2 1 and older on January 1, 1979) representing the civilian population in the United States and those on active military duty overseas in 1982. Interviews were conducted by the National Opinion Research Center at the University of Chicago and the Center for Human Resources and Research at Ohio State University. The subsample used here consisted of 783 females and 800 males. Black respondents accounted for 27.6% of the subsample, while whites accounted for 72.4%. The variables to be analyzed are sex (1 = female, 2 = male), race (1 = white, 2 = black), intention to quit (1 = no, 2 = yes), locus of control (low values = internal locus of control), job satisfaction (low values = low satisfaction), whether the respondent was the primary wage earner in his or her household (1 = secondary/ nonprimary, 2 = primary), and turnover (1 = stayed, 2 = quit). For pedagogical purposes we will test for a specific set of relationships. First, previous research shows (e.g., Bluedom, 1982; Mobley, Griffeth, Hand & Megilino, 1979; Porter & Steers, 1973; Weisberg & Kirschenbaum, 1993), and we expect to find, that turnover is related to intentions, job satisfaction, and the sex of the respondent. Second, although women are historically more likely to quit their jobs than men, we expect to find that when women are the primary wage earners this observed difference will disappear. Finally, following Griffeth and Horn JOURNAL

OF MANAGEMENT,

VOL. 22, NO. 2, 1996

LOGLINEAR

MODELING

AND LOGISTIC REGRESSION

351

(1988), we expect to find an interaction between job satisfaction and locus of control. That is, that job satisfaction will be a stronger determinant of turnover for those individuals who also have an external locus of control. Logistic Regression Analysis. For the logistic regression analysis locus, sex, intent, race, satisfaction, insat (i.e., intent * satisfaction), primary, primsex (i.e., primary * sex), and locsat (i.e., locus * satisfaction) are the independent variables and turnover is the dependent variable. In addition, turnover, sex, race, intent, sat, primary, and primsex are treated as nominal variables, while locus and locsat are treated as intervally measured. Tables 4 and 5 present summaries of the important statistics reported by SPSS forward and backwards logistic regression runs, respectively. Reading across the top, all the models tested are shown. These models include main effects as well as the two factor interactions mentioned above. Even with the relatively small number of variables used here, the number of potential models that can be created are staggering. For this reason it is considered good practice for model selection to always be theory driven. Theoretically driven selection is also helpful when different stepwise methods produce different models, as they did in the current analysis. In the first row of each table is listed the variables that were either selected for inclusion (forward) or for removal (backward). The next two rows list the -2 log likelihood that the observed values come from the parameters. The difference between -2 log likelihood for two models in which the larger model includes a hypothesized independent variable and the smaller model excludes this hypothesized model is distributed as a chi-square with one degree of freedom for the hypothesized main effect. This difference between a larger and a smaller model is called a likelihood ratio test. Looking down this row and the next for its significance we see that it is not likely that all the significant variables and interactions are in this model. We will investigate this further in the loglinear analysis of this data. The next two rows give the goodness-of-fit statistics for the regressive model. Insignificant values indicate that the model does not differ significantly from the optimal model. The -2 log likelihood and goodness-of-fit statistics usually confirm each other, but in this case they do not. So we are left with the conclusion that while it is unlikely that the parameters produced the data, the model does produce expected results similar to the observed ones. The Wald statistic found in the next section is for analyzing individual variables and not entire equations. Specifically, the Wald statistic is for testing a single overall hypothesis about the relationship between an independent variable and a dependent variable. The particular variable being analyzed is in parentheses. For forward regression the Wald statistic of the most significant variable not in the equation is shown, This is the variable that will be selected for inclusion in the next step. For backward regression the Wald statistic of the least significant variable in the equation is shown. This is the variable that will be selected for removal in the next step. The last two rows are a classification table of the results of the equations being tested. In Table 4, the first model in the forward selection process with only a constant indicates that no employees quit. This is because the sample has more employees that stayed than quit. If one has to make a constant prediction then, no JOURNAL

OF MANAGEMENT,

VOL. 22, NO. 2, 1996

352

TANSEY, WHITE, LONG AND SMITH

Table 4. Models -2 Log Likelihood Significance Goodness-of-Fit df Significance

Constant

Forward Logistic Regression P

liS

m,s,

fxS,,I,

2192

2148

2109

2082

2075

0.0000

0.0000

0.0000

0.0000

0.0000

1583

1582

1591

1595

1593

1582

1581

1578

1577

1576

0.4882

0.4811

0.402 1

0.3677

0.3726

26.3 1

6.62

Wald

43.08

35.77

Wald Significance

0.0000

0.0000

0.0000

0.0100

Obs. Pred. Stays

100%

62%

74%

67%

67%

Obs. Pred. Quits

0%

65%

55%

57%

62%

Norrs; P = Primary Wage Earner: S = Job Satisfaction;

SX = Sex; 1, = Intent * Satisfaction

turnover or stayed is the best. Table 5, the first model in the backward selection process contains all the variables in our analysis. As more variables are added or subtracted from these models the overall classification pictures improve. Our final forward logistic regression model correctly classifies 67% and 62% of the stayers and quitters, respectively, and confirms our expectations with regard to job satisfaction and intentions. Our hypothesized relationship between primary wage earner status and sex is also confirmed. Indeed, when wage earner status was controlled for, being female was significantly and negatively related to turnover. The final backward logistic regression model correctly classifies 70% and 63% of stayers and quitters respectively. This model also confirms the relationships between job satisfaction, sex, intent, wage earner status, and turnover, but also confirms our expectations regarding locus of control and job satisfaction. Two questions remain, “What significant variables or interactions are missing from our analysis?” and “Which of the two logistic models is correct?’ We have in these results a good example of one of the problems associated with stepwise techniques. In addition to different results from forward and backward processes, other problems include sensitivity to random fluctuations and problematic p-values. Thus, larger sample sizes like ours and a confirmation by some other technique is usually wise when using stepwise techniques. The likelihood statistics also suggest something is amiss in both of these models, and two logistic models are not much better than none. For an exploration and confirmation, loglinear models will be constructed to explore the data further. Table 6 presents a summary of the important results Loglinear Analysis. from an SPSS loglinear analysis on our data set. If the model specifications look different it is because of the different requirements and approach of a loglinear analysis. First, there is a new variable specified in the model, turnover. Loglinear analysis does not differentiate between independent and dependent variables and so all variables must be specified. A second difference is that the variable locus of control is no longer present as a continuous variable. Loglinear models cannot JOURNAL

OF MANAGEMENT,

VOL. 22, NO. 2. 1996

LOGLINEAR

MODELING

Table 5. Models

353

REGRESSION

Backward Logistic Regression

~SSx,ls~LUU,

-2 Log Likelihood

AND LOGISTIC

~SSx,LUUs

m,sx,L,~,L,

2053

2053

2052

0.0000

0.0000

0.0000

Goodness-of-Fit

1603

1604

1604

df

1572

1573

1574

Significance

0.2862

0.2874

0.2924

Significance

Wald

0.0647

0.8561

Wald Significance

0.7992

0.3548

Obs. Pred. Stays

65%

66%

70%

Obs. Pred. Quits

65%

65%

63%

Norex

P = Primary Wage Earner; S = Job Satisfaction; S, = Sex; Is = Intent * Satisfaction; L = Locus of Control; I = Intention to Quit; R = Race; Ls = Locus * Satisfaction

handle continuous variables without some type of categorical grouping, thus locus of control has been dichotomized for this analysis. Third, this model starts not with main effects or hypothesized interactions as the earlier logistic regression did, but with a saturated model. In this example, with seven variables, there are seven factor interactions. The program proceeds from here by eliminating interactions and not individual variables in a stepwise manner. Only a variable with a random distribution across not only all other variables, but also even its own categories, would ever be eliminated. The goal in loglinear model selection is to find the simplest set of interactions and variables that still models the data well. The standard for modeling the data is an insignificant difference between the observed and expected frequencies. This is measured by the likelihood ratio chi-square and its significance depends on the magnitude of this statistic and its degrees of freedom. Beginning at the top of Table 6 with the saturated model that includes the seven variable interactions, and all the possible lower order interactions, we can see that it perfectly models the data but has no degrees of freedom. Since this is an hierarchical analysis, it is important to understand that in specifying the seven factor Table 6. Model Step 1

T*P*Sx*I*S*L*R

Step 10

T*P*Sx*I*S*R

Loglinear Analysis DF 0 24

L.R chi**2

P-value

0.000

1 .ooo

1.0774

1 .ooo

T*P*I*S*L*R P*Sx*I*S*L*R T*P*Sx”L T*Sx*I*L*R T*Sx*S*L NMPS: T = Turnover (Quit/Stay); P = Primary Wage Earner; S, = Sex; I= Intention to Quit; S = L = Locus of Control: R = Race JOURNAL

OF MANAGEMENT,

Job

Satisfaction;

VOL. 22, NO. 2, 1996

354

TANSEY. WHITE, LONG AND SMITH

interaction we automatically are including every other lower order interaction and all the main effects. This is the most complex model possible and as we can see by its complete lack of degrees of freedom it is not a desirable solution but only a theoretical starting point. After ten stepwise deletions the final model, like the saturated model we began with, has far too few degrees of freedom and too complex an overall structure to be a desirable solution. It does, however, solidly confirm the results of the backward logistic regression shown in Table 5. The loglinear solution indicates a very definite relationship between turnover and the locus of control/job satisfaction interaction as well as the relationships between wage earner status and sex and that between intention and satisfaction. The final model can also help us understand why the likelihood statistics of the logistic regressions looked so weak by pointing out possible interactions that we have missed. For example, the model indicates that not only is turnover impacted by locus of control, sex, intention, satisfaction, primary wage earner status and all their interactions, but that the race variable eliminated in both logistic regressions also has some impact. It is likely that it is the race variable in interaction with one of the other variables that is so significant. And, when the data is examined, it appears that while locus of control has no impact on white employees’ turnover, it does have an effect on black employees’ actions. Indeed, a crosstabulation of the data shows that most black respondents who quit their jobs also had an external locus of control. The race variable might also give us some insight on the differing ways the two logistic regression analyses handled the intention to quit variable since race, at least in this data set, is positively associated with intent but negatively associated with actual turnover. That is, it appears that black respondents more often said they would quit and then didn’t. Conclusion It is often the case that no clear guidelines exist for helping researchers select one statistical tool over another as their primary data analysis method. This dilemma is further exacerbated by the tendency among advocates of various statistical techniques to overweigh the benefits of their favorite technique while minimizing its shortcomings. Researchers are confronted with both of these confusing conditions when making the decision to use loglinear modeling or logistic regression. In the absence of uniform rules for selecting one technique over the other, this choice will ultimately rest on the researcher’s goals, statistical criteria, and the researcher’s level of statistical training and willingness to learn new statistical techniques. It is our concern that new adopters of these techniques do not consider their potential shortcomings and do not consider alternative methodological approaches which may be more statistically appropriate for solving a specific type of problem. We therefore recommend that applied researchers consider the following issues: 1.

Are our data sets consistent with the methodological assumptions of by advocates of logistic logistic regression ? Generally overlooked regression, this technique makes a large number of methodological

JOURNAL OF MANAGEMENT.

VOL. 22. NO. 2. 1996

LOGLINEAR MODELING AND LOGISTIC REGRESSION

2.

3.

355

assumptions (see Table 2). However, unlike OLS regression the statistical literature on logistic regression devotes scant attention to the assumptions required for the efficacious use of this technique. What are the consequences of violating the assumptions underlying logistic regression? Little attention is addressed to this important issue. However, several prominent statisticians (Cox & Wermuth, 1992) have pointed out that logistic regression assumes that the range of positive outcomes for a dependent variable’s odds ratio lies between .20 and JO. However, according to these statisticians, this assumptions invalidates the use of R2, the coefficient of determination, as a summary goodnessof-fit measure since it arbitrarily restricts the range interval for R2. Can we analyze our data sets with both techniques? For mixed data sets containing both interval and categorical variables as covariates, logistic regression is both a powerful and flexible tool and certainly should be used over loglinear modeling, since the latter technique is confined to categorical variables. However, in situations in which all the response and independent variables are measured categorically, applied researchers have the choice of analyzing their data sets using either loglinear modeling or logistic regression. We recommend that researchers should use both techniques to analyze such categorical data sets to ensure that their statistical results are robust across both techniques, rather than being a statistical artifact created by a violation of one of logistic regression’s methodological assumptions.

Some applied researchers are initially wary of loglinear modeling because of its unfamiliar concepts and the need to find the best-fitting model for a data set. At the moment, there are at least four or five different approaches in loglinear modeling for determining the best-fitting model or a small subset of rival models that equally best fit the data. However, finding the best-fitting model is a serious problem not restricted to loglinear modeling; it is also a problem for logistic regression, especially for large models with four or more covariates (Hosmer & Lemeshow, 1989). Figure 1 presents a flow chart for when to select either loglinear modeling or logistic regression. Our experience suggests that researchers initially tend to focus on loglinear modeling’s unfamiliar concepts, e.g., structural zeros, and the lack of a uniform approach for determining the best fitting model without properly weighing the statistical advantages of loglinear modeling. For instance, statisticians (Whitemore, 1978; Giles & Lepage, 1986) suggest the loglinear modeling’s collapsibility rules can be used for solving two major problems frequently encountered in analyzing contingency tables: 1.

2.

The large number of cells for tables of high dimension (a contingency table composed of many variables) makes it difficult to detect relationships among key variables. In a table with a large number of cells and a small total sample size, some of the cells are likely to be empty or have very low frequencies. JOURNAL OF MANAGEMENT,

VOL. 22, NO. 2, 1996

356

TANSEY, WHITE, LONG AND SMITH

Some of variables are interval Yes

I

Yes

I

I

Research Goals on binary outcome Yes

Yes

Interactions and Model Selection

Is research goal focused on discovering all significant higher order interactions?

No -

-l

Yes Level 4 Presence of Zero Cells

I

the categorical variables are crossclassified, are there sampling or structural zeros oresent?

4

Emphasis on discovering main effects and 1st order (two-variable) interactions Yes

When

Yes

No

sampling or shuctoral zeros

-

I

Yes

I

[UwloglinurmodelingJUse logistic

I

Figure 1.

regression

Flow Chart for Selecting Loglinear Modeling Versus Logistic Regression

Loglinear modelings’ formal collapsibility rules permit applied researchers an opportunity to eliminate extraneous variables in a high dimension table and focus on determining the dependence structure among theoretically significant variables without the concern that they confounded their study by not controlling for these extraneous variables. Furthermore, the collapsibility rules allow researchers to eliminate sampling zeros and increase their table’s cell sizes by collapsing cells across various extraneous variables, thus increasing the cell sizes in the reduced table and the asymptotic validity of the statistics derived from these cell frequencies. The distributions of loglinear modeling’s test statistics are derived under asymptotic sampling conditions; thus, the presence of small cell sizes raise serious questions about the appropriateness of such test statistics. Ultimately, however, a researcher’s substantive goals will have a major influence in determining which statistical tool is chosen. For instance, the use of loglinear modeling is more appropriate in situations where applied researchers are interested in the various pairwise and higher order associations among a set of independent variables. In contrast, logistic regression is a more powerful tool for JOURNAL

OF MANAGEMENT,

VOL. 22, NO. 2, 1996

LOGLINEAR

MODELING

AND LOGISTIC

REGRESSION

357

applied research aimed at producing precise predictions regarding the log odds ratio of a binary dependent variable. This latter task can be performed by loglinear analysis indirectly; namely converting a loglinear model to an isomorphic logit model. However, this latter task presumes a certain level of technical sophistication which some researchers may not possess. It has been the purpose of this manuscript to compare and contrast logistic regression and loglinear modeling and not to champion one technique over the other. As this area of categorical and mixed categorical/continuous data analysis grows and becomes ever more sophisticated, it would be prudent for researchers to periodically update their knowledge. Huselid and Day (1991) demonstrate quite convincingly that often researchers place too much faith in the robustness of OLS and MANOVA and have been guilty of violating critical assumptions even when there were alternative techniques available (Barry, 1993; Hardy, 1993). This, however, is not all the fault of the researchers. The textbooks and papers which are available tend to be oriented toward researchers with extensive mathematical training, thus pointing to a real need for “accessible” textbooks and “helpful” developmental manuscripts (Allison, 1984). Acknowledgement: Our thanks to Alan Agresti, David Balkin, Joseph Balloun, Gary Blau, Michael Crino, Jack Feldman, Gary Klein, and Susan Taylor for their helpful comments and suggestions. References Agresti, Agresti, Agresti, Aldrich,

A. (1984). Analysis ofordinal caategorical data. New York: John Wiley. A. (1990). Categon‘ccrl data analysis. New York: John Wiley. A. (1994). Personal communication. J. & Nelson, F. (1984). Linear probabili&, logit. and probit models. Sage University Paper Series on Quantitative Applications in the Social Sciences, 07-045. Beverly Hills, CA: Sage. Allison, P. (1984). Event history analysis. Sage University Paper Series on Quantitative Applications in the Social Sciences, 07-046. Beverly Hills, CA: Sage. Amemiya, T. (1981). Qualitative response models: A survey. Journal ofEconomic Literature. 19: 1483.1536. Andersen, E. (1980). Discrere statistical models with social science applications. Amsterdam: North-Holland. Barry, W. (1993). Understanding regression assumpfions. Newbury Park, CA: Sage. Bluedom, A.C. (1982). The theories of turnover: Causes, effects and meaning. Pp. 75128 in S.B. Bacharach (Ed.), Perspectives in organizational sociology: Theo? and research, Vol. 1. Greenwich, CT: JAI Press. Brown, C. (1982). On a goodness-of-fit test for the logistic model based on score statistics, Communictrtions in Statistics, I I: 1087-l 105. Cox, D. & Wermuth N. (1992). A comment on the coefficient of determination for binary responses. The Amrriccm Statisfican, 46: 1-4. Cramer, J. (1991). The logit modelfor economists. New York: Routledge, Chapman and Hall. Dalton, D.R. & Todor, W.D. (1993). Turnover, transfer, absenteeism: An interdependent perspective. Journal of Management, 191 193-220. Demaris, A. (1990). Interpreting logistic regression results: A critical commentary. Journal ofMarriage and the Family, 52: 271-277. Demaris, A. (1993). Logif modeling: Pracfical applications. Newbury Park: Sage. Doran, H. (1989). Applied regression analysis in econometrics. New York: Marcel Dekker. Dougherty, T.W., Dreher, G.F. & Whitely, W. (1993). The MBA as careerist: An analysis of early-career job changes. Journal of Management, 3: 535-548. Drazin, R. & Kazanjian, R.K. (1993). Applying the Del Technique to the analysis of cross-classification data: A test of CEO succession and top management team development. Acudemy ofManagement Journal, 36(6): 1374- 1399. Everett, B. (1977). The analysis of contingency tables. London: Halsted Press. Fienberg, S. (1980). The analy.sis of cross-classified caregorical dafa, 2nd ed. Boston, MA: MIT Press. JOURNAL

OF MANAGEMENT,

VOL. 22, NO. 2, 1996

358

TANSEY, WHITE, LONG AND SMITH

Fingleton, B. (1984). Models ofccr&~or~ counts. New York: Cambridge University Press. Freeman, D. ( 1987). Applied cntegoriccd dutcr ctncrlysis. New York: Marcel Dekker. Griffeth, R. & Horn, P. (1988).Locus of c~on/rol rend d&y of ,qrcttifmtion us modercmm of emplqyee turnover. Journal of Applied Social Psycholog?: 18 (15): 1318- 1333. Hardy, M. (1993). Rqrrssion with ciunm~~wriahle.~. Newbury Park, CA: Sage. Hauck, W.W. & Dormer, A. (1977). Wald’s /ext as upplied /o hypothrses in lo,qic mcr1y.si.s. Journal of American Statistical Association, 72: 851.853. Hirji, K.. Tsiatis, A. & Mehta. C. (1989). Median unbiased estimation for binary data. The Amrricm Stcrtisticictn. 43: 7-l 1.

Hosmer, D. & Lemeshow S. ( 1989). Applied logistic. rqrrssiorz. New York: John Wiley. Huselid, M. & Day. N. (1991). Organizational commitment, job involvement, and turnover: A substantive and methodological analysis. Journtrl ofApplied P.v~cho/o,~y, 76(3): 380.391. Knoke. D. ( 1975). A comparison of log-linear and regression models for systems of dichotomous variables. Sociokqicctl Methods clnd Resrctrch, 3: 4 16-434. Knoke, D. & Burke, P. ( 1980). Log-linrcrr mod~1.r. Sage University Paper series on Quantitative Applications in the Social Sciences, 07.020. Beverly Hills: Sage. Johnston, M.W., Griffeth, R., Burton, S. &Carson, P.P. (1993). An exploratory investigation into the relationships between promotion and turnover: A quasi-experimental longitudinal study. Journcd of Mtrnctgernmt. 19: 33-50. Lee, S. (1977). On the asymptotic variance of ti terms in log-linear models of multidimensional contingency tables. Jour~~rrl cd Amrricm Stutisticcrl Axsoc~irrtion, 72: 4 12-4 19. Maddala, G. ( 1983). Lir,tited-~I~pcndmt md yunlitrrtivr ~~tricrhlrs in econometrics. Cambridge. U.K.: Cambridge University Press. Maddala, G. ( 1988). /ntrodrcctiorr /o rconoruetricc. New York: Macmillan. Mobley, W.H., Griffeth, R.W., Hand, H. & Megilino, M. (1979). Review and conceptual analysis of the employee turnover process. P.vycholo,+~tl Bulktin, 86: 493-522. Morgan, S. & Teachman, J. (1989). Logistic regression: Description. examples. and comparisons. Joumctl of Mtrrriqr md the Fmily. 50: 929.936. Nelder. J. (I 974). Loglinear models for contingency tables: A generalization of classical least squares. Applied Sto/i.r/ic~s, 23: 323-329. Ornstein. S. & Isabella. L. (1993). Making sense of careers: A review 1989.1992. Joumcr~ of’Mrmtr,qrmm/, 19: 243.268. Porter, L.W. & Steers. R.M. (1973). Organizational, work and personal factors in employee turnover and absenteeism. P.v~c,holo,~ic,ct/ Bulletin. KU: 15 I- 176. Reynolds, H. (1977). The crmrly.ris ofu-mc v-c~lr.s.sijic~tttions. New York: The Free Press. Roncek. D. (1993). When will they ever learn that first derivatives identify the effects of continuous independent variables or”Officer, you can’t give me a ticket. I wasn’t speeding for an entire hour.” Socicrl Forc,r.s, 71(4): 1067-1078. Santner. T. & Duffy, D. (1989). Thr .vfcrti.stic~cdcmcrly.sis ofdiscretr dmc. New York: Springer-Verlag. Upton, G. (1978). The tmcr1y.si.sof‘c,r-ass-rtrhulcrtrtl dtrtcr. New York: Wiley. Weisberg. J. & Kirschenbaum, A. (1993). Gender and turnover: A re-examination of the impact of sex on intent and actual job changes. Hlmzcrn Rr1rttion.v. 46(8): 987-1006. White, M., Tansey, R.. Smith, M. & Barnett, T. (1993). Log-linear modeling in personnel research. Prrsonnrl Psyhr~/o,qv 46(3): 667-686. Whhe, M., Tansey, R. & Smith. M. ( 1994). A causal modeling test of the relationship between Chief Executive Officer experience and corporate strategy. Jounz~l c!f’Occuptrtiotttrl md Orgctnixrtionrtl Psyholo~y. 67: 259.278.

Whittemore,

A.S. (1978). Collapsibility

of multidimensional

contingency

tables. Journtrl ofStuti.stic~crl Soc,iology,

10: 328.340.

Wright, T.A. & Bonett, D.G. (1993). The role of employee coping and performance in voluntary employee withdrawal: A research refinement and elaboration. Jourrzcrl ofMmcr~enzent. 19: 147-162.

JOURNAL

OF MANAGEMENT,

VOL. 22. NO. 2, 1996