DEVELOPMENTALREVIEW
7, 66-85(1987)
Using Latent Class Analysis to Test Developmental Models DAVID City University
RINDSKOPF
of New York Graduate
Center
While psychology in general has moved away from using typologies and toward using continua to conceptualize dimensions of behavior, certain aspects of learning and development are still fruitfully considered in typological terms. One reason for the abandonment of typologies was that a small number of types seemed unable to explain the enormous diversity of behavior. The development of statistical models such as latent class analysis has allowed theories involving types to be tested, and the allowance for errors of measurement explains how a small number of types can result in a large number of observed patterns of behavior. This paper demonstrates the application of latent class analysis in several areas of interest to developmental psychologists. 8 1987 Academic Press, Inc.
Many years ago, psychological theories based on typologies began giving way to theories based on traits; discrete categories were considered crude approximations to continuous traits. If onIy measuring instruments were precise enough, it was thought, any characteristic could be measured quantitatively. But many theories, particularly in developmental psychology, have continued to be more closely aligned with type than with trait models. Children can either conserve volume or they cannot; Piagetian theory does not measure conservation on a continuous scale. Many other examples of a similar kind could be found; they have in common the characteristic that people are hypothesized to be in categories, instead of on a continuous scale. This paper describes the theory and application to developmental psychology of a method for the statistical analysis of data based on the existence of classes or categories of people. The method, called latent class analysis (or latent structure analysis), was developed over 30 years ago, but has only become easy to implement recently with the development of high-speed computers. The simplest latent class models consider only two classes (types) of people, while more complicated models consider more than one trait; The author thanks Geoffrey Saxe for providing one of the data sets discussed here. This article is based on a paper presented at a Social Science Research Council Workshop in New York City, in February 1985. Requests for reprints should be sent to David Rindskopf, Ph.D., City University of New York Graduate Center, 33 West 42nd Street, New York, NY 10036. 66 0273-2297187$3.00 Copyright 0 1987 by Academic Press, Inc. All rights of reproduction in any form reserved.
LCA IN DEVELOPMENTAL
MODELS
67
there are also models where traits form a hierarchy of development. The first section of this paper describes the conceptual background necessary for understanding the basic model. Later sections show the application of successively more complicated models to data sets. The final section contains some guidelines for designing studies which can use the methods described. CONCEPTUAL
OVERVIEW OF LATENT CLASS ANALYSIS
To start with the simplest possible case, suppose that a theory hypothesizes that there are two kinds of people: those who have some skill, characteristic, or trait; and those who do not. This might be those who have and have not achieved formal operations, for example. If four items designed to measure this characteristic were presented to a group of people, and if each item were a perfect measure of the characteristic, then only two response patterns would be observed. Those who have acquired formal operations would get all of the items correct (which can be represented by the pattern 111l), and those who have not would get every item wrong (represented as 0000). Unfortunately, items are not all perfect, nor are peoples’ responses to items always perfect reflections of their skill or ability. We would expect, therefore, that some people who have acquired formal operations would miss one or more items due to carelessness, fatigue, misunderstanding the wording of the item, and so on. We also might expect that those who have not acquired formal operations might get one or more items right, perhaps by guessing or cheating, or because the items might allow a correct response to be deduced by means other than those the items are supposed to test. We would then see other patterns than the perfect patterns (0000 and 1111); with errors of measurement any of the 16 response patterns is possible. Historically, this discrepancy between theory and responding has been labeled the “competence-performance” distinction (Zimmerman & Whitehurst, 1979). When patterns other than the perfect patterns are observed, a theorist has two options. One option is to abandon the theory that there are two kinds of people, and adopt a theory that people fall on a continuous scale. In this case, counting the number of correct items will place each person on the scale. The other option is to retain the theory that there are two kinds of people, but presume that error processes such as those described above might account for the presence of the other observed response patterns. If there are errors in responses to items, then the actual class to which a person belongs is not certain; it is not directly observed, but is inferred through the pattern of right and wrong answers which contain errors. The “real” class to which people belong is called their latent
68
DAVID
RINDSKOPF
(or unobserved) class, from which the name latent class analysis for the statistical method of testing such models is derived. Latent class analysis would not be very useful if it merely speculated that the occurrence of these “imperfect” response patterns were due to error. Its utility comes from the mathematical model which makes predictions about the way these responses will be distributed. If a particular model is correct, then the number of people displaying each possible response pattern will have a predictable distribution. If this distribution does not occur, then there is evidence that the model is wrong. In other words, latent class models are falsifiable. PROCEDURAL
OUTLINE OF LATENT CLASS ANALYSIS
In this section, the basic steps in latent class analysis are described conceptually. For details on the technical aspects of latent class analysis, see Goodman (1974a, 1974b) and Haberman (1974, 1977, 1979). The first step in doing a latent class analysis is to specify the model. In the example above, there are two latent classes, and there are four observed variables, all of which are dichotomous. In general, both latent and observed variables must be categorical, but need not be dichotomous. In each sample of people, a certain proportion will be in each latent class. The probabilities of being in each of the classes are called the unconditional latent class probabilities. In the example described above, for those in the latent class which has attained formal operations, we can consider the probability of answering a particular item correctly. Presumably this would be rather high, but the probabilities may be different for each item; that is, some items are harder than others. There are, therefore, four conditional probabilities of answering items correctly given membership in the latent class which has attained formal operations. Similarly, for those who have not attained formal operations, there are conditional probabilities of answering each of the four questions correctly. These are presumably much lower than the corresponding conditional probabilities for those who have attained formal operations. The unconditional and conditional probabilities considered together are the parameters (i.e., unknown constants which must be estimated) in the statistical model. Once the model is specified, the parameters must be estimated. Usually this is done using the method of maximum likelihood. Two lowcost computer programs are available for doing the calculations required in latent class analysis. One, called MLLSA (Maximum Likelihood Latent Structure Analysis) was written by Clifford Clogg (1977). The source code in FORTRAN for another program, called LAT, is listed in an appendix of Haberman (1979). (Although both of these programs are written
LCA IN DEVELOPMENTAL
MODELS
69
in FORTRAN, once they have been installed on a computer system, no knowledge of FORTRAN is necessary except for simple formatting statements in the LAT program. Both programs can be easily installed on either mainframes or microcomputers.) Since the details of the estimation process shed no conceptual light on the general analytic method, they are not discussed here; those interested in pursuing technical details should consult Goodman (1974a, 1974b) or Haberman (19791, and the references they contain. One general point remains about the specification of the statistical model. While the observed variables are certainly related, the latent class model assumes that they are unrelated within latent classes. That is, in any latent class, errors on each item are independent of errors on other items. This is similar to independence assumptions in other statistical models, such as factor analysis, and merely asserts in a statistical way that the latent classes explain all of the relationships among the observed variables. After the parameter estimates are obtained, the next important step is to test the lit of the model. That is, a statistical test is done to see whether the observed data are consistent with the statistical model which is being hypothesized. This is done using a x2 test. The expected frequencies for this test are obtained in a simple manner from the parameter estimates. These expected frequencies are compared with the observed frequencies using either a Pearson or likelihood ratio x2 statistic. Although the Pearson statistic is the more familiar one, there are reasons for preferring to use the likelihood ratio statistic in most situations, as will be seen when the comparison of different latent class models is discussed. If the model that is being tested is correct, then the expected frequencies should be close to the observed frequencies, and the x2 statistic will be small. If the model is false, then the expected frequencies will not be close to the observed frequencies, and the x2 value will be large. Large x2 values will lead to the rejection of models as being implausible, since they are unlikely to have generated the observed data, while small x2 values will not lead to rejection of the model. In order to evaluate the x2 statistic, the critical value from the x2 distribution is needed. To do this, the degrees of freedom must be counted. The degrees of freedom for testing a latent class model are the number of independent observed cell proportions, minus the number of independent parameters estimated. The number of independent proportions observed is 1 less than the number of observed cells in the cross-tabulated data, since the proportions must sum to 1. In the examples above, there are four dichotomous variables, and 16 possible response patterns in the frequency distribution. The number of independent proportions is therefore 16 - 1 = 15.
70
DAVID
RINDSKOPF
There is I independent unconditional probability in the model, since once that value is known, the other is calculated by subtraction from 1. In general, if there are k classes, then there are k - 1 independent unconditional probabilities, because they must sum to 1, There are 4 independent conditional probabilities in each latent class, corresponding to the probability of getting each item correct given membership in a latent class. The conditional probabilities of missing items are merely 1 minus the probabilities of getting the corresponding items correct. The total number of independent parameters estimated in this model is therefore 1 + 4 + 4 = 9, and the degrees of freedom for testing this model are 15 - 9 = 6. EXAMPLE:
CONSERVATION
OF WEIGHT
Macready and Macready (1974, reported in Dayton & Macready, 1983) tested the ability of 64 children to conserve weight.’ There were 25 children whose responses on all four items indicated they conserved weight; 2.5other children gaves responses on all four items indicating they did not conserve; and the remaining 14 children gave mixed responses. The twoclass model appears reasonable given the theory; an empirical test of this model results in a likelihood ratio x2 of 6.742 with 6 degrees of freedom, so the data fit the model reasonably well. The parameter estimates for this model (labeled DM-1) are given in Table 1, along with the estimates for several other models for this data set and the goodness of fit statistics for these models. In order to tell which class represents conservers and which represents nonconservers, the conditional probabilities of answering each item correctly are examined. For Class 1, these probabilities are low, while for Class 2 they are high; this establishes Class 1 as the class which has not attained conservation of weight and Class 2 as the class which has. Items 3 and 4 are nearly perfect, in the sense that those who can conserve have a probability near 1 of answering them correctly, and those who cannot conserve have probabilities of 0 of answering them correctly. Item 2 is somewhat worse, and Item 1 worse yet, but still useful in spite of the fact that only three out of four conservers answer it correctly. QUASI-INDEPENDENCE
Even though one model fits the data, there may be other models which tit equally well or better. In the case of the data set described above, there are at least three plausible alternative models to consider. The first such model is called a quasi-independence model. One typical feature of this data set is that a large proportion of subjects 1 Dayton and Macready set.
(1983) describe some additional
latent class models for this data
LCA IN DEVELOPMENTAL TABLE
71
MODELS
1
ANALYSESOFDAYTONANDMACREADYDATAON CONSERVATION Parameter estimates P(right(class) Model
Class
P(class = i)
Item 1
Item 2
Item 3
DM-1
1 2 1 2 3 1 2 3 1 2 3 1 2 3 4 1 2 3
.42 33 .39 .35 .26 .41 .58 .02 .42 .37 .21 .42 .06 .oo .52 .42 .06 .52
.oo .76 0* I* .33 .oo .76 .oo .oo 1.oo .33 .oo .oo .85 .85 .oo .oo .85
.07 .84 0* 1* .63 .05 .84 .84 .07 1.00 .56 .07 .07 .93 .93 .07 .07 .93
.oo 1.00 0* 1* .87 .oo 1.00 .oo .oo 1.00 1.00 .oo 1.00 .oo 1.oo .oo 1.00 1.00
DM-2
DM-3
DM-4
DM-5
DM-6
Goodness-of-fit Model DM-1 DM-2 DM-3 DM-4 DM-5 DM-6
Item 4
.oo .97 0* 1* .81 .oo .97 .oo .oo .96 1.00 .oo .97 .oo .97 .oo .97 .97
tests
x2 WV 6.742 15.086 6.742 0.001 0.718 0.717
df** 10 9 1 8 8 9
* Fixed parameter. ** Adjusted for parameter estimates on boundary.
gave responses which are considered the “pure” response types for this model. That is, 50 out of the 64 gave consistent conserving or nonconserving responses to all items. This leads to the consideration of a model where there are three classes: conservers, nonconservers, and transitional responders. If it is reasonable to assume that the transitional subjects give independent responses to each item (that is, their response to one item is unrelated to their response to another item), then the statistical model is one of independence among the transitional subjects. This is called quasi-independence, because it is a model of independence of responses among a part of the sample (those who respond consistently are omitted). According to this model, the conservers will consistently
72
DAVID
RINDSKOPF
give conserving responses, the nonconservers will consistently give nonconserving responses, and the transitional subjects will respond in an essentially random fashion. The quasi-independence model just described was tested, and the results are presented in Table 1; the model is labeled DM-2. The parameter estimates show the statistical characteristics of this model: one class has the conditional probabilities of giving a conserving response fixed at zero; this is the class of nonconservers. Another class has the conditional probabilities fixed at one; this is the class of conservers. The remaining class has estimates of the probability that children in that class will get each item correct. Item 1 appears to be the most diflicult item. The model has a very large x2 relative to its degrees of freedom, and is therefore implausible. Even though a large proportion of the children fit into the two response patterns indicative of the conserving-nonconserving dichotomy, this model, which allows for some errors in response, does not fit well. UNRESTRICTED
THREE-CLASS
MODEL
Even though the quasi-independence model does not fit the data, there may be other three-class models which fit the data better. To start the investigation of other such models, a three-class unrestricted model was fit. By unrestricted, we mean that none of the parameter values is fixed, unlike the quasi-independence model, where conditional probabilities were fixed at 1 in the class which conserved, and at 0 in the class which did not conserve. The first solution which was obtained is presented in Table I, labeled as model DM-3. It appears that this model has the same goodness of fit as the two-class model, but this is incorrect as we will see below. This solution is presented to warn of some of the pitfalls in using the computer programs uncritically. It is possible for the computer program to find a solution which is not the optimal one. There are usually warnings given by each program that this has occurred. One is that the program does not converge; that is, it never reaches the optimal solution and it “knows” that this is the case. Another, more subtle problem, which has occurred here, is that some of the parameter estimates are on the border; that is, some estimated probabilities are either 0 or 1. This is not a certain sign that something is wrong, but that caution is needed. One precaution which is recommended in general, but which is particularly useful in cases where there are observed zeros (and therefore greater potential for estimation problems), is to use different sets of initial estimates of the parameter values to see if the same result is found for each. If so, the researcher has greater confidence that the estimates are correct (see Clogg, 1977; Goodman, 1974a).
LCA IN DEVELOPMENTAL
MODELS
73
Using a different set of starting values for the Dayton and Macready data set gives a x2 value of zero, and the parameter estimates shown as Model DM-4 in Table 1. One problem with an unrestricted three-class model for four dichotomous variables is that the parameters are not all identified. This means that under no circumstances can we get unique estimates of all of the conditional and unconditional probabilities. The best nontechnical explanation for what is happening is that the estimation process involves solving a system of equations, with the unknowns being the parameters (here the unconditional and conditional probabilities). Sometimes it is impossible to solve the equations for all of the unknowns; the most obvious example is when there are more unknowns than equations. That would occur in latent class analysis if the degrees of freedom are negative. But it may also be impossible to solve for all unknowns even when the degrees of freedom are positive, if some equations are redundant; this happens with the unrestricted three-class model for four dichotomous variables. Since we know the model is not identified, why bother to fit it in the first place? One reason is to generate ideas for restricted three-class models which are identified and might fit the data. Secondarily, the goodness-of-fit statistic for the unrestricted three-class model will indicate whether it is possible for any such restricted model to fit. If the unrestricted three-class model fits very poorly, then no restricted three-class model would fit either. The solution for the unrestricted three-class model, then, should be inspected with a view toward finding sensible restricted three-class models. The first important point is that all of the unconditional probabilities are nonzero. If any were zero, it would indicate a potential problem in finding the correct parameter estimates. The conditional probabilities are suggestive of a model with one class of people who always gets each item right, another class who always gets each item wrong, and a third class who always gets Items 3 and 4 correct, but are not consistent on Items 1 and 2. This is, of course, a quasi-independence model, which was already tested and rejected. To develop further models, we combine the empirical evidence that Items 1 and 2 may be different from Items 3 and 4 (which, at the very least, are easier), with information from Dayton and Macready that Items 3 and 4 deal with animate objects, while Items 1 and 2 deal with inanimate objects. This leads to a model in which there are four classes. One class contains those who do well on all items; a second class contains those who do well only on items dealing with inanimate objects; a third class contains those who do well only on items dealing with animate objects; and a fourth class contains those who do poorly on all items. Such a model is specified by imposing constraints on the conditional probabili-
74
DAVID
RINDSKOPF
ties; these constraints are described in detail in Goodman (1974a) and Rindskopf (1983). The goodness of fit and parameter estimates for this model are listed in Table 1 using the label DM-5. This model fits extremely well; the x2 is near zero. There are 8 degrees of freedom listed for this model instead of 15 - 11 = 4, because the model has one unconditional probability and three independent conditional probabilities whose estimates are either 0 or 1. The computer program treats these as if they were fixed quantities. Even if the degrees of freedom were counted in the usual way, the fit of this model would be excellent. As was noted above, there is a problem because one unconditional probability was estimated to be zero. That is, it is estimated that there are no people in the third class (people who show conservation when dealing with inanimate objects, but not when dealing with animate objects). This means that a three-class model with the same restrictions would fit just as well. These parameter estimates are shown in Table 1 as model DM-6. They are, indeed, the same as those for the corresponding classes in Model DM-5. Model DM-6 implies that there are those who can conserve, those who cannot, and those who can conserve on items which use animate objects. HIERARCHICAL
MODELS
Theory sometimes dictates that certain skills should be developed in a particular order, so that no one should acquire skill B until skill A has been acquired, and so on. Considering each skill as a dichotomous latent variable, with four such skills (which we will call A, B, C, and D) there are 16 combinations of presence and absence of the skills. If the hierarchical theory is correct, only 5 of these 16 types should exist. Using a 0 to represent that a skill has not been acquired, and a 1 to represent that a skill has been acquired, these five patterns can be represented as 0000, 1000, 1100, 1110, and 1111. Suppose that four items (also called A, B, C, and D) are developed, one of which measures each of the four skills. Because items are not perfect, the conditional probabilities will not all be 0 or 1. However, it should be true that no matter which class people are in, their probability of getting an item correct should be the same as that of people in other classes who are at the same level with respect to the skill needed for that item. For example, only people in the first latent class listed above lack the skill for item A; therefore, people in Classes 2, 3, 4, and 5 all have that skill and should have the same probability of getting item A correct. Even though those in the higher classes have more skills, those skills are irrelevant to answering item A correctly. These considerations result in
LCA IN DEVELOPMENTAL
75
MODELS
the imposition of equality constraints on the conditional probabilities; their nature will be evident from the parameter estimates. To demonstrate the hierarchical model, and compare it with some other models, a data set kindly provided by Geoffrey Saxe is used. There are four dichotomous items in this data set; the observed frequencies are listed in Table 2. The items are hypothesized to form a hierarchy such as that described above. The first model fit to these data is the hierarchical model; the goodness-of-fit statistics and parameter estimates are in Table 3, where this model is labeled S-l. This model fits the data well. As was noted for some models for the first data set, there are more degrees of freedom than would be expected, because three of the parameter estimates are on the boundaries. There are four independent unconditional probabilities to estimate, and eight independent conditional probabilities, or 12 parameters in all. This would leave 3 degrees of freedom to test the model; the computer program indicates 6 because of the boundary values. One problem with testing this hierarchical model is that, in general, it is not identified. Rindskopf (1983) discusses ways of making the model identified. One way is to eliminate the second and fourth latent classes, leaving a three-class model in which two classes respond alike to items A and B, and two classes respond alike to items C and D. The results of fitting this model are listed in Table 4 as Model S-2. As expected, the x2 value for testing Model S-2 is the same as that for testing Model S-l. In this model there are 2 more degrees of freedom, because there are two fewer unconditional probabilities to estimate.
TABLE
2
SAXE DATA, AND RESULTSOF FINING A TWO-CLASSMODEL Item A
Fail Pass Fail Pass Fail Pass Fail Pass Fail Pass Fail Pass Fail PaSS Fail Pass
Item B
Fail Fail PaSS Pass Fail Fail Pass PaSS Fail Fail Pass PaSS Fail Fail PaSS Pass
Item C
Fail Fail Fail Fail Pass Pass Pass Pass Fail Fail Fail Fail Pass Pass Pass PaSS
Item D
Observed
expected
class
Fail Fail Fail Fail Fail Fail Fail Fail Pass Pass Pass Pass Pass Pass Pass Pass
16 2 13 0 6 0 I 0 0 0 2 0 0 0 25
4.62 19.03 2.12 11.59 1.05 4.31 .62 7.07 0.0 0.0 0.0 2.05 0.0 0.0 0.0 24.95
2 2 2 2 2 2 2 1
1
P(classl
response) 1.00 1.00 1.00 .97 1.00
1.00 1.00 .64 -
1.00
76
DAVID
RINDSKOPF TABLE
3
ANALYSES OF SAXE DATA Parameter Estimates P(correctlclass) Model s-1
s-2
P(class = i)
Item A
Item B
Item C
1
.I8
2 3 4 5
.30 .12 .05 .35 .48 .12 .40 .08 .32 .61 .4l .59
.36 1.00 1.00 1.00 1.00 .76 1.00 1.00 0* I* .94
.22 .22 1.00 1.00
.19 .I9 .19 .93 .93 .19 .19 .93 0*
Class
1 2 3
s-3
s-4
1 2 3 1 2
1.00 .22
1.00
1.00 1.00 0* 1* .51 1.00
31
.37
1* .28 .92 .19
Item D
.oo .oo .oo .oo .98 .oo .oo .98 0* 1* .05 .85 .oo
Goodness-of-fit tests Mode1 S-l s-2 s-3 S-4
x2 G-R)
df
4.680 4.680 6.634 5.863
6 8 9 9
Now. Degrees of freedom are adjusted for parameters on boundaries. * Fixed parameter.
If we were to accept this model, the interpretation would be that there are three kinds of people: those who mastered no skills, those who mastered skills A and B, and those who have mastered all skills. While this is apparently not consistent with the hierarchical model we started with, in fact the problem is just that there is not the right kind of data to test the full hierarchical theory. The theory may be right, but with only one item testing each skill, no definitive test of the hierarchical structure is possible. Even though the model originally suggested has been tested, it is wise to test simpler models also to be sure that no sensible, more parsimonious model fits the data. Two such possibilities in this case are the quasi-independence model and the unrestricted two-class model. The results of fitting these models are presented as Models S-3 and S-4 in Table 3. Both models fit the data well, and on grounds of parsimony the twoclass model would probably be preferred by most theorists. Table 2 shows, for the two-class model, the probability of a person with
LCA IN DEVELOPMENTAL
MODELS
77
each response pattern being in the most likely latent class for that response pattern. This assignment to latent classes is easily done using Bayes’ theorem, and is included in the output of the MLLSA program. MODELS
FOR PANEL DATA
Models discussed in previous sections were all cross-sectional; each person was measured only once. When such data are used to test developmental models, it is assumed that people develop skills or abilities in a certain order, and that all people go through each step in the sequence. Measuring people more than one time allows some of these assumptions to be tested. One common, simple longitudinal design is called the crosspanel design. People are asked a series of questions at one time, and the same questions again at a second time. In this section, we investigate some models for such a study, in which two dichotomous items are asked (measured) at two times. As an example, consider a data set reported in Landis and Koch (1979). A sample of 354 children were measured at two time points (Tl and T2) on two developmental attributes (Al and A2), each of which was either present or not at each of the two times. We consider three types of models for these data, along with other related models suggested by the results of the three basic analyses. The simplest model is that there are two latent classes, that both items measure the trait represented by the two classes, and that children remain in the same class at both times of measurement. A second, more complicated, model is that the items both measure the same characteristic (which is either present or not), but that children’s status on this characteristic may change from Time 1 to Time 2. There would then be four latent classes, with the latent characteristic being either present or absent at Time 1, and again at Time 2. A third model is that the two items measure different (but possibly related) traits, each of which is either present or not, but that there is stability over time in each trait. That is, if a child has the trait measured by Item Al at Time 1, then he or she also has the trait at Time 2 (and similarly for the absence of the trait, and for the trait measured by A2). As for the second model, this model assumes four latent classes: people either do or do not have the trait measured by Item Al, and either do or do not have the trait measured by Item A2, giving 2 x 2 = 4 combinations. The second and third models both are specified by making restrictions on the conditional probabilities. The results of fitting these models to the Landis and Koch data are reported in Table 4. The two-class model (labeled LK-1 in the table) has a large x2 value and is rejected as being improbable for these data. Model LK-2 specifies that there are two traits, one measured by Item Al, the other by A2, and that these traits are either present or absent. The combi-
78
DAVID
RINDSKOPF TABLE
4
ANALYSES OF LANDIS AND KOCH DATA Parameter estimates P( + Iclass) Model LK-1 LK-2
LK-3
Class
P(class = i)
AlTl
A2Tl
AlT2
A2T2
1
SO
.78
1.00
SO
.02
.89 .oo
1.oo
2
.46
S8
Al
A2
1
.oo 30 .50 .oo
.oo .oo .89 .89
1.00
2 3 4
.78 .02 .78 .02
..58 .58
+ -
-
.46
1.00 1.00
+ Tl
+ + T2
1
.36 .oo
.03 .79
.oo .90
.24 .24
.41 .41
+
-
.I.5
.03 .79
.oo .90
1.oo 1.00
1.00 1.00
--‘+ +
+
2 3 4
.49
.46
1.00
Goodness-of-fit tests Model LK-1 LK-2 LK-3
x2 (W
df
33.908 33.914 6.635
9 6 5
Note. Parameter estimates on boundaries were ignored in computations of degrees of freedom.
nations of presence and absence of the two traits are indicated by + and - signs in the right margin of the table for this model. The model does not tit the data well, but the two conditional probabilities of zero indicate that caution is needed. Several different sets of starting values were tried to see if the x2 could be decreased, all without success; in each case, the solution reported in the table was obtained. This model is evidently not supported by the data. The third model, labeled LK-3 in Table 4, is that there is one dichotomous trait measured by both variables, and that people may change status on that trait between Time 1 and Time 2. The meaning of each of the four latent classes is indicated by the columns in the right margin labeled Tl and T2, and in which a + and - sign are used to represent presence and absence of the trait at each time. This model fits the data very well, as indicated by the small x2 value. Examination of the parameter estimates reveals that one of the latent classes has no people; that is, the unconditional probability is zero. This is that class which has the trait at Time 1, but not at Time 2. We conclude that “regression” (from possession of the trait to lack of it) does not
LCA IN DEVELOPMENTAL
MODELS
79
occur. To properly finish off this analysis, we would fit the three-class model, which omits the second class from this model. These results are not reported, as the goodness-of-fit and parameter estimates are the same as those reported for this model, except, of course, that Class 2 is not included. MULTIPLE
GROUP MODELS
All models discussed so far involved one group of people, whether measured at one time (cross-sectional data) or at more than one time (longitudinal data). In this section we consider the case where more than one group of subjects is measured. The groups might consist of subjects of different ages, racial or ethnic groups, geographic areas, sexes, and so on. Models for multiple groups are generally very similar to models for a single group, except that the latent classes are presumed to exist in each group to possibly different extents. That is, one group may have a larger proportion of its members in a latent class than does another group. In the extreme case, some groups may not have any of their members in certain latent classes, thus further restricting the model. Other types of restrictions may also be tested for multiple group models: it could be reasonable to assume that in each group, the conditional probabilities might be equal, even though the proportions of people in each class might vary across groups. In other cases, it might be expected that the proportions in each class might not vary across groups (i.e., that a distribution of a trait is invariant across populations). For a test of this, we would have to obtain a random sample from each population, since the unconditional probabilities are affected by the sampling scheme. As an example of a multi-group data set, we present some analyses of the responses of a group of high school students to the same question asked at two different times (the 9th and 12th grades). These data were analyzed using different procedures by Marascuilo and Serlin (1979). The 1652 students were asked whether the following statement was true or false: “The most important qualities of a husband are determination and ambition. ” Five racial/ethnic groups were represented in the study (Asian, black, Chicano, (American) Indian, and white). While there are a number of plausible latent class models for measurements made at two times (as described in the previous section), the analysis of this data set is severely limited because only one question was asked at each time point. Specifically, we cannot test whether there was a change in attitude over time without making some strong assumptions about restrictions in certain parts of the model. We will consider, out of necessity rather than choice, models which assume stability of the trait
80
DAVID RINDSKOPF TABLE 5 ANALYSESOFMARASCUILOANDSEFUIN DATA Parameter estimates P(class = i/group)
Model
Class
Asian
MS-l
1 2 1 2 1 2 1 2 3
.49 .51 .32 .68 54 .46 SO .ll .39
MS-2 MS-3 MS-4
Model MS-l MS-2 MS-3 MS-4
Black
Chicano
Indian
.28 .24 .20 .72 .76 .80 .32 .32 .32 .68 .68 .68 .24 .24 .24 .76 .76 .76 .28 .24 .19 .22 .14 .35 .49 .61 .46 Goodness-of-fit test x2 (LR) 8.560 98.732 11.840 2.518
P(truelclass) White
Time 1
Time 2
.55 .45 .32 .68 .54 .46 .S6 .14 .30
.31 .70 .38 .83 .31 .70 .30 .30 .89
.Ol .82 .20 1.00 .Ol .82 .04 .80 .80
df 6 10 9 1
from one time to the next, so that the items represent independent assessments of a stable attitude. The first model tested* assumes that there are two latent classes. Because of the nature of the item, these classes might be labeled truditionaf (or conservative) and nontraditional attitudes toward sex roles. The results of fitting the unrestricted two-class model are presented in Table 5, where the model is labeled MS-l. This model tits the data well, as indicated by the small x2 value relative to the degrees of freedom. The conditional probabilities of responding “true” to the question at each time are in the last two columns of the table. Latent Class 1 has a lower probability of answering “true” than does Latent Class 2, which means that Class 1 is the nontraditional class, and Class 2 is the traditional class. There is a tendency toward polarization over time: the traditional class is more likely to answer in a traditional way at Time 2, while the nontraditional class is more likely to answer in a nontraditional way * The estimation of all previous models was done using either the MLLSA program of Clogg (1977), or the LAT program of Haberman (1979), both of which were described above. The MLLSA program is able to do analyses of multiple group models, but cannot make some of the restrictions necessary to fit some of the models reported in this section. The LAT program should be able to tit all of these models, but it would not converge to a proper solution in most instances for this data set. Therefore, to tit some of the models reported in this section, the likelihood function for each model was written in a FORTRAN subroutine, and the IMSL procedure ZXMIN was used to maximize the likelihood function.
LCA IN DEVELOPMENTAL
MODELS
81
at Time 2. These tendencies could be tested by fitting a model with restrictions on the conditional probabilities. The unconditional probabilities in this table are presented in a different fashion from previous models, because of the presence of several groups. Here, the unconditional probabilities are presented separately for each group; that is, they are really conditional on group membership. This is much more informative in the multiple group case than simple unconditional probabilities. In examining the unconditional probabilities, it is obvious that there are differences among the groups in the proportion of people in each group who have traditional rather than nontraditional values. The Asians and whites are split approximately in half, while the blacks, Chicanos, and Indians are heavily traditional. To further refine the model, we test whether it is plausible to assume that all of the groups have the same distribution in the latent classes. From our examination of the unconditional probabilities, this would not seem to be so, and this is confirmed in the analysis of Model MS-2 reported in Table 5. Next, the model which states that Asians and whites are distributed similarly, and blacks, Chicanos, and Indians are distributed similarly was tested. This model, labeled MS-3 in Table 5, fits the data; additionally, the fit is not significantly worse than the unrestricted model, MS- 1. Another model was fit to attempt to investigate whether there was change from Time 1 to Time 2. As indicated above, the most general of such models cannot be tested. A three-class model was tested in which one class was traditional at both times, another class was nontraditional at both times, and a third class was different at the two times. (Whether it changed from traditional to nontraditional or the reverse is difficult to specify in advance in these models; one must examine the results to lind out which occurred.) The results show that this model, labeled MS-4 in Table 5, fit well. This is not surprising, since the unrestricted two-class model also fit well; the issue is whether the fit of the three-class is significantly better than that of the two-class model. Because the two-class model is a special case of the three-class model, we can directly compare their fit. By “special case,” we mean that by putting restrictions on the three-class model (in this case, setting the parameters for the third class equal to zero) we get the two-class model. When this happens, the fit of the models can compared by subtracting the likelihood ratio x2; the difference has a x2 distribution with degrees of freedom equal to the difference in degrees of freedom for the models. (This is the main reason for preferring the likelihood ratio x2, since such comparisons cannot be done using the Pearson x2.) The difference in x2 values between Models MS-1 and MS-4 is 6.04 with 5 degrees of freedom, so the three-class model fits no better than the
82
DAVID
RINDSKOPF
two-class model. Additionally, the three-class model is not identified, while the other models are all identified. One could try to make further restrictions on this model, such as was done for Model MS-3, to increase degrees of freedom and possibly make the model identified. RELATIONSHIP
OF LATENT CLASS ANALYSIS METHODS OF ANALYSIS
TO OTHER
Latent class analysis is related to other multivariate analysis methods which are more familiar to most researchers. It is related to log-linear models, and can be thought of as a log-linear model where one variable, the latent class to which people belong, is unobserved. If we let X represent the latent class variable, and A, B, and C represent observed variables, then the log-linear representation of the latent class model is [XA] [XB] [Xc]. That is, X is related individually to A, B, and C; but A, B, and C are conditionally independent, given X. This is what is meant by the statement that X “explains the relationship among A, B, and C.” Some of the more complicated models discussed earlier are not easily expressed in these terms. Latent class analysis is also related to other latent variable methods. While latent class analysis deals with categorical latent and observed variables, factor analysis involves continuous latent and observed variables. Latent trait models have continuous latent variables, but categorical (usually dichotomous) observed variables. In many cases, cluster analysis can be conceptualized as involving a categorical latent variable, but continuous observed variables. DESIGN CONSIDERATIONS WHEN USING LATENT CLASS ANALYSIS
As with any type of study, many design issues must be faced by a researcher who intends to use latent class analysis to investigate developmental models. All of these issues would occur in any study using latent class analysis, but some are more likely to occur in developmental studies than in other types of studies. This section describes these issues, including problems and limitations of latent class analysis. Sample size is an important consideration in many studies, and more so with the use of latent class analysis to test developmental models, because many studies in this area have used sample sizes which might be considered too small for the valid use of latent class analysis, but are considered relatively large in some areas of developmental research. Two of the studies reported in this paper had sample sizes under 100. These are as small as any analyst could feel comfortable with, and smaller than some would recommend. With such a small sample, one problem which pervades all studies is lack of power for statistical tests. In this case, the
LCA IN DEVELOPMENTAL
MODELS
83
power of the goodness-of-fit test is lower for small sample sizes, making it less likely to reject a model being tested. In the examples presented here, one check on this is that there were plausible models which were rejected; this is direct evidence that the power was at least minimally acceptable. The small sample size also causes problems which are not shared with more traditional models. One is that the goodness of lit statistics are only good approximations to x2 when the sample size is large, and when the expected frequencies are not too small. While the traditional criterion that most or all expected frequencies be greater than 5 is probably too strict, if several expected values less than 2 are found, the accuracy of the approximation of the goodness-of-fit statistic is questionable. (Expected cell frequencies of 0 are not a problem, because they are not used at all in calculating the x2 statistic.) There are instances in the data sets analyzed here where several of the expected values were less than 3, and in these cases one might question the accuracy of the fit statistics. A problem which is related to small expected frequencies is that of the number of cells in the table analyzed, which is a function of the number of variables and the number of categories which each variable has. When the number of variables and/or the number of categories of variables is large, then there will be a large number of cells in the frequency table, which will usually result in at least some cells with few or no observations. Any model which fits the data well will, of course, have small expected values for these cells. To get an idea of how many variables and categories can be handled, consider first a study with dichotomous items. If there are 4 items, there are 16 frequencies; with 6 items, 64 frequencies; and with 8 items, 256 frequencies. Even with a respectable total sample size, there will be many small cells with 8 items. As indicated above, if expected values are zero there is no problem, but if they are nonzero and small, there may be a problem. Next consider variables which have four response categories. With four such variables, there are 256 frequencies in the cross-tabulated data. Not only is this a problem because of small frequency counts, but also because variables with many categories may be hard to structure. Some guidance on possible restrictions on conditional probabilities with variables which have ordered categories is provided by Clogg (1979). An issue which is primarily statistical in nature, but which has important design implications, is the identification status of parameters of a model, and of the model as a whole. If a model is not identified, then the theory which has generated the model is not testable in the confines of the design. Sometimes this can be prevented by including the right number or kind of variables in the study; experimenters must consider such issues if they are collecting data specifically for the purpose of
84
DAVID RINDSKOPF
testing particular models. The necessary conditions for identification are usually a matter of simple counting, but the sufficient conditions are often much more complicated. When the identification status of a model is not known, the simplest procedure is probably to invent an artificial data set which is consistent with the model, and run it through a program (such as MLLSA) which will provide information on the identification status of the model. Many design considerations are common to the testing of all developmental models and are not unique to testing latent class models. For example, one must choose between cross-sectional and longitudinal studies. Cross-sectional studies are usually much easier to implement, but require stronger assumptions than longitudinal models. One must assume that the classes which are observed mirror stages of development and that the order of progression through the hierarchy is known. For example, one must know or assume that children will not regress from a stage of formal operations to a stage where they no longer have formal operational thought capability. The selection of the right age ranges to include in a study is important to be sure that the full range of development is included, so no stages will be missed. Selection of extremes could lead to accepting a latent class model, when a continuum model may be more appropriate. Even with the limitations of latent class analysis, the examples given show that it is a technique which has great promise for investigating one type of theory which is commonly found in developmental models. A wide variety of sensible models exists, and computer programs are widely available for testing these models. Latent class analysis allows the formal testing of what might otherwise remain vague theories. With adequate planning, studies can shed light on the reasonableness of typological theories of development. REFERENCES Clogg, C. C. (1977). Unrestricted and restricted maximum likelihood latent class analysis: A manualfor users. University Park, PA: Population Issues Research Office.
Clogg, C. C. (1979). Some latent structure models for the analysis of Likert-type data. Socia/ Science Research, 9, 287-301. Clogg, C. C. (1981). New developments in latent structure analysis. In D. J. Jackson & E. F. Borgatta (Eds.), Factor analysis and measurement in sociological research: A multi-dimensional perspective. Beverly Hills, CA: Sage. Dayton, C. M., & Macready, G. B. (1983). Latent structure analysis of repeated classifications with dichotomous data. British Journal of Mathematical and Statistical Psychology, 36, 189-201. Goodman, L. A. (1974a). The analysis of qualitative variables when some of the variables are unobservable. Pt. I. A modified latent structure approach. American Journal of Sociology, 79, 1179-1259.
LCA IN DEVELOPMENTAL
MODELS
85
Goodman, L. A. (1974b). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61, 215-23 1. Haberman, S. J. (1974). Log-linear models for frequency tables derived by indirect observation. Maximum-likelihood equations. Annals of Statistics, 2, 911-924. Haberman, S. J. (1977). Product models for frequency tables involving indirect observation. Annals of Statistics, 5, 1124- 1147. Haberman, S. J. (1979). Analysis of qualitative data, Vol. 2: New developments. New York: Academic Press. Landis, J. R., & Koch, G. G. (1979). The analysis of categorical data in longitudinal studies of behavioral development. In J. R. Nesselroade & P. B. Baltes (Eds.), Longitudinal research in the study of behavior and development. New York: Academic Press. Macready, C., & Macready, G. B. (1974). Conservation of weight in self, others, and objects. Journal of Experimental Psychology, 103, 372-374. Marascuilo, L. A., & Serlin, R. C. (1979). Tests and contrasts for comparing change parameters for a multiple sample McNemar data model. British Journal ofMathematical and Statistical Psychology, 32, 105-l 12. Rindskopf, D. (1983). A general framework for using latent class analysis to test hierarchical and nonhierarchical learning models. Psychometrika, 48, 85-97. Zimmerman, B. J., & Whitehurst, G. J. (1979). Structure and function: A comparison of two views of the development of language and cognition. In G. J. Whitehurst & B. J. Zimmerman (Eds.), Functions oflanguage and cognition. New York: Academic Press. RECEIVED: February 19, 1986; REVISED:August 11, 1986.