2. The Statistical Models
In this chapter, the two statistical models used in the remaining chapters of the monograph will be developed in some detail. Readers interested more in the actual results of the study of Bombay education than in the technical aspects of the methods of analysis of the data are advised to skip this chapter. All of the following chapters can be read without knowledge of the contents of the present'one. Two models will be presented, the extended logistic model based on the multinomial distribution and the Poisson distribution model. Few of the variables to be analysed in this study are measures on a continuous or even on an ordinal scale. Those that are will be found in fact to contain dichotomous breaks. Many variables are constructed as sets of categories which are strictly nominal, without even an ordering among them. Examples of such polytomous variables include social class, religion, caste, mother tongue, school sponsorship, language of instruction, and so on. For the study of such variables, and to allow us to discover possible discontinuities in any ordered or metric variables instead of building necessary continuity into the model, a model with the minimum of assumptions about the structure of the relationships among the categories of a variable is required. For example, social class measured on a socio-economic status scale pre-ordains that all the categories (say on a scale from 1 to 9) have a fixed order, and, in addition, are equal distance apart. Here, we refuse these assumptions about the social structure. Although they are methodologically convenient in that they allow application of such techniques as multiple regression, they automatically exclude most of the interesting questions about a society by a priori fiat. On the other hand, the model which we shall use has multiple regression as a special case; we could always turn to that model if it were found to be appropriate. Hence, we assume only that the categories of a variable have names applied to them; no mathematical relationships can be constructed among the categories. The only type of mathematical statement which can be made about the categories is one about the re2ative probability of an individual falling into each of them. This is most conveniently studied by choosing any pair from the set of mutually exclusive and all inclusive categories forming a variable and taking the ratio of probabilities or the odds of the individual falling in the first versus the second. This will be the basis of our statistical model, the extended logistic model. The only problem arises when a variable has more than two categories: a considerable number of different pairs can be chosen for comparisons. Since this is the case, it is more convenient to compare all of the probabilities to some overall mean; since ratios are being used, the geometric mean is appropriate. If k is an index labelling the categories of which there are K, and p(k) is the probability of an individual falling into category k, then our new set of odds ratios is:
9
10
Evaluation
in Education
(1)
where p is the geometric mean of the K p(k)'s. If this odds ratio is greater than unity for a given k, the probability of falling into the category k is greater than average and vice versa for less than unity. Unfortunately, this is asymmetric, since the first case has a much greater range of possible results (one to infinity) than the second (zero to one). Thus, for further development of the model, it is more useful to take the natural logarithm:
Now, if this expression is positive, the probability is greater than average; if negative, less than average. If we wish to compare two categories, k and 1, we simple subtract the corresponding values of expression (2) log$Q
-
log
$Q
=
log
PI 53
k
which is also an easier operation for the general reader than dividing values from expression (1) and which provides less misleading results because of the symmetry mentioned above. The most important result of comparing all probabilities to the mean is that it forces one to think in terms of re2ationship.s among the categories. No value has meaning except in relation to the others. The preceding development applies to one variable, which we shall now take as the dependent variable in our model. We shall assume that the independent variable likewise has been constructed as a set of categories with only names applied to them. Within each category of this independent variable, the distribution of individuals among the categories of the dependent variable may be assumed to be different. With i categories, the distribution in category i will be described by the set of probabilities p(i,k) with geometric mean p(i) so that expression (2) becomes:
We wish to describe how this distribution for categories of the dependent variable changes with the different values of the independent variable. Consider again expression (1), but now for category i of the independent variable. Suppose that the relative probability p(i,k) of being in category k, as compared to the average, p(i), is decomposed into an average relative probability, A(k), of being in category k, which is the same for all values of the independent variable, and a factor which is specific to the category i, B(i,k): =
A(k) x B(i,k)
In terms of expression (2), this becomes =
a(k) + b(i,k)
where a(k) = log A(k) and b(i,k) = log B(i,k).
Primary Education in Bombay
11
In concrete terms, if we compare categories k and 1 of the dependent variab le,
f$!$
=
e
a(k)-(l)
gives the average odds of being in category k versus 1, where the average is over all categories of the independent variable. Then, B(i,k) B(l,l)
=
e b(i,k)-b(i,l)
is the factor by which these averageodds must be multiplied in order to obtain the corresponding odds for the category i of the independent variable. This matrix of values, b(i,k) or B(i,k), provides us with a means of studying the relationships among categories of the two variables simultaneously. This is the type of matrix, the b(i,k)'s, which is presented in the tables throughout Chapters 3 and 4. Since the values of B(i,k) are factors multiplying a geometric mean, their product, over k for fixed ii and vice versa, will be unity. In the same way, the sum of the values of b(i,k) over k for fixed i, or vice versa, will be zero, as presented in the tables, the sum of any row or column will be zero. In order to place ourselves on firmer ground, let us consider how the model works in a concrete case. Consider the two-way frequency table for social class as it depends on religion and vam?a given in Table 2.1. This is the data from which the results were calculated for the first table using the model in the next chapter, Table 3.4. Both variables have strictly nominal categories, which could be re-ordered in any way without changing the results shown by the model.
Hindu Brahman Kshatriya Sudra Vaishya Untouchable Tribal Moslem Christian Jain, Parsi, Sikh
Lumpenproletariat
Wage Labourers
Artisans
Shopkeepers
70 6
3::
60 24
34 17
193 177
1::
497 69 14 94 30 9
2:: 322
122 25 52
57 12 16
13; 165
2:; 136 40
18: :;:
4Y 33 5 41 13 13
12; 2 77
White Collar Ideological Occupations Workers
:I,
(N=4081)
In Table 2.1, the odds of a Moslem being a shopkeeper versus an artisan are 181/129 = 1.4 to one. Our model decomposes this into an average odds, for all religionlvarna groups, of being a shopkeeper versus an artisan, and a factor specific to the Moslems. From the mean vector in Table 2.2, the average odds are exp (a(3)-a(4)) = exp (0.004+0.859) = 2.4 to one, as obtained from the table in Appendix 2. The factor specific to the Moslems is exp (b(7,3)-b(7.4)) = exp (0.503-1.027) = 0.6, again obtained from Appendix 2. Thus, our decomposition of the odds ratio is 1.4 = 2.4 x 0.6. For the study of relationships,
12
Evaluation in Education
the important figure to retain is 0.6: the odds of a Moslem being a shopkeeper rather than an artisan are 0.6 times the average odds for all religion and varna groups. The original odds of 1.4 tells us nothing about how Moslems relate to the other groups with respect to these two social classes. As a second example, let us compare the lumpenproletariat and the ideological occupations. For Brahmans, the odds of being in the ideological class as opposed to the lumpenproletariat are 88/6 = 14.7 to one. The average odds are exp (a(6)-a(1)) = exp'(-0.392+0.488) = 1.1 to one; the factor specific to Brahmans is exp (b(1,6)-b(l,l)) = exp (1.240+1.349) = 1.34. Thus, Brahmans have 13.4 times the average odds of being in the ideological class instead of the lumpenproletariat. Compare this with the Vaishya, for which the specific factor is exp (b(3,6)-b(3,l)) = exp (0.249+0.481) = 2.1, so that they have only 2.1 times the average odds. Obviously, if these and other groups are above average, still other groups must be correspondingly below average. For the study of relationships, only the matrix of values, b(i,k), is necessary; throughout the text, only this is given and not the vector of means. This is the translation of the principle of conditioning on the marginals of a frequency table into parametric terms , which is not done when a table of percentages is given. The presentation of the relationships in terms of the matrix, b(i,k), as in Table 3.4, allows the reader to study in concrete detail any of the possible pairs of relationships, as illustrated by the examples just given. However, once some familiarity with the extended logistic model has been acquired, direct interpretation from the table of values of b(i,k) is possible, and even an intuitive examination of the table without technical knowledge of the model will not be misleading. This is true because the model has been set up so as to be symmetric about zero: for this, expression (2) instead of expression (1) was necessary. Thus, by a glance at a table, the reader may see the relationships among the categories of the two variables, as expressed by the pattern of signs and magnitudes of the values. One important feature of this model is worth noting. Because of a unique property of the multinomial distribution and because of the symmetry resulting from the use of the geometric means, the same matrix, b(i,k), is obtained when the dependent and independent variables are interchanged. Only a new vector of means, for the other variable, must be calculated. Thus, it is also legitimate to consider the odds of an individual in a given social class being in one religious group versus another. The procedure is identical to that described above, but with the variables reversed. Once the extended logistic model has been constructed in the form of equation (6), as based on expression (2), the introduction of further independent variables is straightforward. Equation (6) is the direct analogue of one-way ANOVA; addition of one other variable yields the analogue of two-way ANOVA, in which a second main effect and an interaction term appear. In the same way as the relative probability in equation (5) is decomposed into an average and a factor due to the specific category of the one independent variable, now it will be decomposed into an average, multiplied by three factors, one each for the two independent variables , and one due to interaction between the two independent variables:
w
=
A(k) x B(i,k) x C(j,k) x D(i,j,k)
13
Primary Education in Bombay
where j labels the categories of the second independent variable. extended logistic model with two independent variables is: log pl?#-$$
=
Then, the
a (k) + b(i,k) + c(j,k) + d(i,j,k)
with the same correspondences between terms of these two equations as for equations (5) and (6). Then, b(i,k) and c(j,k) are the parameters for the main effects of the two independent variables and d(i,j,k) is the matrix of interaction parameters. If the latter matrix can be assumed to have only zero elements, the two independent variables are said not to interact with respect to the dependent variable, or they are said to have additive effects on the distribution of this variable, in the same way as for ANOVA. Examples of matrices of main effects are Tables 5.9 and 5.10; here, the interaction has been taken to be zero. Examples of matrices for the interaction are Tables 5.1, 5.2 and 5.3; in each case, only one half of the matrix is presented since the other half is identical except that all signs are reversed (since one variable has only two categories). Interpretations are the same as for ANOVA, although somewhat more complex since each parameter set has an extra dimension: main effects have two dimensional arrays instead of vectors and interactions have three instead of two dimensional arrays. Since the reader of this chapter is assumed to be familiar with these simple ANOVA models, no further details should be required. The extended logistic model with two independent variables is used throughout Chapter 5. For more detailed discussion of the extended logistic model, the reader is referred to Lindsey (1973, 1974c, 1975a). For the case where the dependent variable has only two categories, see Cox (1970, 1972) and Lindsey (1975b). The literature on the analysis of frequency data in the form of contingency tables is vast; however, little of it treats a parametric analysis of the data in the form of the (extended) logistic model. For bibliographies of the classical approach, see Gart (1971) and Killion and Zahn (1976). Other important recent books on the subject are Bishop, Feinberg and Holland (1975), Haberman (1974), and Plackett (1974). The second basic statistical model, used in Chapter 6, is the Poisson distribution. In the previous model, with the multinomial distribution, the variables were assumed only to be constructed from nominal categories, so that no mathematical structure could be used to relate the categories directly to each other. Correspondingly to each category, a probability, p(k), was defined, with the only a priori relationship amongthese parameters being that they sum to unity, in the same way as for any probability distribution. The Poisson distribution requires much stronger assumptions about the (dependent) variable: it must have non-negative integral values. These values are the number of times an event occurs to each individual; in Chapter 6, they are the number of years of school a child is behind. As with the multinomial distribution, each category has a probability, p(k), assigned to it such that the probabilities sum to unity. However, the relationship among these probabilities is much more stringently defined: p(k)
=
,w
(9)
14
Evaluation in Education
where k is the number of times the event occurs to an individual, and u is the average number of times for all individuals. Once the average, p, is fixed, all of the p(k)'s are automatically given by equation (9). In Chapter 6, for example in Table 6.2, !_I is estimated from the data for each social class; the p(k)'s are then calculated and multiplied by 100 to give the predicted percentages in Table 6.1. In the original analysis of the data, four models were considered: the multinomial, normal, Poisson, and negative binomial distributions. As shown by Lindsey (1974a, 1974b), the multinomial distribution will always give the best fit, and the question is to see if any other model chosen for theoretical reasons fits almost as well. In Chapter 6, the Poisson distribution is found to fit virtually as well as the multinomial for three of the social classes, while the normal and negative binomial distributions always fit much worse. With the other four social classes, we are left only with the multinomial model at the present stage of the investigation. Feller (1950, pp.146-154) provides a comprehensive description of the mechanism producing a Poisson distribution and includes a number of illuminating examples. If relatively rare events (i.e. the mean, V, is small) are occurring randomly to individuals, the resulting distribution will be Poisson. In biology, for example, this is widely used to test empirically if plants are growing in a random scatter over a region or if cells are scattered randomly over a petri dish. A fine grid is laid over the area and the number of plants or cells in each square counted. Here, the 'individual' is the square. Deviations from the Poisson distribution and, hence, from randomness, indicate either 'clumping' of the events on certain 'individuals' as indicated by a distribution which is flatter than the Poisson or 'repulsion' of the events so that they are spread more uniformly over all individuals than would be expected by a strictly random process, as indicated by a distribution which is narrower and more pointed than the Poisson.
In our case, the random events are years fallen behind in school by the children. However, these events are only random within certain social classes for specific well-defined theoretical reasons. In the other social classes, 'repulsion' occurs in that years fallen behind are spread more uniformly among all children than would be expected if a random process were at work. It is perhaps useful here to reproduce the comparison of the theory of the normal and Poisson distributions provided in Chapter 6. How can we interpret our model; what does it mean in strictly theoretical terms? Let us first look at the more familiar normal distribution, used for example, in simple linear regression. As an independent variable changes, the meuz of the dependent variable must change in a linear way with it, while the variability of the distribution, as measured by the variance, remains constant. Hence, in a graphic representation of values like those in Table 6.1, the same shape of normal distribution would appear on all graphs, but this would shift along the x-axis linearly with changes in the independent variable. A given value of the independent variable thus determines the location of the distribution on the axis, as measured by the mean. At this given value of the independent variable, say level of family income, the social mechanism acting to produce a normal distribution is one such that a very large number of unknown factors (biological, social, nutritional, etc.) are at work to place the individuals in the required proportions among the values of the dependent variable, say height.
Primary Education
in Bombay
15
Similarly, for the Poisson distribution, the value of the independent variable, the social class, determines tSe location of the distribution on the x-axis, as measured by the mean number of years behind. However, the shape of the distribution also changes with the mean, becoming more 'piled up' on the left as the mean decreases (since the distribution cannot go further to the left than zero years behind the minimum). Once the location and shape of the Poisson distribution have been determined by the value of the independent variable, the social class, the social mechanism producing this distribution is one which distributes individual children within the social class at random among the various years behind, according to the proportions specified. In other words, within a social class where the Poisson distribution is applicable, children are 'assigned' at random to being different numbers of years behind. This also implies that a child who is already say three years behind will have no more probability of falling further behind, while still staying in school, than one who has the minimum age. For both the normal and the Poisson distributions, these are the theoretical considerations which one must have implicitly or explicitly in mind when using either of them to describe social phenomena. Once suitable theories are constructed or adopted, we go to the empirical data to see if they are supported.