On aggregation bias

On aggregation bias

CHAPTER 5 ON AGGREGATION BIAS N. SELLIN Introduction Educational research has to deai with two basic types of variables: measures describing prop...

993KB Sizes 1 Downloads 251 Views

CHAPTER

5

ON AGGREGATION

BIAS

N. SELLIN

Introduction Educational research has to deai with two basic types of variables: measures describing properties of individual students and measures related to groups of students. such as variables describing properties of the teachers, classes and schools to which the students belong. As is evident from the extensive research on so-called classroom and school effects (for overviews, see e.g., Averch et al., 1972; Spady, 2973; Barr & Dreeben, 1977), the focus of much research in education is on possible influences of group properties on student learning behaviours and student achievement. In this context, a salient methodological issue is the unit of analysis problem, which typically refers to the question of whether influences of group related variables should be examined using students, classes or schools as units of investigation (see Wittrock & Wiley, 1970; Herriot & Muse, 1973; Cronbach, 1976; Knapp, 1977; Bidwell & Kasarda, 1980; Burstein, 1980; Cooley, Bond, & Mao, 1981; Larkin & Keeves, 1984). The predominant statistical method in research on classroom and school effects is linear regression and, as pointed out by Cronbach (1976) and Burstein (1980) among others, an important aspect of the unit of analysis problem is aggregation bias. This is concerned with the effects that data aggregation exert upon correlation and regression coefficients. School or class level regression analyses typically incorporate, in addition to group properties, student variables aggregated to school or class means, and the ensuing regression coefficients generally turn out to be nume~cally different from corresponding coefficients obtained on the basis of student level regressions. To the extent that statistical results are different according to the level of aggregation, the choice between different levels of analysis can imply different conclusions in terms of the effects of specific variables. For example, it is possible that particular variables appear to have much stronger effects at the school level than at the class or student level. It is also possible that specific variables have negative effects at one level but positive effects at a different level (for empirical examples, see for examples, Robinson, 1950; Hanushek et al., 1974; Cronbach, 1976). Given divergent or even contradictory results at different levels of anaIysis. an important question is which factors determine differences between regression coefficients at different levels of aggregation. A sizeable body of writing exists on this topic (see for 237

38

K.C.CHEUNG eral.

example, Robinson, 1950; Goodman, 1959; Blalock, 1964; Alker, 1969; Hannan, 1971; Hammond, 1973; Hannan & Burstein, 1974; Burstein, 1978; Langbein & Lichtman, 1978; Smith, 1977). For three reasons, however, these publications appear to be of limited help. First, despite some exceptions (e.g., Smith, 1977). the focus in the cited writings is almost exclusively on bivariate regressions whereas in research practice multivariate models are usually employed. Second, regression models involving group properties and individual properties are generally not explicitly considered although, as will be seen later, these models involve some important special features. Third, and most important, the results as to which factors determine observed differences between regression coefficients at different levels of aggregation appear to be inconclusive. In general, two interpretations are descernible. In the field of educational research it is argued that differences between student level and group level analyses can be seen as reflecting distinct processes of substantive interest to the researcher. Differences between regression results at different levels of analysis are consequently interpreted as reflecting substantively different relationships at each level. It is not possible here to review the relevant theoretical arguments in greater detail. A comprehensive overview has been provided by Burstein (1980), who recommended between student, between class and between school regressions, in order to enable comparisons across different levels of analysis. A similar point had previously been made by Cronbach (1976). Another argument voiced in favour of conducting group level analyses is specifically concerned with examining influences of group related properties on student outcomes. Student level analyses typically employ group characteristics as socalled contextual properties. This means formally that given group properties (e.g., class size, available instructional materials, school equipment, school location, etc.) are assigned as constants to each individual student. As pointed out by Bidwell and Kasarda (1980) among others, this involves the assumption that group resources are identically available to each individual student. Since this seems often to be unrealistic, it is suggested that group level analyses are generally more appropriate because such analyses focus on the ‘average student’ rather than on individual students. The arguments delineated above would seem to promote group level analyses in educational research. It is, in fact, common practice to conduct class or school level analyses as a supplement or an alternative to student level analyses (see Cronbach, 1976; Burstein, 1980). This is generally justified by recognizing aggregation bias as a problem that can be solved in terms of theoretical considerations. A fundamentally different interpretation of aggregation bias, however, has been given in sociology by Robinson (1950) and in econometric writing (Theil 1954; Cramer 1964; Kmenta 1971, pp. 322-329). Focusing on statistical consequences of data aggregation, the authors claim that differences between regression results at different levels of aggregation are to be seen as mere artefacts due to the loss of information that results from the aggregation of data. This interpretation would preclude group level analyses in educational research. It also contradicts the conceptual and theoretical considerations on which aggregate level analyses and cross-level comparisons of regression results seem to be based. Which of the conflicting interpretations is to be seen as more appropriate can only be judged if the factors determining aggregation bias are clear. By concentrating on regression models involving both group-related measures and student variables, this chapter considers this question through an examination of the analytical relation between

Analysis

of Multilevel

Data

259

regression coefficients estimated at different levels of analysis and by deriving algebraic expressions that enable the investigation of the determinants of aggregation bias.

The Basic Statistical Model For convenience, the presentation will be restricted to situations in which there are two levels of analysis, the student and the class level. In its most simple form, the regression model examined in this paper can be specified as: yij = byx.r Xij

+

byr.1 qj

+

eij

(5.1)

where i=l, 2, . ., N indicates individual students, while j= 1,2, . , K indicates the classes to which the students belong. The variables x and y are assumed to constitute student related measures. For example, x may represent a pretest score and y may represent the corresponding posttest score. The variable z, on the other hand, is assumed to represent a group related measure, such as class size for instance. The regression coefficients b,., and bypx denote estimates obtained by ordinary least squares (OLS) regression. Throughout, it will be assumed that all variables are scaled to zero means so that location parameters can be discarded. Equation (5.1) constitutes a student level regression model in which the variable z is used as a contextual property; that is, within each class the respective z-value is assigned as a constant to each individual student. It will be noted that the variable z may not necessarily constitute a global group characteristic, such as class size, that could not be obtained at a lower level. As a matter of fact, a broad range of variables referred to here as ‘group properties’ could, in principle, be measured at the student level but are, nonetheless, often recorded for the class as a whole. For example, data on instructional time. use of materials, and amount of homework could be collected at the student level, but are, in practice, frequently obtained from the teacher. As aconsequence, the resultant measures are generally constant within classes. Such measures could also be represented by the variable z. The key formal feature of the regression model given above is that one regressor varies only between predefined groups while the dependent variable and a second regressor vary within and between groups. The aggregate level analogue of Equation (5.1) is: (5.2) with jj and fj denoting class means and with 6,., and zyZ.Xsymbolizing class level regression coefficients. It will be noted that the shift in the level of analysis does not change the values of the variable z: since z is constant within classes, the z;j values in Equation (5.1) are identical to the corresponding zj values in Equation (5.2) for each class group. As indicated earlier, the umt of analysis problem primarily arises from the fact that student level and group level coefficients are generally found to be numerically different; i.e., in terms of the above example, it is usually observed that &vX.Z # 6,,;,,.,and that gyPx # 6yr,x. The purpose of this chapter is to clarify the source of such differences. This problem has two aspects. The first aspect is the mathematical problem of analyzing regression coefficients in

260

K. C. CHEUNG efaf.

terms of components which account for numerical differences across levels of analysis. This chapter is written to suggest that, for OLS coefficient estimates. this problem can be resolved on the basis of standard product moment algebra. Being strictly algebraic, the derivations presented below apply to any given sets of OLS coefficient estimates. It is therefore not necessary to distinguish between population parameters and their associated estimates. For this reason, the above equations have been formulated using the notation for sample estimates. The second aspect is the conceptual problem of whether the mathematical determinants of aggregation bias reflect theoretically relevant processes. In other words, for justifying aggregate level analyses it is necessary to demonstrate that divergent cross-level results (i.e., the results at different levels of aggregations can be given a theoretically meaningful interpretation. The first task, then, is to examine the analytical relationship between individual levef and aggregate level regression coefficients. For this purpose it is convenient to start with bivariate regression models. This will help with an understanding of the somewhat more complex and perhaps less familiar algebra required for investigating multivariate regression models such as the three-variable model specified above. The following section is thus concerned with bivariate regression coefficients.

Bivariate Regressions As a point of departure, it is instructive to consider the bivariate regression of _Von x. Dropping, for simplicity, the subscripts indicating students and classes, the student and aggregate Ievel regressions of y on x are written as: y = b,, x + u

(5.3a)

p = I;,, ,f + ic.

(5.3b)

Similar to the first-order partial regression coefficients b,,.= and 6y,r.Iinvolved in Equations (5.1) and (5.2), the bivariate coefficients b, and 6YXwill usually differ. For analyzing the source of this difference, it is important to review some previously established results. In the following it will be necessary to refer to sums of squares and sums of cross-products. For convenience, the sum of squares of a given variable, sayx, will be denoted as V(x) and the sum of cross-products of two variables, say x and y, will be denoted as C(,KV).Since all variables are, without loss of generality, assumed to be scaled to zero means, the variance of x is equai to V(x)IN and the covariance of x and y is equal to C(_vx)lN, with :Vdenoting the number of individual cases. To simplify the terminology, however, V(x) and C(_VX) will also be referred to as variances and covariances. This should cause no confusion because, throughout this chapter, the presentation will exclusively refer to sums of squares and sums of cross-products. As shown in standard statistical texts (e.g., Pedhazur, 1982, pp. 530-531), the individual level variance of x, V(x), and the individual level covariance of y and x, Ctj~), can be decomposed into a between group and a within group component: V(x) = V(Z) + V(x,)

(5.4a)

Analysis

of ~Multilevel Data

C(_vx)= C(p) + c(yw x,)

261

(5.4b)

y, and x, denote deviations of individual scores from their associated groups means; i.e., y, is defined as y, = y - 7 and x, is defined as x, = x - X. V(x,) is the so-called pooled within group variance of X; C(_V,J,,,)symbolizes the pooled within group covariance between y and X. V(f) and C@t?) denote the variance and the covariance of group means. Using Equations (5.4a) and (5.4b), Duncan, Cuzzort, and Duncan (1961) derived the following identity:

which expresses the individual level coefficient b, as a weighted composite of its grouplevel counterpart hy+ and the pooled within group coefficient 6,,(,, defined as b,,(,, = C(y,,_xw)IV(xw). Ez denotes the correlation ratio calculated as Ei = V(_f)lV(x). This ratio indicates the relative amount of between group variance of X. As can be seen from Equation (5.4a), E: varies between zero and one; it is equal to zero if V(X) = 0 (no between group variance) and equal to one if V(f) = V(x) or, equivalently, if V(x,) = 0 (no withingroup variance). It should be noted that the above identities hold only if the group means are weighted by the number of individuals belonging to the respective groups. That is, for class level regressions it must be assumed that all class means are weighted by the number of students located in a given class. This assumption will be made from now on. It may be noted that the weighting is necessary not only for ‘algebraic’ but also for statistical reasons. As shown by Smith (1977) among others. regression models comprising unweighted group means generally imply heteroskedastic residuals. In this case, the weighted least squares (WLS) estimation procedure rather than OLS regression should be used, because the OLS procedure would give statistically inefficient coefficient estimates. That is, OLS estimates would be unbiased but would not have the minimum variance property required to constitute statistically efficient coefficient estimates; for a general treatment of heteroskedasticity in linear regression (see, for example, Goldberger, 1964, pp. 231-236). OLS regression on the basis of weighted group means, however, can be shown to be equivalent to the optimal WLS procedure. Equation (5.5) should first be examined for restrictions implying no difference between b, and its aggregate level counterpart gY,V.It is readily seen that only two mutually exclusive conditions imply byI = hyx, namely, (1) Ef = 1, and (2) byxcw,= 6Yx. The first condition would be fulfilled if the within-group variance of x equals zero. Since a zero within-class or within-school variance of student-related measures is rarely, if ever, encountered in actual research, this condition can be excluded from further consideration. Less clear are the empirical implications of the second restriction which requires that all coefficients involved in Equation (5.5) are numerically the same. It is, in fact, not possible to derive the relevant empirical conditions from Equation (5.5) directly. Appropriate algebraic expressions will be given below. Equation (5.5) is also not suitable for examining factors determining cross-level differences between bivariate regression coefficients. It can be used, however, to derive more convenient expressions. Defining T as T = C(~2.f) - b,(,,,, V(R), the following identities are obtained:

K. C. CHEUNG er al.

by.,= Tw-4 - by++)

(5.6a)

6,X = nt’(a)

(5.6b)

- b,(,,.

Straightforward algebra will show that Equation (5.6a) is equivalent to Equation (5.5) above and that the righthand part of Equation (5.6b) is equal to s,,X.That is, the above equations constitute algebraic identities which hold for any given set of bivariate OLS coefficients, provided that the aggregate level model is appropriately weighted. It must. of course, also be assumed that V(x,) # 0 (or, equivalently, V(x) f V(X) and that V(f) f 0, in order to ensure that the coefficients byrtwJand LY,rare defined, With respect to the question of what contributes to the difference between the individual level coefficient b, and its aggregate level counterpart &l.,r,the above identities provide a simple answer, namely: the between group variance of the predictor variab1e.r. V(2), is the only component of the group level coefficient 5.v,rthat is different from the components involved in the coefficient b,,. In other words, aggregation bias in bivariate regression models would appear to be due to the fact that the between-group variance of the predictor variable differs from the corresponding individual-level variance. More precisely, Equation (5.4a) implies V(x) 2 V(X) and provided that V(x) f V(i), the between-group variance of x is necessarily smaller than the corresponding individual-level variance. This variance reduction is clearly a consequence of the loss of information associated with the computation of group means. It occurs because the within-group variation of individual scores is discarded. More will be said about this point later. Apart from V(x) > V(2), another condition necessary for b,, #zY,c is obviously T + 0. Indeed, as can be seen from Equations (5.6a) and (5.6b), T = 0 is the restriction that = &Y,r if V(x) f V(Z). For further reference it will be useful to examine implies b,, = bY,X(H,j the expression T, as defined above for bivariate regression models, in some detail. The expression T, defined as T = C(j.t) - b,,rc,, V(i), can be conveniently interpreted in the framework of analysis of covariance (ANCOVA) models. A useful formulation of ANCOVA models with a single covariate is the following regression equation (see for example, Pedhazur, 1982, pp. 504-509): y = b,di f b,dz + . . . + b,_,d,_,

+ b+

x + c’

(5.7)

where d,, d2, . . ., dk_, denote dummy variables representing group membership (e.g.. class membership); note that k- 1 dummy variables are sufficient to represent k groups. The terms b,, bz, . . ., bk_, symbolize the OLS coefficients pertaining to the dummy vectors, and by,?&symbolizes the OLS estimate of the effect of x with all dummy variables partialled out. It is known that byx.dis equal to b,i,,, the pooled within-group coefficient introduced earlier (see Igra, 1980). It can also be demonstrated that the following identity holds: T = b,C(d, x) + bJ(d,

x) + . . . -t-bk_I C(d,_, x).

(5.8)

This term allows for all variations within units taken across all units. It contributes to the hierarchical character of the phenomena. T is thus equal to the sum of k- 1 multiplicative terms comprising the OLS coefficient pertaining to a given dummy variable and the covariances between x and the associated dummy variable. In other words, the expression

Analysis

of Multilevel

Data

263

Tembodies two components: (1) the covariance ofx with the grouping criterion (e.g., class membership), and (2) the ‘group effects’ reflected by the coefficients b,, b2, . ., bk_,. It is important to note that the dummy variables cover all possible group effects. That is, the ANCOVA model in Equation (5.7) gives the maximum amount of explained variance of y that can be achieved by regression models involving x and any number of group related measures. This is because the dummy variables necessarily ‘explain’ the between-group variance of y completely (i.e., the multiple correlation between p and the set of dummy variables is always equal to l.O), and because the dummy variables are perfectly correlated with any variable that is constant within groups. The above considerations have an interesting parallel in explanations of aggregation bias which refer to specification errors involved in the original individual level model (see Hannan & Burstein, 1974; Hanushek et al., 1974; Langbein & Lichtman, 1978). These authors argue that regression coefficients differ across levels only if the individual level model is misspecified by the omission of relevant explanatory variables. More precisely, the authors seem to refer primarily to the omission ofgroup-characteristics as explanatory factors. As can be seen from Equations (5.6a) and (5.6b), differences between b, and 5YX will occur if T # 0. This could, indeed, be interpreted as being a consequence of a specification error involved in the individual level equation. The misspecification is reflected by non-zero group effects (represented by 7’) occurring in the ANCOVA model in Equation (5.7). One obvious interpretational difficulty of this ANCOVA model, however, is that it gives no information on which specific group characteristics actually make a difference. That is, the ANCOVA model reflects unspecified group effects. This interpretational problem carries over to the difference between b,, and by,. While it is, of course, legitimate to speculate about omitted variables, a non-zero difference between individual level and aggregate level coefficients provides no empirical evidence that any presumed effects of omitted explanatory factors really exist. This latter aspect is also relevant for another argument that can be found in the educational research literature. It seems to be common to suggest that aggregated student variables may have ‘different meanings’ than their disaggregated counterparts. For example, Burstein (1980) suggested that indicators of student socio-economic background aggregated to school means may be regarded as reflecting the socio-economic context of the school community. Interpreted in the framework of regression analysis, such arguments seem to correspond closely to the above considerations on possible specification errors. This is because they appear to assume that effects of unmeasured group characteristics would be reflected in aggregate level analyses and would imply divergent individual level and aggregate level regression coefficients. In other words, such assumptions refer to specification errors due to the omission of relevant predictor variables. Thus, the above remarks concerned with interpretational difficulties emanating from ANCOVA models also apply. While, for instance, Burstein’s interpretation mentioned above may have great intuitive appeal, cross-level differences between regression coefficients cannot be interpreted as providing empirical evidence that effects of unmeasured predictor variables do in fact exist. Burstein (1980) also pointed out that ANCOVA models are of limited value in educational research because they only indicate that some group effects may exist but do not provide any information on more specific effects of group characteristics. As can be seen from Equations (5.6a) and (5.6b), observed differences between bivariate regression coefficients would appear to have precisely the same interpretational limitations.

26-t

K. C. CHEUNG er al.

In conjunction with considerations on possible model misspecifications, it has also been suggested that comparisons between individual level and aggregate level regression coefficients may be used as a ‘check’ for specification errors (Langbein & Lichtman, 1975; Burstein, 1980). As can be been from Equations (5.6a) and (5.6b). the expression T, which reflects the specification error emanating from the ANCOVA model, is involved in both the individual level and the group level coefficient. Thus, it is not necessary to undertake aggregate level analyses in order to identify specification errors since an appropriate ANCOVA at the student level will furnish the same information. To summarize, Equations (5.6a) and (5.6b) imply that observed differences between bivariate regression coefficients convey two pieces of information: (1) that the ANCOVA model in Equation (5.7) would give non-zero ‘group effects’, and (2) that the group level variance of the predictor variable x is smaller than the corresponding individual level variance. Since the expression T and, hence, the ANCOVA group effects are involved in both individual level and the group level coefficients, the variance reduction associated with the computation of group means is clearly the basic source of aggregation bias. As noted above, the basic information associated with such differences between bivariate regression coefficients is the fact that the between group variances of student related variables are smaller than the corresponding student level variances.

Three In this section we return subscripts, the corresponding

Variable

Models

to the three variable model regression equations are: y = by,.,x )?=6

+ b,,.,z+ e

v.r.Zx + byz_ z + P.

specified

earlier.

Without

(S.9a) (5.9b)

Similar to the preceding section, it is assumed that all variables involved in the aggregatelevel Equation (5.9b) are weighted by the number of individuals belonging to a given group. It will be recalled that the variable z is assumed to constitute a measure which is constant within predefined groups. This feature makes it necessary to examine the difference between by_.r and 6YZ..,and the difference between b,,., and f,,.,r separately. We shall, first, derive an expression for the difference between by2x and by_..,, the individuallevel and the group-level effect of z. Using standard path analytical calculations the following normal equations are obtained from Equations (5.9a) and (5.9b): (5.10a) (5.10b) Since z is constant within groups, the individual level covariances of zwith x andy are equal to the corresponding group level covariances; i.e., C(z.r) = C(zf) and C(zy) = C(zy). This follows from Equation (5.4b) given in the foregoing section (because V(z,) = 0) and the following relation can be implies also byr = byz and b,, = 6,r,,. Given these identities, obtained from Equations (5.10a) and (5.10b):

Analysis

of Multilevel

Data

265

(5.11) Equation (5.11) shows that the cross-level difference between the estimated effect of z is a function of b,x, and the difference (6YX.L - byr.r). Many data constellations are possible, implying either that , the student level coefficient of z, is numerically smaller than its aggregate level counterpart i;yZ,.rror that byZ.,is numerically larger than 6YL.X. Suppose, for example, that b,, > 0 and that FY,., > b,.,. This would imply bypx > 6,,,.,; that is, the individual level effect of z would be numerically larger than its group level counterpart. Note that this includes the possibility that by_.xis positive while 6,,._.X is negative. It is to be emphasized, however, that this is just one out of numerous other data constellations which may imply quite different patterns of divergent individual level and aggregate level coefficients. While the above expression is not suitable for predicting empirical differences between bYr..rand 6YZ.yL.X, it clarifies the following very important point. Excluding the restriction b,, = 0 (which gives byz.x = b,Z_= 6,,,z.,I., = 6yr), Equation (5.11) implies that any observed difference between byz.xand byl,r can be traced back to the difference 6,_, - byr.r, that is, to the cross-level difference of the coefficients associated with X. In other words, provided that _b,r,# 0 the factors determining 6Y,X., f b,.: also explain any difference between by_.x and bYr.X.It is, therefore, sufficient to examine the factors responsible for b,,x,z # &,,. A decomposition of the coefficient byr.r has been derived by Somers (1978). The key feature of Somers’ derivations is that first-order partial regression coefficients can be computed as bivariate regressions among ‘residualized’ variables. That is, the coefficient bl.,r.,can be computed as b,., = C (yr x,)lV(x,), with yr defined as yI = y - by; z and with X, defined as x, = x - b,, z. This can be seen by noting that the above definition of ‘residualized’ variables implies V(q) = V(x) - 6,: C(.zx) and C(y,x,) = C&r) - by2 C(xz). VW and CcV,x,) are ‘residualized’ variances and covariances and it can readily be seen that by.r.l = C(y,_x,)lV(x,) is equivalent to the familiar formula for computing first-order regression coefficients. The aggregate level coefficient &., can analogously be computed as &.Z = C~,_f,)lV(Z,), with ,‘r = y - iYZx and 1, = ,? - 6,c,z defining ‘residualized’ group means. It has already been noted that b,, = 6,,z and that b,, = Fx,; that is, since the variable z is constant within groups the bivariate regression of y 0n.z and ofx on z will give identical individual level and aggregate level coefficients. This, however, implies that the residualized scores defined above are obtained by subtracting the same factor from individual level scores and from the corresponding group means. For example, the residualization of x and R is accomplished by subtracting the same factor, namely b,, z = gx, z from both scores. This feature can be used to decompose residualized variances and covariances as follows:

C(Y,

V(x,>= VW + Vkv)

(5.12a)

x,>= co?, .c) + ccvw 4v>.

(5.12b)

The analogy between Equations (5.12a) and (5.12b) and the decompositions for Equations (5.4a) and (5.4b) presented in the foregoing section should be obvious. This correspondence can be used to decompose the coefficient b,., in essentially the same way as the bivariate coefficient b,. The ensuing decomposition is given by:

266

K.C.CHEUNGeral.

(5.13) with Ef., defined as .Ez,, = V(,f,)/V(x,).

For a detailed algebraic derivation of Equation (5.13) see Somers (1978). Using the above equations, it is possible to express the coefficients by.r_._ and GYX.: as: (5.14a)

(5. I4b) with Tr defined as T, i- C(_Pl2,) - b,(,) V(,f,). The expression r, can be interpreted in basically the same way as the expression T introduced earlier for bivariate regression models. That is, it is possible to formulate an ANCOVA model similar to the model specified in Equation (5.7) above, with the only difference that X, is used as the covariate and that yr is used as the dependent variable. Accordingly, T,, can be interpreted as involving two components: (1) the covariance of x, with the grouping criterion, and (2) the ‘group effects’ reflected by the corresponding ANCOVA model. In terms of the sources of a given difference between b,., and !&, Equations (5.14a) and (5.14b) have essentially the same implications as the Equations (5.6a) and (5.6b) derived earlier. There is, in fact, no fundamental difference except that residual&d variables rather than the original scores are employed. The key point clarified by Equations (5.11a) and (5.14b) is that any difference between byr._ and &.Z would appear to be due to the fact that V(x,) differs from the corresponding between group variance V(P,). Moreover, since V(x,> = V(x) - b,c, C( xz ) an d since V(,f,> = V(,t,) - b,rI C(xt), Equations (5. lla) and (5.14b) can be formulated as:

b,v,.,= Trl[V(x) - br, CC=)1+ by.+,

(5.15a)

which show that it is not necessary to argue in terms of residualized variances. As can be seen from Equations (5.15a) and (S.lSb), the difference between 6,.r and $.,., is clearly due to the fact that the between group variance V(Z) is smaller than the corresponding individual level variance. An important special feature of the three-variable model examined here is that the determinants of.byX.r# &., also determine differences between the estimated effects of the group charactertstic Z. This has already been noted in conjunction with Equation (5.11). An alternative approach to make this point explicit is to use the identities b,c, = 6X:and b,, = 6,,: and to reformulate the Equations (5.lOa) and {S. lob) as: + b,,&

(5.16a)

&., = by- - b,: I~JV(%J + b,+J-

(5.16b)

b.yr-.r= brr - b,, P-,WQ

It can be seen that V(q) is the only component of the aggregate level coefficient gYZ.,that is different from the components involved in its individual level counterpart bypr. As can be seen from the above definitions of V(i,) and V(x,), it is precisely the fact that V(X) <

Analysis of Multilevel

Data

267

V(x) which determines observed cross-level differences between the effect estimates associated with the group characteristic z. In sum, then, the differences between the student level and the class level coefficients are due to the fact that the between class variance of the variable x is smaller than the corresponding between student variance. As has been said in the introductory section, a general argument favouring aggregate level analyses for models involving group related predictors concerns the assumption that group resources are identically available to each individual student, or that global group characteristics such as class size would influence student behaviours in exactly the same way. Such assumptions are highly questionable. Furthermore, it can be argued that aggregation introduces similar problems. Clearly, aggregation does not necessarily imply improved measurement of group characteristics, which may account for observed cross-level differences between effect estimates. Rather, as shown above, such differences are due to the variance reduction resulting from the aggregation of student related predictors to class or school means. It may be argued that the aggregation of student related measures reduces measurement error, but this would mean regarding within-class or within-school variances of student measures as error variances. In any case, the above expressions make clear that the investigator must justify the computation of group means explicitly.

Between Group Variance and Aggregation Bias The relationship between the class or school variance expressed as a fraction of the total variance and the aggregation bias (namely 6 - b), i.e., the difference between an aggregated class level regression coefficient and the corresponding student level regression coefficient) is more complex than indicated earlier as an inverse relationship. This relationship is illustrated in Figure 5.1 in a sketched diagram that indicates how

0 Between

Figure 5.1 Relationship

0.5 group variance as a proportion variance

between aggregation

bias and proportion

1 of total

of group variance to total variance.

24X

K.C.CHEUNCernl.

aggregation bias will generally tend to change as the proportion of between group variance to total variance changes. It will be noted that when the between class variance is small compared with the total between student variance then the aggregation bias is small. This corresponds to random grouping. However, aggregation bias increases as the proportion of variance associated with between class variance increases, to reach quickly a maximum value, but falls again to zero as the proportion of variance approaches unity. The extent of aggregation bias is seen to change quite rapidly in the regions occurring most commonly in practice, and it is clearly dubious to attach meaning to the size of this difference between the class level and the student level regression coefficients.