SOCIAL
SCIENCE
RESEARCH
4, 385-401
(1975)
Correlational Change Analysis: Reliability and Metric Issues] RONALD
L. LITTLE
Utah State Utliversity
Unless certain metric conditions are met, and unless reliability estimates are available, analyzing change data via correlational techniques is often inappropriate. Although these techniques are useful for problems of practical prediction, the absence of equivalent interval scales and reliability estimates may influence the value of the observed coefficient so as to misrepresent the underlying causal model. The consequences of ignoring reliability estimates and metric equivalences are discussed for each of the three most commonly used correlational techniques: (1) The correlation of crude gain scores with initial scores, (2) The correlation of crude gain scores with third variables, and (3) The correlation of residualized gain scores with third variables.
INTRODUCTION For hundreds of years, scientists have been grappling with the concept of change, unable to master it completely. Although the creation of integral and differential calculus inestimably advanced problem-solving in the physical sciences, it has not proved appropriate for the types of problems typically encountered in the social sciences (Coleman, 1968). Lacking the requisite techniques, all too frequently, social scientists either avoid altogether the issue of change (Bereiter, 1967) or repeat the errors of previous research efforts. The intent of this article is to demonstrate the pitfalls waiting to snare the unwary when correlational techniques are applied to change data. Both the consequences of violating metric assumptions, and the absence of reliability estimates are examined for the three most commonly used correlation methods: (1) the correlation of crude gain with initial scores, (2) the correlation of crude gain with third variables, and (3) the correlation of residualized gain with third variables.
Howard paper.
1The author Parker for
extends his thanks to Theodore their helpful comments on and
385 Copyright All
ri"h+.
0 1975
by Academic
<\f ~~"~nA>.,-t;r%"
in
Press, Inc. on,,
fnrm
TP
R. Anderson, Richard criticisms of an earlier
J. Hill, and draft of this
386 THE CORRELATION
RONALD
OF CRUDE
L. LITTLE
GAIN SCORES WITH INITIAL
SCORES
The most straightforward and frequently used means of determining the amount of change over time begins by defining gain or loss2 as the simple difference between final and initial observations. Thus, g =y - X, where x = observed initial score, y = observed final score, g = gain of observed scores.3 Using this notation, the Pearson product-moment correlation coefficient for initial and gain scores is rex, Reliability. Problems associated with the use of crude gain scores in correlational analysis have been documented for over half a century. Both Thorndike (1924) and Thomson (1924, 1925) noted that the correlation between observed gain scores and observed initial scores was spurious. That is, the correlation between observed scores, yeX, was not equal to the correlation between true scores, rGX. This difference between rg,y and TGX occurs because the observed scores contain errors, e, and ey. Thus,4
Equation (1) demonstrates that the error variance, Se,2, exercises an influence over the value of rgx. All other values being equal, the greater the error variance, the greater the difference between rgx and rcx. As the value of Se,* increases, the value of rgx is reduced by a fraction of S, 2. Additionally, the relationship between rgx and rGx is linear, and rGx mayXbe more positive, more negative, or equal to rgx. If the error variance is large enough, the sign of the correlation of the observed scores may be the opposite of the sign of the correlation of the true socres. rgx and rcx will share the same sign only when Se,* <~~xSQSX or when the sign of rGx is negative. Further, the sign of rG,y can never be estimated from knowledge of observed scores alone unless rgx is positive, 2As used throughout this paper, gain and loss are synonymous with change, i.e. gain and loss are quantifiable change. For ease of presentation, both are referred to as positive or negative gain, or merely gain. 3Lowercase subscripts denote observed scores, while uppercase subscripts denote true scores, i.e. scores without error. 4Derivations of many of the equations have been omitted for publication purposes. However, proofs will be supplied by the author upon request. 5Unless otherwise noted, it will be assumed throughout the paper that the error terms are neither correlated with one another nor with the true scores (cf. Lord, 1956, 1967).
CORRELATIONAL
ANALYSIS
OF CHANGE
which necessarily means that rGx must be positive. Empirically, somewhat unlikely occurrence because
387 this is a
rgx=(&2 t sy7. - 2r,,sxs,)‘1*~ Unless S,
RONALD L. LITTLE
388
equivalent for ease of presentation. However, when the relationship between scales becomes more complex, for example, when x = bv (S-equivalent) or when x cyn (P-equivalent), the resulting correlation coefficient will differ from one obtained using R-equivalent scales. One exception to this is when partial correlation techniques are employed scales, as will be discussed later.
in conjunction
with
S-equivalent
Many researchersare either unaware of or discount metric requirements when analyzing change. The reasonsfor ignoring so crucial an issuemay be traced to two separate but interrelated causes:(1) An affine transformation, on one or both interval scales in a correlation coefficient 7Q,=a+by,9 leaves the correlation coefficient unaffected; and (2) The consequencesof failing to useR-equivalent metrics has not been demonstrated, nor is it readily apparent. That an affine transformation on either or both scalesdoes not change the correlation coefficient is known by every sociology graduate student. Because this proposition is so familiar, it is possible to confuse r(bg).,., an affine
transformation
of the gain scores, with Y(bY _ .u)eu, an affine transforma-
tion of only one scale, which in effect creates scales which are not R-equivalent.lO
It can be demonstrated
that r(bg)x
is equal to rgX, but
br, J,, - S, qby
-
x)x
=
(3)
(b2S,* + S,* - 2br,,S,S,)~/2
It is obvious from a comparison of Eqs. (2) and (3) that rgx, i.e. r(bg)x, and r(by _ X)X are not identical. In fact, it is only when b = I, or in the limiting instance when rxy = rcbgjx = r(bY - X)X = I, that r(bg)x and r(by _ X)X are necessarily equal. Multiplying all of the y scores by a constant, b creates S-equivalent
metrics.
The result
is a correlation
coefficient
which
differs
in
value from the correlation coefficient which would have been obtained if both scaleshad R-equivalent metrics. The difference between the correlations is a function of 6.1 1 The precise consequenceof a scale transformation for the correlation between gain and initial scoresbecomesmanifest from an examination of Fig. 1. It demonstrates that r(bY _ x)X becomesless negative as the value of b deviates from zero. The higher the absolute value of b, the less negative the value of r(bY _ X)I. 91n sociological literature, transformations of this form are typically referred to as linear transformations because when they are graphed, the result is a straight line (Wilson, 1971, p. 433). lOSince adding a constant in a transformation such as this is merely a change in location, the resultant correlation coefficient is unaffected and a is assumed to be 0. Thus, each y value is multiplied by the constant b, which should not be confused with a regression coefficient. The relationship between scales is thus a simple linear function: x = hy. The scales arc thus S-equivalent. 1lAnalogous arguments could be made for scales which are more complexly related, e.g. x = $ (P-equivalence).
CORRELATIONAL ANALYSIS OF CHANGE
389
Fig. 1. Distributionof ‘(by _ X).y asa function of b, where sx = s?,. The implications for the correlation between initial and gain scoresare clear. If the scales for x and y are not R-equivalent, the value of the correlation between the initial and gain scores is not the coefficient of interest. By increasing the value of b sufficiently, it is possibleto insure that rcby - X)X achieves a value which is only slightly less than the value of rXy. may be more positive, more negative or equal to rgX and Thus, r(by - x)x may have the same or a different sign. Simply stated, if the scalesare not R-equivalent, the value of the correlation coefficient is a function of the difference between scales. Similar problems are encountered if ordinal variables are used in the correlation of gain and initial scores. In order to justify the use of ordinal scalesas if they were interval, it is necessaryto assumethat: (I) The ordinal scalesare monotonic functions of underlying interval scales,L* and (2) The ordinal scales are R-equivalent. When ordinal scales are not R-equivalent, problems arise which may be identical with those previously encountered [see Eq. (3)] or be more complex. The more complex issuesrequire explication. First, let x’ and y’ represent the R-equivalent interval scales.Second,let x’ = blx and y’ = bzy, where bl # b2 (S-equivalent ordinal scales). In this instance the observed correlation is not rCv _ X).u, but is in fact qb2y
-
bi.K)blxt
and b2rxySy
qb2y
may
-
bl.dblx
=
-
blSx
(b22S,2 + b,2SX2 - 2rXyb2b,SYSX)1/2.
(4)
l*The relationshipbetweenan ordinal scale and some “underlying” interval scale in fact, be capable of representation by a simplemonotonicfunction. However,
not,
without such a simplifying assumption, it becomes impossible to discuss the issue. In any event, the logic of the argument remains the same.
390
RONALD L.LITTLE
It is evident that Eqs. (2) and (4) are not equal except in the limiting instance. r(b 2Y _ b ,x)b, X does not equal the coefficient of interest, Ye,, and the value of the former is determined in part by 6, and b2 in much the same is a function of b. In both instances, the correlations lose way that y(by - x)u their zero-dimensional character, becoming dependent upon the scales involved.
THE CORRELATION OF CRUDE GAIN SCORES WITH THIRD VARIABLES A slightly more sophisticated method of analysis is the correlation of gain scores with third variable scores. Unlike the relationship between gain and initial scores, change does not have to be viewed as the dependent variable; it may be conceptualized as either the cause or consequence of the third variable. For example, depending upon the theoretical model under consideration, a positive increase in school grades over time may be thought of as either the cause of particiaption in school social clubs or the result of such participation. Reliability. Correlating crude gain scores with third variable scores does not create the spurious correlation problem encountered with the use of rgdY (Lord, 1967, p. 34). Usually, the measurement of z will be independent of x and y and will not share the errors of either. Thus, rgz = rCG + c - e,) (Z + Pz). It is clear then that no error term occurs more than o&e [cf. Eq. Cl)], and rG.&Sz rxz = --__-. S&
(5)
A comparison of Eqs. (1) and (5) demonstrates why spurious correlation is not a problem when correlating gain scores with third variable scores: The error variance, SeX2, is not included in Eq. (5). Because it is not, the relationship between rGz and rgz is one of simple attenuation. rGz shares the same sign with rgz but is larger in absolute value (Lord, 1967, p. 34). Even in the absence of reliability estimates, therefore, it is possible to estimate the value of rGZ from knowledge of I&. Scale requirements. Although instrument reliability creates no insurmountable obstacles, the correlation of gain and third variables raisesan issue which is not so easily resolved&the problem of metric equivalence.If z and g do not possess R-equivalent metrics, no difficulties occur becausethe transformation of one or both scales, assumingthat they are both interval scales,
391
CORRELATIONAL ANALYSIS OF CHANGE .-.-. -
I
I
I
I
-.\.
‘.
.2
.4
t
I
I r,y=.20
-
-
I
-._ -
.-.-.-.-.
-.-_-.-_-_-__
rxy=.50 ----_
-
.6-
-
.8
--
--------
-----
--
rxy..80
-1 .o +,.0
-
I + .8
I
I
+.6
+.4
+
I
I .2
0.0 Values
Fig. 2. Distribution rxy = rxz = ryz.
of
‘(by
_ x)z
of
.2
I -
I
.4
-
.6
I -
.a
-1.0
b
as a function of
b,
where sX = sy =sZ and
leaves the correlation unaffected. If, however, x and y are not R-equivalent metrics, problems similar to those found in the correlation of gain and initial scores will present themselves. Assume as before that the scale of the y values is a linear function of the x values: e.g. XE by (S-equivalent scales). Here, rgZ is in reality r(by - X)Z, and it can be proved that
bryzSy- rxzSx ‘cby - x)z = (bT,,,2 + sx2 _ 2brxy~%’
(6)
while rgz = @Ftyi’
ryzSy - r&G - 2r,,,S,S,)”
’
(7)
A comparison of Eqs. (6) and (7) demonstrates that rgz and r(b, - X)Z are not necessarily equal except in the limiting instance. When the x and y metrics are not R-equivalent, Eq. (6) indicates that r(b y - x)z ’ 3 values are a function of b, i.e. a function of scale differences. Figure 2 illustrates the relationship between b and r(b, - X)Z. As b decreases, the value of r(b,, - X)Z becomes more negative, approaching asymptote for all values of rxy at approximately b = -0.6. Under the conditions specified in Fig. 2, rgz is zero. As b decreases from +l .O, the value of r(by _ ,+ becomes progressively more negative than rgZ. 13r(by - X)Z is the observed correlation, but
rgz
is the correlation of interest.
392
RONALD
t2.0
I
I
I
I +1.5
I t1.0
I + .5
L. LITTLE I
1 .5
0.2 Values
I
I
I -1.0
I -1.5
of b
Fig. 3. Relationship of r(@, - ,+ and r(v _ .y)z where and where sx = sy = sz, rz,, = .8, rzx = .3, and rxv = .5.
r(by
_ J)z
is function
1 -2.0
of b,
The relationship between rsz and Y(bY _ u)z may be seen in Fig. 3. As it illustrates, depending upon the value of b, r(by _ x~Z may be nwre positive, more negative, or equal to rgz, and it may have the same or a different sign. When b is greater than I, r(bv _ xIz is more positive than rgZ. When b equals When b is less than 1, r(by - x)z is both more negative 1 ) r(by - u)z equa1s rgz. than rgZ and may have a different sign. As with r(b,, _ ,+, the value of r(by - .Y)t is a function of b. Again, the issue of using ordinal data as if it were interval arises, and the same assumptions necessary for the correlation of gain and initial scores are required. Although it is not necessary for the z scale to be R-equivalent to the x and y scales, the .x and y scales must be R-equivalent. If they are not, the associated problems are analogous to those discussed in conjunction with r(bzY - b, .r)b, x. Assuming that x’, JJ’, and z’ represent the underlying interval scales; and that x’ = b, x,11’ = b2y, and z’ = b3z ; and b, # b2 # b3 (S-equivalent ordinal scales), it can be demonstrated that hrJA
- b I r,,S,
Equations (7) and (8) obviously are not equal, and the difference is the inclusion of b, and b2 in Eq. (8). It should be noted that the term b3 has disappeared, proving that the equivalence of the z metric is unimportant. As before, the value of the observed correlation, r(b2 Y _ ,, , x)t,3Z, is a function of the bt and b2, the scale differences. As a result, not only does its value differ from the coefficient of interest. rgZ, but the zero-dimensional nature of the correlation coefficient is lost.
CORRELATIONAL
ANALYSIS
393
OF CHANGE
THE CORRELATION OF RESIDUALIZED SCORES WITH THIRD VARIABLES
GAIN
A yet more sophisticated technique employed in analyzing gain scores is the use of residualized scores where the linear effects of the x scores are removed.14 Even though no objective criterion for choosing residualized rather than crude gain scores exists, several justifications for such a choice have been given (Borhnstedt, 1969, pp. 116-117; DuBois, 1970, p. 109). Among the most important are (1) residualized gain scores are more reliable than crude gain scores, and (2) residualized scores do not require equal metrics (Manning and DuBois, 1962, pp. 291-295). Reliability. Crude gain scores are usually less reliable than either of the scores from which they are computed, which presents a major problem (see Lord, 1967; Borhnstedt, 1969). A similar,situation exists when residualized gain scores are used because they too are frequently less reliable than the x and y scores. Despite their lesser reliability, however, residualized gain scores do have the advantage of being generally more reliable than comparable crude gain scores (Manning and DuBois, 1962, p. 291). On the basis of this difference, a researcher might be tempted to conclude that residualized gain scores are always preferable to crude gain scores in analyzing change data. Such a conclusion is certainly not justified, especially if reliability estimates are not available. As is the case with the correlation of crude gain and initial scores, the correction for attenuation may produce values which have a different sign and are more positive or more negative than the values obtained from uncorrected scores. An examination of the equations for residualized true and observed scores may serve to clarify the situation: (9)
rgz - x = (I-YT
‘gz - rzxrgx *)I/*
(1 - rgx*)l/*’
The relationship between rGz . X and rgz . s can best be seen by comparing the numerators of Eqs. (9) and (10). They are identical except that rgz is multiplied by r,, in Eq. (9). When rgz is moderately high and not exceeded in value by rzx or rgx, rgz 3r,,rgx. When all three correlations are positive, multiplying rgz by r,,, the reliability of x, results in decreasing both the value of the numerator of Eq. (9) and the corrected correlation (rG2 . x Grgz . X). Depending upon the values and signs of r,,, rgz, rrdu, and to part
14 This article correlation.
will
treat
only
partial
correlation,
but
the logic
could
be extended
394
RONALD
L. LITTLE
the result of multiplying the correlation rgz by r,, may be a negative value in the numerator, and thus a negative value of rGZ - x. Partial correlation techniques, then, suffer from the same shortcomings as the correlation between initial and gain scores.If reliability estimatesare unavailable, the correlation of the true scores,rGz . X, is unknown. r(; z . x may be more positive or more negative than rgz . x and ma-y have an opposite sign (see Bohrnstedt, 1969; Lord, 1958). Scale requirements. Perhaps the most important argument advanced for the use of partial correlations when investigating change is that unequal metrics do not affect the correlation (Manning and DuBois, 1962, p. 294). Unfortunately, that argument is correct only when the unequal scalesare linearly related.15 If, for example, the scalesare R-equivalent or S-equivalent the value of rgez - x3 the correlation of linearly related unequal metrics, is identical to the value of rsz . x. 16 This result follows from the logic of partial correlation which removes the linear effects of a given variable, in this instancex. Metric differences assume critical importance when the relationship between scalesis a higher order, nonlinear function. The correlation between nonidentical, but linearly related scalesis rgx,
rg9
. .y = rty
. x) cz . .y)r
17
(11)
which is identical to the partial correlation for identical scaleswhere rgz . .y = rev . x) cz . .yf.
(12)
If, on the other hand, the difference between scales is a higher order. nonlinear function, e.g., X=-VP (P-equivalent scales),it can be demonstratedthat rgk . x =r(?.p . .u) tz . x).
(13)
A comparison of Eq. (13) with Eqs. (I 1) and (12) shows clearly that metric issuesmust not be ignored when using partial correlations. As long as the scales are linearly related there is no problem. If the relationship between scales is nonlinear, however, the value of the observed partial correlation, of the difference between scales. In this instance rgPz . x1 is a function rgPz . .x >rgz . x. As for using ordinal scaleswith partial correlations, no difficulties arise so long as: (1) Arguments such as Ldbovit’s (I 967, 1970, 1971) are accepted, and (2) The relationship between the ordinal scalesand the underlying interval IsThis situation is likely to be fairly common whenever reliability is low and when x and y variables share a common metric (cf. Cronbach and Furby, 1970). the calculation of residualized gain scores 1 f3r gz . x also equals ryr . x making unnecessary (cf. Cronbach and Furby. 1970). 17Partial correlations are essentially the correlation between sets of residualized scores (v - x = y - h,,u).
CORRELATIONAL
ANALYSIS
OF CHANGE
395
scales is linear. If the relationships between the ordinal and underlying interval scales are nonlinear, however, the result is very much like that illustrated in Eq. (13); the value of the resultant correlation is, in part, determined by the nature of the relationship between the ordinal and interval scales.
CONCLUSIONS The basic contention of this paper has been that the use of correlational techniques in the analysis of change is totally inappropriate for much sociological data. The reasons are various. Reliability. As discussedrepeatedly above, part of the problem arises because of the absenceof reliability estimates, estimates which sociologists seldom have accessto for their instruments. To criticize sociological research on the grounds that these estimates are lacking is hardly new, but such a criticism is especially damning whenever change scores are involved. For example, without the benefit of reliability estimates,how does one interpret the meaning of rgz or rgz . X? If practical prediction is of primary interest, there is no difficulty; simply provide one of the possible interpretations (e.g. McNemar, 1958; Costner, 1965) and be of rgx or rgzex done with it. If however, the goal is to develop somecausalmodel to explain the relationship between the variables involved, a simple and direct interpretation of the causalstructure is impossiblebecausethe correction for attenuation for rgx and rgz . x does not function in the samemanner as the simple correction for attenuation. A researcherknows that rx,, is always closer to 0 than rxY, sharing the same sign, and is seldom reticent about using an uncorrected correlation to make inferences about his causalmodel. rgx and rgz . x allow no simple or direct interpretation in terms of the underlying causal model. Without reliability estimates there is no way of knowing the approximate value or the sign of rGx or rGz . x, the coefficients of interest. Thus, the use of rgX or rgz . x when building or testing causal models is hazardous at best. Reliability estimates are not nearly so important when correlating crude gain with third variables, the reason being that the relationship between rGz and rgz is one of simple attunuation. rGz sharesthe samesign with rgz but is larger in absolute value (Lord, 1967, p. 34). Even in the absenceof reliability estimates,it is possibleto estimate the value of rGz from knowledge of rgz. If a liberal perspective is acceptable, then important theoretical values, i.e. true values, can be estimated, and theoretical growth can continue despite fallible measuringinstruments. The precise values of parameters and causallinkages will be lacking, but approximations of the causalstructure will be available. One thing is certain--if the discipline is to advance, sociologistsmust come to grips with the problem of obtaining reliability estimates when
396
RONALDL.LITTLE
investigating change over time. Unfortunately, these estimatesare not readily available for much sociological data. How, for example, would one go about obtaining reliability estimates for panel data? The most obvious technique would be to obtain test-retest reliability coefficients. With change scores, however, these coefficients beg the issueand answerthe wrong question. Similar problems are encountered when attempting to obtain reliability estimates for change over time measureswhen using material collected previously for other purposes. Classroomgrades are a good example. Again, test-retest reliability coefficients answerthe wrong question. Here the problem is to determine the degree to which each teacher assignsthe samegrade for the samework. Ideally a simultaneousindependent test is in order. Of course, such a procedure would be impossible, but it would be relatively simple to design a meansto test, over time, each teacher’sability to assigngradesto the samerange of scoresin an identical fashion. Whenever change, rather than stability, is the researcher’sfocus, simultaneously administered equivalent tests will provide the correct coefficient (Cronbach, 1947). Two problems are encountered with internal consistency techniques, however. First, they require multiple items which measurethe same thing, whereas many sociological variables can be measuredeffectively only as single items, e.g. marital status. Redundant questions may serve no purpose other than to diminish the cooperation of the respondents.Second, in studies involving change, it would be necessaryto run reliability estimatesat both Time One and Time Two. Although research costs would greatly increase, it would help avoid faulty causalinferences and unnecessaryconflicts due to inconsistent replications. Sociologists can no longer afford to ignore problems associatedwith obtaining reliability estimates, nor, when they attack such problems, can they continue aping their cousins in psychology. Sociological problems frequently require unique sociological solutions, including the development of reliability estimates which are appropriate to sociology. The number and variety of reliability coefficients available is limited only by the researcher’simagination and the type of problem under consideration (see American Psychological Association, 1966). An example of this type of work comes from Heise (1969), who developed a novel technique for obtaining test-retest reliability estimates for single item measuresusing path analysis. His is an interesting beginning, but it barely scratchesthe surface of possibilities. Scale requirenzents. Problems associated with metric inequality are perhaps more taxing than those associated with the absence of reliability estimates. Sociologists are frequently forced to use scales for which it is impossible to assumeeither interval scaling, identical metrics (Lord, 1956, 1958, 1967; Manning and DuBois, 1962) or L-equivalent metrics. Too often sociological research has little or no control over the measurement instrument, the respondent. or the means of data collection.
CORRELATIONAL
ANALYSIS
OF CHANGE
397
Under such conditions, assuming R-equivalent metrics is dubious, as is any assumption of interval scaling. Even if the same measuring instrument were employed on more than one occasion an assumption of R-equivalent scales cannot always be justified (see Lord, 1958; Manning and DuBois, 1962).’ 8 For example, delinquency studies frequently compare rates of delinquency at two points in time. Because arrests and convictions are related to ethnicity. area of residence, historical period, and other factors, it is doubtful if internal scaling should be assumed, let alone R-equivalent interval scaling. Similar problems arise in educational research which uses classroom grades. Quite obviously, different teachers have different definitions of each grade category. Highly probably, a given teacher assigns the same grade to different students for entirely different reasons. Almost certainly, teachers change their grading practices over time. Under conditions in which the scales are not R-equivalent, the observed correlation is frequently not the coefficient of interest. When correlating gain scores with either initial scores (rgX) or third variable scores (r,,), the observed correlation is a function of the metric difference. If partial correlations are used (TV, ‘X) ) metric differences create no problems unless the scales are not linearly related. In instances such as those above, the observed correlation is not identical to the desired correlation. The relative magnitude of the desired correlation can merely be estimated, utilizing knowledge of the observed correlations in conjunction with estimates of the scale difference. There is little likelihood that reliable estimates of these differences are possible. How does one evaluate the relative sizes of two observed scores which on the surface are identical, especially if they are obtained from the same measuring instrument? Common sense suggests that the two scores may be unequal on some underlying dimension, but common sense does not suggest a means for determining the precise nature of this difference. Even knowledge of the observed correlation offers no help in estimating the difference. Were it possible to obtain reliable estimates of scale differences, it is still difficult to ascertain the correct sign of the desired correlation. Under optimum conditions, a researcher might be able to determine that an observed correlation is more positive or more negative than the coefficient of interest. Even SO, without some certainty as to the sign of the estimated correlation. articulation of a causal structure is difficult. Necessarily, the validity of interpreting a theoretical structure depends upon the accuracy of the estimate of the scale differences. The use of ordinal data as if it were interval creates problems similar to, but more complex than, those just discussed. In addition to the issue of metric inequality, there is the issue of the equality of the relationship between 18This is considered by Ctonbach the scales share a common
metric
(1970,
and Furby p. 69).
to be a philosophic
issue so long as
398
RONALD
L. LITTLE
the ordinal scales and the underlying interval scales. In any event, the result is the same; the observed correlation is not the coefficient of interest, making interpretation of any causal structure difficult at best. The solution to problems created by metric inequalities is analagous to solutions for problems of unreliability. Ideally, it is necessary, through curve fitting techniques, to demonstrate the precise mathematical relationship between the two scales with the effects of change removed. For example, teachers could be given a hypothetical set of data and be asked to assign grades to it at two distinct points in time. This is very much like obtaining test-retest reliability coefficients. However, because the effects of change must be eliminated from the reliability estimates, secondary and panel data are precluded from use. Once the precise mathematical relationship between the scale is known, it is possible to correct change score correlation coefficients. Unfortunately, curve fitting techniques arc time consuming and not always satisfactory, but a more convenient alternative is available. If the test-retest reliability coefficient, rxy, approximates unity, the regression coefficients. b,,, or b,, , can serve as rough approximations of scale inequality. When b,,, and h,, are approximately equal to 1.0, it may be assumedthat the scalesare d-equivalent. l9 All other values indicate that the scalesare not R-equivalent, and that correlation techniques, other than partial correlations, require correction factors. If Y,, approximates 1.0, and if b,,, and h,, do not, the regressioncoefficients can be used to obtain corrected values for the observed correlations. The easiestprocedure is to replace the hs in Eqs. (3), (4), (6), or (8) with !I,,,, and calculate the corrected values.20 Such corrected correlations, while only approximations, will nonethelessallow continued theoretical development. Obviously, it is not always possible to obtain measuresof reliability. When this information cannot be obtained, it is incumbent upon any researcherto evaluate to the best of his ability, both the degreeof reliability and the nature of the relationships between scales.In someinstancesthis may be impossible, in others, not. For example, in recent years the meansand motivations for more accurately reporting crime statistics have occurred. It is therefore possible to estimate the nature of the relationship between scales. When using crime statistics, it would be safe to infer that the scale for later statistics is greater than the scale for previous statistics (b > 1). Clearly such estimates of b are far from precise, but nonethelessthe advantage of using 19This technique is accurate only when the reliability coefficient equals unity (YXJJ = 1.0). 20The correction procedure for partials, when necessary. is slightly more complex. Corrected values of rgx and rgz must be determined and then used in Eqs. (9) or (10) to obtain a corrected partial.
CORRELATIONAL
ANALYSIS
OF
CHANGE
399
an imprecise estimate far outweighs the disadvantages of attempting to articulate without any estimate at all. Unquestionably the use of correlational techniques may cause difficulties for sociologists whenever change is being analyzed in terms of theoretical or causal networks. They often provide estimates of the true relationships which are at best imprecise and at worst incorrect. In either instance, theory building is restricted. If sociology is to maintain continued growth, substantive theories must be developed and if adequate theories are to be developed, new measurement tools and techniques must be built and old ones refined. Sociologists of all persuasions must therefore turn their attention to measurement. The time is long past when they can safely leave measurement problems to the “methodologists.”
REFERENCES
American Psychological Association (1966), Standards for Educational and Pschological Tests and Manuals, American Psychological Association, Washington, D.C. Anderson, W. T. (1954), “Probability models for analyzing time changes in attitudes,” in Mathematical Thinking in the Social Sciences (P. F. Lazarsfeld, Ed.), pp. 17-66, The Free Press, Glencoe, Illinois. Bereiter, Carl (1967), “Some persisting dilemmas in the measurement of change,” in Problems in Measuring Change (C. W. Harris, Ed.), pp. 3-20, University of Wisconsin Press, Madison. Bohrnstedt, George W. (1969), “Observations on the measurement of change,” in Sociological Methodology 1969 (E. F. Borgatta and G. W. Bohrnstedt, Eds.), pp. 113-I 3 3, Jossey-Bass, San Francisco. Boudon, Raymond (1968), “A new look at correlational analysis,” in Methodology in Social Research (H. M. Blalock, Jr. and Ann B. Blalock. Eds.), pp. 199-235, McGraw-Hill, New York. Campbell, Donald T. (1967), “From description to experimentation: Interpreting trends as quasiexperiments,” 61 Problems in Measuring Change (C. W. Harris, Ed.), pp. 212-242, University of Wisconsin Press, Madison. Carver, Ronald P. (1970), “The relation between two measures of learning: Residual gain and common points of mastery,” in Research Strategies for Evaluating Training (P. H. DuBois and G. D. Mayo, Eds.), pp. 100-108, Rand McNally, Chicago. Coleman, James S. (1968), “The mathematical study of change,” in Methodology in Social Research (H. M. Blalock, Jr. and Ann B. Blalock, Eds.), pp. 428478, McGraw-Hill, New York. Costner, Herbert L. (1965), “Criteria for measures of association,” American Sociological Review 30, 341-353. Cronbach, Lee J. (1947), “Test ‘reliability’: Its meaning and determination,” Psychomet&a 12, 1-16. Cronbach, Lee J. (1960), Essentials of Psychological Testing, 2nd ed., Harper, New York. Cronbach, Lee J., and Furby, Lita (1970), “How we should measure ‘change’-or should we?” Psychological Bulletin ?4, 68-80. DuBois, Phillip H. (1957), Multivariate Correlational Analysis, Harper,NewYork.
400
RONALD
L. LITTLE
Phillip H. (1965), An Introduction to Psychological Statistics, Harperand Row, New York. DuBois, Phillip H. (1970), “Correlational analysis in training research,” in Research Strategies for Evaluating Training (P. H. Dubois and G. D. Mayo. Eds.), pp. 109-116, Rand McNally, Chicago. DuBois, Phillip H., and G. Douglas Mayo, Eds. ‘(1970), Research Strategies for Evaluating Training, Rand McNally, Chicago. Garside, R. F. (1956), “The regression of gains upon initial scores,” Psychometrika 21, 67-77. Glock, Charles Y. (1955), “Some applications of the panel method to the study of change, ” in The Language of Social Research (P. F. Lazarsfeld and M. Rosenberg, Eds.), pp. 242-250, The Free Press, New York. Goodman, Leo A. (1962), “Statistical methods for analyzing processes of change,” American Journal of Sociology 68, 57-78. Harris, Chester W., Ed. (1967), Problems in Measuring Change, University of Wisconsin Press, Madison. Heise, David R. (1969), “Separating reliability and stability in test-retest correlations,” American Sociological Review 34, 93-101. Heise, David R. (1970), “Causal inference from panel data, ” in Sociological Methodology 1970 (E. F. Borgatta and G. W. Bohrnstedt, Eds.), pp. 3-27, Jossey-Bass, San Francisco. Labovitz, Sanford (1971), “In defense of assigning numbers to ranks,” American Sociological Reivew 36, 521-522. Labovitz, Sanford and Lubeck, S. G. (1969), “Issues of social measurement,” Ef al. 2, 111. Linn, Robert L. and Werts, Charles E. (1969), “Assumptions in making causal inferences from part correlations, partial correlations, and partial regression coefficients,” Psychological Bulletin 72, 307-310. Lord, Fredric M. (1956), “The measurement of growth,” Educational and Psychological Measurement 18, 421437. Lord, Fredric M. (1958), “Further problems in the measurement of growth,” Educational and Psychological Measurement 16, 437-45 1. Lord, Fredric M. (1966), “The relation of test scores to the trait underlying the test,” in Readings in Mathematical Social Science (P. F. Lazarsfeld, Ed.), pp. 21-53, M.I.T. Press, Cambridge, Massachusetts. Lord, Fredric M. (1967), “Elementary models for measuring change,” in Problems in Measuring Change (C. W. Harris, Ed.), pp. 21-38, University of Wisconsin Press, Madison. Lord, Fredric M. (1970), “Problems arising from the unreliability of the measuring in Research Strategies for Evaluating Training (P. H. DuBois and G. instrument,” D. Mayo, Eds.), pp. 79-93, Rand McNally, Chicago. McNemar, Quinn (1940), “A critical examination of the University of Iowa studies of environmental influences upon the IQ,” Psychological Bulletin 37, 63-92. “On growth measurement,” Educational and Psychological McNemar, Quinn (1958), Measurement 18, 47-55. McNemar, Quinn (1962), Psychological Statistics, 3rd ed., John Wiley, New York. Manning, Winton H:, and Dubois, Philip H. (1962), “Correlational methods in research on human learning,” Perceptual and Motor Skills 15 (Monograph Supplement 3-Vl5), 287-320. “A note on treating ordinal data as interval data,” American Mayer, Lawrence S. (1971), Sociological Review 36, 5 19-520. DuBo~s,
CORRELATIONAL
ANALYSIS
OF CHANGE
401
“A weakness of partial correlation in sociological studies,” Arnold M. (1949), American Sociological Review 14, 536-5 39. Ruth, Floyd L. (1970), “Measuring gain from a common point of mastery,” in Research Strategies for Evaluating Training (P. H. DuBois and G. D. Mayo, Eds.), pp. 94-99, Rand McNally, Chicago. Schweitzer, Sybil and Schweitzer, Donald G. (1971), “Comment on the Pearson r in random number and precise functional scale transformations,” American Sociological Review 36, 518-519. “A formula to correct for the effects of errors of Thomson, Godfrey H. (1924), measurement on the correlation of initial values with gains,” Journal of Experimental Psychology 7, 321-324. Thomson, Godfrey H. (1925), “An alternative formula for the true correlation of initial values with gains,” Journal of Experimental Psychology 8, 323-324. Thorndike, Edward L. (1924), “The influence of the chance imperfections of measures upon the relation of initial score to gain or loss,” Journal of Experimental Psychology 7, 225-232. Thorndike, Robert L. (1942), “Regression fallacies in the matched groups experiment,” Psychometrika 7, 35-102. Tucker, Ledyard R., Damarin, Fred, and Messick, Samuel (1966). “A base-free measure of change,” Psychometrika 31, 4.57-473. Vargo, Louis G. (1971), “Comment on ‘The assignment of numbers to rank order categories,“’ American Sociological Review 36, 5 17-5 18. Webster, Harold and Bereiter, Carl (1967), “The reliability of changes measured by mental test scores,” in Problems in Measuring Change (C. W. Harris, Ed.), pp. 39-59, University of Wisconsin Press, Madison. Wilson, Thomas P. (197 l), “Critique of ordinal variables,” Social Forces 49, 432444. Zieve, Leslie (1940), “Note on the correlation of initial scores with gains,” Journal of Educational Psychology 31, 391-394. Rose,