DEVELOPMENTAL
REVIEW
1.
314-321 (1981)
The Home Observation for Measurement Environment: A Comment on Zimmerman’s
of the Critique
RICHARDELARDO The University
of Iowa
AND ROBERT University
H. BRADLEY
of Arkansas at Little Rock
This paper is a reply to an article in this journal by M. Zimmerman titled “The Home Observation for Measurement of the Environment: A Comment on Elardo and Bradley’s Review.” We found Zimmerman’s critique of our original review article (Developmental Review, 1981, 1, 113- 145) to contain points which were well taken, and several which seemed pedantic. Specifically, we address his concerns about our discussion of the HOME scale’s interrater reliability, test-retest stability, concurrent validity, and predictive validity; and we reply to his comments about the uses of the HOME scale for purposes of screening and matching environments.
In our recent article (Elardo & Bradley, 1981), we presented an overview of studies which have employed Caldwell’s Home Observation for Measurement of the Environment (HOME) scale. This scale is probably the most widely used instrument to assess the quality of the learning environment provided for a young child in the home. In our article, we first described various alternative approaches to the assessment of home learning environments, and then gave a brief history of the development of the HOME scale by Dr. Bettye Caldwell and her colleagues. The major sections of our paper, titled “Psychometric Properties” and “Applications of the Scale,” evoked a variety of critical comments from Zimmerman (1981), some of which we feel are well taken, but many of which appear naive and/or pedantic. What follows is our reply to many of his points. INTERRATER
RELIABILITY
It was correctly observed that we cited six studies which reported on the interrater agreement obtained using the HOME scale, and that we failed to mention Stevenson and Lamb’s (1979) data. However, the figure (66%) reported bv Stevenson and Lamb is so much lower than the rest of Requests for reprints should be sent to Dr. Richard Elardo, Department of Early Childhood Education, University of Iowa, Iowa City, IA 52242. 0273-2297/81/040314-08$02.00/O Copyright All rights
0 1981 by Academic &as, Inc. of reproduction in any form reserved.
HOME
COMMENT
315
the reports and so far out of keeping with the results of our training sessions throughout the country that we still question whether it should be included. Perhaps it would have been better to report the average of 89.6% as we did, with perhaps a comment that the Stevenson and Lamb results were so out of line that one suspects lack of care in their research. A careful examination of the “Reliability” section in the Stevenson and Lamb article indicates that they did not follow a standard procedure for estimating interrater reliability. The interrater reliability they reported for their own five-item observation scale was only .52. Even Zimmerman (1981) stated that the Stevenson and Lamb results are “far lower than that reported by other investigators.” It seems to us that this recognition should make it clear why we “inexplicably did not include” the Stevenson and Lamb data in our calculations. We could have calculated the Stevenson and Lamb data in, thereby lowering the estimate of interrater reliability. Such a procedure would have resulted in a more conservative, but perhaps a less accurate estimate of the typical or common average interrater agreement that can be obtained with HOME. If we had used the median or modal coefficient, around 90% would have been reported. As mentioned above, we known from our training experiences that a 90% level of agreement is not that difficult to attain. In other comments about interrater reliability, Zimmerman noted that future researchers should use the Kappa coefficient (Cohen, 1960), since it takes into account the level of agreement expected by chance. We thought this was a good point. He also mentioned that the problem of consensual drift may distort reliability data and must be considered, but we cannot see why interrater agreement upon a “modification of the original criterion” would be expected to be higher than the interrater agreement on the original criterion. Consensual drift is a problem, but one which is not likely to have any appreciable impact on interrater reliability. Only when the original criterion is unclear (i.e., hard to measure) would such inflation tend to occur as a result of a consensually developed easier-to-measure criterion. Moreover, the consensual-drift phenomena might tend to deflate the internal-consistency and long-term stability coefficients in a test with a factor-analytic base such as the HOME scale. TEST- RETEST STABILITY
Zimmerman suggests that in examining the stability of the HOME, attention be paid to the stability of the mean scores as well as to test/retest correlations. He cites Cone (1977) as a reference for this suggestion. The suggestion is interesting and perhaps some comment should have been made about the slight increase in HOME scores from 6 to 24 months. Zimmerman does not suggest, however, how we should deal with the mean scores. This raises several concerns in general about his comment,
316
ELARDO
AND
BRADLEY
the first being his failure to elaborate on how we should deal with the stability of mean scores. The second concerns the article cited (Cone, 1977) and its suitability for use in research with the HOME scale. Cone’s article is focused on behavioral assessment. Strictly speaking, HOME falls a bit outside that domain. It assessesbehavior, objects, and events. It utilizes both interview and observation methodology. Cone (1977) notes that the stability of means has “typically been of greater concern to behavioral than nonbehavioral assessors” (p. 418). Cone argues that behavioral assessors are concerned with consistency in behavior over time because of “its necessity in evaluating the effects of a particular intervention” (p. 418). As mentioned in our earlier paper (Elardo & Bradley, 1981), the HOME has been used to evaluate program effectiveness. However, its tie to the intervention program is usually less direct than is the case of behavioral observations used to evaluate behavioral intervention such as were alluded to by Cone (1977). With respect to the HOME, the type of consistency or temporal generalizability of greatest concern is more like that which Cone discusses in reference to personality measures. It is a more “general” trait of the family environment. For example, the range of toys required to support the development of a 6-month-old is a bit different from the range needed at age 2. This is also true for certain outside-of-home experiences. From the standpoint of theory and empirical research, the objects, events, and to some extent parental behavior (e.g., punishment) which serve as indicators of environmental quality shift across the 3-year period covered by HOME. Thus, a shift in mean score would not necessarily indicate poorer reliability and validity for HOME. Ironically, Nelson, Hay, and Hay (1977) in their critique of Cone’s (1977) paper argue that he has overstated the need to establish generalizability across all of the six universes to which generalizability theory may be applied. They state that, “lack of consistency in assessment scores cannot always be interpreted as a lack of precision on the part of the assessment tool; the assessment tool may be precise, but the behavior being measured may have changed” (p. 428). As a matter of fact most of the changes in mean scores for HOME occur in subscales dealing with objects and events rather than parental behavior. However, note of these mean differences should be made when using HOME to evaluate programs so as not to ascribe purely developmental effects to the efficacy of the program. CONCURRENT
VALIDITY
Zimmerman believes that data from the work of Wulbert, Inglis, Kriegsmann, and Mills (1975) and the comments of Stevenson and Lamb (1979) indicate that the HOME scale is not sensitive to social-class differences in the middle and upper classes.
HOME
COMMENT
317
Apparently, Zimmerman has misstated the interpretation offered by Stevenson and Lamb (1979) regarding their findings with the HOME scale. They did not conclude that HOME was insensitive to social-class differences in middle and upper classes. They concluded that the HOME “may be too gross and insensitive to reveal individual differences among middle class homes and mothers” (p. 347). It would have made no sense for Stevenson and Lamb to correlate HOME with SES in their homogeneous sample since there was even less variability in SES than HOME scores. Nor would the Wulbert et al. (1975) study be a good study to use to examine the HOME&ES relationship. First, the sample contains half language-delayed children and a matched normal sample. Second, the children range from 2.5 to 6 years (mostly beyond the O-3 period covered by HOME). It seems that Zimmerman has misunderstood the Wulbert et al. (1975) study. Any correlation from the study has to be considered against the backdrop of the sample used. Half of the data used in the HOMESES correlation was from a language-delayed group. The other half was from a normal sample matched on SES. The HOME scores from the languagedelayed group were significantly lower than those from the normals. Scores on the HOME subscale “Maternal Involvement” were very low in the LD sample. Wulbert concluded “the mothers were generally conscientious in meeting the physical needs of the child, but aside from this there was little interaction” (p. 67). In essence, half of the mothers used in the correlation were aberrant. It is indeed ironic that Zimmerman would question the applicability of the HOME to the middle class on the basis of the Wulbert study. If any study verifies its applicability to the middle class, it’s the Wulbert study. Thirteen of the twenty language-delayed children were middle class or higher. Yet it is precisely these middle-class homes which yielded substuntially lower HOME scores than matched normals. It is, therefore, these middle-class families who need intervention. Inadequate parenting in middle-class families is identified just as is inadequate parenting in lower-class families. A low HOME score is just as indicative of problems in a middle-class home as in a lower-class one-the point made by Bloom 20 years ago. Nonetheless, Zimmerman is generally right in saying that there is a kind of ceiling effect on the HOME scale for middle-class families. He suggests that this “is a problem (which) . . . may limit its applicability to middle and upper class populations.” In fact, the scale was not primarily intended to discriminate between a generally adequate level of parenting and “super” parenting. There are many situations in which an instrument need not discriminate throughout the full range of a variable in order to accomplish its purpose. For making many of the selection and
318
ELARDO
AND
BRADLEY
placement-type decisions discussed by Cronbach (1969), it is not necessary to have an instrument which discriminates throughout the full range of competence or skill. Once a certain level of skill or of a characteristic is established, the decision can be made. In other words, in order to place you on an assembly line, it may be sufficient to establish that you have adequate mechanical skills. We need not try to identify you as a mechanical genius-just show you’re not a klutz. PREDICTIVE
VALIDITY
In this section of his critique, Zimmerman stated that there are two major problems which should prevent the uncritical acceptance of the findings of high predictive validity of the HOME scale: (1) direction of effect, and (2) the compounding of genetic and environmental effects. Actually, we do not see why it was necessary to raise this issue in his critique, since we feel we adequately did so in our original paper. In fact, a significant portion of his discussion of Willerman (1979) was a direct quote (which was not indicated by Zimmerman) or a close paraphrase of our discussion. The research design suggested by Zimmerman would, indeed, be a useful way for some researchers to examine environment/ development relations. In fact, Jordan (1980) has already employed a similar design and consistently finds the HOME scale to be one of the most significant predictors of later cognitive scores. A much more dynamic model has been suggested by Walberg and Marjoribanks (1976) using regression models. Furthermore, a more truly developmental analysis examining changes in environment and development across time would be even better (McCall, Appelbaum, & Hogarty, 1973). SCREENING
Zimmerman was quite critical of our section on screening. He mentioned that the studies we reviewed primarily provided only circumstantial evidence attesting to the HOME scale’s effectiveness as a screening instrument, and we do not disagree with this point. However, in the one study we did cite (Bradley & Caldwell, 1977) in which information was presented on the HOME scale’s sensitivity (the percentage of index cases correctly identified) and specificity (the percentage of nonindex cases correctly identified), Zimmerman accuses us of either reporting an error in calculation or using an unconventional definition of specificity. In fact, no error was made in calculating the specificity of the HOME (43%) in the Bradley and Caldwell (1977) article. The definition of specificity used was stated in the article: “the percentage of those classified as having the condition on the basis of their test scores who actually have the condition” (p. 418). In this case, 12 of the 28 who were classified as IQ < 70 actually had IQ < 70. The source of the definition, Frankenburg (1973),
HOME
COMMENT
319
was also given. Admittedly, the definitions are different. Zimmerman does a service in clarifying the difference. Zimmerman’s comments that the HOME scale has too low a level of sensitivity and specificity appear to us to be naive, and they also seem to betray a value bias. In the first place, Zimmerman criticizes both the 71% sensitivity rate and the 75% specificity rate. As Zimmerman must know, each of these could be changed by adjusting cutoff scores so as to increase either sensitivity or specificity (Stangler, Huber, & Routh, 1980; Gallagher & Bradley, 1972). However, increasing one generally is done at the expense of decreasing the other. The question is, which is more important? Zimmerman, in the last sentence in the paragraph suggests that specificity is the more important. Certainly, others before him have discussed the dilution of limited resources which results from overreferral (see Stangler et al., 1980). However, most-including Stangler et al. (1980) and Gallagher and Bradley (1972)-make the case that test sensitivity may be the more important. Admittedly, an instrument which gave a higher rate of both sensitivity and specificity would be preferable. The question is, where are such instruments? Gallagher and Bradley (1972) have discussed the limitations of existing instruments. Others, before and since, have described the inaccuracy of identification tools. For this reason, many have argued the need for combining information from a variety of instruments to increase predictive accuracy (Bradley & Caldwell, 1978). We have never recommended that the HOME scale be used in isolation as a screening device-nor do we know anyone who would. Furthermore, screening should be followed by a more thorough diagnosis to determine if intervention is really needed (Meier, 1976). Since the cost of diagnosis is generally not that high, especially as compared to the cost of life-long support to a person who could have benefited from early intervention, test sensitivity of a screening device is probably more important than specificity. The real question is what is the best course from among the real options available, not whether HOME or any other instrument is perfect. Let us reply to one other issue: Zimmerman’s concern that what screening studies should have reported is: how much more efficient is the HOME scale than are measures of SES? We feel this point is well taken-albeit the comparison is not technically necessary. There are no good direct comparisons, but there are some provocative findings in other studies. For example, most of the families of language-delayed children in the Wulbert et al. (1975) study had low HOME scores despite the fact that the majority were middle and upper class. Bradley and Caldwell (1977) found that the correlation between HOME and IQ was nearly two times as high as the correlation of SES and IQ among blacks. Finally, Johnson, Kahn, Hines, Leler, and Ton-es (Note 1) found a somewhat different
320
ELARDO
AND
BRADLEY
pattern of HOME/IQ relations among low-income Mexican/Americans as others have found with low-income blacks and whites. MATCHING
ENVIRONMENTS
We believe that Zimmerman’s concern that the usefulness of the HOME scale for matching purposes may be limited to lower and lowermiddle classes due to a ceiling effect on scores in the middle and upper classes is overstated. Granting the existence of a ceiling effect for the middle and upper classes, using the HOME at least ensures that none of the children are experiencing a poor family environment, something a social-status measure does not. Interestingly enough, as we study environmental/developmental relations more, we may find that further discriminations at the top end of the family environment scale are more important for lower-class than middle-class samples. The macroenvironment (outside the home) of a middle-class child may be such that it tends to support development (even going so far as somewhat protecting a child against a slightly deficient family learning environment). Having a parent who is a “super teacher-super responsive type” may not give a child a great advantage. As Stevenson and Lamb (1979) suggest, other variables may loom larger once the threshold is crossed for middle-class children. However, in a poor family where the outside (or macro-) environment does not generally support development, having a super parent may be some advantage over having a good parent. CONCLUSION
We have appreciated this opportunity to reply to some of Zimmerman’s criticisms of our earlier article. If this dialogue serves to help identify or clarify areas of needed investigation relative to the HOME scale, it has served a good purpose. REFERENCES Bradley, R. H., & Caldwell, B. M. Home Observation for Measurement of the Environment: A validation study of screening efftciency. American Journal of Mental Defyciency, 1977, 81, 417-420. Bradley, R. H., & Caldwell, B. M. Screening the environment. American Journal of Orthops.vchiarr.v, 1978, 48, 114- 130. Cohen, J. A coefficient of agreement for nominal scales. Educational and Psychological Measuremeni, 1960, 20, 37-46. Cone, J. D. The relevance of reliability and validity for behavioral assessment. Behavior Therapy, 1977, 8, 41 l-426. Cronbach, L. Essentials of psychological testing. New York: Harper & Row, 1969. Elardo, R., & Bradley, R. H. The Home Observation for Measurement of the Environment (HOME) scale: A review of research. Developmental Review, 1981, 1, 113- 145. Frankenburg, W. Increasing the lead time for the preschool handicapped child. In M. Karnes (Ed.), Not all litfle ~vago,is are red. Arlington, Va.: Council for Exceptional Children. 1973.
HOME
321
COMMENT
Gallagher, J., & Bradley, R. H. Early identification of developmental difficulties. In I. Gordon (Ed.), Early childhood education. 7lst yearbook of the National Society for the Chicago: Univ. of Chicago Press, 1972. Study of Education. Jordan, T. O/d man river’s children. New York: Academic Press, 1980. McCall, R. B., Appelbaum, M. I., & Hogarty, P. S. Developmental changes in mental test performance. Monographs of‘ the Society Ji)r Research in Child Drvelopment, 1973, 38(3). Meier, J. Lkvrlopmental ad learning disuhilities. Baltimore: University Park Press, 1976. Nelson, R., Hay, L., & Hay, W. Comments on Cone’s “The relevance of reliability and Therapy. 1977, 8, 427-430. validity for behavioral assessment.” Behavior Stangler, S. R., Huber, C. J., & Routh, D. K. Screening growth and development ofpreschool children: A guide for teSt selection. New York: McGraw-Hill, 1980. Stevenson, M. B., & Lamb, M. E. Effects of infant sociability and the caretaking environment on infant cognitive performance. Child Development, 1979, 50, 340-349. Walberg, H., & Majoribanks, K. Family environment and cognitive development: Twelve Research. 1976, 45, 527-552. analytic models. Reviells of Educationul Willerman, L. Effects of families on intellectual development. American Psychologist, 1979, 34, 923 - 929. Wulbert, M., Inglis, S., Kriegsmann, E., & Mills, B. Language delay and associated mother-child interactions. Developmental Psychology, 1975, 11, 61-70. Zimmerman, M. The Home Observation for Measurement of the Environment: A comment on Elardo and Bradley’s review. Developmental Review. 1981, 1, 301-313.
REFERENCE 1. Johnson, environment
NOTE
D., Kahn, A., Hines, A., Leler, of Mexican-Americ~anfamilies
H., & Torres, in a parent
sented at the annual meeting of the American Francisco, Calif., 1976. RECEIVED:
July 27, 1981
E. Measuring
Educational
the learning
Paper preResearch Association, San
education
program.