JOURNAL
OF RESEARCH
IN PERSONALITY
23, 450-468 (1989)
Assessing Parental Childrearing Behaviors: A Comparison of Parent, Child, and Aggregate Ratings from Two Instruments J. CONRAD SCHWARZ AND JACK MEARNS The Universit.v of Connecticut This report examined the relative validity of scores of childrearing behavior based upon ratings by four rater types (mother, father, college-age child, and a sibling) from 186 families. Using the Wore11 and Wore11 Parent Behavior Form (PBF; Wore11 & Worell, 1974). subscale scores based on each rater type were of roughly equal internal consistency, had parallel factor structures, and yielded modest agreement with the three other rater types. These results are parallel to earlier findings for Schaefer’s Child’s Report of Parent Behavior Inventory (CRPBI; Schaefer, 1%5; Schwarz, Barton-Henry, & Pruzinsky, 1985) with a similar sample. There was little or no evidence that ratings by the child subject were superior to those of any other family member. Aggregating ratings of multiple family members greatly increased the generalizability of factor scores: The correlations between the three congruent factors of the PBF and the CRPBI averaged .79 when 4-rater aggregate scores from each istrument were employed. Sources of validity and error in assessing childrearing behavior were discussed. 0 1989 Academic
Press. Inc.
The current study follows from research Schwarz, Barton-Henry, and Pruzinsky (1985) conducted on Schaefer’s (1965) Child’s Report of Parent Behavior Inventory (CRPBI), the most widely used measure of parental childrearing behavior. Schwarz et al. demonstrated that CRPBI scores have very low generalizability when a mature child is the sole informant. For the three childrearing factors assessed by the CRPBI, the child’s ratings correlated .38 with ratings by other single family members, on This study was sponsored in part by PHS Grant ROI MH31750-01-6, by PHS Grant 5ROl AAO6754-01-02, and by funds from the University of Connecticut Research Foundation and Computer Center. A summary of the results of the study was presented at the Conference on Human Development, Nashville, Tennessee, April 3-5, 1986. The authors thank Ann Merritt, Martin Krugman. Bronwen Williams, Carl Hindy, Tom Pruzinsky, and Marianne Barton-Henry for their assistance in data collection and coding; George Goldsmith, Sterling Green, and Bob Upson for their assistance with statistical analyses: David Kenny for his statistical advice; and David Wheeler for his comments on the manuscript. Requests for reprints should be sent to J. Conrad Schwarz, Ph.D., University of Connecticut, Department of Psychology, U-20, 406 Babbidge Road, Storrs, CT 06269-1020. 450 0092-6566189 $3.00 Copyright 0 1989 by Academic Press. Inc. All rights of reproduction in any form reserved.
ASSESSING
CHILDREARING
BEHAVIORS
451
the average; and for individual subscales, the mean pairwise agreement between family members was .30. Since, in the majority of studies using this instrument, it is the child’s report alone that serves as a basis for measuring parental behavior, the null hypothesis on many occasions may be erroneously accepted simply because of the low generalizability of measurements of parental behavior. Even if parental behavior had had important effects on the development of the child’s personality, such effects would be extremely difficult to detect with such unreliable measures of parent behavior; and much larger sample sizes would be required than are customarily employed in psychological research. Following the recommendations of Epstein (1980, 1983) for improving personality trait measurement, Schwarz et al. (1985) demonstrated that the reliability and validity of scores on the CRPBI could be substantially increased by aggregating the ratings of multiple informants across scale dimensions. The Parent Behavior Form (PBF; Wore11 & Worell. 1974) is a lesserknown retrospective report of parenting behavior. It has been the subject of several theses and dissertations and has been used in a few published studies (Kelly & Worell, 1976, 1977, 1978; McCraine & Bass, 1984). The PBF uses a similar format to the CRPBI, asking respondents to give retrospective ratings of the parents’ behavior toward the child, and the PBF, like the CRPBI, yields three relatively independent factors (Kelly & Worell, 1976). The three factors obtained for the PBF were parental warmth versus rejection, parental control, and parental cognitive involvement. The first two factors appear identical to factors reported in CRPBI research; however, the third, the cognitive involvement dimension, appears to be unique to the PBF. If the cognitive involvement dimension of the PBF proves to be relatively independent of the three factors measured by the CRPBI. four factors of childrearing behavior must exist rather than the three that have heretofore been presumed to cover the entire domain (see review by Golden, 1969). In addition to allowing investigators to assess parents’ cognitive involvement with their children, the PBF also includes two validity scales which are designed to identify respondents whose reports are not valid. These two scales are a social desirability scale, whose items consist of desirable but unlikely behaviors on the part of parents, and an irrationality scale, which consists of nonsensical items whose endorsement indicates either lack of attention or a frivolous test-taking style. Despite the lack of research on the PBF, the unique aspects of its cognitive factor and its validity scales make it worthy of study. The present study inquired whether the reports of mature children from the PBF, which uses methods similar to the CRPBI, would have comparably poor generalizability, and if so, whether the aggregation of
452
SCHWARZ
AND
MEARNS
ratings from multiple family members would yield scores of adequate generalizability, as was the case with the CRPBI (Schwarz et al., 1985). In the present study, each of four family members (subject child, sibling, mother, and father) rated both mother and father on the PBF. Because the present and the earlier report used overlapping samples, it was possible to examine correlations between the CRPBI and the PBF, and to use the CRPBI as a criterion against which to assess the validity of aggregate scores as compared with scores based on different single rater types. Thus there were several purposes of the present study. First, we sought to learn more about the attributes of mothers, fathers, mature children, and their siblings as raters of childrearing behaviors directed toward the subject child. Would the agreement between pairs of rater types be as low as with the CRPBI? Would the scores of some rater types (e.g., the child) have greater generalizability than those of others? Although the PBF has been employed in the personality literature, particularly in relation to sex role development, very little information has been published about its psychometric properties and no information on its convergence with the CRPBI. Second, we sought to examine the internal consistency and factor structure of the PBF subscales. Would the items and scales of the PBF have similar meaning for each rater type? This condition, which existed for the CRPBI, is a prerequisite to aggregation across multiple raters. Third, we wanted to compare the factor structures of the PBF and the CRPBI. Does the PBF assess the same or different factors of childrearing as the CRPBI? And finally, we sought to examine the convergence and discriminant relationships of single- and multiple-rater PBF factor scores with multiple-rater factor scores from the CRPBI, the more extensively validated measure of parental childrearing behavior. Such an analysis would suggest which rater types are likely to provide the most generalizable and valid ratings and would show the extent to which the aggregation of multiple ratings can improve the validity of measures of parental childrearing behavior. METHOD
Subjects The participants were 4 members from each of 296 families: (a) a freshman college student subject (131 were male and 165 were female), (b) the mother, (c) the father, and (d) one sibling who was within 3 years (plus or minus) of the student’s age. Mean subject age was 18.4 years (SD = .9) and mean sibling age was 18.1 years (SD = 2.2). About 50% of the families in this total sample had scores above the 75th percentile of the Schwarz Interparental Conflict Scale (IPC; Schwarz, 1988) because of intentional and systematic
ASSESSING
CHILDREARING
BEHAVIORS
453
overselection of families high in parental conflict.’ However, for this study, portions of those families high on IPC scores and of those containing a female student were randomly deleted in order to create a sex-balanced sample that was representative of the university population with respect to its distribution of scores on the IPC. The analyses of the PBF alone were conducted on this sex-balanced, representative subsample of 186 families. This sample overlapped by about 60% with that used in the Schwarz et al. (1985) study since additional families were recruited to the Family Dynamics Study subsequent to the analysis of the CRPBI. For the analyses in which ratings of PBF factors are correlated with CRPBI factors (Table 3), the sample size fell to 171 (86 female students and 85 male) because some members of some families who completed the PBF had not fully completed the CRPBI. Recruifment and demographics. Freshmen students were recruited from Introductory Psychology classes and by direct mait for participation in a study of the relationship between family interaction patterns and later adolescent personality. In keeping with the objectives of the larger study, criteria for a subject’s participation were (a) freshmen class standing; (b) a sibling within 3 years of the student’s age, who was willing to complete questionnaires by mail; (c) a mother and a father with whom the subject had lived until age 16 who were both willing to complete questionnaires by mail; and (d) a roommate or friend who was likewise willing to participate in two testing sessions. Approximately 70% of the families invited to participate in the study agreed to do so; and, for those families who agreed, the return rates for members were 92% for mothers, 90% for fathers, and 89% for siblings. The families were generally of middle-class standing with a mean annual income of $33,000 and an interquartile range of $13,000. The median occupational level as coded by the Hollingshead Index of Social Position (Hollingshead, 1957) was 2 (business managers and lesser professionals) with an interquartile range of 1.
Measures Parent Behavior Form (PBfl. The PBF was devised by Worell and Worell in 1974 and consists of two identical sets of 135 items, one pertaining to the mother and the other to the father. These items constitute 15 subscales, 13 scales assessing childrearing behavior (Hostile Control, Rejection, Achievement Control, Strict Control, Punitive Control, Lax Control, Warmth, Active Involvement, Egalitarianism, Cognitive Independence, Cognitive Curiosity, Cognitive Competency, and Conformity), and 2 validity scales (Social Desirability and Irrationality). Each item describes a particular behavior on the part of the parent, and the respondent rates each statement as “like (3),” “somewhat like (2),” or “not like (1)” that parent.* Child’s Reporf of Parent Behavior Inventory (CRPBI). The CRPBI (Schaefer, 1965) has a structure very similar to the PBF. It contains a mother and a father set of 108 items, each of which comprdmises 18 subscales of 5 or 8 items.3 The pronoun structure of items in both the PBF and the CRPBI was modified to create different forms that could be used to obtain ratings of parental behavior toward the subject by different family members.
’ These data were collected as part of a larger study of family dynamics and their effects on late adolescent personality by the first author. It was for the purposes of the larger study that there was an oversampling of high conflict families, which was corrected for in the current study. 2 The PBF and the PBF Manual may be obtained from Dr. Judith P.Worell, Department of Psychology, University of Kentucky, Lexington, KY 40506. ’ The 18 scales of the CRPBI comprise three relatively independent factors. Assigned to the Accepfance factor are the following scales: Positive Involvement, Acceptance, Child-
454
SCHWARZ AND MEARNS
Procedure Students completed the PBF, the CRPBI, and several other questionnaires in groups of 15 to 20. Students were given the PBF and the CRPBI in counterbalanced order in two sessions I month apart. Questionnaires were mailed separately to the mother, father, and sibling, with each family member being given a separate envelope in which to return the materials. The first packet contained the CRPBI and other questionnaires. Upon its return, a second packet was mailed containing the PBF and other measures. All of the recipients were encouraged to work independently so that the data set consisted of 16 sets of unique ratings: namely, ratings of the mother’s childrearing behavior with regard to the subject and the father’s childrearing behavior with regard to the subject as reported by the mother. father, subject, and sibling on two instruments. the PBF and the CRPBI.
RESULTS
Data analyses addressed the following topics: (I) the internal consistency of each PBF subscale for the respective raters and targets; (2) the agreement, or convergence, between PBF scale ratings made by the four types of raters; (3) the similarity of PBF factor structures obtained from ratings by each of the four rater types; (4) the comparability of means and standard deviations of estimated PBF factor scores across rater types; (5) the utility of the PBF validity indices; (6) the extent to which the generalizability of scores could be improved by aggregating data over subscales and raters; and (7) convergence and discriminant validity coefficients between ratings of PBF factors and CRPBI factors with respect to both single-rater and 4-rater aggregate PBF scores. The rationale for and details of the statistical procedures employed have been described in Schwarz et al. (19851, and therefore will be mentioned only briefly in the following section. Subscale Reliability
und Factor Structure
On the whole, the 15 subscales of the PBF showed good reliability. Approximately 70% of the alpha coefficients fell within a range from .70 to .90. although the internal consistencies for parents’ self-ratings tended to be lower, especially on scales having low overall mean endorsement strength. The median alpha was .67 for mothers’ self-ratings and .69 for fathers’, versus .74 for ratings of mother and .83 for ratings of father based on student, sibling, and spouse ratings. In general, then, the data show that the four rater types responded to PBF subscale items with adequate and roughly similar consistency. Centered, Acceptance of Individuation, Rejection (reversed), and Hostile Detachment (reversed). The Psychological Control factor consists of Intrusiveness, Control via Guilt. Hostile Control, Possessiveness, Instilling Persistent Anxiety. and Withdrawal of Relations. The Firm Control factor includes Enforcement, Control, Inconsistent Discipline (reversed), Nonenforcement (reversed), and Lax Discipline (reversed). (See Schwarz et al., 1985. for a fuller description.)
ASSESSING CHILDREARING BEHAVIORS
455
The second set of analyses was concerned with whether the same dimensions of description seemed to underlie ratings by the four members of the family. A principle components factor analysis of the 13 scale scores was done for each rater’s ratings of each parent. (The 2 validity scales were deleted.) Each of these analyses yielded three factors with eigenvalues of 1.0 or greater. On ratings of both mother and father, a Warm Involvement factor was the first extracted for all four raters, followed by factors labeled Harsh Control and Lax Control. Loading high on the Warm Involvement factor were the Cognitive Independence, Cognitive Curiosity, Active Involvement, Cognitive Competency, Warmth and Egalitarianism subscales. The Harsh Control factor subsumed the Hostile Control, Rejection, Achievement Control, Strict Control, Punitive Control, and Conformity subscales. The Lax Control factor had a single high loading, that of the Lax Control subscale. The pattern of subscale clusters was roughly equivalent across the four raters, as were the factor weights.4 Factor Scores Subsequent analyses were conducted using estimated factor scores made up of the unweighted mean of the scale scores of subscales loading highest on each factor. Therefore, for example, a rater’s Harsh Control factor score was the mean of his or her ratings on the Harsh Control, Rejection, Achievement Control, Strict Control, Punitive Control, and Conformity subscales. These three estimated factor scores were calculated for each of the 8 rater/ratee pairs creating a set of 24 factor scores. Internal consistency coefficients were computed for scores on the two factors comprising multiple subscales. For the Warm Involvement factor, the alpha coefficients for scores based on the various rater/ratee combinations ranged from .82 to .89 (mean alpha = .87) and for the Harsh Control factor, alphas ranged from .74 to .88 (mean alpha = .80). Analyses of variance were then performed on ratings of each parent on each factor to test for systematic differences among the four rater types. The design included one repeated measures factor-that of rater, with four levels (mother, father, sibling, and student)-and two between factors: sex of student and sex of sibling. These data are presented in Table 1. The results showed that ratings of parents’ behavior with respect to the dimensions of Warm Involvement, Harsh Control, or Lax Control were little affected by the subject’s gender or by the gender of the sibling rater; however, all three factor scores for both mother and father were 4 Tables of factor loadings from the varimax rotation of scale scores for each ratertarget combination as well as an extended report on the PBF are available from the first author upon request.
.04
.13
2.36,b 2.42, 2.22, 2.29,
M
22.17*
C.31)
C.32)
(.30)
C.22)
SD
Father
41.47*
2.52, 2.48, 2.29, 2.37,
M
Mother
TABLE
1
C.35) C.33)
t.27)
1.34) -
SD
SD
.03
1.46, C.17) 1.50,, (22) 1.56,, (.31) 1.54, (24) 10.30*
M
Mother
.Ol
8.60*
1.46, 1.50,h 1.54, 1.531,
M
Father
factor score
Harsh Control
Approximate
C.18) (28) (23)
t.25)
SD
.I5
40.12*
1.51, 1.48, I .74b 1.76,
M
Mother
AMONG
f.33) C.34) t.32) C.33)
SD
.12
37.15*
(36)
C.37) t.32) t.341
SD
RATERS
Father
1.52, 1.47, 1.73, I .76r,
M
Lax Control
OF APPROXIMATE FACTOR SCORES FOR EACH OF FOUR RATERS: F TESTS OF THE DIFFERENCES AND THE PROWRTION OF TOTAL VARIANCE DUE TO RATER (N = 186)
Warm Involvement
DEVIATIONS
Nofe. The means for each factor score that share a common subscripted letter within each column did not differ at the .Ol level of significance with the Tukey HSD test. * p i .Ool.
Rater effect F Rater &/total a’
Mother Father Sibling Student
MEANS AND STANDARD
F!
F
% ;
ti
x 2
ASSESSING CHILDREARING
457
BEHAVlORS
significantly affected by who was doing the rating (ps < .OOl). Both parents tended to rate themselves higher on Warm Involvement than did other raters, and both parents also reported themselves and their spouses to be less lax in their control than did the subject or sibling (ps < .OS). Mothers tended to rate themselves and their husbands as less harsh than did other raters (ps < .05). In general, mothers and fathers tended to be less variable (p < .05) in their self-ratings of Warm Involvement or Harsh Control (Guilford, 1956, p, 192). This suggested that they avoided extremes of both favorability and unfavorability in presenting themselves. (See Table I.) Convergence
among Individual
Rater Types
The validity of estimated factor scores based on single raters was examined first by intercorrelating the 12 factor scores (3 each from 4 raters) separately for each target. The 3 factors may be regarded as traits of parental childrearing behavior and, because different raters observe slightly different samples of the ratee’s behavior, their separate ratings may be viewed as four different methods of assessing each trait. From this perspective, the correlations between trait scores obtained from different raters represent convergent validity (monotrait-heteromethod) coefficients, whereas those between different traits and different raters represent discriminant validity (heterotrait-heteromethod) coefficients (see Campbell & Fiske, 1959). Convergent and discriminant validity data are presented in Table 2. TABLE AVERAGES
OF VALIDITY
COEFFICIENTS
FOR FACTOR
TWO-RATER
2 SCORES
BASED
Single-rater Factors paired in correlation Convergent validity WI & WI HC & HC LC & LC Discriminant validity WI & HC WI & LC HC & LC
Two-rater
Mean r
RATERS
VERSUS
aggregate
Mean r
Mother
Father
.33 .40 .16
.32 .41 .32
-.14 -.02 -.14
ON SINGLE
AGGREGATES
-.15 -.03 -.14
Range ( .18 to .48) ( .33 to SO) ( .Ol to ,381 (-.31 (-.16 (-.26
to .OO) to .lO) to .06)
Mother
Father
SO .58 .29
.50 58 .49
- .20 -.03 -.19
-.21 -.03 -.19
Range ( .46 to ( 55 to ( .24 to (-.30 (-.13 (-.29
.55) .61) .53)
to -.I]) to .03) to -.09)
Note. Convergent validity coefficients are correlations between scores on the same factor by different raters; whereas discriminant validity coefficients are corelations between scores by different raters on different factors. Correlations are significant if r(l84) > .15; @ < .OS). WI, Warm Involvement factor; HC, Harsh Control factor; LC, Lax Control factor.
458
SCHWARZ
AND
MEARNS
The convergent validity coefficients were, for the most part, higher than the discriminant validity coefficients. Of the 36 convergent validity coefficients, 35 were significant at the .05 level, and 32 (89%) were significant at the .Ol level. For the 72 discriminant validity coefficients, on the other hand, only 13 (18%) were significant at the .Ol level. Means of the pairwise convergent validity coefficients for ratings of mother and of father were .33 and .32, respectively, for Warm Involvement, .40 and .41 for Harsh Control, and .16 and .32 for Lax Control (see Table 2). Of 36 convergent validity coefficients, 32 were significant at the .Ol level. In contrast, the mean for discriminant validity coefficients was - . 10 for mothers and - .ll for fathers. Among 72 discriminant validity coefficients, only 13 were significant at the .Ol level. Higher convergent than discriminant validity coefficients support the trait validity of the PBF factor scores. As with the CRPBI, the student, sibling, mother, and father did not appear to differ significantly in relative validity as raters. Averaging across factors and targets, the means of pairwise convergent validity coefficients involving each of these rater types were .34, .34, .33, and.30, respectively. Although the four rater types were roughly equal in the average magnitude of their convergent validity coefficients, when the father was the target, his self-ratings tended to converge less with each other rater (with means averaged over factors, .31, father and mother; .30, father and student; .30, father and sibling) than their ratings converged with one anothers’ ratings (.38, mother and student; .40, mother and sibling; .38, student and sibling). The raters’ ability to make meaningful distinctions between mother and father when rating their childrearing behaviors was also examined. As in the Schwarz et al. (1985) analysis of the CRPBI, these analyses revealed that individual raters appeared to be biased toward perceiving mother and father as very similar in their childrearing behavior. When the same single rater provided data on both mother and father, the mean correlation between parents’ scores, averaged over raters and traits, was .59. When one rater provided the data on mother and a different rater provided the data on father, however, the parents’ scores correlated only .17 on the average. Such bias is reduced by aggregating the ratings of multiple judges. Convergence among Two-Rater Aggregate Scores
To assess the effects of aggregation on convergent and discriminant validity, two-rater aggregate scores on each factor were computed for each parent by averaging the ratings of two different rater types, yielding 6 unique two-rater aggregate scores (mother-sibling, mother-subject, mother-father, sibling-subject, sibling-father, and subject-father).
ASSESSING
CHILDREARING
BEHAVIORS
459
These 6 two-rater aggregates, in turn, can be paired in 15 possible combinations; however, only 3 of the 15 combinations are truly independent pairings; that is, the two-rater aggregate scores paired together have no raters in common (father-mother with subject-sibling, father-subject with mother-sibling, and father-sibling with mother-subject). Table 2 provides a comparison of the average convergent and discriminant validity coefficients based on independent two-rater aggregate scores with those based on independent single-rater scores. Averaged across traits, convergent validity rose from .33 to .49 with the addition of a second rater. Thus, adding a second rating resulted in a substantial increase in the convergent validity of the score employed to measure each trait with little or no loss in discriminant validity (see Table 2). Validity of the Validity Scales In addition to the 13 childrearing scales, we also examined the two validity indices (the Social Desirability scale and the Irrationality scale). It is worth noting that, for each rater type when rating either target, the mean of Social Desirability items approached the maximum of 3.0, which means that all four raters were willing to describe parents in quite favorable terms. Likewise, the means of Irrationality items for all ratertarget combinations were close to the minimum value of 1.0, indicating a general tendency to avoid attributing irrational behavior to the mother and to the father. We investigated the validity of the validity indices by dividing respondents into groups based, first, on their Social Desirability scores, and, second, on their Irrationality scores. Then for each subgroup we correlated rater’s scores with an aggregate score based on the three remaining raters. Thus, the three-rater aggregate score served as our validity criterion. Averaged across traits, rater types, and targets, the average validity coefficients for subjects scoring low, medium, and high on the Social Desirability scale were .40, .42, and .44, respectively. For subjects scoring low, medium, and high on the Irrationality scale, these coefficients were, respectively, .43, .41, and .46. Thus, there were no systematic relationships between the levels of the validity indices (high, medium, or low subgroups) and the magnitudes of the validity coefficients for any trait, rater type, or target. In other words, the reports of raters with high scores on the Social Desirability and Irrationality subscales appear to be no less valid than do the reports of individuals who scored low on these two indices. It appears that both the Social Desirability scale and the Irrationality scale fail to accurately identify raters whose scores are less valid than those of other raters. In light of this, for populations similar to ours, we recommend against deleting raters based on high
460
SCHWARZ
AND
MEARNS
scores on the validity indices. However, even though the Irrationality scale was not useful as a validity index in this sample of well-motivated and well-educated subjects, it may prove useful in other samples where, because of careless response or deficient~reading skills, greater frequencies of high Irrationality scores may occur. Validity
of 4-Rater Aggregate
Scores
Four-rater aggregate scores were computed for each factor with respect to both targets. An analysis of the generalizability of these scores to the universe of raters (Wiggins, 1973, Chap. 7) yielded alpha coefficients of .66 and .67 for Warm Involvement for mother and father, respectively, of .72 and .73 for Harsh Control, and of .44 and .66 for Lax Control. Whereas, the generalizability of factor scores based on single raters ranged from .16 to .41. Aggregation thus clearly increased the generalizability of estimated factor scores whereas scores based on single raters were only modestly generalizable. As anticipated from classical test theory (Lord & Novick, 1968). the greatest increase in generalizability occurred when shifting from single raters to two-rater aggregates, whereas the addition of more raters to aggregates continued to enhance generalizability, but with a diminishing gain. Since 4-rater aggregate factor scores had the highest generalizability, the relationships among the factors of the PBF would be most accurately revealed by their intercorrelations. With 4-rater aggregate scores, Warm Involvement was independent of Lax Control (r = - .Ol mother, r = - .04 father), whereas Harsh Control was slightly correlated with both Warm Involvement (r = - .31 mother and r = - .30 father) and Lax Control (r = - .28 mother and - .31 father). Convergence
of PBF with CRPBI
Further evidence for the validity of PBF factor scores was obtained by correlating them with CRPBI factor scores; these results are presented in Table 3. The scores for the three CRPBI factors were based on an aggregate of ratings provided by the same four raters who provided ratings on the PBF. The three PBF factors were assessed in five ways, once by each of the four rater types taken singly, and again by an aggregate of these four raters’ scores. We looked first at the 4-rater PBF scores: as anticipated, PBF Warm involvement converged strongly with CRPBI Acceptance, PBF Harsh Control converged strongly with CRPBI Psychological Control, and PBF Lax Control converged strongly in the inverse direction with CRPBI Firm Control. When the single-rater PBF factor scores were used, a similar but less strong pattern of convergence with CRPBI factor scores was observed. The average of the validity coefficients for convergence
ASSESSING CHILDREARING
CORRELATIONS
OF I-RATER
AND
TABLE 3 PBF FACTOR
~-RATER
FACTOR
461
BEHAVIORS
SCORES
WITH
‘&RATER
CRPBI
SCORES
CRPBI 4-rater aggregate factor scores’ PBF factor scores
Psychological Control
Acceptance
Firm Control
Mother
Father
Mother
Father
Mother
Father
.61 .52 .64 .48 .73
.66 .48 .6l .56 .78
-.I9 - .25 - .46 -.29 -.34
- .24 - .03 - .20 -.31 - .26
.02 .- .07 -- .Ol -- .07 .Ol
.02 - .07 - .02 -.07 .02
.50 .58 .64 .40 .82
.06 .03 .05 -.04 .26
.07 .13 .20 .I2 .41
-- .51 - .40 - .45 - .43 -- .73
- .55 - .58 - .63 - .61 -.86
Warm Involve. Mother Father Sibling Student 4-rater agg. Harsh Control Mother Father Sibling Student 4-rater agg.
-
.37 .50 .53 .32 .48
- .56 - .45 - .43 - .41 -.44
.47 .47 .62 .51 .81
Lax Control Mother Father Sibling Student 4-rater agg.
-.04 -.12 .03 .18 .Ol
-.I5 - .07 .oo .03 -.07
- .05 - .Ol - .OS - .I5 -.I1
-.I1 -.18 -.19 -.17 -.23
* The ns for individual coefficients range from 164 to 171. When df = 162, an r of .16 is significant beyond the .05 level and an r of .21, beyond the .Ol level.
between parallel factors of the PBF and CRPBI was .79 for 4-rater PBF factor scores, versus 54 for l-rater PBF factor scores (with Lax Control inverted). The average of the discriminant validity coefficients was - .04 with 4-rater PBF factor scores and - .09 with l-rater PBF factor scores. These results provide strong evidence for the superiority of the 4-rater aggregate scores over scores based on single raters. Evidence bearing on the comparative validity of ratings by the four rater types may be found in Table 3 in the form of correlations of Irater PBF factor scores, each based on a different rater type, with 4rater CRPBI factor scores for parallel factors. Averaged across factors and targets (disregarding the inverse relationship of PBF Lax Control and CRPBI Firm Control), the average convergent validity coefficients ranged from a high of .60 for siblings to a low of 50 for students. Thus, student-subjects (JO), the traditional raters of parents’ childrearing behaviors, on average were no more accurate than either mother (.55),
462
SCHWARZ
AND
MEARNS
father (.51), or sibling (.60) and possibly less accurate than siblings. Siblings may have been accurate because they were less ego involved: After all, they were reporting on someone else’s behavior (the parents’) toward someone other than themselves (the student-subject). Awaiting further study is whether the students’ perceptions of parents’ childrearing behavior would be more predictive than those of other single raters when predicting external criteria such as student adjustment. However, it is not likely that students’ ratings alone would be superior to a 4-rater aggregate, when one considers that the former has an average generalizability of 65. DISCUSSION
The PBF has good psychometric properties, fully equal to those of the CRPBI. Its subscales are internally consistent, with alpha coefficients predominantly in the .7Os and low .8Os for all rater types. Convergence of PBF subscale and factor scores between rater types was of about the same magnitude as found for the CRPBI (Schwarz et al., 1985). The PBF’s factor structure was parallel across the four rater types, which therefore poses no obstacle to the aggregation of scores across different rater types. Strong evidence for the concurrent validity of the PBF was provided by the high convergent and low discriminant validity coefficients that emerged when the 4-rater aggregate factor scores of the PBF were correlated with the 4-rater aggregate factor scores of the CRPBI. These data also suggest that the PBF may be as valid as the CRPBI and that the PBF measures highly similar dimensions. There were two minor disappointments with the PBF: (a) that the validity scales were not effective, and (b) that no new independent dimensions of childrearing behavior were assessed. Although the Irrationality and Social Desirability subscales did not function to identify less valid respondents in this sample, it is possible that they may be useful with subjects who are less well motivated and/or have less adequate reading skills. The Warmth factor of the PBF, which included the cognitive stimulation subscales, Cognitive Curiosity, Cognitive Independence, and Cognitive Competency, proved to be substantially correlated with the Acceptance Factor of the CRPBI. This indicates that the cognitive stimulation subscales, which themselves were highly intercorrelated, are not measuring a dimension that is orthogonal to parental warmth and acceptance. Although it is possible that an oblique factor of Cognitive Stimulation could emerge from a joint factor analysis of the PBF and the CRPBI, such a factor would inevitably be correlated with a factor of Warmth and Acceptance. In many respects, the results of this study of child, parent, and aggregate ratings on the PBF demonstrate that the earlier findings by
ASSESSING
CHILDREARING
BEHAVIORS
463
Schwarz et al. (1985) regarding the CRPBI were not specific to that instrument, but instead may be general to all similar parental-rating methods. Despite evidence that PBF scales have adequate internal consistency (mean alpha = .73), agreement between pairs of raters on each PBF subscale was very modest-.33 on the average, similar to the .30 for the CRPBI. Also as with the CRPBI, aggregating ratings across subscales loaded on the same factor did not appreciably increase the pairwise agreement between raters. For PBF factor scores based on single raters, the average pairwise agreement among single raters was .36. versus .35 for CRPBI factor scores. Four pieces of evidence support the view that each rater’s judgments contain a small but useful portion of valid variance: (a) the low but significant agreement noted above between rater pairs when rating the same parent on the same factor, (b) the increase in average convergence from .36 to .49 when factor scores are based on independent 2-rater aggregates, (c) the moderate agreement (mean = .42) of each rater with the aggregated judgment of the three remaining raters on PBF Factors; and (d) the moderate agreement (mean = .54) of each rater’s PBF factor scores with 4-rater aggregated scores for parallel factors from the CRPBI. The evidence also suggested little if any difference in validity among the four rater types. Alpha coefficients reflect the internal consistency of raters’ judgments. The average alphas of .73 for items within PBF subscales and of .84 for subscales within PBF factor scores indicate that each rater’s judgments contain a fairly high proportion of reliable variance; raters are consistent across items within subscales and across subscales within factors. However, the average correlation of .42 for single raters with the remaining 3-rater aggregate indicated that only a small proportion (18%) of the total variance for a given rater type overlaps with other raters. We regard the overlapping portion of a given rater’s judgment as valid variance, and the reliable but nonoverlapping portion as “systematic error.” The latter is not error in the sense of being unreliable, but it is error in the sense that it is not valid variance, not valid because it is unique to that rater type and does not overlap with other raters’ judgments. Systematic error is consistent across a rater’s judgments within a given factor dimension. The systematic error of a given rater may transcend targets, since all raters were biased to perceive both parents as similar on a given factor. From this reasoning, it follows that each rater has a fair proportion of unique variance-unique in the sense that it is reliable but it does not overlap with other raters’ scores. Even when one considers those pairs of rater types that exhibit the greatest similarity (mother and father, or student and sibling). there still remains a substantial portion
464
SCHWARZ AND MEARNS
of variance for a given rater type that, although reliably measured, does not covary with that of the most similar other rater type. Our model of a dimension of childrearing behavior is analogous to a personality trait. When considering traits, one is faced with the issue of the situational specificity of behavior versus the generality of behavior across situations (cf. Kenrick & Funder, 1988). The childrearing behavior score which predicts best in all parenting situations or any randomly selected situation, is the score which reflects the portion of variance that is generalizable from situation to situation. The wider the variety of situations in which the subject’s behavior is sampled, the more accurate the predictions from the aggregate score to a variety of situations, on the average. This variety seems to be increased substantially by increasing the number of raters included in aggregated scores. The reliable but unique variance (systematic error) associated with the judgments of each rater arises from two sources: (a) variations in the sample of the parent’s behaviors seen by each observer, and (b) variation in perceptual and reporting defenses among people who occupy different roles within the family. Parents may be more strongly motivated than children to put their own and their spouse’s behavior in a favorable light, and they seem to avoid extremes of both favorable and unfavorable selfratings.5 Some children may be motivated to see themselves as more favorably regarded by parents than are their siblings (cf. Schwarz, Wheeler, & Rausch, 1989). The subject may experience his or her strongest expressions of a parent’s affection or criticism in private, not in the presence of the sibling or the other parent. Thus, the subject could have a different observational base for judgments about the parents’ trait-like qualities. While specific motivational sets may be shared in common by raters who occupy a given role within the family, biases and experiences unique to each family member also contribute to the unique systematic error associated with a given rater’s scores. Since each rater’s error is independent of each other rater’s error, averaging the ratings of multiple raters will result in more valid scores. If one assumes that positive and negative errors are equally prevalent, then the process of summing will aggregate the correlated (and probably valid) portion of each rater’s scores while canceling out the unique (and probably erroneous) portion of the ratings (see Lord & Novick, 1968). Of course, the larger the pool of competent raters, the more balanced ’ These biases are indicated by significant differences in means and standard deviations of ratings between rater types. Such parameters of rating distributions do not directly inform one about individual differences in the strength of those biases within rater types. If self-rating bias were substantial but equal across parents, correlations with other raters would be unaffected.
ASSESSING
CHILDREARING
BEHAVIORS
465
and thorough the process of error cancellation. There is a caveat, however: Bias shared in common by all raters cannot be eliminated by aggregation, for example, when all members of a family share a common myth about the behavior of one of the parents. One may choose to make the focus of his or her investigations those aspects of experience and perception that are unique to each member of the family. That has not been our focus here: Rather we have advocated a strategy that will yield trait-like scores of childrearing behavior that, in predicting a variety of situations, will have greater generality than those based on single raters’ perspectives. For this purpose, the “conlluent” variance, that which is overlapping across the ratings of independent observers, is likely to yield the most valid scores. As long as the same set of rater types is employed in computing each aggregate score, some of the bias common to raters of a given type can be eliminated by transforming the ratings of each rater type to a common scale (e.g., standard score) before aggregation. The present approach treats as error to be eliminated the differences in perspective among family members occupying different roles, the very same effects that other researchers wish to assess and understand. There is value in both perspectives and room for both approaches in the field. &proving the quality of measures. Investigators who employ a single rater with either the PBF or the CRPBI run a grave risk of committing a Type II error, the error of failing to reject the null hypothesis when false, and hence failing to detect true relationships. However, practical exigencies, such as cost, may prevent some investigators from obtaining data from four informants. What are the best compromises? Present evidence suggests that the best three-rater combination may be sibling. mother, and father. The best two-rater aggregate seems to be mother and sibling. No one of these four rater types is so weak that the generalizability of an aggregate score would be augmented by excluding his or her ratings, when available. Critics of the rating method for assessing childrearing behavior have tended to focus on the retrospective nature of the reports (cf. Yarrow, Campbell, & Burton, 1970). Two concerns are often expressed: (a) that retrospective reports are distorted unwittingly in an ego-enhancing direction, and (b) that raters of past parental behavior are unduly influenced by current parental behavior. The present report and prior empirical studies confirm such ego-enhancing distortion. It is, however, exactly this type of personal bias on the part of individual raters that is eliminated by aggregation. As long as each rater’s biases are idiosyncratic, the process of aggregation will cancel the error of individual raters and increase the reliability and validity of the resulting aggregate score. The concern about the recency bias in recollection, although valid, may be
466
SCHWARZ
AND
MEARNS
largely moot, since empirical evidence from longitudinal studies indicates that dimensions of parental behavior such as warmth and control are fairly stable over long periods (Hanson, 1975; Roberts, Block, & Block, 1984; Schaefer & Bayley, 1960). The long history of failure to find replicable relationships between family dynamics and psychopathology (for reviews see Fontana, 1966; Frank, 1965; Hetherington & Martin, 1979; Jacob, 1975) may be due in part to poor measurement of parents’ behavior. Investigators have tended to rely either upon behavioral counts with high interrater agreement but low relevance and cross-situation generality, or upon single-rater reports with high relevance but low interrater agreement and hence low generalizability. Wachs (1987) found that single, direct 45min observations of parents yielded scores of low stability. The mean correlation between scores for single observation sessions was .35. Aggregates of two observation sessions yielded mean correlations of .52, and aggregates of seven observation sessions were required to achieve a stability coefficient of .61. In fact, aggregates of raters’ subjective impressions have been found repeatedly to equal or exceed the predictive power of directly recorded behavioral observations (Eaton & Enns, 1986; Moskowitz & Schwarz, 1982; Weinrott, Reid, Bauske, & Brummett, 1981). Epstein (1983) suggests that knowledgable informants engage in a kind of intuitive averaging when they observe people on many occasions, roughly equivalent to the averaging that is more formally carried out when ratings are aggregated over occasions by an experimenter, and that informants’ ratings have the important advantage that they are often based on a greater amount of observation than behavioral ratings. However, the generalizability of single knowledgable informants’ ratings often leaves something to be desired, and relationships must be extremely robust to be detected when the predictor variable is the rating of a single informant. Aggregation of multiple informants’ judgments is a method that permits researchers to obtain at low cost highly generalizable scores, even for covert or low base-rate behaviors such as inconsistent love or physical abuse, behaviors which, if adequately assessed, could predict personality development. In summary, the results of this study indicate (a) that the Parent Behavior Form is a reliable and valid measure of childrearing behavior; (b) that, despite the measure’s validity, ratings by individual informants exhibit only a modest degree of generalizability; and (c) that aggregating the reports of multiple raters can substantially increase the generalizability of scores to a level equal to or greater than that of scales derived from direct observation of parenting behavior. These data lead us to conclude that aggregation of multiple informants’ ratings on retrospective measures of childrearing behavior holds great promise as a method for assessing parental childrearing determinants of offspring’s personality.
ASSESSING CHILDREARING BEHAVIORS
467
REFERENCES Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multi-traitmulti-method matrix. Psychological Bulletin, 56, 81-105. Eaton, W. D., & Enns, L. R. (1986). Sex differences in human activity level. Psychological Bulletin, 100, 19-28. Epstein, S. (1980). The stability of behavior: II. Implications for psychological research. American
Psychologist,
35, 790-806.
Epstein, S. (1983). Aggregation and beyond: Some basic issues in the prediction of behavior. Journal of Personality, 51, 360-391. Fontana. S. F. (1966). Familial etiology of schizophrenia. Psychological Bulletin, 66, 217227. Frank, G. H. (1965). The role of the family in the development of psychopathology. Psychological Bulletin, 64, 191-205. Golden, P. G. (1969). A review of children’s reports of parent behaviors. Psychological Bulletin,
71, 222-235.
Guilford, J. P. (1957). Fundamental statistics in psychology and education. New York: McGraw-Hill. Hanson, R. A. (1975). Consistency and stability of home environmental measures related to IQ. Child Development, 46, 470-480. Hetherington, E. M., & Martin, B. (1979). Family interaction. In H. C. Quay & J. S. Werry (Eds.), Psychopathological disorders of childhood (2nd ed., pp. 247-302). New York: Wiley. Hollingshead, A. B. (1957). Two-factor index of social position. Unpublished manuscript, Yale University, New Haven, CT. Jacob, T. (1975). Family interaction in disturbed and normal families: A methodological and substantive review. Psychological Bulletin, 82, 33-65. Kelly, J. A., & Wore]], L. (1976). Parent behaviors related to masculine. feminine, and androgynous sex-role orientation. Journal of Consulting and Clinical Psychology, 44, 843-85 I. Kelly, J. A., & Worell, L. (1977). The joint and differential perceived contribution of parents to adolescents’ cognitive functioning. Developmental Psychology, 13, 282283. Kelly, J. A., & Worell, L. (1978). Parental cognitive orientation and the development of children’s intellectual behavior. Journal of Research in Personality, 12, 179-188. Kemick, D. T., & Funder, D. C. (1988). Profiting from controversy: Lessons from the person-situation debate. American Psychologist, 43, 23-34. Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores (Chaps. 16-20). Reading, MA: Addison-Wesley. McCraine, E. W., & Bass, J. D. (1984). Childhood family antecedents of dependency and self-criticism: Implications for depression. Journal of Abnormal Psychology, 93, 3-8. Moskowitz, D. S., & Schwarz, J. C. (1982). Validity comparison of behavior counts and ratings by knowledgeable informants. Journal of Personality and Social Psychology, 42, 518-528. Roberts, G. C., Block, J. H., & Block, J. (1984). Continuity and change in parents’ childrearing practices. Child Development, 55, 586-597. Schaefer, E. S. (1965). Children’s reports of parental behavior: An inventory. Child Development, 36, 417-424. Schaefer, E. S., & Bayley, N. (1960). Consistency of maternal behavior from infancy to preadolescence. Journal of Abnormal and Social Psychology, 61, l-6. Schwarz, J. C. (1988). Development and validation of the Inter-Parental Conjhct Scale. Unpublished manuscript, University of Connecticut, Storrs. CT 06269-1020. Ab-
468
SCHWARZ AND MEARNS
stracted in J. Touliatos, B. F. Perlmutter, & M. A. Straus (Eds.), in press, Handbook offamily measurement techniques. Newbury Park, CA: Sage Publications. Scale forms deposited with the National Auxiliary Publication Service (NAPS, c/o Microfiche Publications, 248 Hempstead Turnpike, West Hempstead, NY 11552). Schwarz, J. C., Barton-Henry, M. L., & Pruzinsky, T. (1985). Assessing childrearing behaviors: A comparison of ratings made by mother, father, and sibling on the CRPBI. Child
Development,
56, 462-419.
Schwarz, J. C., Wheeler, D. S., & Rausch, S. P. (1989). The validity scores (ACL).
based
on self,
other,
and aggregated
ratings
from
of personality factor the Adjective Check List
Manuscript submitted for publication. Wachs, T. D. (1987). Short-term stability of aggregated and nonaggregated measures of parental behavior. Child Development, 58, 796-797. Weinrott, M. R., Reid, J. B., Bauske. B. W.. & Brummett, B. (1981). Supplementing naturalistic observations with observer impressions. Behavioral Assessment, 3, 151159. Wiggins, J. S. (1973). Personality and prediction: Principles of personality assessment (Chap. 7). Reading, MA: Addison-Wesley. Worell, L., & Worell, J. (1974). The parent behavior form. Manual in preparation. (Available from Leonard Worell, Department of Psychology, University of Kentucky, Lexington, KY 40506.) Yarrow, M. R., Campbell, J. D., & Burton, R. V. (1970). Recollections of childhood: A study of the retrospective method. Monographs of the Society for Research in Child Development, 35, (Whole No. 138).