Body Image 1 (2004) 199–205
Brief research report
Test–retest reliability and construct validity of Contour Drawing Rating Scale scores in a sample of early adolescent girls Eleanor H. Wertheim∗ , Susan J. Paxton, Linda Tilgner School of Psychological Science, La Trobe University, Bundoora, Melbourne, Vic. 3086, Australia Received 14 August 2003; received in revised form 19 November 2003; accepted 2 December 2003
Abstract Test–retest and construct validity of Contour Drawing Rating Scale (CDRS) scores were examined in 1056 Grades 7 and 8 girls. Questionnaires were completed four times, including retests at 2, 6 and 14 weeks. Test–retest reliability for current size, ideal size and current–ideal discrepancy mostly exceeded 0.70 (0.65–0.87, for the full sample, with higher rs for shorter retest periods). Ideal and current size ratings increased slightly over time. High correlations between perceived current figure and measured body mass index; moderate rs between current–ideal discrepancy and body dissatisfaction and restrained eating; and very low or no significant correlations with social desirability supported construct validity in this group. The study supported the use of the CDRS in early adolescent girls. © 2003 Elsevier B.V. All rights reserved. Keywords: Body image; Figure rating; Adolescent; Assessment; Body dissatisfaction; Reliability
Introduction A common form of body image assessment involves use of figural stimuli (often called ‘silhouettes’) in which participants rate which figure drawing, in a series ranging from very thin to very large, best represents their current body size and which represents their ideal body size. Body dissatisfaction is represented as the difference between current and ideal sizes. There exist figure rating scales resembling adult figures as well as children (Gardner, 2001; Thompson, 1996; Thompson & Gray, 1995; Thompson, Heinberg, Altabe, & Tantleff-Dunn, 1999; Truby & ∗ Corresponding author. Tel.: +61-3-9479-2478; fax: +61-3-9479-1956. E-mail address:
[email protected] (E.H. Wertheim).
Paxton, 2002; Williams, Gleaves, Cepeda-Benito, Erath, & Cororve, 2001; Williamson, Davis, Bennett, Goreczny, & Gleaves, 1989). Reliability and validity estimates exist for data based on some of these figure rating scales in some subgroups. For example, in an undergraduate sample, 2-week test–retest reliabilities for scores based on the group form of the Body Image Assessment were found to be r > 0.95 for current figure, r > 0.76 for ideal figure and r > 0.75 for current–ideal (Williams et al., 2001). In adolescents, Banasiak, Wertheim, Paxton, and Voudouris (2001) examined 4–5-week test–retest reliability of Stunkard, Sorenson, and Schulsinger’s (1983) figure scale scores from Grade 9 girls and found very high test–retest reliability for that subgroup (current figure r = 0.87, ideal figure r = 0.83). However, in most samples test–retest reliability esti-
1740-1445/$ – see front matter © 2003 Elsevier B.V. All rights reserved. doi:10.1016/S1740-1445(03)00024-X
200
E.H. Wertheim et al. / Body Image 1 (2004) 199–205
mates are not so high. In a review of existing child and adolescent figures, Gardner (2001) found that most figural stimuli had not been examined for test–retest reliability and in studies that did assess it, correlations exceeding 1 week often did not meet Nunnally (1970) criterion of 0.70. For example, in one study, 3-day reliability for Collins (1991) children’s figure scores were 0.71 for self but 0.59 for ideal. Sands, Tricker, Sherman, Armatas, and Maschette (1997) found a 3-month Body Image Scale current figure test–retest correlation of 0.56 in 10–12-year olds. Gardner (2001) has suggested that figure rating scales often suffer from several problems. The first is a coarse scale in which respondents select from only a small number of figures that are supposed to represent a near-continuous scale. Scales generally include between six and nine separate figures representing body sizes. While, this number of figures may be appropriate if sufficient graduations (rating points) are offered between the figures, rating scales often do not include graduations, resulting in coarse measurement. A second problem arises when only several of the figures are selected by the vast majority of the sample (e.g. Gardner, Friedman, & Jackson, 1998, reported 85% of adolescents selecting three of eight possible figures), thus reducing variance necessary to assess individual differences properly. In addition, figure scales are generally used as continuous scales by researchers; therefore, figures should include standard increases in size between figures and sufficient graduations between them. Some existing figure scales include figures that become increasingly large at a non-standard (varying) rate (Gardner, 2001). For these reasons, in the current study, the Thompson and Gray (1995) Contour Drawing Rating Scale (CDRS) was used, as it includes nine figures with relatively fine graduations from one figure to the next that were designed to increase at a standard rate and can include graduations between figures. The original development study for the CDRS in male and female undergraduates found 1-week test–retest reliability (n = 32) to be 0.78 and current figure ratings correlated 0.59 with self-reported body mass index and 0.71 with reported weight, suggesting good test–retest reliability and construct validity for their data obtained in a small young adult sample.
It is currently not clear whether existing figure scales or the CDRS in particular, are suitable for use with early adolescent girls (Grades 7 and 8). The aim of the current study was, therefore, to examine the reliability of female CDRS scores in a large sample of early adolescent girls. The form of the scale used here included the standard figures with ratings possible under the figures as well as half-way between. In addition, specific additional clarifying instructions for this format were included. Test–retest reliability was examined over 2-, 6- and 14-week time spans, by examining correlations between Time 1 and later time points and also differences between those means. Ratings of both perceived current size and ideal size were examined at the four time points. Construct validity was assessed in a variety of ways. It was expected that, at Time 1, self-rated current size would be correlated highly with measured body mass index. Time 1 discrepancies between current and ideal figure ratings were considered a measure of body dissatisfaction and correlated with measures of body dissatisfaction and restrained eating. Finally, discriminant validity was also determined by correlating figure ratings with a social desirability response set.
Method Participants Participants were 1056 females in Grade 7 (n = 562) and Grade 8 (n = 494) from 12 schools representing a variety of socio-economic and geographic areas in the Melbourne area (Australia). Co-educational, girls only, private, state and Catholic schools were included. Ages ranged from 11.8 to 14.7 (mean = 13.21 years, SD = 0.61). Ninety-one percent of girls were born in Australia. Mothers’ country of birth was: Australia 67.4%, United Kingdom 12.6, Europe 7.6, Asia 5.4, Africa 1.7, other 5.3. Fathers’ country of birth was: Australia 66.0, UK 12.3, Europe 10.2, Asia 4.0, Africa 1.5, other 6.0. BMI means were 20.40 (SD = 3.56) for Grade 7 and 21.59 (SD = 3.85) for Grade 8. Of the full sample, 9.9% had a body mass index <17; 23.8% had a BMI between 17 and 18.9; 34.1% had a BMI between 19 and 21.9; 17.8% had a BMI between 22 and 24.9; 10.5% had a BMI between 25 and 28.9; 3.9% had a BMI over 29.
E.H. Wertheim et al. / Body Image 1 (2004) 199–205
201
Materials
Procedure
Participants rated current and ideal figure sizes on the CDRS (Thompson & Gray, 1995), which includes nine figures, rated 1–9 in former studies. Some researchers (e.g. Cusumano & Thompson, 1997) have included ratings halfway between figures (e.g. 4.5 halfway between 4 and 5). For younger adolescents it was considered clearer to use only whole numbers; therefore the figures were rated from 1 (smallest) to 17 with odd numbers under figures and even numbers halfway between figures. Instructions were expanded from the original scale, which asked participants to rate their ideal figure and their current size. In this study, instructions for current figure were to circle the number on the line “closest to your present size. That is, the size you are at the moment”. Phrasing for ideal size was “closest to the size you would like to be”. The following additional note also appeared: “The numbers on the line DO NOT refer to dress sizes. They are just numbers we have assigned to each figure. The numbers which DO NOT have a drawing on top represent ‘in between’ sized figures”. Current versus ideal size discrepancy (current–ideal) was also calculated. The Eating Disorders Inventory Body Dissatisfaction (EDI-BD) and Drive for Thinness (EDI-DT) subscales (Garner, Olmstead, & Polivy, 1983) have produced satisfactory reliability and validity in scores from a sample of adolescent girls, when using untransformed item responses (1–6) (Schoemaker, van Strien, & van der Staak, 1994). Definitions of the words ‘preoccupied’ and ‘magnify’ were added as per Martin et al. (2000). The Dutch Eating Behaviour Questionnaire-Restrained Eating scale (DEBQ-R; Van Strien, Frijters, Bergers, & Defares, 1986) has been shown to result in high construct validity, test–retest and internal reliability in samples of adolescents girls (Banasiak et al., 2001; Laessle, Tuschl, Kotthaus, & Pirke, 1989). At Time 1, participants completed a short form of the Children’s Social Desirability Scale (CDS; Crandall et al., 1965, 1991; Tilgner, Wertheim, & Paxton, in press). The CSD’s content is based on the Marlowe Crowne Social Desirability Scale but modified for children. The short form has correlated 0.94 with the full form in early adolescent girls (Tilgner et al., in press). Body mass index was based on measured height and weight.
All students and parents signed informed consent forms. The data reported on here were collected as part of a larger project described to participants as a study on adolescent girls’ eating behaviours and body concerns. Code-numbered questionnaires were administered in classes at four time points: Time 1, Time 2 (2 weeks later), Time 3 (6 weeks after Time 1) and Time 4 (14 weeks after Time 1). The CDS was administered at Time 1 only to a subset of 802 participants. Data were collected in classes, with at least one researcher leading class proceedings while another measured height and weight (on a floor scale) in a private area at the back of the room. Analyses Participants present at all four time points (n = 633) were slightly younger than those who missed testing sessions (13.17 years versus 13.28 years), age t(792.0)=2.85, p = 0.005, but they did not differ significantly on any of the other variables. Therefore, analyses included all available data at each time point.
Results Distribution of scores Current, ideal and current–ideal discrepancy ratings were spread well over a range of rating points. For example, for Time 1 (T1) current figure rating the percentage of participants selecting each rating from 5 to 13 was 7.0, 8.7, 11.7, 10.7, 14.3, 6.0, 10.2, 6.4 and 6.3, respectively (19% selected the remaining ratings). For ideal figure each rating from 3 to 9 was selected by at least 6% of the sample (13.4% selected the remaining ratings). At T1, the mean ratings for current size were 8.45 (SD = 3.33) for Grade 7 and 8.89 (SD = 3.24) for Grade 8. T1 ideal size means were 6.46 (SD = 2.50) for Grade 7 and 6.44 (SD = 2.33) for Grade 8. T1 current–ideal discrepancy means were 1.97 (SD = 2.60) for Grade 7 and 2.42 (SD = 2.62) for Grade 8. At T1, for the full sample, current and ideal figures were relatively normally distributed with current figure skewness = 0.165, S.E. = 0.076, p > 0.05 and p < 0.01 and ideal figure = 0.075, S.E. = 0.076, p >
202
E.H. Wertheim et al. / Body Image 1 (2004) 199–205
Table 1 Contour Drawing Rating Scale test–retest reliability reported for Grades 7 and 8 girls separately, including t-tests comparing means at time points 1 versus 2 (2 weeks apart), 1 versus 3 (6 weeks) and 1 versus 4 (14 weeks) and correlations between those time points Time point comparisons
Grade 7
Grade 8
n
r
t comparing T1 and later T means
Mean difference between T1 and later T
r
n
t comparing T1 and later T means
Mean difference between T1 and later T
Current size T1–T2 T1–T3 T1–T4
496 510 373
0.90 0.87 0.81
0.82 0.42 −2.86∗
0.01a 0.03a −0.31
0.77 0.79 0.72
452 442 332
−3.83∗∗ −3.86∗∗ −3.60∗∗
−0.39a −0.40a −0.50
Ideal size T1–T2 T1–T3 T1–T4
492 508 372
0.83 0.78 0.71
−4.54∗∗ −3.52∗∗ −3.88∗∗
−0.29 −0.25 −0.40
0.73 0.63 0.58
451 441 329
−4.36∗∗ −3.78∗∗ −3.13∗
−0.37 −0.38 −0.38
Current–ideal T1–T2 T1–T3 T1–T4
490 508 373
0.86 0.82 0.70
3.86∗∗ 4.06∗∗ 1.11
0.77 0.74 0.66
449 440 328
0.72 0.39 −0.76
0.05 0.04b −0.10
BMI T1–T4
347
0.96
0.94
321
−2.03∗
−0.15
−1.73
0.27 0.26b 0.12 −0.10
T: time point. All rs significant at p < 0.0005. a Increase in mean score between T1 and later time point was significantly greater for Grade 7 than Grade 8; t > 0.27, p < 0.01. b Decrease in mean score between T1 and T3 was significantly greater for Grade 7 than Grade 8; t = 2.04; p < 0.05. ∗ p < 0.05. ∗∗ p < 0.001.
0.05 (p levels up to <0.001 can be considered acceptable, Tabachnick & Fidell, 1996). Current–ideal ratings were positively skewed (p < 0.001; skewness = 0.808, S.E. = 0.076, kurtosis = 1.43, S.E. = 0.152); 8.6% of the sample chose a larger ideal than current figure, 20.7% chose equal ideal and current figures and 71.7% chose a smaller ideal.
either Grade 7 or 8 samples or both (see Table 1). BMI also showed a small increase for the Grade 8 sample (Table 1) and a similar tendency (p < 0.10) in Grade 7. It should be noted, however, that the large sample size increased the likelihood of significant differences between grades and some may not be meaningful. Construct validity
Test–retest reliability Table 1 displays test–retest reliability information for Grades 7 and 8. For the two grades combined, test–retest rs for current figure were: T1–T2 r = 0.84, T1–T3 = 0.83, T1–T4 = 0.77. Test retest rs for ideal figure were 0.78, 0.71 and 0.65, respectively and for current–ideal the rs were 0.82, 0.78 and 0.68. Grade 7 rs were significantly higher than Grade 8 rs for all current, ideal and current–ideal retest rs except for the correlation between T1 and T4 current–ideal (all others exceeded p < 0.05 using method of Bailey, 1995). t-Tests indicated that current and ideal ratings increased slightly over time at some time points for
Time 1 scores were intercorrelated to indicate convergent and discriminant validity. Full sample rs are noted here unless there was a significant difference (p < 0.05) between grades. Very low or non-significant correlations were found between social desirability and figure ratings: full sample current figure r = −0.07, n = 802, p = 0.05; ideal figure r = 0.08, n = 801, p = 0.034; current–ideal r = −0.15, n = 801, p < 0.001. While some of the social desirability rs were significant due to large sample sizes, they accounted for very little variance (all <2.3%). In support of construct validity, current–ideal discrepancy was moderately to strongly related to EDI-BD
E.H. Wertheim et al. / Body Image 1 (2004) 199–205
(full sample r = 0.40), EDI-DT (full = 0.62; Grade 7 = 0.66; Grade 9 = 0.57) and DEBQ-R (full = 0.57; Grade 7 = 0.62; Grade 8 = 0.51). Partial correlations confirmed that all significant rs remained so, at similar levels, after controlling for social desirability. At T1, current figure correlated 0.69 with measured BMI in the full sample (r = 0.64 with weight).
Discussion Test–retest reliabilities for current CDRS-based ratings were generally strong. The most direct measure of body dissatisfaction, current–ideal discrepancy, had a strong test–retest r at 2 weeks (r = 0.82) and a moderate test–retest r at 14 weeks (0.68). For current figure ratings, test–retest r for the full sample was 0.84 for 2-week reliability and r = 0.77 for 14-week reliability. Ideal ratings were not quite as strong but still adequate (0.78 at 2 weeks and 0.65 at 14 weeks). Given pubertal changes that occur in the age group examined here, one possible outcome might have been that the CDRS scores, which are based on late adolescent/adult looking figures, would be more reliable for the older girls. Comparisons between Grades 7 and 8 reliability coefficients indicated that there was no tendency for higher reliabilities in the older grade level and that, instead, correlations between time points were higher for Grade 7 girls suggesting somewhat more reliability of scores for that subsample. Current figure ratings increased for Grade 8 girls at all time points and for Grade 7 girls from time point 1–4. Although an increase in body mass index was also found between the only times tested (Time 1 and Time 4), it was not quite as strong as the increase in figure ratings. Ideal figure ratings increased in both grades at all time points. Current–ideal discrepancy ratings decreased for Grade 7 girls from Time 1 to Times 2 and 3, but not Time 4 and remained stable for Grade 8 girls. In general, these patterns suggest that researchers conducting intervention studies need to be cautious of a possible artefact of retesting and compare changes over time with a non-intervention control sample. Increases in ratings over time for current figure and some types of ideal figure have been found in a Grade 9 female sample previously when using Stunkard and colleagues’ figures (Banasiak et al., 2001), so the results appear to apply to figure rating
203
scales in general, rather than the Thompson and Gray figures in particular. Gardner’s (2001) criteria for a good figure rating scale related to sufficient variance were met. When participants were given the opportunity to select ratings under the figures and half-way between, they selected a wide range of ratings from the 17 possible rating points. Similarly, the variance for each of the scales was large enough to allow individual differences to emerge. Convergent validity was demonstrated through moderate to strong correlations between current–ideal figure discrepancy and measures of body dissatisfaction and dietary restraint as well as a strong correlation between measured body mass index and self-report current figure size (0.69). Discriminant validity was supported by very low (r < −0.15) or non-significant correlations with social desirability response set. In addition, the significant correlations between figure ratings and body dissatisfaction and dietary restraint remained at similar significant levels after controlling for social desirability responses. It should be noted that other forms of body rating assessment (Gardner, 2001; Thompson, 1996; Thompson et al., 1999) could be superior to the one examined here. For example, a variety of alternative body size estimation techniques are available based on distorting mirrors, Polaroid photos and videotaped images of the participant which are then systematically altered in size (e.g. Brodie, Slade, & Rose, 1989; Gardner, Stark, Jackson, & Friedman, 1999); and movable light beams have been used to estimate body size (e.g. Fabian & Thompson, 1989). In addition, visual stimuli can assess full figures as in the CDRS or specific body sites (e.g. Thompson & Tantleff, 1992); the figural or body stimuli can be rough figure outlines or more accurately represent photos of a person (Truby & Paxton, 2002); size of figures vary; order of presentation can range from the typical row of figures presented sequentially to figures presented in a random order which are then sorted or rated (Gardner et al., 1999; Williamson et al., 1989); and format for rating can differ as in using visual analogue scales (Gardner et al., 1999). Many of these methods need to be administered individually and are thus not suitable for mass testing situations. Therefore, there is a need for easily administered figure drawing rating scales such as the CDRS. Nonetheless, further research is
204
E.H. Wertheim et al. / Body Image 1 (2004) 199–205
needed that examines the most useful forms of figure ratings in specific contexts. Overall, the findings of the present study support the use of the CDRS female figures for use in early adolescent girls, using the instructions described in this study. Strengths of the current study included a large sample size and a variety of test–retest periods. As with the use of any measure, reliability of a data set needs to be demonstrated in each specific context in which the measure is used; therefore, while this study supports the use of the CDRS in early adolescent females, reliability should continue to be established in specific research contexts in future studies.
Acknowledgements The authors sincerely thank the Australian Rotary Health Research Fund and the Australian Research Council for the funding that enabled this research to be conducted; the schools and individual students that participated in this research; and Tracey Holt, Eliza Sims, Giselle Withers, and Priscilla Yardley.
References Bailey, N. T. J. (1995). Statistical methods in biology (3rd ed.). Cambridge, UK: Cambridge University Press. Banasiak, S. J., Wertheim, E. H., Koerner, J., & Voudouris, N. J. (2001). Test–retest reliability and internal consistency of a variety of measures of dietary restraint and body concerns in a sample of adolescent girls. International Journal of Eating Disorders, 29, 85–89. Brodie, D., Slade, P., & Rose, H. (1989). Reliability measures in disturbing body image. Perceptual and Motor Skills, 69, 723– 732. Collins, M. (1991). Body figure perceptions and preferences among preadolescent children. International Journal of Eating Disorders, 10, 199–208. Crandall, V. C., Crandell, V. J., & Katkovsky, W. (1965). A children’s social desirability questionnaire. Journal of Consulting Psychology, 29(1), 27–36. Crandall, V. C., Crandall, V. J., & Katkovsky, W. (1991). Children’s Social Desirability Scale (CSD). In J. Robinson, P. Shaver, & L. Wrightsman (Eds.), Measures of personality and social psychological attitudes (Vol. 1, pp. 43–46). London: Academic Press. Cusumano, D. L., & Thompson, J. K. (1997). Body image and body shape ideals in magazines: Exposure, awareness and internalization. Sex Roles, 37, 701–721.
Fabian, L., & Thompson, J. K. (1989). Body image and eating disturbance in young females. International Journal of Eating Disorders, 8, 63–74. Gardner, R. M. (2001). Assessment of body image disturbance in children and adolescents. In J. K. Thompson & L. Smolak (Eds.), Body image, eating disorders, and obesity in youth: Assessment, prevention, and treatment (pp. 193–214). Washington, DC: American Psychological Association. Gardner, R. M., Friedman, B. N., & Jackson, N. (1998). Methodological concerns when using silhouettes to measure body image. Perceptual and Motor Skills, 86, 387–395. Gardner, R. M., Stark, K., Jackson, N. A., & Friedman, B. N. (1999). Development and validation of two new scales for assessment of body image. Perceptual and Motor Skills, 89, 981–993. Garner, D. M., Olmsted, M. P., & Polivy, J. (1983). Development and validation of a multidimensional eating disorder inventory for anorexia nervosa and bulimia. International Journal of Eating Disorders, 2, 15–34. Laessle, R. G., Tuschl, R. J., Kotthaus, B. C., & Pirke, K. M. (1989). A comparison of the validity of three scales for the assessment of dietary restraint. Journal of Abnormal Psychology, 98, 504–507. Martin, G. C., Wertheim, E. H., Prior, M., Smart, D., Sanson, A., & Oberklaid, F. (2000). A longitudinal study of the role of childhood temperament in the later development of eating concerns. International Journal of Eating Disorders, 27, 150– 162. Nunnally, J. C. (1970). Psychometric theory. New York: McGraw-Hill. Sands, R., Tricker, J., Sherman, C., Armatas, C., & Maschette, W. (1997). Disordered eating patterns, body image, self-esteem, and physical activity in preadolescent school children. International Journal of Eating Disorders, 21, 159– 166. Schoemaker, C., van Strien, T., & van der Staak, C. (1994). Validation of the eating disorders inventory in a nonclinical population using transformed and untransformed responses. International Journal of Eating Disorders, 15, 387– 393. Stunkard, A. J., Sorenson, T. I., & Schulsinger, F. (1983). Use of the Danish Adoption Register for the study of obesity and thinness. In S. Kety, L. P. Rowland, R. L. Sidman, & S. W. Matthysse (Eds.), The genetics of neurological and psychiatric disorders (pp. 115–120). New York: Raven Press. Tabachnick, B. G., & Fidell, L. S. (1996). Using multivariate statistics (3rd ed.). New York: Harper Collins. Thompson, J. K. (1996). Assessing body image disturbance: Measures, methodology, and implementation. In J. K. Thompson (Ed.), Body image, eating disorders, and obesity (pp. 49–81). Washington, DC: American Psychological Association. Thompson, J. K., Heinberg, L. J., Altabe, M., & Tantleff-Dunn, S. (1999). Exacting beauty: Theory, assessment, and treatment of body image disturbance. Washington, DC: American Psychological Association.
E.H. Wertheim et al. / Body Image 1 (2004) 199–205 Thompson, J. K., & Tantleff, S. T. (1992). Female and male ratings of upper torso: Actual, ideal and stereotypcial conceptions. Journal of Behavior and Personality, 7, 349–350. Thompson, M. A., & Gray, J. J. (1995). Development and validation of a new body image assessment tool. Journal of Personality Assessment, 64, 258–269. Tilgner, L., Wertheim, E. H., & Paxton, S. J. (in press). The effect of social desirability on adolescent girls’ responses to an eating disorders prevention program. International Journal of Eating Disorders. Truby, H., & Paxton, S. J. (2002). Development of the Children’s Body Image Scale. British Journal of Clinical Psychology, 41, 185–203.
205
Van Strien, T., Frijters, J. E. R., Bergers, G. P. A., & Defares, P. B. (1986). The Dutch Eating Behaviour Questionnaire (DEBQ) for assessment of restrained. International Journal of Eating Disorders, 5, 295–315. Williams, T. L., Gleaves, D. H., Cepeda-Benito, A., Erath, S. A., & Cororve, M. B. (2001). The reliability and validity of a group-administered version of the Body Image Assessment. Assessment, 8, 37–46. Williamson, D. A., Davis, C. J., Bennett, S. M., Goreczny, A. J., & Gleaves, D. H. (1989). Development of a simple procedure for assessing body image disturbances. Behavioral Assessment, 11, 433–446.