Gender bias in the measurement properties of the center for epidemiologic studies depression scale (CES-D)

Gender bias in the measurement properties of the center for epidemiologic studies depression scale (CES-D)

239 Psychiatry Research, 49:239-250 Elsevier Gender Bias in the Measurement Properties of the Center for Epidemiologic Studies Depression Scale (CES...

880KB Sizes 0 Downloads 50 Views

239

Psychiatry Research, 49:239-250 Elsevier

Gender Bias in the Measurement Properties of the Center for Epidemiologic Studies Depression Scale (CES-D) Manfred Stommel, Barbara A. Given, Charles W. Given, Hripsime A. Kalaian, Richard Schulz, and Ruth McCorkle Received

November

23, 1992; revised version received March 18, 1993; accepted

Ma-v 18, 1993.

Abstract. Confirmatory factor-analytic models are used to examine gender biases of individual items of the Center for Epidemiologic Studies Depression (CES-D) Scale. In samples containing 708 cancer patients and 504 caregivers of the chronically ill elderly, two CES-D items are identified as producing biased responses in comparisons of male and female respondents. Three additional CESD items are excluded on the basis of other psychometric problems, yielding a subset of 15 CES-D items that capture almost all the information of the original 20-item CES-D scale but are free of any gender bias. Gender differences in mean levels of depressive symptomatology are significantly reduced, but not eliminated, when the 15-item scale is used. Key Words. Gender,

depression,

measurement

equivalence.

The Center for Epidemiologic Studies Depression (CES-D) Scale is one of the most widely used self-report instruments to measure current depressive symptomatology in nonpsychiatric populations (Radloff, 1977; Devins and Orme, 1986; Radloff and Locke, 1986). The CES-D scale has often been used to compare prevalence of depressive symptomatology in different racial/ethnic groups (Roberts, 1980; Aneshensel et al., 1984; Vera et al., 1991), in different age groups (Liang et al., 1989; Gatz and Hurwicz, 1990; Hertzog et al., 1990; Kessler et al., 1992), among people of varying levels of physical functioning (Aneshensel et al., 1984; Berkman et al., 1986; Turner and Noh, 1988), and between men and women (Clark et al., 1981; Murrell et al., 1983; Ensel, 19866; Krause, 1986). For the CES-D Scale to serve as a measure of depressive symptomatology in diverse groups, it must be free of measurement biases across the comparison groups of interest; that is, it should exhibit substantial levels

Manfred Stommel, Ph.D., is Associate Professor of Psychometrics, College of Nursing, Michigan State University (MSU), East Lansing, MI. Barbara A. Given, Ph.D., R.N., F.A.A.N., is Professor and Director, Center for Nursing Research, College of Nursing, and Associate Director, Cancer Center, MSU, East Lansing, MI. Charles W. Given, Ph.D., is Professor and Associate Chair of Research, Department of Family Practice, College of Human Medicine, MSU, East Lansing, MI. Hripsime A. Kalaian participated in the study as a Research Aide, Department of Family Practice, MSU, East Lansing, Ml. Richard Schulz, Ph.D., is Professor, Department of Psychiatry, School of Medicine; Associate Director, University Center for Social and Urban Research; and Director, Gerontology Program, University of Pittsburgh, Pittsburgh, PA. Ruth McCorkle, Ph.D., F.A.A.N., is Professor, School of Nursing, University of Pennsylvania, Philadelphia, PA. (Reprint requests to Dr. M. Stommel, Michigan State University, College of Nursing, A230 Life Sciences Bldg., East Lansing, MI 48824-1317, USA.) 0165-178 I /93/$06.00

@ 1993 Elsevier Scientific

Publishers

Ireland

Ltd.

240 of factorial invariance (Alwin and Jackson, 198 1). While the factorial invariance of the scale across age groups has been questioned and is currently debated (Liang et al., 1989; Hertzog, 1990), its measurement equivalence across gender groups has not been sufficiently examined. The CES-D instrument contains 20 items addressing depressive symptoms. Respondents indicate how often (within the last week) they experienced those symptoms: “rarely or none of the time” (0), “some or a little of the time” (I), “occasionally or a moderate amount of time” (2), and “most or all of the time”()). In most studies, researchers use a total scale score summating the responses to all 20 four-point items (e.g., Moritz et al., 1989; Ensel and Lin, 1991; Lewinsohn et al., 1991). The resulting scores have a potential range of 0 to 60, but they tend to be skewed positively in nonpsychiatric populations with most respondents scoring in the lower ranges and mean scale scores not exceeding 10 in the general population (Berkman et al., 1986; Devins and Orme, 1986; Radloff and Locke, 1986). Analyses of the internal structure of the CES-D scale usually yield a four-factor model (see Table 2) which includes a seven-item “depressive affect” or “mood” subscale, a four-item “positive affect” or “well-being” subscale, a seven-item “somatic and retarded activity” subscale, and a two-item “interpersonal” subscale (Clark et al., 1981; Berkman et al., 1986; Ensel, 1986~; Hertzog et al., 1990). However, not all items seem to fit well into this four-factor model. For various reasons, researchers have sometimes excluded a few items from the scale (Radloff, 1977; Ensel, 1986~; Liang et al., 1989). While some researchers have found the subscale dimensions to be sufficiently independent to investigate their relations to predictor variables separately (Krause, 1986; Gatz and Hurwicz, 1990) others have argued that there is not enough empirical differentiation to warrant partitioning the CES-D scale into multiple subscales (Hertzog et al., 1990). Most investigations of measurement bias in the CES-D scale have focused on subscale instead of individual item analysis (Gatz and Hurwicz, 1990; Hertzog et al., 1990) to trace differential responses across comparison groups. This approach is useful only if the internal consistency and validity of each of the subscales can be taken for granted. However, there are still many unresolved questions about the validity and psychometric properties of individual scale items (Liang et al., 1989). For instance, although it has often been observed that distributions of responses to CES-D items tend to be skewed (Devins and Orme, 1986) few researchers have pointed out that the interpersonal items show extreme skewness even in depressed subpopulations whose total CES-D scores far exceed those of the general population. Similarly, a focus on individual items could also reveal systematic differences in the response patterns of men and women. One of the most consistent findings in epidemiological studies of depression is the greater prevalence rate of depressive symptoms among women compared with men (Murrell et al., 1983; Devins and Orme, 1986; Ensel, 19866). It is difficult to say, however, whether these disparities actually reflect higher rates of depressive disorders among women (George, 1990), a greater tendency of women to express their feelings (Verbrugge, 1985), or simply an artifact of measurement procedures. In fact, a close look at the individual CES-D items suggests that at least two items may lead to gender-spect@ response patterns. Given prevailing socialization patterns

241

(Barnett and Baruch, 1987), men should be far less likely than women to express their depression in crying spells. Depressed men are also more likely to turn inward and be less inclined to talk. Thus, it can be anticipated that the two items “I had crying spells” and “I talked less than usual” will generate different response patterns among men and women, who otherwise exhibit the same general levels of depressive symptomatology. If so, comparing total CES-D scores of men with those of women is subject to biases because the relationship between some of the indicator items and the underlying depression differs across gender groups. There may be other items that exhibit gender bias for less obvious reasons. Whatever those reasons, genderbiased scale items should be removed before total scale scores of men and women are compared.

Methods Analytic Procedures. The main emphasis in the following is on testing alternative factor models (Long, 1983; Schaie and Hertzog, 1985; Alwin, 1988; Hertzog, 1990). Specifically, we examine the degree to which the CES-D scale is “factorially invariant”(Jijreskog, 1971; Alwin and Jackson, 1981) across groups of male and female cancer patients. This examination involves the imposition of several nested factor models on both groups of respondents. These nested models are compared to a baseline model requiring only that the same indicator items load on the same subscale factors for male and female patients. The baseline model is derived from the usual four-factor model with CES-D items grouped into subscales as indicated in Table 2. On the opposite end of the continuum of nested models, we propose a highly restrictive model that incorporates the hypotheses (a) that all unstandardized factor loadings are equal for male and female respondents, (b) that all error variances are equal across the comparison groups, and also (c) that all covariances among the subscale factors are equal across the groups. If consistent with the data, this model would entail the absence of any gender bias since allfree parameters are constrained to be equal in both gender groups, with the implication that the underlying factor model is identical in both subgroups. To test for possible deviations from this strict model, a series of Lagrange Multiplier tests are implemented (Bentler, 1990). This test helps evaluate which constraint(s) should be relaxed to yield improvements in the overall goodness-of-fit of the model. Since the Lagrange Multiplier test is used in an exploratory manner to discover which items produce gender-biased responses, we retest the model on a second, independent sample to avoid taking advantage of sampling chance. Subjects. The following analysis is based primarily on a sample of 708 cancer patients. This sample combines cases from three different home-care studies based on different populations: cancer patients from lower Michigan primarily residing in small towns and rural areas (n = 240) (Given and Given, 1987~) cancer patients from suburban Pittsburgh (n = 258) (Schulz, 1990), and cancer patients from urban Philadelphia (n = 210) (McCorkle, 1990). As the data in Table 1 indicate, subjects in the three subsamples differ significantly with respect to age, marital status, race/ethnicity, as well as their cancer diagnosis and functional limitations in activities of daily living. Only the ratio of men to women shows substantial similarities across studies. While heterogeneity in the data tends to reduce the fit of scaling models because of additional sources of score variance, models that do fit under these conditions have a greater claim to generalizability and, thus, are likely to prove more robust. As mentioned, to confirm findings on data from a noncancer population, a second sample of 504 caregivers selected from three Michigan home-care studies was used. The proportion of male caregivers in these studies ranged from 16.2% to 27.6%. To avoid gender-study interactions, all male caregivers with complete CES-D responses were selected and supplemented with an equal number of randomly selected female caregivers from each study.

242 Table 1. Characteristics

of cancer patients

in three subsamples

(n = 708)

Michigan (n = 240)

Pittsburgh (n = 258)

n

%

n

%

n

%

value

Female

120

50

129

50

112

Male

120

50

129

50

98

53 47

0.817

202

84

169

65

113

54

38

16

89

35

97

46

230

96

236

91

148

70

10

16

22

9

62

30

145

61

121

47

103

49

61

26

110

43

70

33

Jewish

0

0

3

1

21

IO

Other

14

6

6

2

8

4

None

19

8

18

7

8

4

Patient

characteristics

Philadelphia (n = 210)

P

Gender

Marital status Married Not married

0.000

Race White Other

0.000

Religion Protestant Catholic

0.000

Primary cancer diagnosis Breast

64

27

57

22

53

25

Colorectal

31

13

13

5

43

21

Lung

36

15

51

20

33

16

5

2

37

14

23

11

18

8

18

7

33

16

Head:neck Gastrointestinal Gynecological

12

5

25

10

0

0

Prostate/bladder

18

8

34

13

25

12

Other

56

23

23

9

0

0

0.000

FunctIonal limitations Yes

71

30

29

11

174

83

No

169

70

229

a9

36

17

Mean

Range

Mean

Range

Mean

Range

22-83

63

30-83

62

19-89

Patient age

The final sample

59

includes

caregivers

of physically

0.000

0.002

impaired

elderly (n = 100) (Given and 1987h), and caregivers of patients with diverse chronic problems (n = 234) (Given and Given, 1989), all evenly split by gender. An examination of sociodemographic characteristics exhibited no significant differences across the three subgroups. Most of the 504 caregivers were married (78.2Ye). white (92.5%), Protestant (73.2%), middle class (mean household income: $32,500), and between I8 and 88 years of age (mean = 63.4). Comparing the combined caregiver sample with the combined cancer patient sample yielded significant differences across all mentioned sociodemographic variables.

Given, 1986), Alzheimer’s patients (n = 170) (Given and Given,

Results Several CES-D

researchers have observed that the distribution of responses to individual items is often highly skewed (Devins and Orme, 1986; Liang et al., 1989).

243 Such patterns may merely “reflect an underlying nonnormal distribution of depressive affect in the general population”(Hertzog et al., 1990). However, the problem of skewed responses is not shared by all items to the same degree. As the item statistics in Table 2 make clear, skewness in the distribution of responses is particularly severe for the two “interpersonal” items. This pattern is consistent across the three cancer study samples combined here, occurs in our sample of 504 caregivers, and has also been reported in other studies (e.g., Clark et al., 1981). Skewness remains a severe problem even at the subscale level and is the likely reason for the lower interscale correlations involving the “interpersonal”subscale (see Table 2). Again, our findings are consistent with those in other studies, whether the correlations are based on summated subscale scores or weighted factor scores (e.g., Clark et al., 1981; Hertzog et al., 1990). The skewed response pattern of the interpersonal items resulted from the fact that more than 90% of both male and female respondents in the cancer patient sample indicated that they “rarely or none of the time” thought people unfriendly or felt disliked. This is remarkable in view of the overall elevated CES-D scores in this sample, with a mean of 13.2 for the 20-item summated scale score. (Among the 504 caregivers with an even higher mean CES-D score of 14.8, there are still more than 82% who responded “rarely or none of the time” to the two interpersonal items.) Given this difference in response patterns compared with the other CES-D items, there is little reason to think that the interpersonal items are valid indicators of depressive symptomatology. Their lack of face validity (Liang et al., 1989) lack of contribution to scale variance, and lack of desirable psychometric attributes (skewness) all favor exclusion of the interpersonal items from the scale. This leaves 18 CES-D items for an examination of possible gender bias. Without the interpersonal factor, the remaining items are grouped into a threefactor model, including (1) depressive mood, (2) well-being, and (3) somatic symptoms factors as indicated in Table 2. When these three factors are used, a gender-bias free model can be constructed that requires the same factor loadings, the same error variances, and the same interfactor covariances for both male and female respondents. To the extent that these cross-group constraints are inconsistent with the data, they can be relaxed until a model is found that fits the data as well as the null model, which puts no cross-group equality constraints on any structural parameters. The attempt to fit the bias-free model to the data resulted in acceptable values for the overafl goodness-of-fit indices: NFI = 0.976 and CFI = 0.989 (Bentler, 1990). However, this model clearly does not fit as well as the constraint-free null model: the x2 difference test yields a highly significant (p < 0.000) x2 value of 105.34 (df= 39). After relaxation of only five equality constraints identified through the Lagrange Multiplier tests, however, an alternative model was found that fit the data as well as the null model h2 = 38.58, df = 34, p = 0.270). This alternative model no longer required equality across gender for two factor loadings, two error variances, and one correlation among the latent subscales. Confirming the fit of this same model on the caregiver sample also resulted in a nonsignificant x2 difference of 29.41 (df= 34, p = 0.517). In the context of our discussion of gender bias, it is important to examine which population parameters differ between male and female respondents in the new well-

244

Table 2. Descriptive statistics for Center for Epidemiologic Studies Depression Scale (CES-D) items and four subscales commonly encountered in the literature (female: n = 361, male: n = 347)

Depressive CES-D CES-D CES-D

affect,

3 could

Mood subscale

not shake off the blues

6 felt depressed 9 thought

life a failure...

Skew % None

Mean

SD

3.99

3.77

1.24

3.10

3.35

1.60

0.56

0.79

1.46

58.4%

0.50

0.78

1.60

63.4%

(F) (Ml

0.77

0.81

1 .Ol

41 .3%

0.69

0.74

1.09

44.1 %

0.23

0.54

2.65

82.3%

0.22

0.58

3.14

84.7%

0.64

0.75

1.37

48.2%

0.46

0.67

1.61

61 .7%

0.57

0.75

1.44

54.6%

0.48

0.76

1.72

64.6% 59.0%

Items (F) M IFI PA

CES-D

8 felt hopeful about the future

(F) M (F) (4 (F) WI IF) P.4 F) (Ml (F) (Ml IF) (Ml (F)

1.03

0.66

44.7%

12 was happy

M F=)

0.96

CES-D

1.07

0.93

0.33

33.8%

(4 (F) M

1.03

0.89

0.39

33.4%

0.84

0.97

0.74

49.0%

0.69

0.86

0.94

54.2%

6.19

4.33

0.66

5.98

4.18

0.66

1 was bothered by things

(F) M (F)

0.66

0.81

1.23

50.7%

0.66

0.84

1.33

52.2%

2 had a poor appetite

M F)

1.05

1.12

0.73

41.3%

(Ml

1 .Ol

1.13

0.75

44.7%

(F) M (F) (Ml (F) (4 (F) M (Fl (4

0.73

0.90

1.13

50.4%

0.66

0.84

1.26

52.7%

1 .lO

1.00

0.65

31.3%

1.02

1.Ol

0.69

37.2%

1 .Ol

1.04

0.76

38.5%

1.03

1.01

0.68

36.3%

0.69

0.86

1.13

52.1%

0.80

0.97

1.06

49.6%

0.94

0.94

0.90

36.0%

0.80

0.87

1.07

42.1 %

CES-D CES-D CES-D CES-D Positive CES-D

CES-D

10 felt fearful _.. 14 felt lonely 17 had crying spells 18 felt sad affect,

Well-being

subscale

4 felt as good as other people

16 enjoyed

life

0.50

0.71

1.57

0.21

0.46

2.50

81 .O%

0.71

0.76

1.11

43.5%

0.55

0.66

1.12

53.3%

3.39

2.80

0.39

3.29

2.76

0.62

0.54

0.85

1.52

0.61

0.91

1.37

62.8%

0.94

1.Ol

0.61

45.7%

64.5%

Somatic and retarded activity, Psychomotoric CES-D CES-D

aspects

subscaie

CES-D 5 had trouble keeping my mind on what CES-D CES-D CES-D CES-D

I was doing 7 everything

was an effort

11 sleep was restless 13 talked

less than usual

could not get going

245

Table 2. Descriptive statistics for Center for Epidemiologic Studies Depression Scale (CES-D) items and four subscales commonly encountered in the literature (female: n = 361, male: n = 347) (Cont.) Interpersonal relations

SD

Skew

0.25

0.79

3.83

CES-D

PA

0.18

0.64

4.35

(F)

0.15

0.55

4.1 1

90.9%

M

0.09

0.36

5.14

CES-D

93.4%

(F)

0.10

0.36

4.48

91.1%

M

0.10

0.39

5.10

92.6%

(F)

13.83

9.37

0.66

(MI

12.55

9.19

1.04

F)

15 people were unfriendly 19 felt people disliked

me . .

CES-D total scale score

%

None

Mean

Items

Subscale correlations for combined sample (n = 706) (based on summated subscale scores) Depressive mood

Wellbeing

Well-being

0.77

Somatic

0.87

0.71

0.57

0.31

symptoms

Interpersonal

Somatic symptoms

0.44

fitting factor model. The results provide a clear picture of the differences in the factorial structure of the CES-D scale between gender groups. The five constraints that had to be relaxed involve factor loadings and error variances of the items: (1) “thought life a failure,” (2) “talked less than usual,” and (3) “had crying spells.” In addition, the strength of the correlation between the somatic symptoms factor and the depressive mood factor differed among male and female respondents. The reasons for the between-group differences of the “failure” item are not quite clear, except that this item has the third highest skewness among the original 20 CES-D items in both the cancer patient sample (see Table 2) and the caregiver sample (where its sample mean is 0.29 with a skewness value of 2.34), and it has also shown poor reliabilities in previous studies (Liang et al., 1989). The reasons for the gender difference in the response patterns of the “talked less” and “crying” items appear to be substantive in nature. To aid the interpretation of this finding of nonequivalence across gender groups, an additional test has been performed (see Table 3). This test involves a multivariate regression with the “talked less” and “crying” items regressed on a dummy variable for gender (1 = female, 0 = male) and the remaining 15 CES-D items. It is worth remembering that the 15 remaining CES-D items have been shown to produce bias-free responses with respect to gender, since none of their equivalence constraints caused any problems. Thus, they are equally good measures of depressive symptomatology among men and women. As the results in Table 3 make clear, the same cannot be said about the “talked less” and “crying” items. Responses to both of these items depend on the gender of the respondent even after controlling for respondents’ general levels of depressive symptomatology as represented by the 15 unbiased CES-D items. In sum,

246 Table 3. Regressions of Center for Epidemiologic Studies Depression Scale (CES-D) items 13 (“talked less than usual”) and 17 (“had crying spells”) on gender dummy and 15 unbiased CES-D items (n = 708) (a) Multivariate effect of gender Wilk’s A: 0.941, significance: 0.000 (b) Univariate effects (unstandardized

regression coefficients):

CES-D13 T-Sig.

bi

T-Sig.

-0.17

0.000

0.21

0.000

0.19

0.000

0.05

0.103

-0.12

0.043

0.08

0.040

bi Independent

CES-D17

variables

Gender

(1 = female)

CES-D

3 _.. could not shake off the blues ._.

CES-D

6

CES-D

10

felt fearful

-0.04

0.477

0.15

0.000

CES-D

14

felt lonely

0.24

0.000

0.01

0.798

0.10

0.084

0.35

0.000

CES-D18

felt depressed

._. felt sad ._.

CES-D

4

felt as good as other people

0.06

0.134

0.01

0.663

felt hopeful about the future

0.02

0.536

-0.03

0.174

0.09

0.033

-0.03

0.219

-0.04

0.359

-0.04

0.090

0.02

0.716

0.00

0.967

0.08

0.006

-0.01

0.448

0.06

0.136

0.03

0.166

0.10

0.010

0.00

0.938

CES-D

8

CES-D

12 _.. was happy _..

CES-D

16 _.. enjoyed

CES-D

1

was bothered

CES-D

2

had a poor appetite

CES-D 5

life by things

had trouble keeprng my mend on what I was doing

CES-D

7... everything

was an effort .._

CES-D

11

sleep was restless

0.00

0.951

0.00

0.802

CES-D

20

could not get going

0.09

0.034

-0.03

0.246

R2 = 0.39

R’ = 0.28

men who otherwise have the same level of depressive symptoms as women are less likely to have “crying spells,” a fact that marks this item as a gender-biased indicator of depression. Interestingly, the gender bias in the response to the “talked less” item is in the opposite direction: depressed men are more likely to reduce their verbal communication compared with equally depressed women. This, too, is a genderspecific response pattern that biases total CES-D scale scores in comparisons between men and women. (In the caregiver sample, the “crying” and “talked less” items produced similar gender-specific responses, with statistically significant biases of -0.13 and f0.23, respectively.) Table 4 presents the final three-factor model of the CES-D with all gender-biased items removed. While the remaining 15 factor loadings and error variances do not differ between male and female respondents, among cancer patients, the correlation between the somatic symptoms and depressive affect/mood factor is markedly higher for men (0.90) than women (0.75). Imposing an equality constraint leads to a significant worsening of the goodness-of-fit of the model with an incremental ~2 value of 10.44 (df = 1, p < 0.000). This finding appears to be peculiar to the population of cancer patients. Among the caregivers, the equality constraint on this factor covariance could not be rejected; the incremental x2 value was 1.82 (df= 1, p < 0.177).One can only speculate whether the symptoms resulting from cancer or its treatment (like nausea and vomiting) have different effects on men compared with

247

Table 4. Standardized maximum likelihood estimates for 15 items that were free of gender bias from the Center for Epidemiologic Studies Depression Scale (CES-D) (n = 708) Factor

Error variances

loadings

Items Depressive

affect, Mood subscale

CES-D

3 could not shake off the blues .._

0.764

0.646

CES-D

6 felt depressed

0.842

0.540

CES-D

10 felt fearful

0.649

0.761

CES-D

14 felt lonely ._.

0.631

0.775

CES-D

18 felt sad

0.803

0.596

Positive affect, Well-being

subscale

CES-D

4 felt as good as other people

0.498

0.867

CES-D

8 felt hopeful about the future

0.628

0.778

CES-D

12 was happy

0.752

0.659

CES-D

16 enjoyed

0.744

0.668

0.651

0.759

life

Somatic and retarded activity, Psychomotoric

aspects subscale

CES-D

1 was bothered

CES-D

2 had a poor appetite

by things

0.428

0.904

CES-D 5 had trouble keeping my mind on what I was doing

0.569

0.822

CES-D

7 everything

0.662

0.749

CES-D

11 sleep was restless

was an effort

0.435

0.901

CES-D

20 could not get going

0.643

0.766

Interfactor correlations for male and female subgroups Women (n = 361) Depressive mood Well-being

0.78

Somatic

0.75

symptoms

Wellbeing

Men (n = 347) Depressive mood

Wellbeing

0.78 0.71

0.90

0.71

Lagrangemultiplier test

X2/ X2

df

df

NFI

CFI

X2

df

P

Ma: Final model

360.54

206

1.75

0.981

0.992

44.93

32

0.064

MO: Null model

315.61

174

1.81

0.980

0.991

Models

those on women resulting in different correlations between the mood and somatic symptom subscales. A final question concerns the effect of removing five CES-D items from the scale on gender comparisons. Employing the original 20-item scale in a two-way analysis of variance with the combined 1212 cases stratified by gender and subject group (cancer patients vs. caregivers) yields the following results: among cancer patients, women average 13.8 and men 12.6 on the total CES-D scale; among caregivers, the means are 15.8 (women) and 13.8 (men). These values represent significant

248 differences by gender (p < 0.004) and subject group (p < 0.004), but there is no interaction (p > 0.5 14). After removal of the two gender-biased items as well as the “failure” and the two interpersonal items, the following scale means obtain for the reduced 15item CES-D scale: among cancer patients, 12.1 (women) and 11.2 (men): among caregivers, 13.9 (women) and 12.2 (men). As in the case of the total CES-D scores based on all 20 items, the gender (p < 0.005) and subject group (p < 0.004) effects remain significant, but the reduction in the gender difference in CES-D scores from 1.6 to 1.3 (for the combined sample of 1212) is itself statistically significant. Despite the narrowing of the gender difference, the reduced 15item scale correlates very highly with the original 20-item scale (0.98). In addition, shortening the CES-D scale by the five selected items barely affects its overall reliability: the Cronbach’s a of 0.89 for the 20-item scale changes to 0.88 for the 15item scale. Discussion Recent discussion of the measurement biases of the CES-D scale has focused on possible age biases (Liang et al., 1989; Gatz and Hurwicz, 1990). While this discussion has so far proved inconclusive (Hertzog et al., 1990; Kessler et al., 1992), there is also a concern about the gender bias of the CES-D, since it has often been used for comparisons between men and women. Building on recent advances in structural equation models, we were able to define and test the properties of factorial models free of any gender bias. In fitting such models to a sample of 708 cancer patients and reconfirming it on a sample of 504 caregivers of chronically ill elderly, we could examine each individual scale item for its bias (or lack thereof) with respect to gender. This procedure resulted in the identification of two CES-D items that clearly show different response patterns among men and women. Furthermore, three additional CES-D items were excluded because of their poor psychometric qualities, leaving a subset of 15 gender-bias free scale items. Our findings are partially consistent with those of Roberts et al. (1990) who, with data on an adolescent population, also found a gender bias in the responses to the “crying” item. However, while Roberts et al. did not detect differences with respect to the “talked less” item, the boys and girls showed different response patterns to the appetite item, a finding that may be peculiar to an adolescent population. Our findings are also consistent with those of Shrout and Yager (1989) who showed that the sensitivity and specificity of the CES-D scale is not compromised even when as many as 10 items are eliminated. Moreover, our analyses provide guidance as to which specific items should be candidates for elimination. While the reduced 15-item CES-D scale no longer exaggerates gender differences in depressive symptomatology, it retains almost all the information of the original 20-item scale as demonstrated by the very high correlation between the original 20-item and the shortened 15-item version of the CES-D. Being shorter and unbiased, the 15-item version is preferable to the original scale. Acknowledgments.

This research is supported by grants: “Family Homecare for CancerA Community-Based Model” (#I RO I NR019 15) funded by the National Center for Nursing Research, Barbara A. Given and Charles W. Given, Principal Investigators; “Living With Homecare: Cancer Patients and Caregivers”(#ROI CA48635) funded by the National Cancer

249

“Evaluation of Home Care for Cancer Institute, Richard Schulz, Principal Investigator; Patients” (#ROl NR01914) funded by the National Center for Nursing Research, Ruth “Caregiver Responses to Managing Elderly Patients at McCorkle, Principal Investigator; Home”(#l ROl AG06584) funded by the National Institute on Aging, Charles W. Given and “Impact of Alzheimer’s Disease on Family Barbara A. Given, Principal Investigators; Caregivers”(#l ROI MH41766) funded by the National Institute of Mental Health, Charles W. Given and Barbara A. Given, Principal Investigators; and “Caregiver Responses to Managing Elderly Patients at Home”(#2 ROl AG06584) funded by the National Institute on Aging, Charles W. Given and Barbara A. Given, Principal Investigators.

References Alwin, D.F. Structural equation models in research on human development and aging. In: Schaie, K.W., ed. Methodological Issues in Aging Research. New York: Springer, 1988. pp. 71-171. Alwin, D.F., and Jackson, D.J. Application of simultaneous factor analysis to issues of factorial invariance. In: Jackson, D.J., and Borgatta, E.F., eds. Factor Analysis and Measurement in Sociological Research: A Multi-Dimensional Perspective. Beverly Hills, CA: Sage, 1981. pp. 78-178. Aneshensel, C.S.; Frerichs, R.R.; and Huba, G.J. Depression and physical illness. Journal of Health and Social Behavior, 25:350-371, 1984. Barnett, R., and Baruch, G. Social roles, gender, and psychological distress. In: Barnett, R.; Biener, L.; and Baruch, J., eds. Gender and Stress. New York: Free Press, 1987. pp. 57-73. Bentler, P.M. EQS-Structural Equations Program Manual. Los Angeles: BMDP Statistical Software, Inc., 1990. Berkman, L.F.; Berkman, C.S.; Kasl, S.; Freeman, D.H.; Leo, L.; Ostfeld, A.M.; CornoniHuntley, J.; and Brody, J.A. Depressive symptoms in relation to physical health and functioning in the elderly. American Journal of Epidemiology, 124:372-388, 1986. Clark, V.A.; Aneshensel, C.S.; Frerichs, R.R.; and Morgan, T.M. Analysis of the effects of sex and age in response to items on the CES-D scale. Psychiatry Research, 5: 17 I -I 8 1, 198 I. Devins, G.M., and Orme, CM. Center for Epidemiologic Studies Depression Scale. In: Kayser, D.J., and Sweetland, R.C., eds. Test Critiques. Vol. 2. Kansas City: Test Corporation of America, Inc., 1986. pp. 144-160. Ensel, W.M. Measuring depression: The CES-D scale. In: Lin, N.; Dean, A.; and Ensel, W.E., eds. Social Support, Life Events, and Depression. New York: Academic Press, 1986a. pp. 51-70. Ensel, W.M. Sex, marital status, and depression: The role of life events and social support. In: Lin, N.; Dean, A.; and Ensel, W.M., eds. Social Support, Life Events, and Depression. New York: Academic Press, 19866. pp. 231-247. Ensel, W.M., and Lin, N. The life stress paradigm and psychological distress. Journal of Health and Social Behavior, 32:321-341, 1991. Gatz, M., and Hurwicz, M. Are old people more depressed? Cross-sectional data on Center for Epidemiologic Studies Depression Scale factors. Psychology and Aging, 5:284-290, 1990. George, L.K. Gender, age, and psychiatric disorders. In: Glass, L., and Hendricks, J., eds. Gender and Aging. Amityville, NY: Baywood Publishing Company, 1990. pp. 33-43. Given, B., and Given, C. W. Family homecare for cancer-A community-based model (Grant #l ROl NR01915). Funded by National Center for Nursing Research, 1987a. Given, C.W., and Given, B. Caregiver responses to managing elderly patients at home (Grant #l ROl AG06584). Funded by the National Institute on Aging, 1986. Given, C.W., and Given, B. Impact of Alzheimer’s disease on family caregivers (Grant #I ROI MH41766). Funded by the National Institute of Mental Health, 19876. Given, C.W., and Given, B. Caregiver responses to managing elderly patients at home (Grant #2 ROI AG06584). Funded by the National Institute on Aging, 1989.

250 Hertzog, C. Using confirmatory factor analysis for scale development and validation. In: Lawton, M.P., and Hertzog, A.R., eds. Special Research Methods for Gerontology. Amityville, NY: Baywood Publishing Company, 1990. pp. 281-306. Hertzog, C.; Van Alstine, J.; Usala, P.D.; Hultsch, D.F.; and Dixon, R. Measurement properties of the Center for Epidemiologic Studies Depression Scale (CES-D) in older populations. Psychological Assessment, 2~64-12, 1990. Jiireskog, K.G. Simultaneous factor analysis in several populations. Psychometrika, 57~409-426,

1971.

Kessler, R.C.; Foster, C.; Webster, P.S.; and House, J.S. The relationship between age and depressive symptoms in two national surveys. Psycho/ogy and Aging, 7: I I9- 126, 1992. Krause, N. Stress and sex differences in depressive symptoms among older adults. Journal of Gerontology,

41:127-73

1, 1986.

Lewinsohn, P.M.; Rhode, P.; Seeley, J.R.; and Fischer, S.A. Age and depression: Unique and shared effects. Psychology and Aging, 6:247-260, 199 I, Liang, J.; Van Tran, T.; Krause, N.; and Markides, K.S. Generational differences in the structure of the CES-D scale in Mexican Americans. Journal of Gerontology, 44:S120-130, 1989.

Long, J.S. Confirmatory 1983. McCorkle, R. Evaluation by the National Center for Moritz, D.J.; Kasl, S.V.; impaired elderly spouse: Gerontology,

44:S17-27,

Factor

Anal+vsis: A Preface

to LISREL.

Beverly Hills, CA: Sage,

of home care for cancer patients (Grant #ROl NR01914). Funded Nursing Research, 1990. and Berkman, L.F. The health impact of living with a cognitively Depressive symptoms and social functioning. Journal of

1989.

Murrell, S.A.; Himmelfarb, S.; and Wright, K. Prevalence of depression and its correlates in older adults. American Journal of Epidemiology, I 17: 173-l 85, 1983. Radloff, L.S. The CES-D scale: A self-report depression scale for research in the general population. Applied Psychological Measurement, I:385401, 1977. Radloff, L.S., and Locke, B.Z. The community mental health assessment survey and the CES-D scale. In: Weissman, M.M.; Myers, J.K.; and Ross, C.E., eds. Community Surveys of Psychiatric Disorders. New Brunswick, NJ: Rutgers University Press, 1986. pp. 177-189. Roberts, R.E. Reliability of the CES-D scale in different ethnic contexts. Psychiatry Research,

2: 125- 134, 1980.

Roberts, adolescents Assessment,

R.E.; Andrews, J.A.; Lewinsohn, P.M.; and Hops, H. Assessment of depression in using the Center for Epidemiologic Studies Depression Scale. Psychological 2: I22- 128, 1990.

Schaie, K.W., and Hertzog, C. Measurement in the psychology of adulthood and aging. In: Birren, J.E., and Schaie, K.W., eds. Handbook of the Ps.cchologyl of Aging. New York: Van Nostrand Reinhold, 1985. pp. 61-92. Schulz, R. Living with homecare: Cancer patients and caregivers (Grant #ROl CA48635). Funded by the National Cancer Institute. 1990. Shrout, P.E., and Yager, T.J. Reliability and validity of screening scales: Effect of reducing scale length. Journal of Clinical Epidemiology, 42:69-78, 1989. Turner, R.J.. and Noh, S. Physical disability and depression: A longitudinal analysis. Journal

of Health

and Social

Behavior,

29123-37,

1988.

Vera, M.; Alegria, M.; Freeman, D.; Robles, R.R.; and Rios, C.F. Depressive symptoms among Puerto Ricans: Island poor compared with residents of the New York City area. American Journal of Epidemiology’, 134:502-5 IO, 199 I. Verbrugge, L.M. Gender and health: An update on hypotheses and evidence. Journal of Health

and Social

Behavior,

26: 156- 182, 1985.