Journal of Affective Disorders 100 (2007) 265 – 269 www.elsevier.com/locate/jad
Brief report
Reliability and validity of the Hospital Anxiety and Depression Scale and the Beck Depression Inventory (Full and FastScreen scales) in detecting depression in persons with hepatitis C Jeannette Golden a , Ronán M. Conroy b,⁎, Anne Marie O'Dwyer a b
a Psychological Medicine Service, St James's Hospital, Dublin 8, Ireland Epidemiology Department, Royal College of Surgeons in Ireland, 120 St Stephen's Green, Dublin 2, Ireland
Received 16 June 2006; received in revised form 21 October 2006; accepted 23 October 2006 Available online 6 December 2006
Abstract Background: We examined the performance the Beck Depression Inventory (BDI) and its short form (BDI-FS) and the Hospital Anxiety and Depression Scale depression (HADS-D) and anxiety (HADS-A) subscales in detecting depression in a group of patients with hepatitis C. Methods: SCID-CV was used to establish DSM-IV diagnosis. Sensitivity, specificity, positive and negative predictive values were used to assess test performance and Cohen's Kappa to measure agreement with DSM diagnosis. Results: Twenty-five of 88 participants had a DSM-IV depressive diagnosis. There was considerable non-overlap between ‘caseness’ on the BDI and HADS (Kappa = 0.44). The HADS depression subscale had poor sensitivity (52%) and poor agreement with clinical diagnosis (Kappa = 0.35). The full BDI had a sensitivity of 88% and a Kappa of 0.54 against a sensitivity of 84% and Kappa of 0.42 for the short form. The HADS anxiety subscale predicted depression as well as the depression subscale (sensitivity 88%, Kappa 0.47). Conclusions: Neither the BDI nor the HADS agrees well with the clinical diagnosis of depressive disorder, nor do they agree well with one another. The anxiety subscale of the HADS appears to measure depression at least as well as the depressive subscale. © 2006 Elsevier B.V. All rights reserved. Keywords: Depression; Screening; Self-completion scales; HADS; BDI
1. Introduction Depression is still under-recognised and undertreated in medical patients (Gelenberg, 1999), both in primary care (Williams et al., 1999) and in medical inpatients (Bowler et al., 1994; Koenig et al., 1988; Penn et al., 1997). Screening for depression in the medially ill is made more difficult because the somatic
⁎ Corresponding author. E-mail address:
[email protected] (R.M. Conroy). 0165-0327/$ - see front matter © 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.jad.2006.10.020
symptoms of depression are also common in physical illness (Brown-DeGagne et al., 1998; Ross et al., 2003). The Beck Depression Inventory FastScreen (BDI-FS) was developed as a 7-item subset of the Beck Depression Inventory (BDI), aimed at rapid screening in medical patients (Beck et al., 1997). It is based on the cognitive symptoms of the BDI, and mirrors the diagnostic criteria for Major Depressive Disorder in DSM-IV. The Hospital Anxiety and Depression Scale (Zigmond and Snaith, 1983) comprises two 7-item scales designed to rate depression (HADS-D) and anxiety (HADS-A). It was developed to be brief, non-threatening and to exclude
266
J. Golden et al. / Journal of Affective Disorders 100 (2007) 265–269
items which might reflect somatic complaints. It has been widely used in research (Bjelland et al., 2002). In this study, we assess the reliability and validity of the HADS, BDI and BDI-FS in a group of patients with hepatitis C. 2. Methods 2.1. Participants and measures Participants were recruited as part of a study of mood and wellbeing among outpatients at the hepatitis C services of a St James's Hospital, Dublin. Ethical approval for the study was obtained by the hospital ethics committee. Having obtained written informed consent, participants were interviewed using the SCIDCV, a structured diagnostic interview based on DSM-IV criteria (First et al., 1996). They also completed the HADS and BDI scales. We have described the study in more detail elsewhere (Golden et al., 2005). The BDI-FS score was calculated from the relevant items on the BDI. A caseness threshold of 7/8 was used for the HADS subscales, and 3/4 for the BDI-FS. A threshold of 18/19 was used for the BDI. 2.2. Statistical analysis Data were analysed using Stata Release 9. Cronbach's alpha was used to measure reliability. Validity was assessed in two ways: using the area under the ROC curve, which is a generalised measure of the ability of a scale to distinguish between two groups. It measures the probability that a depressed person will score higher than a nondepressed person. An ROC curve area of 1 indicates perfect separation of the two groups while an area of 0.5 indicates no better separation than expected by chance. Validity was also assessed by calculating positive and negative predictive values for each test at its published caseness threshold. Agreement between DSM-IV diagnosis was measured using the Kappa statistic, with confidence intervals using the method of Donner and Eliasziw (Reichenheim, 2004). 3. Results 3.1. Diagnoses Of 97 potential participants, five refused and two could not be interviewed because of security concerns. Of 90 participants, two failed to complete both selfassessment measures, leaving 88 participants to form the
study group. Of these, 23 were women (26%). Most participants (47%) had been infected though injecting drug use (IDU) and a further 32 had iatrogenic disease, through transfusion (8) treatment for haemophilia (14) or contaminated anti-D products (10). The remaining participants had disease of unknown aetiology. DSM-IV depression was diagnosed in 25 participants (28%), of whom seven had major depressive disorder, ten adjustment disorder with depressive features and eight dysthymic disorder or depressive disorder not otherwise specified. 3.2. Reliability All three depression scales had high reliabilities: 0.83 for the HADS depression scale, 0.85 for the anxiety subscale, 0.85 for the BDI-FS and 0.93 for the full BDI scale. Examination of the performance of individual scale items showed that the BDI suicidality item had a very limited range, with 88% of participants scoring 0 and the remainder 1. It also had low correlation with the other items of the BDI-SF (0.4) and could be removed without altering the reliability. We therefore calculated a second BDI-FS score based on the remaining six items, which was shall refer to as the BDI-FS6, using the same cutpoint (3/4) to determine caseness. 3.2.1. Validity: detection of depression by the depression scales The area under the ROC curve was 0.87 for the BDI (95% CI 0.80 to 0.95). In comparison, it was 0.85 for the BDI-FS (95% CI 0.77 to 0.93) which was not significantly lower (P = 0.227). The BDI-FS-6 had an ROC curve area of 0.84 (95% CI 0.75 to 0.92) which was not significantly lower than that for the BDI-FS (P = 0.092). The HADS depression subscale, on the other hand, had an ROC curve area of only 0.78 (95% CI 0.68 to 0.88), significantly lower than the BDI (P = 0.025). Table 1 shows the predictive value of the scales using the caseness thresholds. It should be noted that there is considerable non-overlap between caseness on the HADS and the BDI (Kappa = 0.56) or BDI-FS (Kappa = 0.44). Agreement with DSM-IV diagnosis was highest for the BDI (Kappa = 0.54) and somewhat lower for the BDI-FS (Kappa = 0.42) and BDI-FS-6 (Kappa = 0.44). (Kappas in the range 0.4 to 0.6 are taken as indicators of poor agreement in clinical medicine.) For the HADS depression scale, the Kappa was worse, at only 0.33. The BDI had acceptable sensitivity (88%) and negative predictive value (94%) and these values were
J. Golden et al. / Journal of Affective Disorders 100 (2007) 265–269
267
Table 1 Diagnostic and screening performance of the HADS depression (HADS-D) and anxiety (HADS-A) subscales, the BDI-FS in its original (7-item) and modified (6-item) versions and the Beck Depression Inventory (BDI) Screening scale
HADS-D cutoff 7/8
HADS-A cutoff 7/8
BDI-FS cutoff 3/4
BDI-FS-6 cutoff 3/4
BDI
Non-case Case +ve predictive value (95% CI) −ve predictive value (95% CI) Sensitivity (95% CI) Specificity (95% CI) Kappa (95% CI) Non-case Case +ve predictive value (95% CI) −ve predictive value (95% CI) Sensitivity (95% CI) Specificity (95% CI) Kappa Non-case Case +ve predictive value −ve predictive value Sensitivity (95% CI) Specificity (95% CI) Kappa Non-case Case +ve predictive value −ve predictive value Sensitivity (95% CI) Specificity (95% CI) Kappa Non-case Case +ve predictive value −ve predictive value Sensitivity (95% CI) Specificity (95% CI) Kappa
Non-cases (SCID-CV)
Cases (SCID-CV)
Total
N = 63
N = 25
N = 88
52 11 54% 81% 52% 83% 0.35 43 20 52% 93% 88% 68% 0.47 42 21 50% 91% 84% 67% 0.42 43 20 51% 91% 84% 68% 0.44 47 16 58% 94% 88% 75% 0.54
12 13 (33%–74%) (70%–90%) (31%–72%) (71%–91%) (0.13–0.55) 3 22 (36%–68%) (82%–99%) (69%–97%) (55%–79%) (0.30–0.64) 4 21 (34%–66%) (34%–66%) (64%–95%) (54%–78%) (0.24–0.60) 4 21 (35%–67%) (80%–98%) (64%–95%) (55%–79%) (0.26–0.62) 3 22 (41%–74%) (83%–99%) (69%–97%) (62%–85%) (0.37–0.71)
64 24
46 44
46 42
47 41
50 38
Numbers in bold are frequencies. Numbers in parentheses are confidence intervals.
similar for the BDI-FS and BDI-FS-6. Positive predictive value indices were lower for all three BDI-derived measures. The HADS depression scale had a poor sensitivity (52%) missing roughly half of all depressed participants. 3.2.2. Detection of depression by the HADS anxiety subscale Table 1 also shows the performance of the anxiety subscale of the HADS, which had a higher ROC curve area (0.84, 95% CI 0.74 to 0.93) than the depression subscale. It performed better as a screening tool for depression, with higher negative predictive value and
sensitivity, detecting 88% of depressed participants as against 52% for the depression subscale. 3.2.3. Optimisation of cutoff points for depression We examined each scale to see if changing the caseness threshold would significantly improve its discriminative ability. No significantly better caseness threshold was found for the HADS, but for the BDI-FS, the use of a threshold of 5/6 Kappa increased to 0.53, specificity to 76% and positive predictive value to 58% with sensitivity and negative predictive value essentially unaltered. This increase in prediction was statistically significant (P = 0.021).
268
J. Golden et al. / Journal of Affective Disorders 100 (2007) 265–269
4. Discussion There are two areas of concern about current screening tools for depression: the first is their ability to identify those with depressive disorder. There is a perceived need for means of rapid assessment of depression in medical patients (Williams et al., 2002) without the need for formal psychiatric assessment. The second is that in many research studies these instruments are used in the absence of formal psychiatric assessment. This applies particularly to the HADS, which, because it is simple and non-threatening, is widely used in research (Herrmann, 1997). However, comparatively few studies have validated the HADS against clinical diagnosis, and these have used a variety of caseness thresholds, making it difficult to synthesise the findings (Bjelland et al., 2002). 4.1. Case detection The most significant finding of our study is the poor agreement between all self-completion measures and DSM-IV diagnosis of depression. This is of particular concern in relation to the HADS, which missed just under 50% of all those with depression. By contrast, though the BDI-FS had a high rate of false positive findings, it had a lower rate of false negatives. Its high negative predictive value suggests that it may be useful in ruling out depression. The superiority of the BDI-FS confirms the findings of Parker et al. (2001) and Beck et al. (1997). Though most case-findings studies with the HADS have reported sensitivities and specificities of 80% or more (Herrmann, 1997), several other authors have reported performance as poor as that documented here (Hall et al., 1999; Silverstone, 1994). One explanation may be the difference in reading age between the BDI and HADS. The BDI has a Fog readability index that places it in the ‘easy’ category, while the HADS has one of the highest required literacy levels of any of the selfcompletion instruments assessed by Williams et al. (2002). Our patient group would have included a significant number with low literacy which, taken with Williams et al.'s findings, suggests that the HADS may perform poorly in populations with low literacy levels. 4.2. What does the HADS anxiety scale measure? The HADS anxiety scale performed marginally better than the depression scale as a screen for depressive disorder. Costantini et al. (1999) have reported a similar
finding in a study of women with breast cancer. This strongly suggests that the two subscales of the HADS do not measure distinct clinical entities, and that the large literature based on the premise that they do is signally flawed. 4.3. Can the BDI-FS be shortened? Our findings suggest that the omission of the suicidality item from the BDI-FS does not compromise its performance, and may increase its acceptability to patients. Further work, however, is needed to validate this finding. 4.4. Conclusion It is clear that none of the instruments is a substitute for clinical observation. We can only repeat the warning sounded by Beck that “researchers and clinicians need to be aware of the differential sensitivity of depression instruments which, while supposedly measuring the same construct, are focussed on different components of this mood disorder” (Beck and Gable, 2001). References Beck, C.T., Gable, R.K., 2001. Comparative analysis of the performance of the Postpartum Depression Screening Scale with two other depression instruments. Nurs. Res. 50 (4), 242–250. Beck, A.T., Guth, D., Steer, R.A., Ball, R., 1997. Screening for major depression disorders in medical inpatients with the Beck Depression Inventory for Primary Care. Behav. Res. Ther. 35 (8), 785–791. Bjelland, I., Dahl, A.A., Haug, T.T., Neckelmann, D., 2002. The validity of the Hospital Anxiety and Depression Scale. An updated literature review. J. Psychosom. Res. 52 (2), 69–77. Bowler, C., Boyle, A., Branford, M., Cooper, S.A., Harper, R., Lindesay, J., 1994. Detection of psychiatric disorders in elderly medical inpatients. Age Ageing 23 (4), 307–311. Brown-DeGagne, A.M., McGlone, J., Santor, D.A., 1998. Somatic complaints disproportionately contribute to Beck Depression Inventory estimates of depression severity in individuals with multiple chemical sensitivity. J. Occup. Environ. Med. 40 (10), 862–869. Costantini, M., Musso, M., Viterbori, P., Bonci, F., Del Mastro, L., Garrone, O., et al., 1999. Detecting psychological distress in cancer patients: validity of the Italian version of the Hospital Anxiety and Depression Scale. Support. Care Cancer 7 (3), 121–127. First, M.B., Gibbon, M., Sptizer, R.L., Williams, J.B.W., 1996. Structured Clinical Interview for DSM-IV Axis I Disorders: Clinician Version (SCID-CV): User's Guide. American Psychiatric Publishing Inc., Arlington, VA. Gelenberg, A., 1999. Depression is still underrecognized and undertreated. Arch. Intern. Med. 159 (15), 1657–1658. Golden, J., O'Dwyer, A.M., Conroy, R.M., 2005. Depression and anxiety in patients with hepatitis C: prevalence, detection rates and risk factors. Gen. Hosp. Psych. 27 (6), 431–438.
J. Golden et al. / Journal of Affective Disorders 100 (2007) 265–269 Hall, A., A'Hern, R., Fallowfield, L., 1999. Are we using appropriate self-report questionnaires for detecting anxiety and depression in women with early breast cancer? Eur. J. Cancer 35 (1), 79–85. Herrmann, C., 1997. International experiences with the Hospital Anxiety and Depression Scale—a review of validation data and clinical results. J. Psychosom. Res. 42 (1), 17–41. Koenig, H.G., Meador, K.G., Cohen, H.J., Blazer, D.G., 1988. Detection and treatment of major depression in older medically ill hospitalized patients. Int. J. Psychiatry Med. 18 (1), 17–31. Parker, G., Hilton, T., Hadzi-Pavlovic, D., Bains, J., 2001. Screening for depression in the medically ill: the suggested utility of a cognitivebased approach. Aust. N. Z. J. Psychiatry 35 (4), 474–480. Penn, J.V., Boland, R., McCartney, J.R., Kohn, R., Mulvey, T., 1997. Recognition and treatment of depressive disorders by internal medicine attendings and housestaff. Gen. Hosp. Psych. 19 (3), 179–184. Reichenheim, M.E., 2004. Confidence intervals for the kappa statistic. Stata J. 4 (4), 421–428.
269
Ross, L.E., Gilbert Evans, S.E., Sellers, E.M., Romach, M.K., 2003. Measurement issues in postpartum depression part 2: assessment of somatic symptoms using the Hamilton Rating Scale for Depression. Arch. Women Ment. Health 6 (1), 59–64. Silverstone, P.H., 1994. Poor efficacy of the Hospital Anxiety and Depression Scale in the diagnosis of major depressive disorder in both medical and psychiatric patients. J. Psychosom. Res. 38 (5), 441–450. Williams Jr., J.W., Mulrow, C.D., Kroenke, K., Dhanda, R., Badgett, R.G., Omori, D., et al., 1999. Case-finding for depression in primary care: a randomized trial. Am. J. Med. 106 (1), 36–43. Williams Jr., J.W., Pignone, M., Ramirez, G., Perez Stellato, C., 2002. Identifying depression in primary care: a literature synthesis of case-finding instruments. Gen. Hosp. Psych. 24 (4), 225–237. Zigmond, A.S., Snaith, R.P., 1983. The hospital anxiety and depression scale. Acta Psychiatr. Scand. 67 (6), 361–370.