J. psychiat. Res., Vol. 22, No. 1, pp. 13-19, 1988. Printed in Great Britain.
IMPROVING CONCURRENT
DEPRESSION
0022-3956/88 $3.00+ .00 Pergamon Press pie
SEVERITY ASSESSMENT--II.
AND EXTERNAL
VALIDITY
DEPRESSION
OF THREE
CONTENT, OBSERVER
SCALES
W O U ~ G MAmR, ISABELLAHEUSER, MIcI-I~L PHILIPP, ULRICHFRomammGER and W o u ~ G D~MXSTR Department of Psychiatry,Universityof Mainz, Untere Zahlbacher Str. 8, D-6500Mainz, F.R.G. (Received 15 January 1987; revved 15 July 1987; accepted 16 November 1987)
Smmnary--TheHamiltonDepressionScale(HAMD),the Montgomery-AsbergDepressionRating Scale (MADRS)and the Bech-RafaelsenMelancholiaScale(BRMS)werecomparedwith respectto content, concurrentand externalvalidityin a sampleof 130patients with a major depressiveepisode.The three scalesdid equallywellin concurrentand externalvalidity.TheHAMDshowedsomedeficienciesin content validity. The consequencesfor depression severityassessment are discussed. INTRODUCTION THE HAMILTONDEPRESSION SCALE (HAMD) (HAMILTON, 1960, 1967) is widely accepted as a valid indicator of the severity of depression and has been used by most studies in the literature monitoring the severity of depression. In spite of its high degree of acceptance, the HAMD has been criticized for lack of validity and rival scales have been proposed. In a previous paper (MAIER et al., 1988), we were able to demonstrate the superiority of another scale, the BechRafaelsen Depression Scale (BRMS) (I~CH et al., 1986), with respect to the criteria of reliability, internal construct validity and sensitivity to change; the BRMS was also superior to another rival depression scale--the Montgomery-Asberg Depression Rating Scale (MADRS) (MoNTOOMERY and ASBERG, 1979). Before any final recommendation is possible, it must be demonstrated that the BRMS is not inferior to the HAMD and the MADRS with respect to more traditional aspects of validity (NuNNALLY, 1972) as content validity, concurrent validity and external validity, Content validity describes the adequacy with which the specified domain of content is sampled by the items of a scale. Depression scales should mainly refer to features of depression which contribute to its severity. On the other hand, they should refer to syndromes different from depression (e.g. anxiety) to a lesser degree. This condition is crucial if depression scales are to be used to evaluate the antidepressant efficacy of treatments and to discriminate antidepressant drug effects from anxiolytic drug effects. Otherwise, anxiolytic drugs without a true antidepressant effect may be misidentified as antidepressants. Content validity can hardly be assessed by coefficients (NuNNM.L¥, 1972). Indirect evidence for content validity comes from concurrent validity, the extent to which the items and the global score of a scale correlate with other accepted measurements of the severity of depression. The association between scores of a scale, on the one hand, and global expert ratings, on the other, is considered to be the crucial criterion for the validity of a 13
14
WOLI~GANGMAIERet aL
depression scale (I-IAMILTON,1986). The validity of the HAMD is supported in this respect by previous studies (HAMmTON, 1986). This association is the main argument for the validity and the clinical utility of the HAMD. Apart from the HAMD, no severity scale in depression is generally accepted. Thus, globally rated severity of depression is the only way to check the concurrent validity of depression scales. Criteria for external validity are variables linked to the severity of depression by accepted clinical or theoretical concepts (Num~LY, 1972): (1) diagnostic criteria associated with the severity of depression; e.g. normal vs depressed probands, Major Depressive Episode (MDE) with melancholia vs MDE without melancholia (ZnvIMEgMANet al., 1986); (2) psychosocial variables related to the severity of depression; (3) biological variables unambiguously related to the severity of depression according to previous empirical experience; they are distinguished by being clearly external to the psychometric construct of severity of depression. Unfortunately, no biological variable of this kind is available. Investigation of the validity of rating scales requires their independent measurement, including their criteria of validity. Based on this principle, the HAMD, MADRS and BRMS have been compared with respect to content, and concurrent and external validity.
PATIENTS AND METHODS Patients and assessments
One hundred and thirty consecutive, acutely depressed inpafients of the Department of Psychiatry, University of Mainz, aged between 20 and 60 yr, meeting the criteria of MDE (DSM-III) were recruited. They were interviewed and assessed by two raters, both experienced psychiatrists, during the first week after admission in a joint rater interview including the SCID (SPITZER and WnxIaUS, 1984); details of the strategy of the interviews are reported in the accompanying paper (MAmRet al., 1988). The following scales were conducted separately by both participating psychiatrists: HAMD, MADRS, BRMS, the Raskin Depression Scale (RDS) for a gobal rating of the severity of depression (3 items) (Ln'MA~, 1982), the Covi Anxiety Scale (CAS) for a global rating of the severity of anxiety (3 items) (Ln'MAN, 1982) and the Global Assessment Scale (GAS) (SPrrzF_~ et al., 1976) for assessing psychosocial impairment; non-standardized questions were asked if the structured clinical interview did not cover all symptoms and complaints necessary for the rating of these scales. Data analysis Concurrent validity. In samples 1 and 2, Pearson correlations were calculated to investigate
the relationship between scores of the items and sum scores of the scales, on the one hand, and global assessment of severity of depression (RDS) and anxiety (CAS) rated by the corater on the other. The use of Pearson correlations is justified since the HAMD, BRMS, MADRS, RDS and CAS fit the normal distribution in samples 1 and 2 (Kolmogoroff-Smirnow test, P < 0.05). External validity. External construct validity was assessed (1) by correlating the total score of the severity scales with the GAS, measuring psychosocial impairment and (2) by comparison
EVALUATIONOFDEPRESSIONRATINGSCALES--II
15
of global scores of the three scales under study between MDE with and without melancholia. The diagnoses were extracted from the structured interview performed by the co-rater (U-test). The equality of Pearson correlation coefficients between different pairs of scale scores in the same sample was tested by the t-test proposed by DUNN and C-Xam~ (1971). RESULTS Means and standard derivations for the three scales under study are given in the accompanying paper (Mamg et al., 1988). All three scales are highly associated. The Pearson correlations for the global scores are: H A M D (21 items)x H A M D (17 items), 0.94; H A M D (21 items) x MADRS, 0.83; H A M D (21 items) x BRMS, 0.86; H A M D (17 items) x MADRS, 0.85; H A M D (17 items)× BRMS, 0.87; MADRS x BRMS, 0.89.
Concurrent validity The Pearson correlations between globally assessed severity of depression (RDS) and total scores of the scales under study ranged from 0.62 to 0.71 (Tables 1-3). Although the concurrent validity of BRMS and MADRS is superior to that of the HAMD (both 17- and 21-item versions, these differences are not significant (P > 0.55). The validity of a scale may be different in severely and mildly depressed patients. The sample was therefore subdivided into two subsamples by the median of the total score of RDS TABLE l . EXTERNAL VALIDITY OF THE HAMILTON DEPRESSION SCALE: PEARSON CORRELATION BETWEEN H A M D SCORES ASSESSED BY RATER 1 AND THE GLOBAL SCORE OF R D S ASSESSED BY RATER 2
Items 1. Depressed mood 2. Guilt 3. Suicide 4. Insomnia, initial 5. Insomnia, middle 6. Insomnia, early awakening 7. Work and interests 8. Retardation 9. Agitation 10. Anxiety, psychic 11. Anxiety, somatic 12. Gastrointestinal symptoms 13. General somatic symptoms 14. Loss of libido 15. Hypochronriasis 16. Loss of weight 17. Loss of insight 18. Diurnal variations 19. Paranoid symptoms 20. Depersonalization,(derealization 21. Obsessive symptoms
0.65 0.38 0.43 0.20 0.25 0.28 0.38 0.50 0.16 n.s. 0.48 0.38 0.34 0.40 0.45 0.17 n.s. 0.38 0.10 n.s. 0.21 0.21 0.25 0.06 n.s.
Total (items 1-17) 0.65 Total (items 1-21) 0.62 n.s.=not significantly different from zero (P > 0.05).
16
Wo~oA~rO MAmRet al. TABLE2. Ex'rv.~at, v aJaDrrYOFMOI~MERY-ASBERO ~ON SCALE:Pr~,SONCOV,SEt,A'nON~ 'mE MADRS seoP.~sa~ESS~OBYRATER1 ANDTHEt~OBAt SCO~ OF RDS ASSESSEDBYRATm~2 Items 1. Apparent sadness 2. Reported sadness 3. Inner tension 4. Reduced sleep 5. Reduced appetite 6. Concentration difficulties 7. Lassitude 8. Inability to feel 9. Pessimistic thoughts 10. Suicidal thoughts Total (items 1-10)
0.62 0.63 0.41 0.30 0.40 0.39 0.60 0.59 0.50 0.39 0.71
TABLE3. EXT~P.NALVAUDrrYOFT~mBECH-RAFAELSEN MELANCHOLIASCALE:PEARSONCORRELATIONBETWEEN THEBRMS~:OlU~A~F~_~:DBYRATER1 ANDTHEGIOBAL SCOREOF RDS ASStLKSEDBYRATER2 Items 1. Retardation (motor) 2. Retardation (verbal) 3. Retardation (intellectual) 4. Anxiety (psychic) 5. Suicidal impulses 6. Lowered mood 7. Self-depreciation 8. Retardation (emotional) 9. Sleep disturbances 10. Tiredness and pains ii. Work and interests Total (items 1-11)
0.50 0.41 0.37 0.46 0.33 0.53 0.47 0.46 0.29 0.33 0.34 0.70
(submedian, supramedian). The Pearson correlation coefficient between the total score of RDS with the total scores of the three severity scales u n d e r study in the subsample with more severe depression are: H A M D (21 items) 0.46; H A M D (17 items) 0.50; BRMS 0.69; M A D R S 0.68. I n the subsample with less severe depression, the corresponding coefficients are: H A M D (21 items) 0.63; H A M D (17 items) 0.68; BRMS 0.71; M A D R S 0.73. The correlations o f the following items o f the H A M D with the globally assessed severity o f depression are n o t significantly different from zero. H A M D : item No. 21 (obsessive symptoms), item No. 17 (loss of insight), item No. 9 (agitation), item No. 15 (hypochondriasis) (Table 1). All items in the M A D R S a n d the BRMS were significantly correlated with globally assessed severity (Tables 2 a n d 3). The total scores o f the BRMS a n d M A D R S fulfil the condition of correlating more highly with the global severity assessment of depression (RDS) than with that of anxiety (Covi Anxiety Scale) in both samples (P < 0.01) (Table 4). This difference is of limited significance ( P = 0.05) for the H A M D .
EVALUATIONOF DEPRESSIONRATING SCALES--II
17
TABI.E 4. VAUDrrY Or M D , MADRS, BRMS: PEARSON COgR~LATION wrrH TH~ R A W SCORES OF C A S IN C O M P A R m O N w r m PEARSON CORP,m.ATION WrrH THE
RAW SCORESOr arm RDS
HAMD (21 items) HAMD (17 items) MADRS BRMS
RDS
CAS
0.62* 0.65* 0.71t 0.70~
0.43 0.45 0.42 0.42
*Significant differences between both correlation coefficients (P < 0.05). ~'Significant differences between both correlation coefficients (P < 0.01).
TABLE 5. VALIDITY OF HAMD, BRMS AND MADRS ASSESSED BY DICRIMINATING
M~a,~crlou~so~,~mt~crlouAANDP~/POST-~r~T~rr STATUSANDBZCOm~mT~ONwrm O.OBA~ PSYC~OSOCtA-L m P A m ~
MDE with melancholia (N=49) (mean) MDE without melancholia (N=81) (mean) Significance (U-test, t-test) Spearman correlation with GAS
(GAS)
HAMD (21 items)
HAMD (17 items)
BRMS
MADRS
24.0
22.9
18.1
27.7
18.5 P = 0.01 -0.69
17.7 P < 0.01 -0.68
13.5 P < 0.01 -0.75
19.2 P < 0.01 -0.71
External validity All three scales are able to discriminate between MDE with and without melancholia (Table 5). All are strongly associated with degree of psychosocial impairment in all samples (Table 5). DISCUSSION
Concurrent validity All three scales under study are highly associated with the global clinical judgement of the severity of depression performed by an expert rater as well as with the sum scores of the concurrent scales (coefficients higher than 0.60). This result lends a sufficient degree of concurrent validity to the sum score of each scale. In addition, the BRMS has a slightly higher degree of concurrent validity than the other scales; this is especially valid for severe levels of depression. The global score of the HAMD and of the MADRS are validated to a similar extent by this criterion as previously observed by ~ s et al. (1982). Some items of the HAMD show zero or negative correlations with the globally assessed severity of depression (item Nos 9, 14, 17, 21). Hypochondriasis (item No. 14) may fail to be associated with the severity of depression because it is often associated with mild but not with severe depression. The invalidity of the item measuring obsessive-compulsive symptoms (item No. 21) has already been noted by HAU~TON (1986); it may partly be due to the relatively low frequency of the symptom measured by this item. The item measuring "agitation" (item No. 9) also turned out to be invalid; the overlap of this symptom with such features as
18
WO~rGANGMA~IERet aL
irritability, panic or insecurity may explain this result. A significant loss of insight (item No. 17), however, is often found in severe, delusional depression; therefore, the low degree of validity for item No. 17 seems to be implausible. However, severe, non-delusional depression fails to be associated with loss of insight, whereas mild depression is often attributed to exhaustion and therefore counts for item No. 17, these divergent factors may induce the low correlation of item No. 17 with globally assessed severity. The low reliability of item Nos 14 and 17 (MAmR et al., 1988) may also contribute to their missing validity. Removing both from the HAMD can therefore be proposed as a way of enchancing the validity of this scale. Content validity The domain of variables contributing to the severity of depression is not well defined. Consequently, the three scales differ in those included. A minimal requirement is that each scale should cover all core symptoms of depression (depressed mood, psychomotor symptoms, feelings of guilt, feelings of inadequacy, reduced energy, non-reactivity of mood, anhedonia, sleep disturbances, reduced concentration). The original version of the HAMD focuses on behavioural aspects of depression; no item referring to non-reactivity of mood is included, nor are special items available for reduction of concentration or anhedonia. Thus, the lack of items referring to these symptoms in the HAMD reduce the content validity without compensating for this disadvantage by any enhancement of reliability. The MADRS also fails to included all core symptoms, no item refers to retardation. The BRMS covers all core symptoms, demonstrating that it has the highest and the HAMD the lowest a priori content validity. The association of the global score of each scale with severity of depression is higher than with severity of anxiety. This result lends discrinu'nant validity to all three scales. The difference between both associations is more prominent for the BRMS and MADRS than for the HAMD, which may be explained by the large numbers of items referring to anxiety or somatic complaints in the HAMD. It remains an open question whether the difference between both degrees of association for each scale is high enough to avoid a misidentification of anxiolytic response without real antidepressant efficacy as an antidepressant effect. External validity No biological variable clearly related to the severity of depression is available. For some biological variables, such a relationship has been postdated (e.g. the dexamethasone suppression test); but such claims have not been substantiated and biological variables cannot yet be used as criteria of validity. Judgements by the clinician of the global severity of the disorder or indicators of psychosocial dysfunction are the only available criteria for cross-sectional validity of the depression scales. Adapting the principle that validity criteria for assessment systems should be based on longitudinal observation and full information, the validity criteria were assessed by an expert psychiatrist blind to the parallel ratings. In this way, all three scales were found to show a strong relationship to the degree of psychosocial dysfunction. In addition, all scales were able to discriminate between MDE with and without melancholia. This demonstrates the clinical utility of all three scales.
EVALUATIONOF DEPRESSIONRATINGSCALES~II
19
CONCLUSION The d a t a relating to concurrent a n d external validity o f the H A M D , M A D R S a n d B R M S lend s u p p o r t to the validity o f all t h r e e scales. T h e findings o n content validity p o i n t to the superiority o f the M A D R S a n d the B R M S . Together with the d e m o n s t r a t e d superiority o f the B R M S with respect to internal validity a n d sensitivity to change reported in the a c c o m p a n y i n g p a p e r ( M a m a et al., 1988), we suggest that it m a y be unwise to rely on the H A M D alone for assessing the severity o f depression. T h e B R M S might be considered in addition to the H A M D which, up till the present has been the instrument m o s t c o m m o n l y used to j u d g e severity o f depression. Such a strategy can be expected t o guarantee the comparability o f results with previous studies using the H A M D exclusively and to bring a b o u t m o r e valid results in all areas o f research in depression.
REFERENCES BECH, P., KAS~UP, M. and R_~LSEN, O. J. (1986) Mini-compendium of rating scales for anxiety, depression, mania, schizophrenia with corresponding DSM-III syndromes. Acta psychiat, scand. 73, Suppl. 326, 1-39. DtrNN, O. I. and CLARK,V. (1971) Comparisons of tests of the equality of dependent correlation coefficients. J. Am. Statist. Ass. 66, 904-911. HAMILTON,M. (1960) A rating scale for depression. J. NeuroL Neurosurg. Psychiat. 23, 56-62. I-~u~ror~ M. (1967) Developmentof a rating scale for primary depressiveillness.Br. Z Soc. clin. Psychol. 6, 276-296. H ~ T O N , M. (1986) Hamilton Rating Scale for Depression. In Assessment of Depression (Edited by S~roaros, N. and BAN, T.), pp. 143-152. Springer, Berlin. I~a~rs, N. P., CRUtCKSH~m't(,C. A., McG~Ga-~, K., RrLA6, S. A., SHAW,S. P. and SNarrH, R. P. (1982) A comparison of depression rating scales. Br. J. Psychiat. 141, 45-49. Ln,u_~,r, R. S. (1982) Differentiating anxiety and depression in anxiety disorders: use of rating scales. Psychopharmac. Bull. 18, 69-82. MAmR,W., ~ , M., I-~usm%I., Scm.r.om.,S., Buu.~t, R. and WEaT~L,H. (1988) Improving depressionseverity assessment. I. Reliability, internal validity and sensitivityto change of three observer depression scales. J. psychiat. Res. 22, 3-12. Mo~zrc~o~mRYS. A. and .~SBBI~GM. (1979) A new depression scale designed to be sensitiveto change. Br. J. Psychiat. 134, 382-389. NtrNNALLY,S. (1972) Psychometric Theory, 2nd Edn. McGraw-Hill, New York. SPrrz~R R. L. and WmLmus, J. B. W. (1984) Structured Clinicallnterviewfor DSM-III (SCID 5/1/84). Biometric Research Department, New York State Psychiatric Institute, New York. SPrrzER, R. L., ENDICOTr, J. and FL~.ISS,L. (1975) The Global Assessment Scale. A procedure for measuring overall severity of psychiatric disturbances. Archs gen. Psychiat. 33, 766-771. Zna~w_atY~N, M., CORX'1~L,W., PFor~, B. and STANGL,D. (1986) The validity of four definitions of endogenous depression. Archs gen. Psychiat. 43, 234--244.