Improving depression severity assessment—I. Reliability, internal validity and sensitivity to change of three observer depression scales

Improving depression severity assessment—I. Reliability, internal validity and sensitivity to change of three observer depression scales

J. psychiat. Res., Vol. 22, No. 1, pp. 3-12, 1988. Printed in Great Britain. IMPROVING RELIABILITY, CHANGE 0022-3956/88 $3.00+ .00 Pergamon Press pi...

664KB Sizes 0 Downloads 40 Views

J. psychiat. Res., Vol. 22, No. 1, pp. 3-12, 1988. Printed in Great Britain.

IMPROVING RELIABILITY, CHANGE

0022-3956/88 $3.00+ .00 Pergamon Press pie

DEPRESSION INTERNAL

OF THREE

SEVERITY ASSESSMENT--I.

VALIDITY AND SENSITIVITY TO

OBSERVER DEPRESSION

SCALES

WOLFGA.NG MAIER, MICHAEL PI-mXpp, ISABELLA HEUSER, SABINE SCHLEGEL, RAIMUND BULLER AND HERMANN WETZEL Department of Psychiatry, University of Malnz, Untere Zahlbacher Str. 8, D-6500 Malnz, F.R.G.

(Received 15 January 1987; revised 15 July 1987; accepted 16 November 1987) Snmmary--The Hamilton Depression Scale (HAMD) is the most commonly used scale for depression severity assessment and for antidepressant treatment evaluation. Alternative scales have been proposed by Bech and Rafaelsen (BRMS) and by Montgomery and Asberg (MADRS) to try to overcome the shortcomings of HAMD: they are based on different concepts of severity and different scaling procedures. Comparisons with respect to reliability, validity and ability to detect change have been performed using these scales in different samples. The BRMS proved superior. This result makes it necessary to question the usual procedure of testing the efficacy of antidepressants by means of M D alone. Problems in defining the severity of depression and in testing the validity of severity scales are discussed. INTRODUCTION

ASSESSINGthe severity of depression is essential in many areas of psychiatric research, particularly when investigating the antidepressant efficacy of treatment and explaining any change in biological variables. Scales with a high degree of reliability, validity and ability to detect change are essential for these purposes. Several scales are available but a comprehensive evaluation of them has not up to now been available. T h e first observer scale (HAMD) specific for measuring severity of depressive states was developed by HAMILTON(1960, 1967). It has, however, been criticized by MorzrcoM~aY and ASBERG(1979) for being insensitive in assessing change during treatment; in consequence, a new depression scale, designed to be sensitive to change, the Montgomery-Asberg Depression Scale (MADRS), was devised. BEcrI and RAI~AELSEN(1980) have also criticized the HAMD because of its low homogeneity and low transferability between different groups of patients BECH et aL, 1981). With this background, Bech and Rafaelsen developed their modification of the HAMD, the Bech-Rafaelsen Melancholia Scale BRMS (BEcrI et al., 1986). These three scales differ in the following respects. (1) The scope of symptoms to which the items refer; apart from the core symptoms of depression, hypochondriasis, loss of insight and obsessive symptoms are taken into account by the HAMD but not by the BRMS and MADRS; features of anxiety and degree of psychosocial impairment are only relevant to the HAMD and BRMS. (2) The HAMD comprises 24 of 21 or 17 items, the BRMS 11 items, and the MADRS 10 items. (3) The number of levels for the scoring of items varies from 3 to 5 for the HAMD, is 5 for the BRMS and 7 for the MADRS.

4

WOLI~GANGMATERet al.

(4) The HAMD is mainly based on observed behaviour whereas BRMS and MADRS also include items explicitly based on self-report. (5) Precision in description of item content and the scores of the items is maximal in BRMS and lowest in HAMD. (6) The HAMD was developed only for the assessment of severity of depressive disorders whereas the BRMS can also be used for the diagnosis of depression. A number of studies have demonstrated good or at least acceptable reliability and validity of the HAMD (I"IEDLUNGand VmWEG, 1979; CIcc~mTm and PRUSOFF, 1983; REm~ AND O'HARA, 1985); the few studies that deal with the MADRS and the BRMS demonstrate excellent reliability and validity for both (BEcHet al., 1983; KEam~rset al., 1982; MAmR and Prm~P, 1985; DAY,SON et al., 1986). Only selected aspects of validity were addressed in these studies which are mainly orientated towards the cross-sectional assessment of scales. But no comprehensive study comparing reliability and validity of these three depression scales is available; thus, the relevance of the differences between these scales is unclear. The purpose of this paper is to evaluate and compare the three depression scales (HAMD, MADRS and BRMS) with respect to inter-observer reliability, homogeneity (i.e. to check that the items included in the scale are equally related to the same underlying clinical dimension of depression), transferability (i.e. to make sure that the items included in the scale maintain their homogeneity when applied to different groups of patients, such as males and females), and sensitivity to change during treatment. PATIENTS AND METHODS Patients and assessments In two patient samples, the three scales under study and additional ratings of affective states were assessed in different ways. Sample 1. One hundred and thirty consecutive acutely depressed inpatients of the Department of Psychiatry, University of Mainz, aged between 20 and 60 yr, meeting the criteria of Major Depressive Episode (MDE) (DSM-III) were recruited. All were interviewed using a comprehensive structured clinical interview (PI-IILIPP and MAma, 1986) comprising the structured clinical interviews SCID (SPrrZER and WnaxAuS, 1984) as well as elements of the Present State Examination (W~rG et al., 1974), supplemented by additional structured open-ended questions for symptoms relevant for the assessment of the items of scales which are not addressed in PSE or SCID. The patients were interviewed and assessed by two raters during the first week after admission in a joint-rater interview. The participating raters were experienced and trained psychiatrists (at least 15 training sessions); one of the two participating raters interviewed the patients while the other acted as observer. After the structured interview, the scales HAMD, MADRS and BRMS were separately judged by both raters. Sample 2. Forty-eight consecutive acutely depressed inpatients (same hospital), meeting the same inclusion criteria as in sample 1, received either imipramine (150 mg/day) or amitriptyline (150 mg/day) for 3 weeks, after a medication-free period of 2 weeks; drug allocation was random. The following rating scales were applied independently at the beginning and end of antidepressant treatment by two experienced and trained psychiatrists at separate sessions: HAMD; MADRS; BRMS and global rating scales for the severity of depression (Raskin Depression Scale--RDS) and for the severity of anxiety (Covi Anxiety Scale--CAS) (LwuAN,

Ev~uA:no~

oF DEPRESSION RATn~G S C ~ E S - - I <

5

1982). The ratings were based on free clinical interview which had to cover all symptoms included in the scales under study. The two interview sessions took place on the afternoon of the same day; the maximum time period between both sessions was 1 hr. As an index of validity of change in depressive states during the treatment period, a global assessment of severity was used, based on the Raskin Depressive Scale. By this scale, change in response to treatment was measured as the difference between pretreatment and posttreatment scores. Furthermore, a 9-point clinical improvement scale was rated after treatment by the treating psychiatrist: in the 9-point ordinal scale, the score " 4 " means total improvement in several initially severe symptoms, bringing about an impressive amelioration in the overall severity of depression; the score " 2 " means total improvement in only one initially severe symptom or in several initially mild or moderate symptoms, inducing a moderate overall improvement; the score " 0 " means no change in overall severity; the score " - 2 " means mild or moderate worsening in several symptoms present initially or occurrence of one new severe symptom or of several mild or moderate symptoms; the score " - 4 " means occurrence of several new severe symptoms inducing substantial overall worsening; all other scores ("Y', " 1 " , " - 1 " , " - 3 " ) are intermediate categories of the ordinal scale.

Data analysis Inter-rater reliability. Intraclass coefficients are used to measure the inter-observer reliability; this measure was proposed by BaRTKO and CARPENTER (1976) for items of scales and is preferred to other correlation coefficients. Unbiased estimates of intraclass coefficients were calculated by the jackknife method of KRAE~CmR(1980); statistical tests for the equality of intraclass coefficients were also carried out by application of the jackknife method ~ R , 1980). Internal construct validity. Although factor analysis is often used for evaluating construct validity, this method is not appropriate for many reasons (BECH, 1981). Instead, a latent trait analysis (Rasch analysis: RAscn, 1960; Ar~ERSEN, 1972) Was performed in order to investigate simultaneously whether the items of the individual scales are tapping the same one-dimensional continuum (homogeneity) and whether the inter-item relations are independent of the special features of the sample (transferability). For this procedure, sample 1 was subdivided into several pairs of subsamples; criteria for the subdivision were: chance, sex, age, diagnosis, scale scores (continuous variables were dichotomized by the median). For each pair of subsamples, the model-coefficients were estimated in each separately and tested for consistency between the two subsequently. A significant empirical P-value (P < 0.01) for a pair of subsamples indicates insufficient model-fitting, i.e. a violation of the conditions of homogeneity and transferability. Sensitivity to change of severity of depression. Items included in a scale intended to be used for the evaluation of antidepressant treatment should be able to reflect change of severity of depression. In the absence of external criteria for change of severity of depression, global ratings of this factor performed independently of the depression scale ratings by fully informed experts (treating psychiatrists), were used as criteria. Sensitivity to change was assessed using analysis of covariance (ANCOVA) and multiple regression analysis in sample 2. Post-treatment ratings of each item and each scale Of one rater were adjusted for the corresponding pretreatment ratings of the same rater by analysis of covariance (ANCOVA); the residuals (observed minus expected post-treatment scores) are considered as change scores. These

6

WOLFOANGMAmR et al.

regressed scores were correlated with the global assessment of change performed by the treating psychiatrist. Global assessment of change was performed directly by assessing a 9-point Likert scale and indirectly by adjusting the post-treatment ratings of RDS for the pretreatment scores. RESULTS

The mean values and variances of the global scores of each scale (Table 1) are comparable with other studies (BEca et al., 1986). T A B ~ I. ~ ,

S T A N D A R D DEVIATION A N D M E D I A N F O R T H E SCALES U N D E R

S T U D Y IN T H E

DIFFEI~ENT SAMPLES

HAMD (21 items)

HAMD (17 items)

MADRS

BRMS

25.8 7.9 24.9

22.0 7.3 21.3

22.9 11.5 21.9

15.9 7.5 15.1

27.1 10.9 27.2

24.5 10.5 24.1

24.7 9.9 24.6

19.6 9.0 19.8

21.0 11.1 20.6

18.1 11.0 17.8

17.4 13.4 17.1

12.4 8.8 12.2

Sample 1 Mean Standard deviation Median Sample 2 (pretreatment) Mean Standard deviation Median Sample 2 (post-treatment) Mean Standard deviation Median

Inter-rater reliability (Table 2) Items with a degree of reliability not significantly higher than agreement due to chance can only be found in the HAMD; intraclass coefficients of item Nos 15 (hypochondriasis), 17 loss of insight) and 21 (obsessive symptoms) are not significantly different from zero ( P > 0.05) in one or more samples; the intraclass coefficients are (a) for item No. 15- 0.40, 0.10, 0.09, (b) for item No. 17: 0.50, 0.14, 0.40, (c) for item No. 21: 0.19, 0.37, 0.19. Items with only fair or poor degrees of reliability (intraclass coefficient lower than 0.60) TABLE 2. II~rER-RATER~ . ~ U ~ Z Scores

or DEPRESSIONSCALES(INTRACLASSCOEFFICIENTS)

Sample 1

Sample 2 (pretreatment)

Sample 2 (post-treatmen0

HAMD Total score (items 1-17) Total score (items 1-21)

0.70 0.66

0.72 0.70

0.70 0.69

MADRS Total score (items 1-11)

0.73

0.66

0.82

BRMS Total score (items 1-10)

0.71

0.79

0.88

EVALUATIONOF DEPRESSIONRATING SCALES--I

7

can be found in all scales in all samples and under all conditions. These items are the following: (1) HAMD item Nos 5 (middle insomnia), 13 (general somatic symptoms), 15 (hypochondriasis), 17 (loss of insight), 21 (obsessive symptoms); (2) MADRS item Nos 3 (inner tension), 7 (lassitude), 10 (suicidal thoughts); (3) BRMS item No. 3 (retardation-intellectual). All total scores of HAMD, MADRS and BRMS in all samples and under all conditions are at least higher than 0.65 and are therefore good or excellent. Even so there are important differences between the three scales under study. The sum score of the BRMS has, on average, the highest mean reliability. The only significant differences of the intraclass-coefficients in the test proposed by KP,AE~CmR(1980) for tied samples (P < 0.05) are found in sample 2 (pretreatment), where BRMS is superior to MADRS, and in sample 2 (post-treatment), where BRMS is superior to both versions of the HAMD. Taking these results together, BRMS is the most reliable scale and HAMD is the least reliable scale. The reliability of the RDS (global score) is 0.60 and for the CAS (global score) 0.57.

Internal construct validity (Table 3) Table 3 shows the fit to the Rasch model for each manner of partitioning. BRMS shows the best overall fit: no significant deviation from the model is observed. Other scales do not TABLE 3. INTEP~ALCONSTRUCTVALIDITYOF THE HAMD, MADRS AND BRMS (RATEDBY THEINTERVIEWERIN SAMPLE1) ASSESSEDBYC O ~ N D E N C E TO THE R A S C H MODEL:~ I R I C A L P-VALUES FOR THE ~n'ru~o To arm MODEL Scale

HAMD (21 items)

HAMD (17 items)

MADRS

BRMS

Age (partitioned by median)

0.17

0.19

0.09

0.25

Sex

0.57

0.35

0.20

0.12

HAMD (17 items) (partititioned by median)

0.00"

0.01"

0.00"

0.04

MADRS (partitioned by median)

0.01"

0.01"

0.00"

0.08

BRMS (partitioned by median)

0.01"

0.02

0.00"

0.56

RDS (partitioned by median)

0.00"

0.00"

0.00"

0.03

DSM-III (mel0ncholia vs nonmeloncholia)

0.02

0.02

0.06

0.20

Newcastle Scale (endogenous depression vs nonendogenous depression)

0.05

0.03

0.01"

0.04

ICD-9 (affective psychosis vs other diagnoses)

0.01"

0.01"

0.18

0.17

Partition by chance

0.02

0.03

0.25

0.25

Manner of partitioning

*P=0.01 for the Rasch model-fitting test; in this case the dam do not fit the conditions of the Rasch model.

WOLF6ANOMAIE~.et al.

8

TABLE 4 . SENSrlaVlTY TO CHANGE OF THE H A M D : ASSOCIATION OF I ' O S T - T R E A ~ RESPONSE TO TREATMENT ADJUSTED FOR PRETREATMENT STATUS

Direct assessment of response 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21.

Depressed mood Guilt Suicide Insomnia, initial Insomnia, middle Insomnia, early awakening Work and interests Retardation Agitation Anxiety, psychic Anxiety, somatic Gastrointestinal symptoms General somatic symptoms Loss of libido Hypochondriasis Loss of weight Loss of insight Diurnal variations Paranoid symptoms Depersonalization/derealization Obsessive symptoms

Total score (items 1-21) Total score (items 1-17)

SCORES AND

Indirectassessment of response

0.39* 0.19 0.10 0.05 0.50* 0.30t 0.47* 0.51" 0.24t 0.25~ 0.45* 0.29t 0.34* 0.63* 0.22t 0.01 0.15 0.40* 0.10 0.30~ 0.09

0.31t 0.15 0.03 0.15 0.49* 0.39* 0.45* 0.40* 0.29t 0.20 0.35t 0.35t 0.31t 0.50* 0.07 -0.05 0.10 0.35* 0.12 0.31t 0.05

0.65* 0.69*

0.67* 0.71"

*P < 0.01. tP < 0.05 (testing equality to zero). fit the model well enough; the worst model fitting is observed for H A M D (21 items) and for M A D R S , showing clear deviations f r o m the model for at least two manners o f partitioning. The best overall fit to the Rasch model is observed for the BRMS (Fable 3). Items with significant nonfitting (P < 0.01) to this model for at least two manners o f partitioning are: H A M D item Nos 3, 9, 11, 13, 15, 17 (17-item version); M A D R S item Nos 2, 3, 7, 10; BRMS item Nos 5, 10.

Sensitivity to change o f severity o f depression (Tables 4-6) The change scores o f the M A D R S show the lowest correlation with those o f the global assessment o f change; this finding refers to indirect as well as direct global assessment o f change. The BRMS is clearly superior to M A D R S and H A M D (21 items) in this respect. The difference between the correlations o f the regressed BRMS score with the direct global assessment o f treatment response (Table 6) and the corresponding correlation for the regressed M A D R S score (Table 5) is significant (P < 0.05) (Dtrr~ and ~ , 1971). The corresponding differences f o r the indirect assessment o f change are in the same direction; however, no differences between correlation coefficients based on indirect global assessment o f change are significantly different f r o m zero (P > 0.05) . The particular items in each scale can also be evaluated with respect to their sensitivity to change o f depression severity. Again, regressed scores are used for the assessment o f change.

EVALUATION OF DEPRESSION RATING

SCAI~S--I

TABLE 5. SENSlTIVrrY TO CHANC~ O1: THE M A D R S : ASSOCIATIONOF POST-TREATMENT SCORES AND RESPONSE TO T R E A ~ ADJUSTED FOR PRETREATMENT STATUS

Multiple regression analysis standard coefficients Direct assessment Indirect assessment of response of response 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Apparent sadness Reported sadness Inner tension Reduced sleep Reduced appetite Concentration difficulties Lassitude Inability to feel Pessimistic thoughts Suicidal thoughts

Total score

0.45* 0.20 0.29 0.46* 0.50* 0.39* 0.35t 0.19 0.30t 0.19

0.38* 0.21t 0.26t 0.35t 0.53* 0.36t 0.42* 0.16 0.25t 0.13

0.63*

0.61"

*P < 0.01. t P < 0.05 (testing equality to zero).

TABLE 6. SENSITIVITY TO CI:L4,ANGEOF THE B R M S : ASSOCIATIONOr POST-TREATMENT SCORES RESPONSE TO TREATMENT ADJUSTED YON PKETREATM~NT STATUS

Multiple regression analysis standard coefficients Direct assessment Indirect assessment of response of response 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.

Retardation (motor) Retardation (verbal) Retardation (intellectual) Anxiety (psychic) Suicidal impulses Lowered mood Self-depreciation Retardation (emotional) Sleep disturbances Tiredness and pains Work and interests

Total score

0.39* 0.21 0.45* 0.38* 0.18 0.30* 0.43* 0.20 0.39* 0.38~ 0.40*

0.29# 0.20 0.30# 0.29~ 0.10 0.20 0.40* 0.24t 0.34t 0.25t 0.28t

0.77*

0.70*

*P < 0.01. t/~ < 0.05 (testing equality to zero).

Insufficient sensitivity to change of depression severity can be observed for the following items. (1) HAMD: loss of weight (No. 16), suicide (No. 3), initial insomnia (No. 4), loss of insight (No. 17), paranoid symptoms (No. 19), obsessive symptoms (No. 21), guilt (No. 2). (2) MADRS: suicidal thoughts (No. 10), inability to feel (No. 8). (3) BRMS: suicidal impulses

(No. 5).

10

WOLFGAI,IGMAmRet al. DISCUSSION

Reliability

The global scores of the BRMS and the HAMD (17 items) show a reliability greater than 0.70 in the three samples, which is considered to be sufficient (Num~ALLY, 1972). The global scores of the HAMD (21 items) and MADRS do not satisfy this requirement for at least one sample. The observed reliabilities of the HAMD are comparable to some other studies (CICCI-mTTIand PRU$OFF, 1983; RErlM and O'HARA, 1985). The reliability coefficients for the sum score of the BRMS and the MADRS are lower than in other studies (MONTGOMERYand ASBERG, 1979; ~ S et al., 1982; BECrI et al., 1983; DAVlDSONet al., 1986). The different reliabilities for the three scales may partly be brought about by differences in length of the scales and differences in mean number of categories in each scale (5 for the BRMS, 7 for the MADRS, between 3 and 5 for the HAMD). The reason for the relatively low reliability of the MADRS may be that the number of categories for each item should be less than or equal to 5, for a higher number cannot be handled by the system of perception and judgement of most human subjects (MI~Eg, 1956). Supplementing a scale by new homogeneous items enhances its reliability (NUNNALLY, 1972); this psychometric argument apparently favors the HAMD. However, the homogeneity of the HAMD items is rather low and therefore it is unclear if this theory works. Another reason for the superiority of the BRMS lies outside these psychometric considerations: the definition of the items has the highest degree of precision. A comparison of the magnitude of the coefficients of reliability allows conclusions to be drawn relating to the dependency of the reliability on different settings of the ratings: a priori, the setting in sample 1 should induce enhanced reliability because this property is evaluated in a joint-rater setting and the assessment of the scales is preceded by an extensive structured interview. In spite of these differences in settings, differences in the reliability of sum scores are small. This result argues for the robustness of the reliability of the scales relative to different settings of the assessment. Only the HAMD includes items which are unacceptable for reasons of reliability (i.e. intraclass coefficient not significantly different from zero): item No. 21 (obsessive-compulsive symptoms), item No. 14 (hypochondriacal complaints), and item No. 17 (loss of insight). The lack of reliability of item Nos 14 and 17 has been observed in other studies too (CICCn~TTI and PRUSO~, 1983; R~rIM and O'HARA, 1985) and argues in favour of omitting or redefining these items. Internal construct validity

The three scales under study are usually evaluated by summing up the scores of the items and calculating in this way a total score for each scale; the ability to obtain this summation, therefore, requires a certain degree of homogeneity. Otherwise, collecting the information in more than one global score would be more appropriate. Latent trait analysis shows a superiority of the BRMS with respect to internal construct validity, reflecting a high homogeneity of this scale which focuses on core symptoms of depression. The Rasch model fitting of the BRMS also demonstrates the transferability of this scale across different samples of depressed patients. The HAMD and MADRS do not fit the requirements of internal validity. The observed lack of internal construct validity of the HAMD is in accordance with previous studies (BEcH et al., 1981; MAmR and PItILWP, 1985). All three scales can be improved by exclusion of some items.

EVALUATION OF DEPRESSION RATING SCALWS--I

11

Sensitivity to change o f the severity o f depression Evaluation of validity of the assessment of change is complicated by lack of biological indicators of course or of treatment response. Non-biological course-related variables as psychosocial parameters or the global clinical judgement based on all available data and on longitudinal observation can be used as criteria for this type of validity. The total scores in the BRMS show the best validity with regard to sensitivity to change. This result does not depend on the method of assessment of change in the depression severity (direct or indirect). It has been argued that a scale used for the assessment of change should only include items which are sensitive to change in the same direction (MoNTCO~mRYand ASBERC, 1979). The change of each item must reflect the change in severity of depression. Insufficient sensitivity to this factor indicates that the change of severity of an item is not going together with the change of severity of depression; insufficient sensitivity to change in this case does not, however, preclude improvement during antidepressant treatment. Taking this criterion as a yardstick, some items must be considered as invalid; 8 of 21 items in the HAMD, 2 of 10 items in the MADRS and 1 of 11 items in the BRMS are not sensitive to change (no significant positive relationship to the global assessment of change). Therefore, the BRMS has the highest sensitivity to change of severity of depression. NELSONet al. (1984) have evaluated the items of the HAMD in a similar manner, using the data of a sample following a 3-week treatment period. Item Nos 17 (loss of insight), 16 (loss of weight) and 3 (suicidal ideation) have proved to be insensitive to change according to the data of N~LSONet al. (1984) as well as to the data presented here. At least 8 items are at variance between both studies. This rather high degree of variation between studies can be explained by the rather low reliability of the HAMD items. CONCLUSION

Thus, the main findings of this study are as follows. (1) The total scores of the three scales under scrutiny have a sufficient degree of reliability. (2) The requirements of internal construct validity are satisfied by the BRMS but not by the HAMD and the MADRS. (3) The BRMS is superior to the HAMD and the MADRS with respect to sensitivity to change. The assessment of severity of depression, up to now almost exclusively based on the HAMD, can be improved, either by using the BRMS instead of, or in addition to, the HAMD. This will be especially useful for the evaluation of antidepressant treatment. Real differences in efficacy of different treatments will have a greater chance of being detected. The superiority of the BRMS is probably due to several factors. (1) An adequate domain of symptoms included in the list of items; all core symptoms of depression are represented as items. This list is supplemented by some symptoms which may reduce the validity: psychic anxiety and somatic pain. (2) The high degree of precision in defining each item and each item score. All three scales might be improved by using shorter lists of items and by revising definitions for some items. It is beyond the scope of this study to present and to evaluate new versions of the scales. The results imply, however, that all scales might have their validity enhanced by omitting the item referring to suicidal ideation. In addition, the HAMD might profit by omitting or reformulating item 15 (hypochondriasis) and item 17 (loss of insight) as is recommended by other authors too; the short version with 17 items is not inferior to the longer

12

WOLrG~6 M~aER et aL

v e r s i o n w i t h 21 items in reliability a n d validity, d e m o n s t r a t i n g t h a t i t e m N o s 18-21 d o n o t a d d relevant i n f o r m a t i o n to t h e 17 i t e m version. C o m p a r a t i v e analysis o f t h e three m a j o r scales f o r assessment o f t h e severity o f d e p r e s s i o n has detected significant differences b e t w e e n t h e m . This finding s h o u l d stimulate f u r t h e r w o r k in this field, especially f o r i m p r o v i n g sensitivity t o c h a n g e w h i c h is crucial f o r t h e e v a l u a t i o n of antidepressant treatment.

REFERENCES ArroE~s~, E. B. (1972) Conditional Inferences and Models for Measuring. Mentalhygiejnisk, Copenhagen. BARaXO,J. J. and Cav.v~rnm, W. T. (1976) On the methods and theory of reliability. J. nerv. ment. D/s. 163, 307-317. BECH, P. (1981) Rating scales for affective disorders: their validity and consistency. Acta psychiat, scand. 64 Suppl. 295, 1-101. BECH, P., A ~ t r v , P., GRAM,L., REISBY,N., ROSENBERC,R., JACOBSEN,D. and NAGY,A. (1981) The Hamilton Rating Scale: evaluation of objectivity using logistic models. Acta psychiat, scand. 63, 290-299. BETH, P., Grmu~Js, A., ANDE~Sm,r, J., BOJHO~, S., KRAm', P., BOVCHG,T. G., K~TRrJP, M., ~ , L. and RArAm.SEN, O. J. (1983) The Melancholia and the Newcastle Scales. Br. J. Psychiat. 143, 58-63. B~¢H, P., KASa'ROV,M. and RAV~tS~, O. J. (1981) Mini-compendium of rating scales for anxiety, depression, mania, schizophrenia with corresponding DSM-III syndromes. Acta psychiat, scand. 73, Suppl. 326, 1-39. BV.CH, P. and R~Am~S~N, O. J. (1980) The use of rating scales exemplified by a comparison of the Hamilton and the Bech-Rafaelsen melancholia scale. Acta psychiat, scand. 62 Suppl. 285, 128-131. C/¢crmrri, D. V. and PRUSOrr, B. A. (1983) Reliability of depression and associated clinical symptoms. Archs gen. Psychiat. 40, 987-990. DAVIDSO~, J., Tum,mtrix, C. D., Sa'~JCK_L~, R., Mn.T.~R, R. and Gv~v-~s, K. (1986) The Montgomery-Asberg Depression Scale: reliability and validity. Acta psychiat, scand. 73, 544--548. I)te~, O. J. and CL~a~, V. (1971) Comparisons of tests of the equality of dependent correlation coefficients. J. Am. Statist. Ass. 66, 904-911. HAmLTON, M. (1960) A rating scale for depression. J. NeuroL Neurosurg. Psychiat. 23, 56-62. HAU~TON,M. (1967) Development of a rating scale for primary depressive illness. Br. J. Soc. clin. Psychol. 6, 276--296. ~ L t n , ro, I. L. and Vmw~G, B. W. (1979) The Hamilton rating scale for depression. A comprehensive review. J. operat. Psychiat. 10, 149-165. Kva~v~s,N. P., C R y , C. A., McGtnG~n, K., Rm~G, S. A., SI~w, S. P. and SNArrH,R. P. (1982) A comparison of depression rating scales. Br. J. Psychiat. 141, 45-49. Igx~mR, H. C. (1980) Extensions of the kappa coefficient. Biometrics 36, 207-217. L t v ~ , R. S, (1982) Differentiating anxiety and depression in anxiety disorders: use of rating scales. Psychopharmac. Bull. 18, 69--82. MAmR, W. and Pnmivp, M. (1985) Comparative analysis of observex depression scales.Actapsychiat. Scand. 72, 239-245. MJUJ~R, G. A. (1956) The magical number seven, plus and minus two. PsychoL Rev. 63, 81-97. MGYrC,oMBRY,S. A. and ASBER~,M. (1979) A new depression scale designed to be sensitive to change. Br. J. Psychiat. 134, 382-389. N~soN, J. C., MAZU~, C., ~ , D. M. and ZAaT.ow, P. I. (1984) Drug-responsive symptoms in melancholia. Archs gen. Psychiat. 41, 663-668. Ntr~v, S. (1972). Psychometric Theory, 2nd Edn. McGraw-Hill, New York. PnmivP, M. and MAI~, W. (1986) The polydiagnostic interview. A structured interview for polydiagnostic classification of psychiatric patients. Psychopathology 19, 175-185. 1L~SCH,G. (1960) Probabil~ic Model~ for some Intelligence and Attainment Tests. Institute of Mathematics and Statistics, University of Copenhagen. IL~nM, L. and O ' ~ , M. W. (1985) Item characteristics of the Hamilton rating scale for depression. Z psychiat Res. 19, 31-42. SPrrz~, R. L. and Wnaaaus, J. B. W. (1984) Structured Clinical Interview f o r DSM-IlI (SCID 5/1/84). Biometric Research Department, New York State Psychiatric Institute. WXNG,J. K., Coovm~, J. E. and S~mrovawos,N. Description and Classification o f Psychiatric Symptoms. Cambridge University Press, London.