15 Reliability and Validity of the DSD for Adolescent Depression

15 Reliability and Validity of the DSD for Adolescent Depression

15 Reliability and Validity of the DSD for Adolescent Depression Yuriko Doi National Institute of Public Health, Japan H. Morita Tokyo Metropolitan G...

642KB Sizes 5 Downloads 85 Views

15 Reliability and Validity of the DSD for Adolescent Depression Yuriko Doi National Institute of Public Health, Japan

H. Morita Tokyo Metropolitan Guidance Center J. Noda National Institute of Public Health, Japan T. Tango National Institute of Public Health, Japan R. E. Roberts University of Texas K. Takeuchi Gunma Prefectural College for Health Sciences

Yuriko Doi, J. Noda, and T. Tango 9Department of Epidemiology, National Institute of Public Health, 6-1, Shirokanedai 4 chome, Minato-ku, Tokyo 108, Japan. H. Morita 9 Tokyo Metropolitan Guidance Center, Tokyo, Japan. R.E. Roberts ~University of Texas Health Science Center, Houston, Texas, USA. K. Takeuchio Gunma Prefectural College for Health Science, Japan.

International Perspectives on Child and Adolescent Mental Health. Volume I Proceedings of the First International Conference, edited by N. N. Singh, J. P. Leung, and A. N. Singh. 9 2000 Elsevier Science Ltd. All rights reserved.

266 Adolescent mental health and behavioral problems in Japan are now receiving attention. School refusals and school violence reached record highs in 1996, and adolescent suicides caused by being bullied and bullying-related crimes are serious problems (e.g., Management and Coordination Agency, 1997). However, little is known about the prevalence of mental health problems in Japanese adolescents and how seriously the adolescents are impaired both psychologically and socially. Accurate estimation of the prevalence of adolescent mental disorders in the general population is essential for the development of adequate treatment plans and intervention programs. Community-based or school-based epidemiological research on adolescent mental health is, therefore, a pressing public health issue in Japan. Previous studies suggest that adolescent depression leads to maladjustment in social life (e.g., Kandel & Davies, 1986; Kovacs & Goldston, 1991; Puig-Antich, 1985) and increases the risk of future episodes in later life (e.g., Harrington et al., 1990). Studies indicate that the preponderance of depression in women is established by the age of 15 and persists until 55 (e.g., Angold & Worthman, 1993, Burke et al., 1990, Kessler et al., 1994). Girls have a higher prevalence of depression based on the Diagnostic and Statistical Manual (DSM)-III-R criteria in middle school (Roberts et al, 1997) and high school (Lewinsohn et al., 1993). Morita et al. (1993) reported that girls in Japanese junior high school are more likely to have emotional disorders than boys. Takeuchi et al. (1994) also reported that Anglo, Hispanic and Japanese girls express depressive feelings more than boys. All of this evidence suggests that adolescent girls are at high risk of depression and should be epidemiologically studied for prevalence, incidence, risk factors and future consequences. A reliable and valid diagnostic instrument is needed for epidemiological research on adolescent depression in Japan. The purpose of this study was to examine the reliability and validity of the Japanese version of the DSM Scale for Depression (DSD), a self-administered questionnaire for adolescent depression based on the DSM-III-R criteria, for its use in the general population.

METHOD

Measures

Two instruments were used: the DSD (Roberts et al., 1997) in stage 1 and the Structured Clinical Interview for DSM-III-R (SCID) for non-patients (Takahashi et al., 1992) modified for this

267 study in stage 2. The DSD is a self-administered questionnaire developed from the National Institute of Mental Health Diagnostic Interview Schedule for Children version 2.3 (DISC-2.3). To prepare the Japanese version (Doi, 1995), the DSD was translated from English into Japanese, and then back-translated into English. We compared the original and back-translated versions, and then prepared a preliminary Japanese version. This was pre-tested for words, meanings, and content of each item on junior high school students who were not involved in this study. The original English version has 31 items, but the Japanese version has 27. Four items coveting suicidal plans and attempts were excluded from the Japanese version, but 3 on thoughts of death and suicidal ideation remain. Because no Japanese version of structured or semi-structured interview schedules for children or adolescents was available, we selected the SCID. We used the part on depression and modified the wording to be suitable for adolescents. We changed the item order to facilitate rapport between interviewee and interviewer: appetite and weight, sleep, and fatigue replaced depressed mood and anhedonia as the first. We considered threshold values to be an episode lasting at least 2 weeks, a change in weight of at least 2 kg in a month, or an episode of self-injury at the beginning of the 2nd semester when the study was held. Both the self-administered questionnaire and the interview asked questions on daily impairment at school, at home or with peers.

Case Ascertainment

Case ascertainment for depression was based on the DSM-III-R Criteria for major depressive episode (Takahashi & Hanada, 1992). Each item in the DSD corresponds to any of 9 symptoms: depressed mood, anhedonia, change in appetite or weight, hypersomnia or insomnia, motor agitation or retardation, loss of energy or fatigue, poor concentration or indecisiveness, feelings of guilt or worthlessness, and thoughts of death or suicidal ideation. If a subject marked "almost everyday in the past 2 weeks" for at least 5 symptoms including depressed mood or anhedonia, or if an interviewee responded with at least 5 symptoms on the SCID above the threshold, she was defined as depressed. Cases were further categorized as with or without impairment. A computer diagnostic algorithm was used for case ascertainment. Interviewers could also ask additional questions, if needed, and then made a clinical assessment (CA) on depression as being not a case, being probably not a case, being probably a case or being a case.

268

Subjects and Data Collection Subjects were 389 girls aged 12 to 15 years attending a private junior high school for girls in a residential area of the Tokyo Metropolitan Area. A few weeks after the beginning of the 2nd semester in 1996, a questionnaire on Life and Health for Adolescents including the DSD was administered to 382 girls in the classroom and monitored by classroom teachers. Seven girls were absent. About 2 weeks later, the same questionnaire was administered to them to assess test-retest reliability. Of 372 girls participating in the 1st stage of the survey, 31 were identified as depressed based on the DSM-III-R criteria using the DSD. Fifty-eight girls were randomly selected as non-depressed from 341 girls whose reports did not meet the DSM criteria for depression. To test the validity in both groups, 1 male child psychiatrist and 1 female psychiatrist independently and blindly interviewed 45 and 44 girls, respectively, using the modified SCID. Both interviewers had about 15 years' clinical and research experience. After completing the structured interview, they asked further questions for clinical assessments if needed. They also interviewed the girls who were not selected for a validity study, asking about their general mental state and social functioning. All interviews were tape-recorded for assessing an inter-rater reliability of the modified SCID.

Statistical Analyses The internal consistency reliability of the DSD was assessed by calculating Cronbach's alpha coefficient (Cronbach, 1951). An acceptable level of alpha coefficient is 0.70 (Nunnally, 1978). Kappa coefficients were used to determine the test-retest reliability, inter-rater reliability and validity (Goldstein & Simpson, 1995; Landis & Koch, 1977; Shrout, 1995; Streiner & Norman, 1995). With consideration of the proposal by Landis and Koch (1977), the criteria for kappa values were applied: almost perfect, kappa=0.81-1.00; substantial, kappa=0.61-0.80; moderate, kappa-0.41-0.60; fair, kappa=0.21-0.40; poor, kappa=0.20 or lower. SPSS for windows version 8.0 was used for the analyses.

RESULTS

Of 89 subjects selected in the 1st stage, 85 responded to the DSD at the retest (95.5 percent) and all 89 were interviewed in the 2nd stage.

269 Cronbach' s alphas were 0.95 for the DSD at the initial test, 0.97 for the DSD at the retest and 0.80 for the SCID at the interview, all of which were higher than acceptable. For the inter-rater reliability study, each psychiatrist independently evaluated the tape-recorded interviews conducted by the other. The kappa coefficient of inter-rater reliability was 0.92 for 78 interviews whose recording quality was sufficiently good, which was almost perfect. Table 1 shows the degree to which the DSD produced systematic or reproducible variation on 2 occasions for an overall case ascertainment of depression as well as each depressive symptom based on the DSM-III-R criteria. Except for anhedonia (kappa=0.07) and motor agitation or retardation (kappa=0.12), the kappa estimates varied from 0.24 to 0.37 (fair). Depressed mood (kappa-0.37), thoughts of death or suicidal ideation (kappa=0.36) and poor concentration or indecisiveness (kappa=0.35) had relatively higher kappa coefficients compared with those of the other symptoms.

Table 1.

Reliability and Validity for Measurement Study. Reliability

Depression based on DSM criteria Overall

a

Validity b

(n=85) 0.24

(n-89) 0.15

(1) depressed mood

0.37

0.25

(2) anhedonia

0.07

0.11

(3) feel guilty or worthless

0.27

0.18

(4) thoughts of death or suicidal ideation

0.36

0.15

(5) insomnia or hypersomnia

0.29

-0.02

(6) gain or loss of weight/appetite

0.24

0.12

(7) feel fatigue

0.25

0.16

(8) poor concentration or indecisiveness

0.35

-0.01

(9) motor agitation or retardation

0.12

0.03

Episode

a Kappa coefficient for test-retest reliability of the DSD. b Kappa coefficient for validity between the DSD and the modified SCID DSD: the Diagnostic and Statistical / Manual Scale for Depression SCID: the Structured Clinical Interview for DSM-III-R.

270 Table 1 also shows the degree to which the DSD correlated with the modified SCID as an external measurement of criterion validity. Considering caseness of depression by these 2 instruments, the kappa coefficient was 0.16. Kappa coefficients of core symptoms were 0.25 for depressed mood and 0.11 for anhedonia. Inappropriate feelings of guilt, thoughts of death or suicidal ideation, fatigue, and changes in appetite or body weight had higher kappa coefficients than motor agitation or retardation, inability to concentrate or indecisiveness, and insomnia or hypersomnia. Negative kappa values of the latter 2 episodes indicate that agreement between the 2 instruments was less than expected by chance.

Table 2.

Diagnostic Agreement Between DSD and SCID, DSD and CA, and SCID and CA.

Number of cases (n=89)

DSD+/SCID

Impairment

kappa

DSD+/SCID-

DSD-/SCID+

DSD-/SCID-

5 5

26 13

2 2

56 69

+

0.15 0.32

DSD+/CA+ 11 9

DSD+/CA20 9

DSD-/CA+ 11 13

DSD-/CA47 58

+

0.18 0.29

SCID+/CA+ 6

SCID+/CA1

SCID-/CA+ 16

SCID-/CA66

+

0.33

+

The modified SCID and CA were conducted on the same day, otherwise the time intervals were about three to four weeks between the DSD and the modified SCID or the DSD and CA. DSD: the Diagnostic and Statistical Manual Scale for Depression SCID: the Structured Clinical Interview for DSM-III-R CA: Clinical Assessment * A computer diagnostic algorithm is applied for case ascertainmentusing either the DSD or the modified SCID.

Table 2 shows kappa coefficients of diagnostic agreement between the DSD and the modified SCID, the DSD and CA, and the modified SCID and CA. The kappa coefficients were enhanced by a decrease in disagreement (DSD+/SCID-or DSD+/CA-) and an increase in agreement (DSD-/SCID-

271 or DSD-/CA-) after consideration of daily impairment. The relatively high proportions of disagreement between algorithm-generated diagnosis and clinician-generated diagnosis (DSD+/CAor SCID-/CA+) were characteristic, compared with the disagreement between algorithm-generated diagnoses (DSD-/SCID+).

DISCUSSION

The tentative prevalence rates of depression in our study were 8.6 percent without and 2.2 percent with impairment. These results are comparable to those in other studies of adolescent depression based on the DSM-III-R criteria using structured diagnostic interviews (Boyle et al., 1993; Roberts et al., 1995, 1997; Shaffer et al., 1996). The similarities in the findings of prevalence across the studies suggest that the Japanese version of DSD could be used as a diagnostic tool in a community-based epidemiological survey of adolescent depression in Japan. The appropriateness of this measurement instrument for epidemiological purposes requires us to examine its reliability and validity in the Japanese community sample. Previous studies revealed that kappa coefficients for test-retest reliability for depression varied from 0.36 to 0.63 in psychiatric children and adolescents as determined by using the DISC and algorithm-generated diagnoses (Hodges, 1993). Schwab-Stone et al. (1993) showed a much higher kappa value (0.77) for test-retest reliability with a revised version of DISC (DISC-R) in a clinical sample of 74 psychiatric patients aged 11 to 17 years. However, few measurement studies with diagnostic instruments have been done with community samples of children and adolescents. In a recent study of test-retest reliability ofDSIC-2.1, Jensen et al. (1995) investigated 375 subjects aged 9 to 17 years from both clinical and community samples at time intervals of 5.8 to 33.3 days. They found kappa coefficients of depression or dysthymia of 0.38 for clinical cases and 0.29 for community cases. The overall test-retest reliability of our study (kppa=0.24) approximates their finding in the community sample, although there are differences in age and gender in the samples used for these two studies. Previous validity studies have shown that diagnostic agreement between the DISC and other diagnostic instruments was generally low. Weinstein et al. (1989) compared the DISC algorithmgenerated diagnosis of major depression with clinicians' DSM-III-R diagnosis within 2 weeks in 162 psychiatric inpatients aged 12 to 16 years and reported a kappa value of 0.17. Cohen et al. (1987)

272 examined 109 children aged 9 to 12 years from an epidemiological sample using DISC and K-SADS computer diagnoses at an interval of 3 to 4 months. They found very low kappa values: 0.08 for KSADS possible with DISC possible, 0.10 for K-SADS possible with DISC probable and 0.00 for KSADS definite with DISC definite. Piacentini et al. (1993) tested the concurrent criterion validity of the DISC-R with clinical assessment of major depression in 74 psychiatric inpatients aged 11 to 17 and indicated that kappa value was 0.39. In the study by Schwab-Stone et al. (1996), 274 youths aged 9 to 18 years were interviewed by lay interviewers and clinicians using the DISC-2.3. Kappa values were 0.27 between lay DISC diagnosis and diagnosis from clinician symptom ratings, and 0.79 between clinician-administered DISC diagnosis and diagnosis from clinician symptom ratings. In general, our findings of test-retest reliability and diagnostic validity are consistent with or slightly inferior to those reported in previous studies (Cohen et al., 1987i Jensen et al., 1995; Schwab-Stone et al., 1996), although internal consistency and inter-rater reliability are acceptable as prerequisites for further measurement studies. Our current findings should be evaluated first in the context of the study design used for assessments. As Cohen et al. (1987) described, study design factors are likely to influence the magnitude of agreement. The first determinant of the expected size of agreement is the nature of the population being assessed. As has been pointed out (Robins, 1985), the identified cases in an epidemiological sample will tend to be at the lower end of symptom severity, and thus will be harder to distinguish from noncases. Because our main purpose was to examine the utility of the DSD in a community sample, concordance was naturally lower. The second factor is the interval between test and retest or test and interview. The second measurement is often affected by the respondent's recall, the respondent's real biological changes, and changes in the respondent's perception of the symptoms or other problems (Cohen et al., 1987; Shrout, 1995; Simon & Vonkorff, 1995). The longer the interval, the more likely that the information provided will be discrepant. In our study, the intervals are about 2 weeks for test-retest reliability, about 4 weeks for diagnostic validity of the DSD with the SCID or CA, and 0 for concurrent validity of the SCID with CA. It is hard to judge whether a 2-week interval is appropriate for test-retest reliability in adolescent depression. As Rintelmann et al. (1996) show in their randomized clinical trial, depressive symptoms fluctuate a lot, even among child and adolescent outpatients. Jensen et al.

273 (1995) also comment that internalizing disorders are more subjective, transient, and prone to recall difficulties. Therefore, the test-retest interval issue should be further examined in a future study, particularly for child and adolescent depression. Practical strategies for improving test-retest reliability that we should take in a school setting is to help teachers and students understand the meaning of taking the same tests at different times and to enhance their motivation for participating in the study. We should have focused more on thus, but we were cautious not to influence the students' responses with special instructions. For the validity study, the kappa value for concurrent validity is about twice as high as those for diagnostic validity with a 2-week interval, even though the instruments are different: a selfadministered questionnaire (DSD), a structured interview (SCID), and clinical assessments (CA). This evidence implies that the DSD would have been more valid if it were used with other diagnostic instruments on the same day. The third factor is the instrument type and diagnostic method used for a validity study. Algorithm-generated diagnosis based on DSM-III-R criteria for depression is used for case ascertainment with either the DSD or the SCID. However, wording in some items is different between the DSD and the SCID; this may explain discrepancies between the measurements. For example, the DSD asks whether insomnia or hypersomnia lasted almost everyday for the past 2 weeks, whereas the SCID asks about sleep as being good, fair, poor, or very poor (kappa=-0.02, although a kappa value for reliability is 0.29). As another example, the SCID asks about not only suicidal ideation or thoughts of death but also suicidal plans or self-injury, which are not asked in the DSD (kappa=0.15, although a kappa value for reliability is 0.36). In addition to the study design factors mentioned above, Japanese adolescents' understanding of the meaning or content of some items in the DSD should be reexamined; the items on anhedonia and motor agitation or retardation have particularly low kappa values. The Japanese version of DSD could be revised, after qualitative analysis of the wording of the instrument itself considering adolescent cognitive development. As has been previously pointed out (Jensen et al., 1995; Rice et al., 1992), we can also identify from our measurement study that the low reliability and validity of our findings in a school setting are partly due to the presence of subjects near the threshold of diagnosis. From the higher proportion of negative algorithm-generated diagnoses and positive clinical assessments (Table 2), we assume that clinicians are more likely to detect and diagnose subthreshold subjects as cases. Ten

274 of 11 (DSD-/CA+) were the girls who did not completely meet the DSM-III-R criteria for depression but had some of 9 criteria lasting at least 2 weeks. Through personal communication with clinicians, we know that they evaluated a subject as a case in interviewing if she had a serious depressive episode, even if she did not perfectly meet the DSM-III-criteria; for example, self-injury but only a few symptoms lasting at least 2 weeks. This threshold subject issue is critical in an epidemiological sample. First, even with improvement of study design, we will not be able to obtain satisfactory results in reliability and validity studies using conventional kappa statistics, if this issue is not resolved. To solve this issue, Rice et al. (1992) propose a method for using the stability of diagnosis to model the relationship between clinical covariates (e.g. number of symptoms or episodes, presence of attempted suicide, hospitalization) and the probability of being a true case. Rintelmann et al. (1996) suggest that the best predictors for symptom severity over time are age, social functioning, family history of an affective disorder and a change in scores. A more reliable and valid multiple systematic assessment such as a combination of the DSD with other predicting covariates is probably needed for a community sample of adolescents. Second, knowledge of threshold subjects is important for prevention and early intervention as well as for studies on risk factors and etiology. There is evidence presented in previous studies (Klerman et al., 1992) that people who have depressive episodes but not a clinical diagnosis have a poor prognosis in many cases, most of the recovery is in the first 6 months and is related to the severity of the initial symptoms, the first onset of major depression is often in adolescence and young adulthood and subclinical symptoms predict this first onset, and depressive symptoms impair state and brief intervention may interrupt a future illness for the mildly ill.. Therefore, more interest should be paid to adolescent threshold subjects for depression in a community who are less likely to be referred for care. In summary, the test-retest reliability and diagnostic validity of the DSD from our school survey are comparable or slightly inferior to data reported from other studies. Considering the characteristics of study design in a community sample, these findings should not be dismissed as too unreliable and invalid for the technique's use in epidemiological studies, but rather encourage us to elaborate and develop the instrument and assessment, and further measurement studies.

275 REFERENCES

Angold, A. & Worthman, C.W. (1993). Puberty onset of gender differences in rates of depression: a developmental, epidemiologic and neuroendocrine perspective. Journal of Affective

Disorders, 29, 145-158. Boyle, M.H., Offord, D.R., Racine, Y., Sanford, M., Szatmari, P., Flemming, J.E., & Price-Munn, N. (1993). Evaluation of the Diagnostic Interview for Children and Adolescents for use in general population samples. Journal of Abnormal Child Psychology, 21,663-681. Burke, K.C., Burke, J.D. Jr., & Reiger, D.A. (1990). Age at onset of selected mental disorders in five community populations. Archives of General Psychiatry, 4 7, 511-518. Cohen, P., O'Connor, P., Lewis, S., Velez, C.N., & Malachowski, B. (1987). Comparison of DISC and K-SADS-P interview of an epidemiological sample of children. The American Academy

of Child and Adolescent Psychiatry, 26, 662-667. Cronbach, L.J. (1951 ). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297334. Doi, Y. (1995). An examination of the reliability and validity of the DSM Scale for Depression

among Japanese andAnglo adolescents. Thesis of the University of Texas School of Public Health at Houston. Goldstein, J.M., & Simpson, J.C. (1995). Validity. In Tsuang, M.T., Tohen, M., & Zahner, G.E.P. (Eds.), Textbook in psychiatric epidemiology (pp.213-228). New York: John Wiley & Sons, Inc. Harrington, R., Fudge, H., Rutter, M., Pickles, A., & Hill, J. (1990). Adult outcomes of childhood and adolescent depression. Archives of General Psychiatry, 47, 465-473. Hodges, K. (1993). Structured Interviews for Assessing Children. Journal of Child Psychology and

Psychiatry, 34~ 49-68. Jensen, P., Roper, M., Fisher, P., & Piacentini, J. (1995). Test-retest reliability of the Diagnostic Interview Schedule for Children (DISC-2.1). Archives of General Psychiatry, 53, 61-71. Kandel, D.B., & Davis, M. (1986). Adult sequelae of adolescent depressive symptoms. Archives of

General Psychiatry, 43, 255-262. Kesseler, R.C., McGonagle, K.A., & Nelson, C.B. (1994). Sex and depression in the National Comorbidity Survey. II. Cohort effects. Journal of Affective Disorders, 30~ 15-26.

276 Klerman, G.L., & Weissman, M.M. (1992). The course, morbidity, and costs of depression. Archives

of General Psychiatry, 49~ 831-834. Kovacs, M., & Goldston, D. (1991). Cognitive and social cognitive development of depressed chilclren and adolescents. Journal of the American Academy of Child and Adolescent

Psychiatry, 30, 388-392. Landis, J.R., & Koch, G.G. (1977). The measure of observer agreement for categorical data.

Biometrics, 33, 159-174. Lewinsohn, P.M., Hops, H.H., Roberts, R.E., Seeley, J.R., & Andrews, A. (1993). Adolescent psychopathology: I. Prevalence and incidence of depression and other DSM-III-R disorders in high school students. Journal of Abnormal Psychology, 102, 133-144. The Management and Coordination Agency (1997). Conduct and behavioral problems. Whitepaper

on adolescents in 1997. Tokyo: (Author). Morita, H., Suzuki, M., Suzuki, S., & Kamoshita, S. (1993). Psychiatric disorders in Japanese secondary school children. Journal of Child Psychology and Psychiatry, 34, 317-332. Nunnally, C. (1967). Psychometric theory. New York: McGraw-Hill. Piacentini, J., Shaffer, D., Fisher, P., Schwab-Stone, M., Davies, M., & Gioia, P. (1993). The Diagnostic Interview Schedule for Children-Revised Version(DISC-R): III. Concurrent criterion validity. The Journal of American Academy of Child and Adolescent Psychiatry,

32, 658-665. Puig-Antich, J., Lukens, E., Davis, M., Goetz, D., Brennan-Quatrtrock, J., & Todak, G. (1985). Psychosocial functioning in prepubertal major depressive disorders. II: Interpersonal relationships after sustained recovery from affective episode. Archives of General

Psychiatry, 42, 511-517. Rice, J.P., Rochberg, N., Endicott, J., Lavori, P.W., & Miller, C. (1992). Stability of psychiatric diagnoses an application to the affective disorders. Archives of General Psychiatry, 49, 824830. Rintelmann, J.W., Emslie, G.J., Rush, A.J., Varghese, T., Gullion, C.M., Kowatch, R.A., & Hughes, C.W. (1996). The effects of extended evaluation on depressive symptoms in children and adolescents. Journal of Affective Disorders, 41, 149-156. Roberts, R.E., Roberts, C.R., & Chen, Y.R. (1997). Ethnocultural differences in prevalence of adolescent depression. American Journal of Community Psychology, 25, 95-110.

277 Roberts, R.E., Lewinsohn, P.M., & Seeley, J.R. (1995). Symptoms of DSM-III-R major depression in adolescence: evidence from an epidemiological survey. Journal of American Academy of

Child and Adolescent Psychiatry, 34, 1608-1617. Roberts, R.E., & Chen Y.W. (1995). Depressive symptoms and suicidal ideation among Mexicanorigin and Anglo adolescents. Journal of American Academy of Child and Adolescent

Psychiatry, 34, 81-90. Robins, L. (1985). Epidemiology: reflections on testing the validity of psychiatric interviews.

Archives of General Psychiatry, 42, 918-924. Schwab-Stone, M., Fisher, P., & Piacentini, J. (1993). The Diagnostic Interview Schedule for Children-Revised Version (DISC-R): II. Test-retest reliability. The Journal of American

Academy of Child and Adolescent Psychiatry, 32, 651-657. Schwab-Stone, M., Shaffer, D., Dulcan, M., Jensen, P., Fisher, P., Bird, H.R., Good am, S.H., Lahey, B.B., Lichtman, J., Canino, G., Rubio-Stipec, M., & Rae, D (1996). Criterion validity of the NIMH Diagnostic Interview Schedule for Children Version 2.3 (DISC-2.3). The

Journal of American Academy of Child and Adolescent Psychiatry, 35, 878-889. Shaffer, D., Fisher, P., Dulcan, M.K., Davis, M. Oiacentini, J., Schwab-Stone, M.E., Lahey, B.B., Bourdon, K., Jensen, P., Bird, H.R., Canino, G., & Regier, D.A. (1996). The NIMH Diagnostic Interview Schedule for Children Version 2.3 (DISC-2.3): Description, acceptability, prevalence rates, and performance in the MECA study. Journal of the

American Academy of Child and Adolescent Psychiatry, 35, 865-877. Shrout, P.E. (1995). Reliability. In Tsuang, M.T., Tohen, M. & Zahner, G.E.P. (Eds.), Textbook in

psychiatric epidemiology (pp.213-228). New York: John Wiley & Sons, Inc. Simon, G.E., & Vonkorff, M. (1995). Recall of psychiatric history in cross-sectional surveys: implications for epidemiologic research. Epidemiologic Reviews, 17, 221-227. Streiner, D.L., & Norman, G.R. (1995). Health measurement scales. Oxford: Oxford University Press. Takahashi, S., & Hanada, K. (1992). The Japanese edition of DSM-III-R training guide for

diagnosis of childhood disorders. Rapoport, J.L. & Ismond, D.R. (Eds.). Tokyo: Igaku-Shoin Ltd. Takeuchi, K., Roberts, R.E., & Suzuki, S. (1994). Depressive symptoms among Japanese and American adolescents. Psychiatry Research, 53, 259-274.

278 Weinstein, S.R., Stine, K., Noam, G.G., Grimes, K., & Schwab-Stone, M. (1989). Comparison of DISC with clinician's DSM-III-R diagnoses in psychiatric inpatients. The Journal of American Academy of Child and Adolescent Psychiatry, 28, 53-60.