Issues of validity in the diagnostic interview schedule

Issues of validity in the diagnostic interview schedule

J. psychial. Rex, Vol. 26, No. Printed in Great Britain. I, pp. 59-67, 1992. 0 ISSUES OF VALIDITY IN THE DIAGNOSTIC 0022-3956/92 $5.00 + .oO 1992...

790KB Sizes 0 Downloads 42 Views

J. psychial. Rex, Vol. 26, No. Printed in Great Britain.

I, pp. 59-67,

1992. 0

ISSUES OF VALIDITY

IN THE DIAGNOSTIC

0022-3956/92 $5.00 + .oO 1992 Pergamon Press plc

INTERVIEW

SCHEDULE ROBERT G. MALGADY,’ LLOYD H. ROGLER~ and WARREN W. TRYON~ ‘Program in Quantitative Studies, New York University and Hispanic Research Center, Fordham University; 2Albert Schweitzer Professor of Humanities and Hispanic Research Center, Fordham University; 3Department of Psychology, Fordham University, Fordham Road, Bronx, NY 10458, U.S.A. (Received 20 December

1990; revised 9 July 1991)

Summary-The Diagnostic Interview Schedule, the chief instrument in contemporary studies in psychiatric epidemiology, enhances the reliability of psychiatric diagnosis and enables lay interviewers to closely reproduce psychiatric interviews. However, despite frequent references in the literature to the validity of the Diagnostic Interview Schedule, most studies fundamentally represent variations of reliability paradigms to the neglect of criterion-related validity. Mistaken assertions of validity persist in the psychometric language used to describe the Diagnostic Interview Schedule. This article examines the basis for claims and counterclaims of validity in accordance with standard psychometric definition, and identifies sources of erroneous reasoning in attempts to infer validity from reliability. The article presents a general framework organizing the process of diagnostic validation and discusses strategies for research seeking to validate psychiatric diagnoses achieved through the Diagnostic Interview Schedule.

Introduction THERE CAN be little doubt that the Epidemiological Catchment Area (ECA) Program sponsored by the National Institute of Mental Health has become a major institutional force shaping the contemporary character of research in psychiatric epidemiology. Evidence of this abounds: as of December 1988, a bibliography (NIMH, 1988) of 17 pages had accumulated on publications resulting directly from the Program and from other studies stimulated by the Program. The Program’s chief instrument is the Diagnostic Interview Schedule (DIS) which has enabled lay interviewers to reproduce the diagnoses of psychiatric interviewers within a clinically negligible margin of error (Robins, 1989). The economy provided by lay DIS interviewers, obviating the involvement of psychiatrists, has facilitated the collection of current and lifetime DSM-III (and DSM-IIIR) prevalence data in large ECA probability samples. Moreover, the structured nature of the interview and computer algorithms linking DIS responses to DSM-III criteria have increased the reliability of diagnostic judgments compared to unstructured diagnoses in routine psychiatric practice. Nonetheless, the DIS and its symbiotic link to the DSM-III have been questioned periodically for lack of empirical evidence of validity. At an operational level, uncertainty persists over whether or not the DIS provides valid DSM-III diagnosis in the hands of lay interviewers, an issue which is analytically separable from the memory errors of respondents in the DIS’ attempts to retrieve retrospectively episodes of mental illness over a lifetime (Rogler, Malgady, & Tryon, in press). This paper critically examines issues concerning the validity of the DIS as an instrument that operationalizes DSM-III criteria. It is reasonable 59

60

ROBERT

G.

MALGADY

et al.

to inquire whether or not the DIS should be subjected to formal psychometric scrutiny with regard to validity. In the absence of a validating criterion, or “gold standard,” it could be argued that perhaps the most that can be hoped for, as with unstructured clinical assessment, is social consensus on diagnostic classification (e.g. concordance among DIS interviews or convergence of multiple methods of classification). For example, Dohrenwend (1989) argues that “The DIS is not a psychometric measure, nor is it a clinical examination” (p. 42); he recommends that validation should be based on majority rule in a multi-method approach to diagnosis. However, the problem of lack of a true criterion is not unique to psychiatry; it is often circumvented by the attempt to establish evidence of an instrument’s criterion-related validity. This would amount to a determination of whether or not a DIS/DSM-III diagnostic classification concords with observable indicators of the latent construct being measured, such as independent observations of symptomatology relevant to the DSM-III diagnostic criteria. The most recent version of Standards for Educational and Psychological Testing (1985) suggests that interview schedules used for diagnostic purposes - even computer-scored and computer-interpreted assessment techniques - are considered “tests” and that “all the standards apply with equal force to such tests” (p. 6). A diagnostic instrument such as the DIS is indeed an instance of a test that samples behavior in a structured manner. Although it is recognized that psychometric standards are not applicable with the same rigor to unstructured behavior samples, such as in unstructured clinical assessment (Standards for Educational and Psychological Testing, 1985), structured behavioral samples should be held accountable to such standards. Reference to the terms “reliability” and “validity” is understandably replete in the DIS literature, indicating that the instrument has been viewed with the same psychometric concern as any other test. Thus, we begin with the claims and counterclaims about the validity of the diagnostic information obtained from the DIS to see if they accord with standards of psychometric theory. We discuss the distinction between reliability and validity to show how the welding of the two concepts in psychiatric epidemiological practice has confounded their meaning. Finally we present a general framework for conceptualizing research on the validation of psychiatric diagnosis. This framework is consistent with recent developments in DIS research and also provides for the consideration of standard psychometric definitions of content, criterion-related, and construct validity. Distinguishing

Reliability

from Validity

Considerable evidence indicates that the use of structured interviews in conjunction with diagnostic criteria has substantially improved the interrater reliability of psychiatric diagnosis (Robins, 1989; Keller et al., 1981; Grove, Andreasen, McDonald-Scott, 1981). Moreover, in estimating reliability a great deal of care has been exercised in controlling or attempting to disentangle a variety of sources of measurement error including those between raters, differential diagnoses, patients and non-patients, times of day the interview is conducted, test-retest intervals, and geographic locations. The DIS procedure consists of asking the respondent a highly structured series of questions, the answers to which are linked algorithmically to standardized diagnostic criteria. The higher interrater reliability of this

61

DIS VALIDITY

procedure, compared to clinical judgments made from unstructured interviews, is hardly surprising. Indeed, the error of measurement attending diagnosis under such conditions is probably more a function of the respondents’ own instability than that of the interview schedule. The standardization of the diagnostic process is a useful way of increasing to respectable levels low reliability or concordance coefficients. However, the higher interrater reliability gained by the increased structure of the DIS, relative to a routine clinical interview, does not mean that psychiatric epidemiology has come any closer than before to achieving valid psychiatric diagnoses. Although reliability is often called the sine qua non of validity, increasing reliability does not guarantee or even imply an attendant increase in validity. The real task is to determine whether there has been an improvement in the validity of DIS/DSM-III diagnosis accompanying the enhanced reliability. The concept of validity refers to the veracity or accuracy of some measurement of a construct. The question of validity is whether or not the quantitative or qualitative values assigned to units under observation accurately depict the units’ variations in the construct or entity that is the intention of measurement. Symbolically, validity of some measure (X) is estimated by its correlation, or concordance, with another measure (Y) of the criterion or of a criterion-related indicator that is external to X. When two X measurements are rendered by different interviewers at the same time or at different times or even by different interviewers at different times, the correlation between the measurements is an estimate of reliability. To qualify as a bona fide validity paradigm, the criterion-related indicator (Y) must be external to X, meaning that it was obtained by a different assessment technique, and must have relevance to the construct that is the target of measurement. Grove et al. (1981) remarked that the distinction between reliability and validity is clear in the abstract, but in practice the difference can easily become blurred. They identified several ways that reliability and validity are confounded in the psychiatric and epidemiologic literature. Test-retest stability of diagnosis and assessment of outcome are “usually considered to be validators of diagnostic categories” (p. 410) when actually such stability refers to reliability. The so-called gold standard of criterion diagnosis pertains to validity, not reliability, and although this standard is inaccessible, “one is sometimes set simply for heuristic purposes” (p. 410). Grove et al. (1981) also remind us that “a reliable measure that has no validity is worthless” (p. 410). In an early study of the validity of the DIS (Version II), Robins, Helzer, Ratcliff, and Seyfried (1982) noted that although reliability and validity are distinct in theory, “in practice it is often difficult to keep them apart” (p. 855). Similarly, others (Blashfield, 1989) have argued that reliability and validity are not always clearly separable, as traditional psychometric theory would suggest, and may vary according to the context of the measurement problem. However, we argue that validity is not in the eye of the beholder; rather we believe that research on the DIS should conform to accepted psychometric standards which meticulously enunciate the distinction between reliability and validity. Comparisons

of Lay and Psychiatric

Interviewers

The standard paradigm for estimating validity of the DIS (Robins et al. 1982; Robins, Helzer, Croughan, & Ratcliff, 1981) involves a comparison of two interviews with the same

62

ROBERT G.

MALGADY

et al

patient by a lay person and by a psychiatrist. In a study of DIS lifetime diagnoses, Helzer et al. (1985) acknowledge that the goal of the DIS is valid detection of the occurrence of psychiatric symptomatology, but they have argued that there is no absolute validity standard to determine whether or not this goal has been achieved. The comparison of diagnoses obtained by lay and psychiatric interviewers was suggested as an alternative, with the validity standard being the clinical skill of an experienced psychiatrist. The flaw in this approach, according to Helzer, Robins, McEvoy, Spitznagel, Stoltzman, Farmer, & Brockington (1985), stems from the fact that the validity standard (i.e. the psychiatrist’s diagnosis) is itself a fallible measurement. So-called validity studies of the DIS based upon this paradigm have proliferated (Anthony et al., 1985; Folstein et al., 1985; Helzer, Spitznagel, & McEvoy, 1987). A recent example is provided in a cross-cultural study of DIS symptom scales (as opposed to diagnoses) by Rubio-Stipec, Shrout, Bird, Canino, and Bravo (1989), who note that the scales they derived had been validated against clinical DSM-III diagnoses. The promotion of this validity paradigm over the years since the inception of the DIS attests to the general acceptance of a methodology we believe is highly questionable. The inability to establish the validity of the DIS because of the lack of a criterion diagnosis is a widely recognized problem in the behavioral sciences (Cronbach, 1970). The strategy of establishing the clinical skill of psychiatrists as the validity standard (Helzer et al., 1985) can be construed as an attempt to create a criterion-related standard. However, the flaw in this standard is not the lack of perfect concordance among skilled psychiatrists, as Helzer et al. (1985) and others asserted. Such error is inherent to the measurement of all but the most mundane criterion-related indicators. The net effect of measurement error, expressed psychometrically as the departure from unity of the criterion-related measure’s reliability coefficient, is to attenuate the estimate of the true validity coefficient. The flaw in setting psychiatrist’s clinical skills as the standard of criterion-related validity is the fact that psychiatric diagnoses are not external to lay diagnoses because both come from the DIS. The confounding of reliability and validity in the lay person/psychiatrist validation paradigm derives from the treatment of psychiatric diagnosis as an external criterion-related validity standard (Y) in respect to lay diagnosis (X). This places the focus of the distinction between X and Y on the interviewer and not on the instrument. Following this logic, it is easy to see that reliability can easily be transformed into validity through sufficient perturbations of X induced by procedural variation. In the words of Grove et al. (1981) posing the psychiatrist’s diagnosis as a heuristic standard is just that - an unjustified solution to the problem. The standard validity paradigm has been described by Robins et al. (1982) as “straddling the reliability vs. validity issue” (p. 856). Since the lay-interviewer and the psychiatristinterviewer are qualitatively different, Robins et al. argue that the paradigm does not qualify as an ordinary reliability study, and since the DIS is not compared to another validated instrument or diagnostic criterion, this paradigm is not the “preferred” way of estimating validity. Nevertheless, the outcome of research based on this paradigm is still more often identified as validity than as reliability. The paradigm has been legitimated by reference to Spitzer and William’s (1980) concept of “procedural validity”, representing some sort of amalgamation of reliability and validity. In fact, it seems more appropriate to refer to the comparison of diagnoses obtained by lay and psychiatric interviewers as “procedural

63

DIS VALIDITY

reliability.” Dohrenwend (1989) seems to concur in his observations about the paradigm: “ . . . convergence for lifetime diagnoses in retests done with the DIS itself which, although the type of interviewer is varied, is more a test of reliability than validity” (p. 44). The demonstrated concordance between lay and psychiatric interviewers does not stem from two qualitatively different measurements (one being criterion-related). It is nothing more than evidence of interrater reliability for the two systematically different types of interviewers under comparison. The common practice of reporting concordance for different types of diagnostic categories (i.e. patient types), for example, is no different in principle from reporting concordance between lay persons and psychiatrists (i.e. interviewer types). Neither procedure implicates validity. Since the psychiatric diagnosis itself has not been validated, the fact that the lay diagnosis agrees with the psychiatric diagnosis does not support the validity of the lay diagnosis. The concordance obtained between lay and psychiatric interviews merely establishes the reproducibility of diagnosis under varied interviewer conditions. In still another view, Brown (1970) discusses procedural variations of conventional types of reliability, such as alternate test forms administered at two different times. In this wedding of the conventional procedures for estimating either alternate forms or test-retest reliability, the resultant reliability estimates are referred to as lower-bound, since error variance is induced from two systematic sources (forms and time). Likewise, if systematically different interviewers render a diagnosis at two different times, the resultant concordance is a lowerbound reliability estimate (due to error variance induced by interviewers and time). Thus, we do not dispute the epidemiological utility of lay DIS diagnoses indicated by their fairly respectable concordance with psychiatric diagnoses. But in the absence of empirical research linking DIS diagnoses to external (non-DIS) criterion-related standards, there is simply little that can be said regarding the psychometric validity of the DIS at the present time. Lay-psychiatric concordance is praiseworthy because of the economic utility gained in epidemiological research. On the other hand, lay-psychiatric concordance is not praiseworthy when it is taken as a proxy for validity. Comparison

of DIS Diagnoses

and Medical

Charts

Robins et al. (1982) purported to have an external validation criterion embodied in medical-chart diagnoses which were not based on the DSM-III and which were rendered prior to both lay and psychiatric DIS interviews. Their results indicated that 23 to 88 percent (mean = 55 percent) of the lay diagnoses and 44 to 83 percent (mean = 63 percent) of the psychiatric diagnoses confirmed the previous diagnoses in patients’ medical charts; but medical charts, in turn, fared much worse by confirming only 27 percent of lay and 28 percent of psychiatric diagnoses. These findings led Robins et al. (1982) to conclude that “clinical practice is not an adequate standard against which to measure the validity of a research instrument” (p. 868). Yet they proceeded to argue that a better “yardstick” of validity is to compare lay DIS diagnoses with psychiatric diagnoses that have been validated by medical charts, reasoning that agreement between two imperfect criteria yields a more accurate yardstick than either alone. The finding that lay-psychiatric concordance increased by a mean of 6 percent when the psychiatric standard was confirmed by a medical-chart standard was interpreted as evidence that the initial estimates were too conservative.

ROBERT G. MALGADY et al.

64

This approach to validation and the conclusions drawn are questionable for several reasons. The researchers ignored the finding that, in three of seven diagnostic categories examined, there was either no difference between lay and psychiatric concordances with the medical chart or greater lay medical-chart concordance. Logically, this would seem to impugn further the respectability of the psychiatrist diagnosis as a criterion-related standard. If, when judged against external medical chart diagnoses, lay DIS diagnoses are as concordant or more concordant than psychiatric diagnoses for nearly half of the diagnostic categories, it is difficult to image how the latter can be seriously entertained as a standard in their own right. Second, the proposition that the convergence of two imperfect diagnostic indicators establishes greater confidence in accuracy than either imperfection alone is unfounded. The fact that lay interview results are more consistent with the combination does not imply that accuracy has increased. Moreover, since certain lay diagnoses concord with one standard as well as or better than the two standards concord with each other, the combination lacks integrity. Finally, we question the logic behind the attempt to use the medical chart as an external standard of validity and to pose it as an alternative to psychiatrist DIS diagnoses, a tactic which has been followed elsewhere with even less reservation (Hendricks et al., 1983). If we do not lose sight of the original intention of the DIS and DSM-III, it is apparent that medical-chart diagnoses are subject to all the sources of unreliability that structured interviews and diagnostic criteria were designed to eliminate in the first place. Using medical records of diagnoses obtained in an unstandardized and unstructured fashion as a validity standard constitutes circular reasoning when juxtaposed with the original intent of the DIS. Why evaluate a research instrument by its capacity to reproduce the very judgments it was designed to replace? If unstructured, unstandardized clinical judgment is suspect and medical records are based on such judgments, the extent to which DIS diagnoses concord with medical records may be cause for suspicion but not for validity. A Framework

for Validation

of Psychiatric

Diagnosis

We propose that validational research be framed according to a three-step hierarchical order resembling Cloninger’s (1989) reduction of Robins and Guze’s (1970) five-phase approach to the validation of diagnostic criteria to the psychometric concepts of content, criterion-related and construct validity. In the process of making successive assessments derived from classifying a respondent according to a DSM-III diagnosis, the approach would begin by establishing content validity, proceed to research on criterion-related validity, and finally arrive at construct validity. Content validity assesses the degree to which the totality of items and procedures in the DIS combine with the corresponding algorithms to approximate specified DSM-III disorders. In terms of its manifest content, the contributions of the DIS to epidemiological research have been outstanding. The DIS is the product of a highly disciplined and meticulous orientation toward the DSM-III. Robins’ (1989) recent discussion of this topic foreshadows even greater sensitivity toward content validity in the future. Criterion-related validity, the next step in the hierarchy, involves concurrent and predictive assessments of the original DISDSM-III diagnosis. We have seen that this type of validity

DIS

65

VALIDITY

is often confused with reliability in research on the internal coherence of the DIS and on its test-retest consistency when administered by qualitatively different interviewers. Distinct and separate from issues of reliability, the validity problem in this step involves assessing the respondent’s DSM-III-relevant conduct and experience in the present and in the future by means of instruments and observations which do not rely upon the DIS. Generally, do such assessments away from the interview situation show that what the DIS uncovers forms some part of the respondent’s daily life? For instance, are the hallucinations, delusions, and disruptive social behavior of a respondent with a DIS/DSM-III diagnosis of schizophrenia independently corroborated by other persons proximate to the respondent? Are there accompanying perceptions of impaired functioning? Can the respondent perform customarily expected social roles? The answers to these questions invite the development of field procedures in which the respondent’s significant others are sampled to serve as witnesses of the disorder. These questions need attention in epidemiological research because what one instrument uncovers in a particular data-collection setting may or may not be consequential to the respondent’s present or future life. If the DIS/DSM-III diagnosis is not consequential, the classification is ephemeral and devoid of general significance. To illustrate the implementation of studies within the framework, we assume the present DIS distinction between the respondent’s current diagnosis (C-DIS) and lifetime retrospectively based diagnosis (L-DIS) and assume, too, the collection of data relevant to criterion-related validity standards (CRVS). We present below a schematic representation of a research strategy for assessing criterion-related validity of current DIS diagnosis and retrospective accuracy of lifetime diagnosis.

TIME

INTERVALS

I

2

3

ASSESSMENT Criterion-related Validitv

Standards

Current

DWDSM-III

Diaenosis

Lifetime

CRVS

CRVS

CRVS

w C-DIS

C-DIS

I/ C-DIS

Retrospective

DWDSM-III

Diagnosis

-L-DlS

L-DIS

. .

Estimates of criterion-related validity can be made in two ways: concurrently, by assessing the congruence between the C-DIS and CRVS at each time period (as shown by the vertical arrows), and, predictively, by assessing the congruence between the C-DIS in one time period and the CRVS in a subsequent time period (as shown by the left-to-right oblique arrows).

66

ROBERT

G.

MALGADY

et al

When such congruence is determined over successive time periods, our understanding of the stability of criterion-related validity estimates of current psychiatric diagnosis is enhanced. Once concurrent and predictive validity of C-DE is established, the issue of retrospective accuracy of L-DIS can be meaningfully addressed. Retrospective accuracy would be estimated by determining whether or not the L-DE uncovers cases previously identified as C-DE diagnoses that in turn have been psychometrically validated. The concordance between the L-DE and one or more C-DE that are both prior and validated would constitute evidence of retrospective accuracy (as shown by the right-to-left oblique arrows in the schematic representation). Whereas criterion-related validity involves the external generalizability of diagnosis across space and time, construct validity - the validation hierarchy’s third step - poses substantially more refined and abstract questions about the distinctiveness of patterns associated with specific psychiatric disorders. It is for this reason that Cloninger (1989) recognizes that construct validity explicitly involves the hypothetico-deductive method. It is hypothetical because theoretical premises are invoked which designate the proximate causes of the disorder, its correlates, and its consequences. It is deductive because predictions are advanced from these theoretical premises regarding expected research outcomes. Thus, construct validity surrounds the psychiatrically diagnosed disorder with undergirding theoretical premises and the predictions which derive from them. For example, in psychopharmacological research theoretical assumptions predict that differential reactions to the ingestion of specific drugs are associated with diverse disorders (Klein, 1989); in twin and family studies the prevailing assumption of genetic transmission of a disorder leads to the deduction that, given a positive case, the transitional probabilities of the disorder among genetically more similar family members progressively exceed independent probabilities. Construct validation efforts can be extended beyond organic variables to include cultural and psychosocial processes. However, this line of research remains woefully undeveloped. Application of the three-step hierarchy in structuring DIS/DSM-III validation research presupposes research developments in each previous step, and each step carries with it an array of useful psychometric methodologies. Acknorvledgemunts-This research was supported in part by a grant (No. 2ROlMH 30569) from the National Institute of Mental Health, Se]-vices Research Branch to Lloyd H. Rogler. The authors thank Gerald Cm-in, Charles E. Holzer, III, Stasia Madrigal, Janet T. Cohen, and Ivette Estrada for their comments on the manuscript.

References Anthony, J. C., Folstein, M. F., Romanoski, A. J., Von Korff, M. R., Nestadt, G. R., Chahal, R., Merchant, A., Hendricks Brown, C., Shapiro, S., Kramer, M., & Gruenberg, E. M. (1985). Comparison of the lay Diagnostic Interview Schedule and a standardized psychiatric diagnosis: Experience in Eastern Baltimore. Archives of General Psychiatry, 42, 667-675. Blashfield, R. K. (1989). Alternative taxonomic models of psychiatric classification. In Robins, L. N., &Barrett, J. E. (Eds.), The Validity of Psychiatric Diagnosis (pp. 19-34). New York: Raven Press. Brown, F. (1970). Principles of Educational and Psychological Testing. Hinsdale, IL: Dryden Press. Cloninger, C. R. (1989). Establishment of a diagnostic validity in psychiatric illness: Robins and Guze’s method revisited. In Robins, L. N., & Barrett, J. E. (Eds.), The Validity qf Psychiatric Diugnosis (pp. 9-18). New York: Raven PrerT.

DIS VALIDITY

67

Cronbach, L. J. (1970). Essentials of Psychological Testing. New York: Harper and Row. Dohrenwend, B. P. (1989). The problem of validity in field studies of psychological disorders revisited. In Robins, L. N. &, Barrett, J. E. (Eds.), The Validity of Psychiatric Diagnosis (pp. 35-55). New York: Raven Press. Folstein, M. R., Romanoski, A. J., Nestadt, G., Chahal, R., Merchant, A., Shapiro, S., Kramer, M., Anthony, J., Gruenberg, E. M., & McHugh, P. R. (1985). Brief report on the clinical reappraisal of the Diagnostic Interview Schedule carried out at the Johns Hopkins site of the Epidemiological Catchment Area Program of the NIMH. Psychological Medicine, 15, 809-8 14. Grove, W. M., Andreasen, N. C., & McDonald-Scott, P. (1981). Reliability studies of psychiatric diagnosis. Archives of General Psychiatry, 38, 408-413. Helzer, J. E., Robins, L. N., McEvoy, L. T., Spitznagel, E. L., Stoltzman, R. K., Farmer, A., & Brockington, T. F. (1985). A comparison of clinical and Diagnostic Interview Schedule diagnoses: Physician reexamination of lay-interviewed cases in the general population. Archives of General Psychiatry, 42, 657-666. Helzer, J. E., Spitznagel, E. L., & McEvoy, L. T. (1987). The predictive validity of lay Diagnostic Interview Schedule Diagnoses in the General Population. Archives of General Psychiatry, 44, 1069-1077. Hendricks, L. E., Bayton, J. A., Collins, J. L., Mathura, C. B., McMillan, S. R., & Montgomery, T. A. (1983). The NIMH Diagnostic Interview Schedule: A test of its validity in a population of Black adults. Journal of National Medical Association, 75, 667-67 1. Keller, M. B., Lavori, P. W., McDonald-Scott, P., Scheftner, W. A., Andreasen, N. C., Shapiro, R. W., & Croughan, J. (1981). Reliability of lifetime diagnoses and symptoms in patients with a current psychiatric disorder. Journal of Psychiatric Research, 16, 229-240. Klein, D. F. (1989). The pharmacological validation of psychiatric diagnosis. In Robins, L. N., & Barrett, J. E. (Eds.), The validity ofpsychiatric diagnosis (pp. 203-216). New York: Raven Press. National Institute of Mental Health, Epidemiology and-Psychopathology Branch, Division of Clinical Research (1988). Publications List Updating the ECA Bibliographv. I _ _ Rockville. MD: Author. Robins, E., & Guze, S. B. (1970). Establishment of diagnostic validity in psychiatric illness: Its application to schizophrenia. American Journal of Psychiatry, 126, 983-987. Robins, L. N., Helzer, J. E., Croughan, J., & Ratcliff, K. S. (1981). National Institute of Mental Health Diagnostic Interview Schedule. Archives of Genera/ Psychiatry, 38, 381-389. Robins, L. N., Helzer, J. E., Ratcliff, K. S., & Seyfried, W. (1982). Validity of the Diagnostic Interview Schedule, Version II: DSM-III diagnoses. Psychological Medicine, 12, 855-870. Robins, L. N. (1985). Epidemiology: Reflections on testing the validity of psychiatric interviews. Archives of General Psychiatry, 42, 918-924. Robins, L. N. (1989) Diagnostic grammar and assessment: Translating criteria into questions. In Robins, L. N., & Barrett, J. E. (Eds.), The validity of psychiatric diagnosis (pp. 263-278). New York: Raven Press. Rogler, L. H., Malgady, R. G., & Tryon, W. G. (in press). Evaluation of mental health: Issues of memory in the Diagnostic Interview Schedule. Journal of Nervous and Mental Disease. Rubio-Stipec, M., Shrout, P. E., & Bird, H., Canino, G., & Bravo, M. (1989). Symptom scales of the Diagnostic Interview Schedule: Factor results in Hispanic and Anglo samples. Psychological Assessment: Journal of Consulring and Clinical Psychology, 1, 30-34. Spitzer, R. L., & Williams, J. B. W., (1980). Classification of Mental Disorders and DSM-III. In Kaplan, H., Freedman, A., & Saddock, B. (Eds.), Comprehensive Textbook of Psychiatry (3rd ed.) pp. 1035-1072). Baltimore: Williams and Wilkins. Standardsfor Educational and Psychological Testing (1985). Washington, DC: American Psychological Association.