Structured clinical interviews and the inferential process

Structured clinical interviews and the inferential process

Jomol oJSchoo1 Prychology. Vol. 30, pp 141-149. Pergamon Press Ltd. P&ad in the USA oozz-4405/92/15.00 + .oo 0 1992 The Journal of School Psychology...

607KB Sizes 0 Downloads 80 Views

Jomol oJSchoo1 Prychology. Vol. 30, pp 141-149. Pergamon Press Ltd. P&ad in the USA

oozz-4405/92/15.00 + .oo 0 1992 The Journal of School Psychology. Inc.

1992

Structured Clinical Interviews and the Inferential Process Philip A. Saigh Graduate

School of the City University

of New York

This article reviews the history of psychiatric nosology and the use of interview data as a vehicle for formulating clinical inferences. Attention is focused on the qualities of structured interviews as well as procedures for constructing these indices and methods for establishing their psychometric properties. Practical and theoretical limitations relating to formal systems of classification and structured clinical interviews are reviewed and future directions are considered.

A SHORT

HISTORY

OF PSYCHIATRIC

NOSOLOGY

Strongly influenced by the comprehensive approach of Emil Kraepelins (1856-1916) to the description of the etiology, course, and symptomology of psychiatric

disorders,

the American

Medico

Psychological

Association

(i.e.,

the forerunner of the American Psychiatric Association [APA]) developed a formal nosology in 1917 (Copp, 1917). The APA proceeded to publish the Standard ClassifiGation of Nomenclature of Disease in 1934. Although the 1934 nosology reflected an attempt to expand on the earlier model, it was criticized for being too limited in scope: “Only 10% of the total cases seen fell into any of the categories originally seen in public mental health hospitals” (APA, 1952, p. 5). The nosology was also criticized for not providing operational descriptions. Prompted by these concerns, the APA published the Diagnostic and Statistical Manual of Mental Disorders (DSM-I) in 1952. Unlike its predecessors, the DSM-I provided a modicum of increased coverage by listing 108 classilications and general symptomological descriptions. Investigators were, however, critical

of the nonoperational

& Blashlield,

1986).

subsequently

reconvened

nature

of these descriptors

The APA’s Committee and issued

(Morey,

on Nomenclature

the DSM-II

in 1968.

Skinner,

and Statistics Although

this

model provided broader coverage by listing 182 disorders, it too was criticized for presenting nonoperational descriptors. Moreover, researchers were quite critical about the low levels of reliability that were associated with DSM-II diagnoses (Feighner et al., 1972; Spitzer & Fleiss, 1974). Given these concerns, the APA appointed a task force to work on the development of the DSM-III in 1975. Under the direction of Robert Spitzer, Received November 27, 1990; final revision received September 3, 1991. Address correspondence and reprint requests to Philip A. Saigh, Graduate versity of New York, 33 West 42nd Street, New York, NY 10036. 141

School,

City Uni-

Journal of School Psychology

142

psychiatrists, field trials

psychologists, involving

and social workers

approximately

12,000

carried

cases.

out a wide range of

In each instance,

prac-

titioners prepared detailed symptomological profiles that were eventually used to generate operational descriptors for the various classifications. These efforts culminated

in the publication

of the DSM-III

in 1980. Unlike its predecessors,

the 1980 nosology provided greater coverage by listing 265 classifications

and

detailed diagnostic criteria. Following its publication, the DSM-III gained rapid acclaim and came to serve as a lingua franca for mental health practitioners in the United States (Williams,

1988).

Despite the apparent

success of the DSM-III,

revisionary

efforts were initiated in 1983. This work was prompted by research suggesting that some of the diagnostic criteria needed to be modified (Williams, 1988). Field trials involving approximately 1,200 cases were subsequently conducted (APA, 1987). In the main, these trials served to synchronize current nosological research and diagnostic III-R

was subsequently

criteria for a number of classifications.

The DSM-

published in 1987. Although it is too soon to comment

on its usefulness, a functional analysis indicates that more explicit diagnostic criteria were provided in a number of instances. It is also apparent that the amount of general information that is listed for the 334 classifications (e.g., course and associated features) was considerably

ADVANTAGES

expanded.

OF A DIAGNOSTIC

MANUAL

A formal nosology offers a number of important advantages (Morey et al., 1986; Spitzer & Williams, 1980). Initially, it may serve as a basis for communication.

Without

using different

this, practitioners

terms to describe

would confuse their communications

the same conditions

and identical

by

terms to

describe different phenomena. In a similar vein, a well conceived nosology may facilitate the storage and retrieval of information. Finally, an empirically derived diagnosis coupled with information as to course, direction, and associated characteristics should lead to the provision of a treatment of choice.

INTERVIEWS Interviews

have traditionally

been used to establish inferences

in clinical set-

tings. Kolb’s preface to The Psychiatric Interview in Clinical Practice (1979) indicates that “the interview is the most important technical instrument in all of the professions concerned with man and social functioning” (p. vii). In effect, interviews involve a series of interactions, often in question-and-answer format, that are used as a basis for formulating inferences. Examined operationally, interviews may follow an unstructured or structured format. The former involves nonstandardized questions or probes that are based on the examiner’s personal knowledge and ongoing clinical observations. The latter entails a standardized process wherein examiners

present questions

according

to a specified

143

Saigh

protocol.

Although

unstructured

tion, they are more susceptible

interviews

may yield a wealth of informa-

to information variance (i.e.,

differences

in the

quantity and quality of information that is obtained from the same examinee by different examiners) (Spitzer, Endicott, & Robins, 1978). Perloff, Craft, and Perloff (1986) also caution that unstructured interviewers formulate decisions “very early in the interview (primacy effect); negative information about the interviewee

seems to be weighted more heavily than positive information;

visual cues are more important than verbal cues” (p. 424). In a similar vein, Edelbrock and Costello (1990) observed that unstructured interviews are “one of the least trustworthy assessment procedures, subject to broad variations in content, style, detail, and coverage” (p. 308). G iven this propensity for variation, it is not surprising

that unstructured

interviews

generally yield low esti-

mates of reliability.

STRUCTURED INTERVIEWS According to the Edelbrock and Costello (1990) definition, a structured interview essentially consists of “a list of target behaviors, symptoms, and events to be covered, guidelines for conducting

the interview and procedures

for record-

ing the data” (p. 311). With this in mind, Hedlund and Viewing (1981) reported that “carefully structured, well defined rating scales help to make clinical observations more reliable across different settings” (p. 39). To date, a number of DSM-IIIor DSM-III-R-based developed to diagnose psychiatric morbidity

structured interviews have been among children and adolescents.

By way of example, Puig-Antich and Chambers (1978) developed the Schedule for Affective Disorders and Schizophrenia for School Age Children 6-18 Years (Kiddie-SADS, or K-SADS). This interview was designed to assess current

and past (i.e.,

lifetime)

psychopathology.

Since its development,

the

K-SADS has been repeatedly revised, and two complementary versions (i.e., clinical and epidemiological) are currently available. The clinical version (KSADS-P) (Puig-Antich & Ryan, 1986) 1s intended to diagnose current episodes of psychopathology, while the epidemiological version (K-SADS-E) (Orvaschell & Puig-Antich, 1987) is primarily intended to denote the lifetime prevalence

of psychopathology.

In a similar vein,

the Diagnostic

Interview

for Children (DICA) was designed for clinical and epidemiological research involving children and adolescents (Welner, Reich, Herjanic, Jung, & Amado, 1987). Currently, two versions of the DICA are available. The DICA-A (a child and adolescent interview) and the DICA-P (a parent interview) provide DSM-III-R diagnoses for the majority of the psychiatric disorders that are applicable to children and adolescents (Reich, 1991a, 1991b). Given the structured nature of the K-SADS and DICA, it is not surprising that both measures offer adequate interrater reliability (Chambers et al., 1985; Welner et al., 1987). Moreover, both measures have effectively discriminated between pediatric psychiatric cases with divergent diagnoses (Cham-

144

journal of School Psychology

bers et al., 1985; Welner et al., 1987). For a fuller review of these and related measures, the reader is referred to Edelbrock and Costello (1990) and Spiker and Ehler (1984).

Development Despite the increased interest in structured interviews, the clinical literature does not provide guidelines on test development. On the other hand, guidelines for developing are apparent

tests in general

and structured

this context, completed, goes a long way toward generating observations

interviews

in particular

in the annals of measurement and survey research. Viewed in Cronbach (1971) o b served that “the universe of definition, once the items to be used or the

to be made” (p. 459). If one remains with the operational

tors used in the DSM-III-R,

sity involve the generation of content-valid must determine if the test items adequately (e.g., DSM-III-R diagnostic criteria) this principle, survey research (Borg

items. In effect, test developers sample the universe of behaviors

that they purport to measure. Given & Gall, 1983; Sudman & Bradburn,

1982) suggests that the following corollaries 1. Clarity is paramount.

descrip-

the initial issue for test developers will by neces-

are important:

If one seeks to make valid inferences,

items should

be interpreted in the same way by different examinees. 2. The proportions of items to criterion must be constant. 3. Short items are recommended,

as they are easier to understand.

4. Compound items that require a single answer for two or more questions (e.g., “Have you been having repeated and senseless thoughts, and have you acted on these thoughts?“) should be avoided. 5. Technical jargon should be avoided, as this may confuse examinees. 6. Negatively phrased items eliciting a positive response (e.g., “Don’t you feel anxious?“) should not be used, as examinees may respond in a manner that is inconsistent with their symptomology. 7. A brief, clear, and standardized examinees. 8. The examiners

introduction

should be provided for the

should be provided with clear instructions

as to adminis-

tration and scoring. 9. Peer evaluations of the experimental protocol should be obtained. 10. The examiners should be trained to administer and score the interview. 11. The interview should be field-tested on a modest sample (n = 25). 12. Following field testing, written evaluations should be obtained from the examiners and examinees. 13. Revisions should then be made as indicated. 14. If necessary, field testing should be repeated. By way of example, matic Stress Disorder

Table

1 presents a subtest of the Children’s

Inventory

(Saigh,

Posttrau-

1989a) and the section of DSM-III

145

Saigh Table Children’s

1

Posttraumatic Stress Disorder Inventory Corresponding DSM-III Criterion

Subtest and

Questions 1. SAY: “Are you having a lot of unwanted thoughts about the experience that you described?” RECORD: Yes _ No _ 2. SAY: “Are you having bad dreams about this?” RECORD: Yes _ No _ 3. SAY: “Do you sometimes feel as if this experience is about to happen again?” RECORD: Yes _ No _ 1. If the Examinee 2. If the Examinee

Scoring said “No” to Questions 1, 2, and 3, RECORD said “Yes” to Questions 1, 2, or 3, RECORD SCORE

DSM-III criterion Reexperiencing the trauma 1. Recurrent and intrusive 2. Recurrent dreams of the 3. Sudden acting or feeling

on which it is based.

a 0 in the scoring box. a 1 in the scoring box.

I

is evidenced by at least one of the following: recollections of the event. event. as if the traumatic event were reoccurring.

Examined in this context, a correspondence between criteria is apparent. Moreover, standardized

the test items and diagnostic directions for administration

and scoring

are provided.

Clear,

short,

and

positive items are also evident

Reliability Although developers & Martin,

structured

interviews have enjoyed considerable

use in recent years,

frequently fail to establish the reliability of these measures (Fisher 1981). This problem is particularly apparent when investigators

develop an interview for a particular disorder (e.g., separation anxiety disorder) as part of larger clinical or epidemiological trial. In view of this, and as the accuracy and consistency of measurement quent inferences, the estimation of reliability part of the test development From

may affect the quality of subsecan only be seen as an integral

process.

the onset, it should be understood that structured interviews yield nominal data (i.e., positive or negative diagnoses). With this in mind, two procedures can be used to gauge reliability. The first involves a test-retest methodology by which the same examinees are independently tested by two examiners. After testing, the number of agreements between examiners is divided by the number of agreements and disagreements and multiplied by 100. Although percentage of agreement is easy to calculate, it should be interpreted with caution, as it may lead to unreliable estimates because it does In order to avoid this form of error, not control for chance agreements.

146

Journal of School Psychology

Cohen’s (1960) and calculation

coefficient, kappa, may be used. Given that the derivation of coefficient kappa is rather complex, readers may wish to

review Cohen’s 1960 article for a detailed description. Validity Once the role of content validity has been established, dures should be considered.

The

percentage

The second is spec;f;ip

of true positives).

first is referred

three additional proceto as sensitivity (i.e., (i.e.,

the

the percentage

of

true negatives). The third is predictive power (i.e., the percentage of true positives and false negatives). Inasmuch as textbooks in the area of psychological measurement do not usually address these procedures, the following example may serve as an illustration. Assume that 100 students presented for evaluation and that two highly experienced

(i.e.,

expert)

psychologists

conducted

independent, unstructured interviews with each student. Also assume that the expert examiners agreed that 60 students met diagnostic criteria for a DSM-III-R Conduct Disorder diagnosis and that a novice examiner administered a structured interview to the cases in question. Table 2 presents the hypothetical

expert-novice

sitivity, specificity,

diagnoses

and the corresponding

estimates of sen-

and predictive power.

Limitations By their very nature, structured interviews are intended to limit information variance. Although standardization may augment reliability, the exclusive use of structured

interviews may preclude the use of clinically valuable ques-

tions. This concern may be allayed, however, by conducting nonstructured interviews before or after the administration of a structured interview. A second concern

involves the fact that structured

interviews

entail self-

reports, and this form of assessment “may serve instrumentally to achieve other anticipated consequences for the individual, leading to responses that

Hypothetical Expert Sensitivity,

Table 2 Versus Novice Diagnoses and Estimates Specificity, and Predictive Power

of

Expert diagnoses

Novice diagnoses

Conduct disorder (N = 60)

Conduct Disorder Other Diagnoses Total

55 5 60

Note. Sensitivity: 55/60 = .92; power: (55 + 35)/100 = .90.

Other diagnoses (N = 40) 5 35 40

specificity:

35/40 = .88;

Total 60 40 100 predictive

147

Saigh

are simply untrue and that depend on the individual’s perception tion” (Borkovec,

Weerts,

& Bernstein,

1977,

p. 405).

of the situa-

In view of this, the

position recommended here is to formulate inferences on the basis of a variety of measures (e.g., overt behavior, physiological ratings, reports by significant others, as well as structured interviews). Of course, the validity of the criterion

for which a test is developed

paramount

situation.

concern

in any measurement

It should be recalled

is a in

this respect that the DSM-III-R presents 265 disorders and that the validity of a taxonomy of mental disorders varies across disorders. Viewed along these lines, empirical case-control studies have provided greater support for the syndromal validity of some disorders as compared to others (Blanchard, Kolb, Prins, Gates, & McCoy, 1991; Saigh, 1989b; Skodol, 1989). Consequently, the sensitivity and specificity of structured interviews may vary as a function of the validity of the disorder that they purport to measure. Finally, psychiatric nosology and structured interviews have been challenged on the grounds that assigning an individual to a psychiatric classilication will initiate a self-fulfilling prophecy (Morey et al., 1986; 1973; Szasz, 1970). Labelling persons as “deviant,” it is argued, them into more frequent association with other “deviants” (i.e., patients)

or may lead to their social isolation.

Rosenhan, may bring psychiatric

In either case, the individual

can only become more disturbed. Although this argument provides a degree of intuitive appeal, Morey et al. (1986) counter by noting that little empirical support is available to buttress the self-fulfilling formulation. In any event, the virtues of the structured interview are not diminished should one abstain from formulating a diagnosis.

FUTURE DIRECTIONS An overview

of the clinical

literature

clearly

indicates

that the structured

interview is assuming more importance as an integral part of the clinical and research process. Over time it is expected that professional educators will train their students in this device and that this in turn will lead to its increased use in clinical practice. It is also anticipated that ongoing nosological research will lead to additional refinements. At the moment, work is progressing on the DSM-IV, and the publication of this system will necessitate of content-valid measures like the K-SADS and DICA.

the revision

It should also be understood that the nosologies that were considered here are the product of U.S. psychiatry and that other systems of classification (e.g., the World Health Organization (1979) International Classzjihtion of Disease-9) present different classifications and diagnostic criteria. Like the American Psychiatric Association’s models, however, these systems have been repeatedly revised, and it is anticipated that future revisions will mandate the modifications of content-valid instruments like the ICD-9’s Present State

Journal of School Psychology

148

Exam (Wing,

1977). Finally,

more rigorous criteria for publication

time, compel test developers to determine

will, over

and report the psychometric

proper-

ties of these measures.

ACKNOWLEDGEMENTS The author thanks Professors

George W. Hynd and Florent

as the anonymous this paper.

for their detailed

reviewers

feedback

Dumont

on earlier

as well drafts of

REFERENCES American Psychiatric Association. (1934). Standard classzfied nomenclature of disease. Washington, DC: Author. American Psychiatric Association. (1952). Diagnostic and statistical manual of mental disorders. Washington, DC: Author. American Psychiatric Association. (1968). Diagnostic and statistical manual of mental disorders (2nd ed.). Washington, DC: Author. American Psychiatric Association. (1980). Diagnostic and statistical manual of mental disorders (3rd ed.). Washington, DC: Author. American Psychiatric Association. (1987). Diagnostic and statistical manual of mental disorders (3rd ed. rev.). Washington, DC: Author. Blanchard, E. B., Kolb, L. C., Prim, A., Gates, S., & McCoy, G. C. (1991). Changes in plasma norepinephrine to combat-related stimuli among Vietnam veterans with posttraumatic stress disorder. Journal of Nervous and Mental Disease, 179, 371-173. Borg, W. R., & Gall, M. D. (1983). Educational research: An introduction (4th ed.). New York: Longman. Borkovec, T. D., Weerts, T. C., & Bernstein, D. A. (1977). Assessment of anxiety. In A. R. Cimenero, K. S. Kalhoun, & H. E. Adams (Eds.), Handbook of behavioral assessment (pp. 367-428). New York: Wiley. Chambers, W., Puig-Antich, J., Hirsch, M., Paez, P., Ambrosini, P., Tabrizi, M., & Davies, M. (1985). The assessment of affective disorders in children and adolescents by semi-structured interview: Test-retest reliability of the K-SADS. Archives of General Psychiatry, 42, 696-702. Cohen, J. (1960). A coefftcient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37-46.

Copp, 0. (1917). Report of the Committee on Statistics of the American MedicoPsychological Association. American Journal of Insanity, 74, 255-270. Cronbach, L. J. (1971). Test validation. In R. L. Thorndike (Ed.), Educational measurement (pp. 443-507). Washington, DC: American Council on Education. Edelbrock, C., & Costello, A. J. (1990). Structured interviews for children and adolescents. In G. Goldstein & M. Hersen (Eds.), Handbook ofpsychological assessment (2nd ed., pp. 308-324). Elmsford, NY: Pergamon Press. Feighner, J, P., Robins, E., Guze, S. B., Woodruff, R. A., Winokur, G. W., & Munoz, R. (1972). Diagnostic criteria for use in psychiatry. Archives of General Psychiatry, 26, 57-63. Fisher, S. R., & Martin, C. J. (1981). Standardized interviews in psychiatry: Issues in reliability. British Journal of Psychiatry, 139, 138- 143. Hedlund, J. L., & Viewing, B. W. (1981). Structured psychiatric interviews: A comparative review. Journal of Operational Psychiatry, 12, 39-67.

Saigh

149

Kolb, L. C. (1979). Foreword. In R. A. MacKinnon & R. Michels (Eds.), The psychiatric interview in clinical practice. (pp. i-vii). Philadelphia: W. B. Saunders. Morey, L. C., Skinner, H. A., & Blashfield, R. K. (1986). Trends in the classification of abnormal behavior. In A. R. Cimenero, K. S. Calhoun, & H. E. Adams (Eds.), Handbook of behavioral assessment (2nd ed.; pp. 47-78). New York: Wiley. Orvaschel, H., & Puig-Antich, J. (1987). Schedulefor A# ect ive Disorders and Schizophrenia for School and Children Epidemiological Version: Kiddie-SADS-E. Unpublished interview schedule. Pittsburg, PA: Western Psychiatric Institutes and Clinic. Perloff, R., Craft, J. A., & Perloff, E. (1984). Testing and industrial application. In G. Goldstein & M. Hersen (Eds.), Handbook of psychological assessment (pp. 421440). New York: Pergamon. Puig-Antich, J., & Chambers, W. (1978). Schedulefor Affective Disorders and Schizophrenia for School Age Children. Unpublished interview schedule. New York, NY. Puig-Antich, J., & Ryan, N. (1986). Schedule for Affective Disorders and Schizophrenia for School Age Children 6-18 years.: Kiddie-SADS-Present Episode. Unpublished interview schedule. Pittsburg, PA: Western Psychiatric Institute and Clinic. Reich, W. (1991a). Diagnostic Interview for Children-Child Version. Unpublished interview schedule. Washington University School of Medicine, Department of Psychiatry. Reich, W. (1991b). Diagnostic Interview for Children-Parent Version. Unpublished interview schedule. Washington University School of Medicine, Department of Psychiatry. Rosenhan, D. L. (1973). On being sane in insane places. Science, 179, 250-258. Saigh, P. A. (1989a). The development and validation of the Children’s Posttraumatic Stress Disorder Inventory. International Journal of Special Education, 4, 75-84. Saigh, P. A. (1989b). The validity of the DSM-III posttraumatic stress disorder classification as applied to children. Journal of Abnormal Psychology, 98, 189- 192. Skodol, A. (1989). Problems in differential diagnosis. Washington, DC: American Psychiatric Press. Spiker, D. G., & Ehler, J. G. (1984). Structured psychiatric interviews for adults. In G. Goldstein & M. Hersen (Eds.), Handbook of psychological assessment (pp. 291306). New York: Pergamon. Spitzer, R. L., Endicott, J., & Robins, E. (1978). R esearch diagnostic criteria: Rationale and reliability. Archives of General Psychiatry, 23, 41-55. Spitzer, R. L., & Fleiss, J. L. (1974). A re-analysis of the reliability of psychiatric diagnosis. British Journal of Psychiatl;y, 125, 341-347. Spitzer, R. L. & Williams, J. B. (1980). Classifications of mental disorders in the DSM-III. In H. Kaplan, A. Freedman, & B. Sadock (Eds.), Comprehensive textbook ofpsychiatry (3rd ed.; pp. 1035-1072). Baltimore: Williams & Wilkins. Sudman, S., & Bradburn, N. M. (1982). Asking questions. San Francisco: Jossey-Bass. Szasz, T. S. (1970). The manufacture of madness. New York: Harper & Row. Welner, F., Reich, W., Herjanic, B., Jung, K., & Amado, H. (1987). Reliability, validity, and parent-child agreement studies of the Diagnostic Interview for Child and Adolescent Psychiatry, Journal of the American Academy of Child and Adolescent Psychiatv, 25, 649-653. Williams, J. B. (1988). Psychiatric classification. In J. A. Talbott, R. E. Hales, & S. C. Yudofsky (Eds.), Textbook ofpsychiatty (pp. 201-224). Washington, DC: American Psychiatric Press. Wing, J. K. (1977). Reliability of the PSE (9th ed.). Psychological Medicine, 7, 505516. World Health Organization. (1979). The international classification of diseases, clinical modification (9th ed.). Ann Arbor, MI: Edwards.