Efficiency of Diagnostic Criteria for Attention Deficit Disorder: Toward an Empirical Approach to Designing and Validating Diagnostic Algorithms

Efficiency of Diagnostic Criteria for Attention Deficit Disorder: Toward an Empirical Approach to Designing and Validating Diagnostic Algorithms

Efficiency of Diagnostic Criteria for Attention Deficit Disorder: Toward an Empirical Approach to Designing and Validating Diagnostic Algorithms STEPH...

7MB Sizes 0 Downloads 40 Views

Efficiency of Diagnostic Criteria for Attention Deficit Disorder: Toward an Empirical Approach to Designing and Validating Diagnostic Algorithms STEPHEN V. FARAONE, PH.D., JOSEPH BIEDERMAN, M.D., SUSAN SPRICH-BUCKMINSTER, B.A., WEI CHEN, M.D., AND MING T. TSUANG, M.D., PH.D.

Abstract. Using structured psychiatric interviews, 73 attention deficit disorder (ADD) patients, 26 psychiatric control patients, 26 normal controls, and all available first degree relatives of these index children were examined. ADD subgroups with and without comorbid psychiatric disorders did not differ on rates of specific ADD symptoms. The construct of ADD is internally consistent as measured by Cronbach's alpha. The diagnostic efficiency of individual items is presented. A receiver operating characteristic-based procedure is used to create an ADD diagnostic algorithm that is more efficient in discriminating ADD children from controls than the DSM-IIIbased clinical diagnosis. Cross-validation with family study data shows this procedure to be superior to the procedure used for the DSM-III-R diagnosis. The results show that proponents of conditional probability and receiver operating characteristic analyses are correct in asserting that the examination of symptom combinations may result in better diagnostic algorithms. 1. Am. Acad. ChiLd Adolesc. Psychiatry, 1993, 32, I: 166-174. Key Words: attention deficit disorder, diagnostic efficiency, comorbidity, family study. The diagnostic criteria for attention deficit disorder (ADD) have undergone a considerable evolution from DSMII's "hyperkinetic syndrome of childhood," to DSM-III-R's "attention-deficit hyperactivity disorder" (ADHD), and further changes are planned in DSM-IV. Although this process has made dramatic changes in symptom content, the choice of symptoms has been based mostly on a process of consensus among committee members rather than by examination of large databases. For example, Spitzer et aI. (1990) indicate that the DSM-llI-R Advisory Committee responsible for the development of diagnostic criteria for ADHD agreed

Accepted April 14. 1992. Dr. Faraone is Director of Research, Pediatric PsychopharmacoLogy Unit, Massachusetts General HospitaL and Assistant Professor of Psychology, Harvard Medical SchooL, Boston, Massachusetts. Dr. Biederman is Chief, Pediatric Psychopharmacology Unit, Child Psychiatry Service, Massachusetts General Hospital and Associate Professor of Psychiatry, Harvard Medical School. Ms. SprichBuckminster is Research Assistant, Pediatric Psychopharmacology Unit, Massachusetts GeneraL HospitaL, Boston, Massachusetts. Dr. Chen is a doctoraL student, Department of Epidemiology, Harvard School ofPubLic Health, and Psychiatry Service, Brockton-West Roxbury VA Medical Center. Dr. Tsuang is Professor of Psychiatry, Department of Psychiatry, Harvard MedicaL SchooL, and Director of the Joint Program in Psychiatric EpidemioLogy, Harvard SchooLs of Medicine and Public Health, and Chief of Psychiatry, Psychiatry Service, Brockton-West Roxbury VA Medical Center. This work was supported, in part, by grants from the Charlupski Foundation (J.B.), as well as USPHS (NIMH) grant ROI MH-413I4OIAi (J.B.) We thank Dr. Kerim Munirand Virginia Wright, B.A.,for their heLp with this project, as well as Dr. Michael Jellinek for his encouragement. This manuscript was prepared while Dr. Tsuang was a Fellow at the Centerfor Advanced Study in the BehavioraL Sciences. We are gratefuL for the financiaL support provided to the Center for this fellowship year by the John D. and Catherine T. MacArthur Foundation and the Foundations Fund for Research in Psychiatry Endowment. Reprint requests to Dr. Biederman, Pediatric Psychopharmacology Unit (ACC 725), Massachusetts GeneraL Hospital, Fruit Street, Boston, MA 02114. 0890-8567/93/3201-0166$03.00/0©1993 by the American Academy of Child and Adolescent Psychiatry.

166

to abandon the grouping of diagnostic items into the three categories of inattention, impulsivity, and hyperactivity. However, they disagreed about the relative utility of different items and on how many items should be required to establish the diagnosis. The analysis of the DSM field trials has used sophisticated methods to choose diagnostic cutpoints (Spitzer et aI., 1990) or to suggest changes in algorithms (Siegel et aI., 1990; Spitzer and Siegel, 1990). However, such methods might be more profitably applied to the selection of items by the committee. Despite taxonomic difficulties, there is considerable evidence for the validity of a syndrome of inattentiveness, impulsivity, and hyperactivity affecting large numbers of children in geographically and ethnically varied groups. For simplicity of exposition the term ADD is used here to refer generically to this syndrome. Although ADD children differ from other child psychiatric patients on validating measures such as outcome, familial aggregation, and psychosocial factors (Rutter, 1981; Szatmari et aI., 1989), ADD commonly presents with a variety of comorbid psychiatric disorders. In particular, high levels of comorbidity exist between ADHD and conduct, mood, and anxiety disorders (Biederman et aI., 1991). These nosological problems pertaining to the diagnosis of ADD may benefit from a more thorough understanding of the psychometric properties of its defining criteria. Although DSM-III and its revision provide explicit operational criteria for ADD, little is known about the internal consistency and diagnostic efficiency of these criteria (Hodges et aI., 1990; Shaywitz and Shaywitz, 1987). In psychometric theory, the internal consistency of scales of measurement provides a basis for asserting the reliability of a measure. However, the assessment of diagnostic reliability for ADD has mostly focused on the demonstration of interrater agreement with statistics such as Cohen's Kappa. Although such data play a vital role in assessing the adequacy of diagnostic methods, they provide little information about reliability as the term is used in psychometrics, i.e., the J. Am. Acad. Child Adolesc. Psychiatry, 32: I, January 1993

VALIDATING DIAGNOSTIC ALGORITHMS correlation of an observed score with a "true" unobservable score. "Diagnostic efficiency" is the degree to which individual criteria correctly discriminate cases from noncases as evaluated from actuarial prediction and decision theory (Widiger et aI., 1984). The focus is on a conditional probability analysis that computes the sensitivity, specificity, positive pre, ~ictive power, and negative predictive power of symptoms and combinations of symptoms as predictors of diagnoses. These methods have been useful for the development of diagnostic algorithms and the demonstration of divergent validity for childhood psychiatric disorders (Milich et aI., 1987; Spitzer and Siegel, 1990). Sophisticated conditional probability analyses can be performed using signal detection theory (Kraemer, 1987; Siegel et aI., 1989, 1990).

TABLE

I. Definitions of Diagnostic Efficiency Statistics

Diagnosis Symptom Yes No Column total Positive Predictive Power Negative Predictive Power Sensitivity

Specificity

Method Total Predictive Power

SUBJECTS Seventy-three ADD and 26 psychiatric control patients were consecutively recruited from the pool of existing and new referrals to the Pediatric Psychopharmacology Clinic of the Massachusetts General Hospital (MGH). The 26 normal control children came from the pediatric primary care service at the MGH. All ADD subjects, had a diagnosis of DSM-llI ADD based on interviews with the children and their parents by a clinic psychiatrist. Also, each ADD subject received a diagnosis of DSM-III ADD based on a structured psychiatric interview (described below). The clinic diagnoses of psychiatric controls were also confirmed with a structured psychiatric interview. They were: affective disorders (N = 10), anxiety disorders (N =8), pervasive developmental disorders (N = 7), and Tourette's disorder (N = 1). Potential normal controls were excluded if they received a DSM-llI diagnosis of ADD based on the structured psychiatric interview. PROCEDURES Psychiatric assessments of children were obtained by interviewing their mothers with the Diagnostic Interview for Children and Adolescents-Parent Version (DICA-P). Parents were interviewed about themselves with the National Institute of Mental Health Diagnostic Interview Schedule (NIMH-DIS) to cover adult disorders and an addendum based on the DICA-P to cover childhood disorders. Subjects were diagnosed as ill only if, according to the interview results, DSM-III criteria were unequivocally met. All the symptom data presented in this report were collected during the structured psychiatric interview. Measures of Diagnostic Efficiency

As Table 1 indicates, the positive predictive power (PPP) for an individual item is ~. In this formula, a is the number of subjects with both the symptom and the diagnosis and b is the number with the symptom but not the diagnosis. in which c is The negative predictive power (NPP) is the number of subjects without the symptom who carry the diagnosis and d is the number for whom both the symptom

c1:cF

J. Am. Acad. Child Adolesc. Psychiatry, 32: J, January J993

Yes

No

Row Total

a c a+c

b d b+d

a+b c+d

the probability of having the diagnosis given the symptom is present the probability of not having the diagnosis given the symptom is absent the probability the symptom is present among those given the diagnosis the probability the symptom is absent among those not given the diagnosis the total probability of correct classification

-aa+b d c+d

-aa+c

-db+d

-a+d --

a+b+c+d

and the diagnosis are absent. The sensitivity is afc; and the specificity is b ~ d' The total predictive value (TPV) is the total probability of correct classification. This is defined as: a+d a+b+c+d

The application of signal detection theory (SDT) requires only the sensitivity and specificity. The PPP, NPP, and TPV are not used in SDT because their values will vary with the prevalence of the disorder. In contrast, sensitivity and specificity are independent of prevalence. The receiver operating characteristic (ROC) graph summarizes these measures by plotting sensitivity against one minus specificity. In diagnostic applications it is convenient to plot sensitivity against specificity for ease of interpretation. Kraemer (1988) has shown that the diagnostic efficiency of each symptom can be more easily grasped from a transformation of the ROC curve called the quality ROC curve (QROC). The QROC adjusts the sensitivity (SE) and specificity (SP) for the base rate of the symptom (~plotting the kappa coefficient for specificity, Ksp = - I"Q, against the kappa coefficient for sensitivity, KSE = s~-q On the ROC plot, randomly ~edictive symptoms fall along the random ROC. This is the straight line connecting SE = 1.0 and SP = 1.0. On the QROC plot, all randomly predictive symptoms fall on the point at which KSE = Ksp = 0.0. An ideal, or perfectly diagnostic symptom falls on the point KSE = Ksp = 1.0; symptoms falling on this ideal point produce no false positives and no false negatives. The distance of each symptom to the ideal point on the QROC, dQ, is used as an overall index of the efficiency with which each symptom discriminates cases from non-cases. As diagnostic efficiency increases, dQ decreases. Methods of Constructing Diagnostic Algorithms

Three methods of constructing diagnostic algorithims

167

FARAONE ET AL. TABLE

2. Symptom Base Rates for Subgroups of ADD Probands All ADD ADD Psychiatric Normal Only" ADD + CIY' ADD + MDD< ADD + ANXd Controls Controls Cases (N = 73) (N = 30) (N = 19) (N = 24) (N = 22) (N = 26) (N = 26)

Inattenti veness A. Difficulty finishing games B. Difficulty finishing work C. Doesn't listen to parents or teachers D. Can't concentrate if there is noise or people E. Has trouble concentrating on school work F. Has trouble concentrating on fun things Mean Impulsivity G. Rushes into things and gets hurt or in trouble H. Oftewchanges from one activity to another I. Often rushes throu&h work J. Has to be reminded to do chores K. Usually falls behind in class L. Speaks out in class M. Has trouble waiting for tum or in line Mean Hyperactivity N. Often runs around, even in the house O. Climbs on things not meant for climbing P. Has difficulty sitting still Q. Often fidgets when seated R. Leaves table before through eating S. Leaves seat at school T. Restless sleeper U. Is always on the go as if driven by a motor Mean

0.79 0.79 0.88 0.88 0.88 0.27 0.75

0.67 0.63 0.87 0.83+ 0.83+ 0.23 0.63

0.96++* 1.00+++** 0.96++ 0.92++ 0.92++ 0.33 0.85

0.84 0.95+++* 0.84 0.95++ 0.89+ 0.32 0.80

0.92+ 0.82++ 0.86 0.95++ 0.95++ 0.36 0.81

0.58# 0.42## 0.62## 0.50### 0.58## 0.19 0.48

0.15*** 0.08*** 0.08*** 0.08*** 0.15*** 0.00* 0.09

0.70 0.74 0.74 0.78 0.77 0.74 0.66 0.73

0.60 0.56 0.63 0.70 0.67 0.53+ 0.57+ 0.61

0.88+++* 0.92+++* 0.83++ 0.79+ 0.83++ 0.88+++** 0.67++ 0.83

0.63+ 0.79++ 0.84++ 0.84++ 0.84+ 0.84+++* 0.74++ 0.79

0.82++ 0.86+++* 0.82++ 0.82++ 0.91++ 0.86+++** 0.82+++ 0.84

0.31## 0.32## 0.38## 0.42## 0.46## 0.23### 0.23### 0.34

0.00*** 0.00*** 0.19** 0.23*** 0.08*** 0.04*** 0.00*** 0.08

0.70 0.53 0.73 0.68 0.64 0.56 0.55 0.70 0.64

0.63+ 0.53 0.77++ 0.73++ 0.43 0.50+ 0.57 0.67++ 0.60

0.71+ 0.58 0.67 0.54 0.71++ 0.63++ 0.58 0.79+++ 0.65

0.79++ 0.47 0.74+ 0.74+ 0.79++* 0.53+ 0.74 0.74++ 0.69

0.73++ 0.64+ 0.73+ 0.77++ 0.86+++* 0.59++ 0.50 0.6S++ 0.69

0.31## 0.31 0.38## 0.35## 0.27## 0.19## 0.46 0.19### 0.31

0.15*** 0.12** 0.08*** 0.04*** 0.04** 0.04*** 0.08*** 0.04*** 0.07

QADD with no CD, MDD or ANX. bConduct disorder.
were compared. Each method used data from patients only; information from relatives was not used to create the QROC diagnostic algorithm. QROC Rule. The QROC diagnostic algorithm for ADD is constructed as follows. The QROC index of diagnostic efficiency, dQ, is computed twice for each possible pair of symptoms. In the first pairing, a pair of symptoms is scored as positive if both are present (the AND rule); in the second pairing a pair of symptoms is scored as positive if at least one is present (the OR rule). For each "AND" and "OR" pairing, ~ is computed. The most diagnostically efficient pair is then selected (Le., the one with the lowest~) as the starting point of the diagnostic algorithm. The increase in efficiency afforded by adding other symptom pairs to the algorithm is evaluated. The symptoms in the most efficient of the remaining pairs are joined with the existing pair either as a whole pair or separately depending upon which approach leads to the lowest value for dQ• This procedure continues with additional symptom pairs, in order of efficiency, until dQ cannot be improved. The process is repeated

168

to create another subset of symptoms that, when combined with the other subset, results in the lowest value for dQ• The procedure continues until dQ can no longer be reduced and there are at least 14 symptoms in the diagnostic algorithm. The symptom set was reduced to 14 because this is the number used in DSM-Ill-R. Best 14 Rule. This rule is similar to the procedure used to select symptoms for the DSM-III-R diagnosis of ADHD. This rule is constructed by first selecting the 14 most diagnostically efficient symptoms based on the dQ criterion. Rules of the form "at least X out of 14 symptoms are required for a positive diagnosis" (where X varied from 1 to 14 in increments of one) are examined. The best of these is termed the "Best 14" rule. All21 Rule. A rule that used all the 21 symptoms available is also examined. The "All 21" rule has the form "at least X out of 21 symptoms are required for a positive diagnosis" where X varied from 1 to 21 in increments of one. This rule was used to determine if the addition of symptoms to the Best 14 diagnostic set could improve diagnostic efficiency. J.Am.Acad. Child Adolesc. Psychiatry, 32: 1, January 1993

VALIDATING DIAGNOSTIC ALGORITHMS

Results Table 2 compares the base rates of ADD symptoms in ADD patients with and without comorbidity. Since the symptoms in Table 2 define the diagnosis of ADD, it is not surprising to find that most of these are significantly more prevalent among the ADD cases compared with the psychiatric and normal control groups. However, three symptoms do not significantly discriminate ADD from psychiatric control subjects; these are "has trouble concentrating on fun things" (Symptom F), "climbs on things not meant for climbing" (Symptom 0), and "restless sleeper" (Symptom T). ADD subjects with and without comorbidity are significantly more likely to have each ADD symptom when compared with normal controls. However, when the ADD subgroups with and without comorbid disorders are compared, the ADD symptoms are found to be more prevalent among the comorbid subgroups. Indeed, although the ADDonly children are more symptomatic than the psychiatric controls, the differences are statistically significant for only 10 of the symptoms (Table 2). In contrast, a higher number ~f individual symptoms significantly discriminatepsychiatnc controls from ADD+CD (N=16), ADD+MDD (N=16), and ADD+ANX (N=18) children. INTERNAL CONSISTENCY In most clinical and research settings, the question of differential diagnosis is paramount. The differentiation of ADD children from those with no psychiatric disorder is less important. Indeed, as Table 2 indicates, the base rate of all symptoms is very low in normal controls. Thus, to assess internal consistency and diagnostic efficiency, the analyses exclude the normal control cases. Even with normal controls excluded, the internal consistency of the entire set of ADD items is very high. Cronbach's alpha is 0.89 for the 21 ADD items. This indicates that the construct of ADD as represented by the sum of the items is highly reliable. Since alpha is either 0.88 or 0.89 for all diagnostic item sets created by deleting items one at a time, we conclude that the reliability of the ADD construct cannot be meaningfully improved by deletion of individual items. .

biserial correlation classifies 69% of cases correctly and has one of the best dQ values. Figure I shows the receiver operating characteristic (ROC) plot for the 21 ADD symptoms. The letters on this plot correspond to the letters that label the symptoms in Tabl~s ~ and 3. The ROC plots sensitivHy (SE) against s.pecificity (~P). On the ROC graph, points on the diagonal Ime connectmg SE = 1 and SP = 1 cannot discriminate cases from noncases. The farther a point is from the diagonal, the better is the discrimination. The advantage of the ROC is that it clearly shows the sensitivities and specificities of each symptom and offers a convenient summary of the data. For exam~le, it is easy to se~ that symptom T, "restless sleeper" by .b~mg close to the diagonal line has very low diagnostic effICIency. In contrast, symptom K "falls behind in class" is preferable to T because it has better sensitivity (Le., it is farther away from the diagonal line on the sensitivity axis); Symptoms 0, "climbs on things," and S "leaves seat," are better than T due to increased specificity (i.e., they are farther away from the diagonal line on the specificity axis). The QROC for the data in Figure 1 is plotted in Figure 2.'!?e QROC plots the kappa coefficient for sensitivity (1
CE

D

K9

, H

L

Q iii

U

R U

o

s

DIAGNOSTIC EFFICIENCY Table 3 presents the point biserial correlations between each item and the sum of all the other items. As mentioned in Method, the point biserial correlation is the correlation between each item and the sum of all the other items; this assesses the degree to which each item measures the construct defined by the set of diagnostic items. In contrast to the individual item alphas, the point biserial correlations show more variability. They range from a low of 0.32 for "restless sleeper" (Symptom T) to a high of 0.71 for "has trouble waiting for tum or waiting in line" (Symptom M). ~he sens~ti~ity, specificity, positive predictive power, nega~ve predictive po,":er, total predictive power, and ~ values m Table 3 are conSistent with the point biserial correlations. For example, the symptom with the lowest point biserial correlation only classifies 55% of cases correctly and has the worst dQ value. The symptom with the highest· point J. Am. Acad. Child Adolesc. Psychiatry, 32:1. January 1993

0.2 '. ".

O.O--r----r--,----.--,-r--.,......-r---.--..-~

0.0

0.2

0.4·

0.6

O.ES

1.0

SP FIG.. I .. Receiver operating characteristic plot of sensivitity (SE) and speCIfiCIty (SP) for 21 symptoms of attention-deficit disorder. The letters used as plotting symbols correspond to the symptoms listed in Tables 2 and 3. Points on the diagonal line cannot discriminate cases from noncases. Perfect discrimination occurs at the point: SE = I and SP =1.

169

FARAONE ET AL. TABLE 3. Diagnostic Efficiency Statistics for Detection of ADD versus Psychiatric Controls

Inattentiveness A. Difficulty finishing games B. Difficulty finishing work C. Doesn't listen to parents or teachers D. Can't Concentrate if there is noise or people E. Has trouble concentrating on school work F. Has trouble concentrating on fun things Mean Impulsivity G. Rushes into things and gets hurt or in trouble H. Often changes from one activity to another I. Often rushes through work J. Has to be reminded to do chores K. Usually falls behind in class L. Speaks out in class M. Has trouble waiting for tum or in line Mean Hyperactivity N. Often runs around a lot even in the house O. Climbs on things not meant for climbing P. Has difficulty sitting still Q. Often fidgets when seated R. Leaves table before through eating S. Leaves seat at school T. Restless sleeper U. Is always on the go as if driven by a motor Mean

Point Biserial Correlation

Specificity

Positive Predictive Power

Negative Predictive Power

Total Predictive Value

Sensitivity

d" Q

0.57 0.34 0.36 0.46 0.62 0.37 0.45

0.79 0.79 0.88 0.88 0.88 0.27 0.75

0.42 0.58 0.38 0.50 0.42 0.81 0.52

0.79 0.84 0.80 0.83 0.81 0.80 0.81

0.42 0.50 0.53 0.59 0.55 0.28 0.48

0.70 0.74 0.75 0.78 0.76 0.41 0.69

1.11 0.91 0.97 0.85 0.95 1.23 1.00

0.57 0.57 0.44 0.43 0.37 0.47 0.71 0.51

0.70 0.74 0.74 0.78 0.77 0.74 0.66 0.73

0.6~

0.65 0.62 0.58 0.54 0.77 0.77 0.66

0.86 0.86 0.84 0.84 0.82 0.90 0.89 0.86

0.45 0.47 0.46 0.48 0.45 0.51 0.44 0.47

0.70 0.72 0.71 0.73 0.71 0.75 0.69 0.72

0.91 0.90 0.95 0.93 1.00 0.76 0.86 0.90

0.61 0.45 0.56 0.52 0.53 0.62 0.32 0.68 0.54

0.70 0.53 0.73 0.68 0.64 0.56 0.55 0.70 0.64

0.69 0.69 0.62 0.65 0.73 0.81 0.54 0.81 0.69

0.86 0.83 0.84 0.85 0.87 0.89 0.77 0.91 0.85

0.45 0.35 0.44 0.43 0.42 0.40 0.30 0.49 0.41

0.70 0.58 0.70 0.68 0.67 0.63 0.55 0.73 0.66

0.91 1.10 0.97 0.97 0.93 0.92 1.30 0.77 0.98

Note: The letters that label symptoms correspond to the plotting symbols in Figures I, 2, 3, and 4. "dQ is the distance between the symptom and the ideal point on the QROC graph.

ciency d Q• which is the distance between the ideal point Ks£ = Ksp = 1.0 and the symptom. For example, on the ROC it cannot be easily judged whether D or L is the more efficient symptom. On the QROC, L can readily be identified as being closer to the ideal point. Another advantage of the QROC is that the "diagnosis line" always connects the random point (Ks£ = Ksp = 0.0) with the ideal point (Ks£ = Ksp = 1.0). Any symptom falling on the diagnosis line has a base rate equal to the base rate of the criterion diagnosis. Despite these differences in interpretability. it is reassuring that both methods produce similar results. Both curves identify symptoms T, F, A, and 0 as low in diagnostic efficiency; both identify D, L, U, and M as high in diagnostic efficiency. OPTIMIZING THE ADD DIAGNOSTIC ALGORITHM Figure 3 plots the ROC for all possible pairings of the 21 symptoms. The QROC for these pairings is plotted in Figure 4. The "0" symbols indicate pairs scored positive only if both symptoms were positive (the AND rule); the "+" symbols indicate pairs scored positive if either symptom was positive (the OR rule). By comparing the ROCs in Figures 1 and 3, it is clear that the AND rule tends to increase specificity and decrease sensitivity; the OR rule increases sensitivity and decreases specificity. A comparison of the QROCs in Figures 2 and 4 shows that many of the 170

paired symptoms are more diagnostically efficient than any single symptom. To construct the QROC diagnosis, the procedure described in the methods section was followed. Table 4 presents the resulting diagnostic algorithm. As Table 4 indicates, our procedure found three clusters of symptoms that were most efficiently combined, within cluster, by using the OR rule. For example, the symptom pair (L OR R) was found to be the most diagnostically efficient pair. The efficiency of this pair increased by adding symptoms U, S, and M but could not be increased further by adding other symptoms. By repeating this process, clusters two and three were created. It was found that the symptoms from each cluster were best combined using the OR rule within each cluster and the AND rule between clusters. Thus, to maximize diagnostic efficiency only one symptom from each of the three clusters is needed to make the diagnosis. The diagnostic efficiency for the three diagnostic methods described in Method above are presented in Table 5. Each method used data from patients only; information from relatives was not used to create the QROC diagnostic algorithm. The QROC has excellent diagnostic efficiency as measured by each of the conditional probability statistics. Its low dQ and high total predictive value also indicates excellent diagnostic efficiency. The "Best 14" rule simulates the procedure used to select symptoms for the DSM-III-R diagnosis J. Am-Acad. Child Adolesc. Psychiatry, 32: 1, January 1993

V ALIDATING

DIAGNOSTIC ALGORITHMS

SE

ICsE

...

1.0

1.0

.:: .

0.8

0.8

~

J

··· . ·. ·.

0.6

0.6

c

..

E

J H rJ Q l!I

K

0.2

·· .

0.4

0

0.4

··· ·· ·· ·· . ·· ·· · ·.·

L U

0.2

II

R S

0

o.o-+--,----r---.-r---r--r---r----.----r~

0.0 0.0

0.2

0.6

0.4

0.8

1.0

ICsp FIG. 2. Quality receiver operating characteristic plot of the kappa coefficients for sensitivity (KSE) and specificity (Ksp) for 21 symptoms of attention-deficit disorder. The letters used as plotting symbols correspond to the symptoms listed in Tables 2 and 3. Symptoms that cannot discriminate cases from noncases would fall on the point: KSE = o and Ksp = O. Perfect discrimination occurs at the point: KSE = I and Ksp =1. The diagonal line is the diagnosis line. Symptoms falling on the diagnosis line have a base rate equal to the base rate of the criterion diagnosis.

of ADHD. This rule was constructed by first selecting the 14 most diagnostically efficient symptoms from Table 3 based on the dQ criterion. The efficiency of the 14 rules of the form "at least X out of 14 symptoms are required for a positive diagnosis" (where X varied from 1 to 14 in increments of one) was then examined. The rule that had the best diagnostic efficiency (i.e., the lowest dQ [0.59]) also had the highest total predictive power (0.86); this rule required only four of 14 symptoms to be positive for diagnosis. The "All 21" rule used all the 21 symptoms. These rules had the form "at least X out of 21 symptoms are required for a positive diagnosis" where X varied from 1 to 21 in increments of one. The rule that had the lowest dQ (0.58) also had the highest total predictive power (0.86); it required at least seven of 21 symptoms to be positive. The results for this rule are similar to those for the "Best 14" rule. Both require approximately 30% of the symptoms to be positive and both are less efficient than the QROC algorithm. As Table 5 indicates, although both rules were slightly more sensitive than the QROC algorithm, their specificities were much poorer. For comparison purposes, Table 5 also presents the diagnostic efficiency statistics computed for the diagnosis of DSM-III-R ADHD from the DSM-III-R field trials (Spitzer et aI., 1990). Based on the total predictive power (0.83) and J. Am. Acad. Child Adolesc. Psychiatry, 32: 1, January 1993

0.0

0.2

0.4

0.6

0.8

1.0

SP FIG. 3. Receiver operating characteristic plot of sensitivity (SE) and specificity (SP) for all possible pairings of the 21 symptoms of attention deficit disorder. The' '0" symbols indicate pairs scored positive only ifboth symptoms were positive (the AND rule); the "+" symbols indicate pairs scored positive if either symptom was positive (the OR rule). The numbers 1,2, 3, and 4 refer to the diagnostic rules in Tables 5 and 6. Points on the diagonal line cannot discriminate cases from noncases. Perfect discrimination occurs at the point: SE = I and SP = I.

dQ value (0.51), the efficiency of the DSM-III-R algorithm

is similar to the Best 14 and All 21 rules. Overall, diagnostic efficiency was best for the QROC algorithm. This is apparent in its high total predictive power and its low dQ value. To facilitate comparisons of the three Method and the field trials, the diagnostic efficiency statistics in Table 5 were compared with each other and with all possible symptom pairings in Figures 3 and 4. The points labeled 1,2, 3, and 4 correspond to the QROC rule, Best 14 rules, All 21 rule, and DSM-III-R field trials, respectively. These figures indicate that each rule performs better than any single pairing and that the QROC rule is the most efficient. VALIDATION OF THE QROC RULE Validation using external validators was used since the procedure to develop the QROC rule may have capitalized on chance variation in the data. Thus, validating the QROC rule using symptom data from the biological parents and siblings of the ADD, psychiatric control and normal control children was attempted. Since it has been shown that ADD is a familial disorder (e.g., Biederman et aI., 1990, 1992; Cantwell, 1971; Morrison and Stewart, 1971), it was reasoned that increasing the efficiency of the diagnosis of ADD should increase the degree of observed familial aggregation. 171

FARAONE ET AL.

QROC-defined ADD children are over seven times more likely to have QROC-defined ADD. For the other definitions there is only a three- to four-fold increase in risk. The effect is more pronounced in the comparison with normal controls. Compared with relatives of normal controls, relatives of QROC-defined ADD children are over 23 times more likely to have QROC-defined ADD. For the other definitions there is only an eight- to nine-fold increase in risk.

1.0 •

0.8

J

Discussion

0.6

PSYCHIATRIC COMORBIDITY

..

0.4 0.2 O.O-f'---r-...:...,---r-,--...-..---r----r--r---,

0.0

0.2

0.4

0.6

0.8

1.0

Ksp FIG. 4. Quality receiver operating characteristic plot of the kappa coefficients for all possible pairings of the 21 symptoms of attentiondeficit disorder. The "0" symbols indicate pairs scored positive only if both symptoms were positive (the AND rule); the "+" symbols indicate pairs scored positive if either symptom was positive (the OR rule). The numbers 1,2, 3, and 4 refer to the diagnostic rules in Tables 5 and 6. Symptoms that cannot discriminate cases from noncases would fall on the point: 1(5£ = 0 and lCsp = O. Perfect discrimination occurs at the point: 1(5£ = I and 1(5P = I. The diagonal line is the diagnosis line. Symptoms falling on the diagnosis line have a base rate equal to the base rate of the criterion diagnosis.

The validation procedure was as follows for each diagnostic rule. First, the ascertainment of our sample was simulated by rediagnosing each of the index children with the QROC rule. Any of the original ADD children who did not have ADD, as defined by this rule, and any of the psychiatric and normal control children who did meet the rule's criteria for ADD were excluded. Each relative was then rediagnosed according to the QROC rule and computed the percentage of relatives in each index child group meeting criteria for ADD as defined by this rule. Table 6 uses the odds ratio to compare the degree of familial aggregation resulting from the original DSM-lIlbased diagnosis and each of the new rules in Table 5. The odds ratio expresses the degree to which a relative's odds of having ADD are increased by a biological relationship with an ADD index child compared with the odds of having ADD for relatives of psychiatric or normal controls. Compared with the other rules, the QROC rule diagnoses fewer cases of ADD for each group of relatives. However, as measured by the odds ratio, the QROC definition of ADD is more familial than the other definitions. For example, compared with relatives of psychiatric controls, relatives of

172

Our analyses indicate that ADD children with comorbid conduct, major depressive, and anxiety disorders have more symptoms of ADD than ADD children who have none of these disorders. As a result, the ADD children with comorbid conditions were, on the basis of ADD symptoms, more discriminable from psychiatric and normal controls than were the ADD children without these conditions. The ability of ADD symptoms to discriminate ADD cases with major depression or anxiety from psychiatric controls is particularly notable given that 69% of the psychiatric controls had an affective or anxiety disorder. Examination of the individual symptoms in Table 2 suggests that the greater number of symptoms among ADD children with major depression and anxiety disorders compared to those without these disorders cannot be attributed to psychopathology shared by ADD and these comorbid disorders. Also, although major depressive and anxiety disorders are usually expressed with' 'internalizing" behaviors, ADD children with these disorders had more of the impulsive and hyperactive "externalizing" behaviors than did the ADD children with no comorbid disorder (ADD-only). The effects of a comorbid conduct disorder on the diagnosis of ADD are less clear cut. Many of the ADD symptoms that were assessed might be attributable to the antisocial nature of the conduct disorder child. On the other hand, TABLE 4.

Optimal QROC Diagnostic Algorithm for ADD

At least one item from each of Clusters 1,2, and 3 must be positive to make a diagnosis Cluster I L. Speaks out in class R. Leaves table before through eating U. Is always on the go as if driven by a motor S. Leaves seat at school M. Has trouble waiting for tum or in line Cluster 2 G. Rushes into things and gets hurt or in trouble B. Difficulty finishing work D. Can't concentrate if there is noise or people E. Has trouble concentrating on school work N. Often runs around, even in the house O. Climbs on things not meant for climbing Cluster 3 P. Has difficulty sitting still Q. Often fidgets when seated K. Usually falls behind in class Note: The letters that label symptoms correspond to the plotting symbols in Figures I, 2, 3, and 4. J. Am. Acad. Child Adolesc. Psychiatry, 32: 1,January 1993

V ALIDATlNG TABLE

Diagnostic Rule Method 1: QROC Method 2: Best 14 Method 3: All 21 DSM-lII-R field trialsb

DIAGNOSTIC ALGORITHMS

5. Diagnostic Efficiency Statistics for Four Diagnostic ALgorithms

Sensitivity

Specificity

0.95 0.99 0.97 0.85

0.85 0.50 0.54 0.80

Positive Predictive Power

Negative Predictive Power

Total Predictive Value

0.95 0.85 0.86 0.85

0.85 0.93 0.88 0.79

0.92 0.86 0.86 0.83

0.30 0.59 0.58 0.51

Note: The QROC rule is defined in Table 3. The other rules are defined in the text. adQ is the distance between the rule and the ideal point on the QROC graph. bResults from the DSM-lII-R field trials (Spitzer et aI., 1990).

compared with ADD children with no cormorbid disorder, the conduct disordered children had higher rates of all symptoms. Thus, the relationship between conduct disorder and the symptoms of ADD is nonspecific. Our results for the ADD children with no history of conduct, major depressive or anxiety disorders provide further support for the validity of the ADD syndrome. Compared with normal controls, these ADD-only children have significantly and substantially higher rates of all the ADD symptoms. For example, 83% of the ADD-only children "can't concentrate if there is noise or people" compared with only 8% of normal controls. Compared with psychiatric controls, the ADD-only cases have significantly higher rates of two symptoms of inattentiveness, two symptoms of impulsivity, and five symptoms of hyperactivity. These differerices are not only significant, they are large. 0PTIM~ZING THE

ADD DIAGNOSTIC ALGORITHM

It would be premature to conclude that the QROC-derived diagnosis should be the definitive diagnosis of ADD. Nevertheless, its favorable comparison with other diagnostic rules (Table 5) is encouraging. It seems reasonable to conclude that the apparent superiority of the QROC rules arises from its use of information about the relationships among symptoms. As demonstrated, pairs of symptoms can be more efficient than the efficiency of their constituent items would suggest. However, the use of additional information from item combinations may take advantage of chance fluctuations. Indeed, whereas the "Best 14" rule chooses among 21 items, the QROC rule starts with 210 "AND" pairs and TABLE

210 "OR" pairs. In a replication study the efficiency of the QROC rule may deflate more than the other rules. Although an independent cross-validation is needed to determine if the efficiency of the QROC rule would deflate more than other rules, the presented finding that the QROC diagnosis gives the best evidence for the familial aggregation of ADD provides external evidence for its validity. Since each method of constructing diagnostic algorithms did not use information from relatives, the family aggregation results provide some validation of the QROC algorithm. Clearly, further cross-validation from studies of treatment response, cognitive functioning, and other variables known to vary among ADD children would be useful. LIMITATIONS The strength of the presented conclusions is mitigated by some methodological limitations. Since most of the ADD children (90%) had hyperactivity, the results may not generalize to patients without hyperactivity. Unfortunately, the sample of ADD children without hyperactivity was too small (N=7) to support meaningful comparisons with the hyperactive ADD subjects. Since the assessments of psychopathology in children were based on interviews with their mothers, they may be limited in several ways. For example, a halo effect might have biased mothers to report more symptoms in children with two disorders. Also, the raters interviewing about the children were not blind to the symptoms of the mother. Although this may have biased the ratings of symptoms in the children, this effect is believed to have been minimal because raters were trained to record symptoms as

6. Rates of ADD among ReLatives of Index Children Using Four Diagnostic ALgorithms Percent of Relatives with ADD by Diagnosis of Index Child

Diagnostic Rule Method I: QROC Method 2: Best 14 Method 3: All 21 DSM-lII-R field trials b

ADDa

Psychiatric Controls

20.7 29.7 23.4 26.7

3.5 11.5 7.1 7.1 .

Odds Ratios Normal Controls

ADD vs. Psychiatric Controls

ADD vs. Normal Controls

1.1 4.7 3.4 4.4

7.3 3.2 4.0 4.7

23.7 8.6 8.7 8.0

Note: The data in each row are based on the diagnoses of index children and relatives given by the rules in column one. The QROC rule is defined in Table 3. Rules 2, 3, and 4 are defined in the text. aADD: Attention-deficit disorder. bResults from the DSM-lII-R field trials (Spitzer et aI., 1990). J. Am. Acad. Child AdoLesc. Psychiatry, 32:1, January 1993

173

FARAONE ET AL.

positive only if they have been unequivocally endorsed by the interviewee. Although mothers were asked to include knowledge of school behavior in their responses, we did not directly collect such information from teachers. References Biederman, J., Faraone, S. V., Keenan, K., et al. (1992), Further evidence for family-genetic risk factors in attention-deficit hyperactivity disorder (ADHD): patterns of comorbidity in probands and relatives in psychiatrically and pediatrically referred samples. Arch. Gen. Psychiatry, 49:728-738. Biederman, J., Faraone, S. V., Keenan, K., Knee, D. & Tsuang, M. T. (1990), Family-genetic and psychosocial risk factors in DSM-IIl attention-deficit disorder. J. Am. Acad. Child Adolesc. Psychiatry, 29:526-533. Biederman, J., Newcom, J. & Sprich, S. E. (1991), Comorbidity of attention-deficit hyperactivity disorder (ADHD). Am. J. Psychiatry, 148:564-577. Cantwell, D. P. (1972), Psychiatric illness in the families of hyperactive children. Arch. Gen. Psychiatry, 27:414-417. Hodges, K., Saunders, W. B., Kashani, J., Hamlett, K. & Thompson, R. J. (1990), Internal consistency of DSM-IIl diagnoses using the symptom scales of the child assessment schedule. J. Am. Acad. Child Adolesc. Psychiatry, 29:635-641. Kraemer, H. C. (1987), The methodological and statistical evaluation of medical tests: The dexamethasone suppression test in psychiatry. Psychoneuroendocrinology, 12:411-427. Kraemer, H. C. (1988), Assessment of 2 X 2 associations: generaliza-

174

tion of signal-detection methodology. The American Statistician, 42:37-49. Milich, R., Widiger, T. A. &. Landau, S. (1987), Differential diagnosis . of attention deficit and conduct disorders using conditional probabilities. J. Consult. Clin. Psychol., 55:762-767. Morrison, J. R. & Stewart, M. A. (1971), A family study of the hyperactive child syndrome. BioI. Psychiatry, 3: 189-195. Rutter, M. (1981), Longitudinal studies: a psychiatric perspective. In: Prospective Longitudinal Research, eds. S. Mednick, A. Beart, & B. Bachman. Oxford: Oxford University Press, pp. 326-336. Shaywitz, S. E. & Shaywitz, B. A. (1987), Attention-deficit disorder: current perspectives. Pediatr. Neurol., 3:129-135. Siegel, B., Vukicevic, J., Elliott, G. R. & Kraemer, H. C. (1989), The use of signal detection theory to assess DSM-III-R criteria for autistic disorder. J. Am. Acad. Child Adolesc. Psychiatry, 28:542-548. Siegel, B., Vukicevic, J. & Spitzer, R. L. (1990), Using signal detection methodology to revise DSM-IIl-R: reanalysis of the DSM-IIlR national field trials for autistic disorder. J. Psychiatric Res., 24:293-311. Spitzer, R. L., Davies, M. & Barkley, R. A. (1990), The DSM-IIl-R field trial of disruptive behavior disorders. J. Am. Acad. Child Adolesc. Psychiatry, 29:69~97. Spitzer, R. L. & Siegel, B. (1990), The DSM-IIl-R field trial of pervasive developmental disorders. J. Am. Acad. Child Adolesc. Psychiatry, 29:855-862. Szatmari, P., Boyle, M. & Offord, D. R. (1989), ADDH and conduct disorder: degree of diagnostic overlap and differences among correlates. J. Am. Acad. Child Adolesc. Psychiatry, 28:865-872. Widiger, T. A., Hurt, S. W., Frances, A., Clarkin, J. F & Gilmore, M. (1984), Diagnostic efficiency and DSM-IIl. Arch. Gen. Psychiatry, 41:1005-1012.

J. Am. Acad. Child Ado/esc. Psychiatry, 32: J,January J993