Screening for anxiety in an epidemiological sample: predictive accuracy of questionnaires

Screening for anxiety in an epidemiological sample: predictive accuracy of questionnaires

Anxiety Disorders 16 (2002) 113±134 Screening for anxiety in an epidemiological sample: predictive accuracy of questionnaires JuÈrgen Hoyera,*, Eni S...

138KB Sizes 0 Downloads 35 Views

Anxiety Disorders 16 (2002) 113±134

Screening for anxiety in an epidemiological sample: predictive accuracy of questionnaires JuÈrgen Hoyera,*, Eni S. Beckera, Simon Neumera, Ulrich Soedera, JuÈrgen Margrafb a

Clinical Psychology and Psychotherapy, Technical University of Dresden, D-01062 Dresden, Germany b University of Basel, Basel, Switzerland

Received 25 January 2000; received in revised form 15 June 2000; accepted 15 July 2000

Abstract The study examined the predictive accuracy of selected questionnaires when screening for anxiety in a representative epidemiological sample of young female adults (N ˆ 1877). All participants were diagnosed using a structured diagnostic interview. Anxiety questionnaires included global as well as speci®c measures (Beck Anxiety Inventory (BAI), Symptom Checklist, Anxiety Sensitivity Index (ASI), Fear Questionnaire (FQ), Mobility Inventory (MI)). Sensitivity, speci®city, positive and negative predictive power were computed for two screening decisions: (1) identifying any anxiety disorder or (2) identifying a speci®c anxiety disorder (agoraphobia) within the total sample and the clinical subsample. Due to naturalistic (low) base rates in epidemiological samples, diagnostic indices were lower than those previously reported. However, questionnaire data proved useful when a speci®c disorder was targeted (agoraphobia) and speci®c symptoms were operationalized. # 2002 Elsevier Science Inc. All rights reserved. Keywords: Screening; Sensitivity; Speci®city; Anxiety disorders; Agoraphobia

Anxiety disorders are among the most prevalent psychological disorders, especially in young adults. A relevant proportion of patients who present themselves to the general practitioner actually have an anxiety (or other psychological) disorder which is not adequately diagnosed and therapeutically *

Corresponding author. E-mail address: [email protected] (J. Hoyer).

0887-6185/02/$ ± see front matter # 2002 Elsevier Science Inc. All rights reserved. PII: S 0 8 8 7 - 6 1 8 5 ( 0 1 ) 0 0 0 9 4 - 9

114

J. Hoyer et al. / Anxiety Disorders 16 (2002) 113±134

addressed (Wittchen & Boyer, 1998). Also, anxiety accompanies, precedes, or causes other psychosocial problems such as marital or other interpersonal con¯icts, and problems at the work site or stress (Gotlib, Lewinsohn, & Seeley, 1998; Wittchen, Nelson, & Lachner, 1998). However, for practical reasons diagnostic or treatment decisions in these areas cannot be suf®ciently based upon relatively complex and time-consuming diagnostic interviews. Therefore, more ef®cient and economical screening procedures for anxiety are needed that could help identify anxietyÐeven newly emerging disordersÐ in diverse ®elds of research, prevention, and therapy (Wittchen & Boyer, 1998). Anxiety questionnaires are often used with the purpose described above, and the present article aims at evaluating this practice. Numerous studies have reported data about the psychometric properties of different anxiety questionnaires. Most published anxiety questionnaires have proven their reliability, their concurrent validity (correlations with other anxiety measures) and their ability to differentiate between groups of anxiety disordered and healthy individuals on the basis of test score distributions. However, these data do not adequately describe the screening ability of an instrument, which is to classify patients who have a target disorder from a reference group of those who do not and to minimize false categorizations. The diagnostic utility of a test in such clinical discriminations is more precisely de®ned by its sensitivity, speci®city, and positive and negative predictive power (NPP) (Baldessarini, Finklestein, & Arana, 1983; Elwood, 1993; Meehl & Rosen, 1955). These parameters (other than, e.g., effect sizes) refer directly to the probability of true and false decisions. Sensitivity re¯ects the true positive rate, i.e., the proportion of subjects with a target disorder who are identi®ed by a positive test ®nding (above a given cut-off score). Speci®city re¯ects the true negative rate, i.e., the proportion of subjects without the target disorder who are identi®ed by a negative test result (below a given cut-off score). However, sometimes it may be more important to know the converse probabilities. Positive predictive power (PPP) means that a patient has the disorder given that he or she obtained a positive test result, and conversely, NPP means the probability of the patient not having the disorder given a negative test result. Usually, sensitivity and PPP, or speci®city and NPP, are not equivalent. This is due to PPP and NPP depending on the prevalence or base rate of the disorder in a population whereas sensitivity and speci®city depend (among other factors) on the discriminant validity of a test. (See, e.g., Baldessarini et al., 1983; Elwood, 1993; or Glaros & Kline, 1988, for an elaborated overview.) The relative importance of a speci®c index of diagnostic utility depends on the decision to be made and its consequences (Zarin & Earls, 1993). Given that a positive test result could lead to harmful consequences (risk-laden treatment) or high costs, the possibility of false positives should be minimized and a test with a potentially high PPP should be used (although one would hope that not only a

J. Hoyer et al. / Anxiety Disorders 16 (2002) 113±134

115

single screening measure is used in such cases). If missing true cases had harmful consequences (further development of a disorder or even lethality as in some psychological disorders), sensitivity would be particularly valuable. For research and practice, depending on the consequences of a diagnostic decision each of the above mentioned indices or speci®c combinations of some of them can be of special interest. Accordingly, only when all of these are reported, the potential diagnostic utility for a given instrument in a given setting (e.g., practice or research) can be fully evaluated. As Elwood (1993) and Meehl and Rosen (1955) have shown, low prevalence rates, which are usual in mental disorders, lead to low PPP and, conversely, high NPP. If in a sample of 1000 individuals only 10 of them had a rare disease with no positive diagnostic decision made and all individuals were classi®ed as healthy, a ``hit rate'' (total number of correct classi®cations) of 99% would result although there is no positive prediction at all (PPP ˆ 0%). Vice versa, NPP would be very high in this case (99%). Any test applied for disorders with a low base rate must be expected to fail to improve the total number of correct decisions. No matter how high prevalence rates for anxiety disorders are estimated, it appears questionable that existing anxiety tests could improve PPP in natural clinical settings. For example, in the study by Kabacoff, Segal, Hersen, and Van Hasselt (1997), the predictive accuracy of the Beck Anxiety Inventory (BAI; Beck & Steer, 1990) with older psychiatric outpatients was examined. Although the prevalence rate of having any anxiety disorder was close to 30% in this sample, a PPP of .50 was reached only above a cut-off score of 34 (which is extremely high for the BAI). At this high cut-off, the sensitivity of the BAI in this sample is unsatisfactorily low (.17). This illustrates the necessity to address the ``base rate problem'' (Elwood, 1993) and to report not only sensitivity and speci®city, but also PPP and NPP of a test. Still, compared with reports on the initially mentioned standards of test validation, fewer studies focus on examining data on predictive accuracy. Furthermore, in most of these relatively few studies methodological limitations have to be acknowledged. In some studies (e.g., Kabacoff et al., 1997), only one instrument is under investigation, thus limiting an empirical comparison of instruments which could lead to further improvement. In case there are positive results, we do not learn if other concurring instruments would have fared better. Other studies were conducted using preselected samples of disordered patients with the criterion disorder and a reference group of healthy individuals (e.g., Beck, Stanley, & Zebb, 1996). This strategy can only be interpreted cautiously because it does not meet the clinical reality. Furthermore, studies were conducted using reference groups of other patients or healthy individuals with an (at least approximately) matched sample size (e.g., Stangier, Heidenreich, Berardi, Golbs, & Hoyer, 1999). This strategy overestimates the ability of a given test to discriminate patients in typical clinical

116

J. Hoyer et al. / Anxiety Disorders 16 (2002) 113±134

situations because the base rate of the disorder is much higher using such a strategy than in natural clinical settings. Given these considerations, an evaluation of clinical tests in an epidemiological sample seems especially promising because the base rate in these samples equals the (low) prevalence of anxiety disorders in the population. Therefore, to know the predictive accuracy of otherwise valid anxiety questionnaires under such naturalistic conditions seems to be of special interest for their further development. In the present study, the predictive accuracy including sensitivity, speci®city, PPP, and NPP, and further indices (see further) of selected anxiety questionnaires was tested in a representative epidemiological sample of young female adults (N ˆ 1877) who were diagnosed for anxiety and other psychological disorders. To allow for comparisons between instruments, some of the standard instruments for measuring global aspects of anxiety were examined with regard to their utility for identifying any anxiety disorder within the total sample: 1. the anxiety subscale of the Symptom Checklist Revised (SCL-90-R, German version: Derogatis, 1986, Franke, 1995), 2. the BAI (Beck & Steer, 1990), and 3. the Anxiety Sensitivity Index (ASI, Reiss, Peterson, Gursky, & McNally, 1986). When screening for any anxiety disorder (global screening intention, like with the above mentioned instruments), base rates are higher than in case a speci®c disorder is screened for (speci®c screening intention). On the other hand, a speci®c disorder may be more easily detected because symptoms (items) are more circumscribed. To test for the possibility that a speci®c screening intention is favorable over a global screening intention, data on screening for a speci®c anxiety disorder (agoraphobia) will be presented. For this purpose, the Fear Questionnaire (FQ, Marks & Mathews, 1979; agoraphobia subscale), and the Mobility Inventory (MI, Chambless, Caputo, Jason, Gracely, & Williams, 1985), two speci®c measures of agoraphobia, were also included in the study. 1. Methods 1.1. Overview Data presented here were derived from the Dresden predictor study (DPS). The DPS was a prospective epidemiological study designed to collect data on the prevalence, incidence, course, and risk factors of mental disorders. A representative sample of young women in Dresden took part in a baseline survey and one follow up survey. The baseline survey was conducted from July 1996 to

J. Hoyer et al. / Anxiety Disorders 16 (2002) 113±134

117

September 1997 and the follow up from December 1997 to February 1999. The present study refers to baseline data only. 1.2. Sample The sample was drawn from the Dresden government registry of residents. Participants had to be German females and between age 18 and 24 at the time of sampling to be eligible to participate. Of the 9000 addresses received from the registry of®ce, 5204 women were located and eligible for the study. From this sample 2068 took part in the interview, and 998 only ®lled out questionnaires. Of the 2068 women undergoing the interview, 1877 also ®lled out the questionnaires. The majority of the women were not married (94.9%), but did have a partner (66.5%). About half of the women were living with their parents, about a third with a romantic partner and about 14% lived alone. Very few dropped out of school without a degree (3.3%), consistent with mandatory school law. The minority went to a ``Hauptschule'' (6.5%), the lowest level of school education, about one-third went to the medium level of schooling (``Realschule'' and ``Polytechnische Schule'') and about half ended schooling with a degree that allows them to inscribe at university (``Abitur''). Almost half of the young women were working, 31.5% of the whole sample were working full-time, 15.3% parttime. A few women were still at school (4.3%), about 40% were university students. About 5% were currently unemployed (and not in school). 1.3. Diagnostic assessment The diagnostic assessment was based on the ``Diagnostisches Interview fuÈr psychische StoÈrungen±Forschungsversion'' (F-DIPS; Margraf, Schneider, Soeder, Neumer, & Becker, 1996). The F-DIPS is a structured interview to diagnose axis I disorders according to DSM-IV, for life-time and point prevalence. It is based on the Anxiety Disorders Interview Schedule (ADIS-IV-L; DiNardo, Brown, & Barlow, 1995) and is also a continuation of earlier work according to DSM-IIIR (DIPS; Margraf, Schneider, & Ehlers, 1991). Unlike its previous versions it contains also sections for substance abuse and dependence as well as for children's disorders. The following disorders can be diagnosed: all anxiety disorders, all affective disorders, the research-diagnosis mixed anxiety-depression, hypochondriasis, somatization disorder, conversion disorder and pain disorder, substance abuse and dependence, bulimia, anorexia and some children's disorders (separation anxiety, attention-de®cit/hyperactivity and disruptive behavior disorders, elimination disorders). Furthermore there is a social-demographic section, a screening for psychosis, a screening for the general medical condition and medication, a short section of family history of psychological disorders and a section about treatment for psychological disorders. Axis IV (psychosocial and environmental problems) and axis V (global assessment of functioning) are also registered or rated. On average the baseline interviews took 111 min, with a S.D. of 40 min.

118

J. Hoyer et al. / Anxiety Disorders 16 (2002) 113±134

1.4. Reliability and validity The retest- and inter-rater reliability of the DIPS was tested in an unselected sample of 201 patients, mostly of an internistic-psychosomatic clinic (Schneider, Margraf, SpoÈrkel, & Franzen, 1992). The retest-reliabilities across the groups of disorders were between .68 and .79 (k-coef®cient) and .67 and 1.0 (Yule's Ycoef®cient). Besides a few exceptions the single diagnoses also reach satisfactory values (k-coef®cient between .68 and .73 and Yule's Y between .71 and 1.0). The study proved the DIPS as a valid instrument for the diagnosis of psychiatric disorders (Margraf et al., 1991). Reliability and validity fortheF-DIPSarenotyet®nallytested. 1.5. Interviewers, training procedure, and supervision Interviewers were either psychology students in their last years of training or were medical doctors. All underwent an extensive training of about 1 week. During the training the different disorders were explained, emphasizing DSM-IV criteria and differential diagnosis. All F-DIPS sections were practiced and a pre®eld training was done. This was followed by rating of four practice interviews. Two practice interviews were carried out before starting ®eld work. All interviewers were supervised bi-weekly. Furthermore all interviews were taped, and some tapes were sampled randomly and controlled by supervisors. Most importantly, every single interview was proof read by specially trained supervisors. Unclear cases were discussed and a consensus diagnosis was given. 1.6. Diagnoses and prevalences In Table 1, frequencies and prevalences of selected disorders in our sample (N ˆ 1877) are listed. The most prevalent disorders in our sample of young female adults were anxiety disorders (18.3%), primarily social phobia (7%) and speci®c phobia (9.8%). Other anxiety disorders were less prevalent. A full description of the diagnostic and epidemiological ®ndings of the study was given elsewhere (Becker, TuÈrke, Neumer, Soeder, & Margraf, 2000). 1.7. Questionnaires Two sets of questionnaires were given; the ®rst during a break in the interview, the second after the interview. Only questionnaires relevant for the present study will be described here. 1.8. BAI The primary aim of the BAI (Beck, Epstein, Brown, & Steer, 1988; Beck & Steer, 1990) was to provide a simple measure that would discriminate anxiety

J. Hoyer et al. / Anxiety Disorders 16 (2002) 113±134

119

Table 1 Number of anxiety and other disorders in the epidemiological sample (N ˆ 1877)

Psychological disorders (total) Anxiety disordersa Panic disorder (300.01) Panic disorder with agoraphobia (300.21) Agoraphobia (without panic) (300.22) Social phobia (300.23) Speci®c phobia (300.29) Generalized anxiety disorder (300.02) Obsessive-compulsive disorder (300.3) Posttraumatic stress disorder (309.81) Acute stress disorder (308.3) Affective disorders Other disordersb

Point prevalence

Lifetime prevalence

381 334 5 8 28 130 178 35 15 10 0 31 63

758 517 39 16 41 227 233 53 24 66 2 283 412

(20.3%) (17.8%) (0.3%) (0.4%) (1.5%) (6.9%) (9.5%) (1.9%) (0.8%) (0.5%) (1.9%) (3.3%)

(40.4%) (27.5%) (2.1%) (0.9%) (2.2%) (12.1%) (12.4%) (2.8%) (1.3%) (3.5%) (0.1%) (13.7%) (26.9%)

a Including speci®c phobia; point prevalence of anxiety disorders without patients having speci®c phobia alone: N ˆ 198 (10.5%). b Including somatoform disorders, substance disorders, eating disorders, and children's disorders.

from depression. It is a 21-item Likert scale self-report questionnaire measuring common symptoms of clinical anxiety, such as nervousness and fear of losing control. Respondents indicate the degree to which they are bothered by each symptom. Each symptom is rated on a four-point scale ranging from 0 (not at all) to 3 (severely), giving a score range of 0±63. Thirteen items assess physiological symptoms, ®ve cognitive aspects, and three both somatic and cognitive symptoms. The BAI is internally consistent with psychiatric outpatients (a ˆ :92, Beck et al., 1988; a ˆ :94, Fydrich, Dowdall, & Chambless, 1992). Concurrent validity is high with the SCL-90-R (Derogatis, 1977) anxiety subscale (r ˆ :81; Steer, Ranieri, Beck, & Clark, 1993) and moderate with the Hamilton Anxiety Rating Scale (Hamilton, 1959) in 367 outpatients with anxiety disorders (r ˆ :56; Beck & Steer, 1991). As intended, the BAI is superior over the STAI in discriminant validity (Fydrich et al., 1992; Creamer, Foran, & Bell, 1995). Common cutting scores of 10 suggest mild anxiety, with 19 re¯ecting moderate anxiety. Factor analyses revealed two factors, one re¯ecting somatic items, one re¯ecting subjective aspects of anxiety (Beck et al., 1988; Creamer et al., 1995; Kabacoff et al., 1997) but some studies failed to replicate the original factor solution (e.g., Borden, Peterson, & Jackson, 1991). Only the total score of the BAI will be presented. Recently it has been criticized that the BAI may overrepresent panic attack symptoms. This may raise the discriminant validity with respect to depression but may not be representative for anxiety (Cox, Cohen, Direnfeld, & Swinson, 1996; Creamer et al., 1995). Our data will contribute further empirical information to this discussion (see also Steer & Beck, 1996).

120

J. Hoyer et al. / Anxiety Disorders 16 (2002) 113±134

1.9. ASI The ASI (Peterson & Reiss, 1987) is based on the expectancy theory by Reiss and McNally (1985) and Reiss (1991). Anxiety sensitivity (see Reiss & McNally, 1985) is the fear of anxiety-related symptoms arising from the belief that these symptoms have harmful consequences. Anxiety sensitivity is seen as important for the development of most anxiety disorders (see further), particularly avoidance tendencies inherent in them, because it ampli®es anxiety and enhances conditionality. The ASI includes 16 items re¯ecting this concept (e.g., ``It scares me when I feel faint'') which have to be rated on a ®ve-point Likert scale ranging from ``very little'' to ``very much''. The total score is obtained by summing up items, giving a score range from 0 to 64. In a study by Taylor, Koch, and McNally (1992), the ASI total score was elevated in 5 of 6 anxiety disorders when compared to normal controls, simple phobia being the exception. The authors argue that it may be unlikely in simple phobia to develop anxiety sensitivity beliefs because anxious feelings in this disorder are highly situation bound and therefore predictable. Further support for reliability and construct validity of the instrument was collected in Ehlers and Margraf (1993, German version), Peterson and Reiss (1987) and Reiss (1991). 1.10. Symptom checklist (SCL-90-R; Derogatis, 1977; German version by Franke, 1995) The scale measures the self evaluation of different disturbing and impairing symptoms. It covers a wide range of psychological problems and contains nine subscales or dimensions: somatization (12 items), obsessions/compulsions (10 items), interpersonal sensitivity (9 items), depression (13 items), anxiety (10 items), aggression/hostility (6 items), phobic anxiety (7 items), paranoid thinking (6 items), and psychotizism (6 items). The score of overall impairment by psychological symptoms (global severity index, GSI) is considered as a strongly reliable and valid measurement of general psychopathology (Franke, 1995). As there have been efforts to use (or modify) SCL-subscales also for speci®c diagnostic purposes (Bech, 1993), predictive accuracy will be assessed for the anxiety subscale which covers heterogeneous anxiety symptoms ranging from nervousness to panic attacks. 1.11. FQ The FQ (Marks & Mathews, 1979) is a widely used self-report instrument for measuring phobic avoidance. The FQ includes 15 items which are rated on a ninepoint Likert scale (0 ˆ would not avoid, 8 ˆ would always avoid). The sum of the 15 items provides a total phobia score with a score range of 0±120. The FQ divides into three ®ve-item subscales: agoraphobia (FQ-AG), social phobia (FQ-Sp), and Blood and Injury Phobia (FQ-B/I). The FQ has shown to be psychometrically

J. Hoyer et al. / Anxiety Disorders 16 (2002) 113±134

121

reliable and valid in healthy and clinical samples (Marks & Mathews, 1979; Oei, Moylan, & Evans, 1991; for an overview: Moylan & Oei, 1992). It is argued (Oei et al., 1991) that the subscales should be used in their own right because FQ-Ag and FQ-Sp have proved discriminative for agoraphobic and social phobic patients whereas the total score was of less discriminative validity. Due to the rationale of the present study, only diagnostic results which are based on at least two different scales are reported. Therefore, only the predictive accuracy of FQ-Ag will be tested (and not that of FQ-Sp and FQ-BI). 1.12. MI The MI (Chambless, Caputo, Jason, Graceley, & Williams, 1985) is a 27-item inventory for the measurement of self-reported agoraphobic avoidance behavior and frequency of panic attacks. Twenty-six situations have to be rated on a ®vepoint scale for avoidance both when respondents are accompanied and when they are alone; an additional item asks for the frequency of panic attacks. Summing up the 26 situations results in an agoraphobic avoidance score (a) when accompanied and (b) when alone. Other than the previously described instruments, the MI was developed to identify avoidance behavior that accompanies a speci®c anxiety disorder (agoraphobia) and has proven discriminant validity (Craske, Rachman, È st, 1990). & Tallman, 1986; O In the present study, the modi®ed German version of the MI (Ehlers, Margraf, & Chambless, 1993) was applied in which avoidance is economically rated by a single mark irrespective of whether the person is alone or accompanied although some information is lost in the version used here. 1.13. Statistic indices For some of the questionnaires under investigation, all standard indices for predictive accuracy at selected cut-offs are presented including sensitivity, speci®city, PPP, and NPP. These are simple descriptive measures as de®ned previously (see also, e.g., Baldessarini et al., 1983; Elwood, 1993). Further, the overall hit rate (total number of correct decisions) will be computed. To reduce complexity and number of tables, as an overall measure of predictive accuracy the Youden-index will be presented [Y ˆ …sensitivity ‡ specificity† 1†] (Abel, 1993). This measure takes sensitivity and speci®city simultaneously into account and can vary between 1 and ‡1. High positive scores indicate good predictive accuracy. Normally, the consequences of false positive or false negative decisions will not be weighted equally as with this index. Still, it provides a very simple overall measure of the tests maximal diagnostic accuracy. For reasons of simplicity it was preferred over the receiver operating characteristic (ROC) statistic (Hsiao, Bartko, & Potter, 1989; Mossman & Somoza, 1989) which has also been applied to the ®eld.

122

J. Hoyer et al. / Anxiety Disorders 16 (2002) 113±134

1.14. Groups All screening decisions pertain to comparisons between the following groups: (a) anxiety patientsÐall participants with at least one anxiety disorder other than speci®c phobia (see further), (b) agoraphobicsÐall participants with an agoraphobia with or without panic attacks, (c) othersÐall participants (patients and non-patients) not subsumed under the respective clinical group, (d) other patientsÐall participants with at least one disorder other than the clinical comparison groups and other than speci®c phobia. 2. Results 2.1. Descriptive statistics As Table 2 indicates, anxiety and symptom severity scores in the total as well as in the healthy sample were relatively low compared with other samples. For example, Gillis, Haaga, and Ford (1995) reported a BAI total score of M ˆ 7:3 (S:D: ˆ 8:4) in their younger subgroup (age 18±44, no n for the subgroup reported), while the score reported by Creamer et al. (1995) appears to be extraordinarily high for a non-clinical group (M ˆ 13:1; S:D: ˆ 9:6; t…2196† ˆ 22:53, P < :001 when compared with our sample). Also, previously reported mean scores of the ASI were higher (see Peterson & Reiss, 1987; M ˆ 17:8, S:D: ˆ 8:8 in a non-clinical sample; t…2501† ˆ 16:30, P < :001 when compared with our sample). The speci®cally low symptom self-ascriptions in questionnaires in our study may be in part due to the two-stage interview design applied. After being confronted with symptom descriptions during the interview, respondents may have reacted more ``conservatively'' to questionnaire items. This issue will have to be considered in the discussion. Similarly, relatively low anxiety scores are found in the disordered subgroups, particularly the combined group with any anxiety disorder. Most participants in our sample had only one diagnosis of an anxiety disorder and no comorbid other anxiety disorder which, for older samples, would not be characteristic. Additionally, our samples with anxiety disorders did not present themselves for medical consultation or psychotherapy before. Psychopathology here is obviously less intense, widespread, and chronic. Nevertheless, ®ndings contrast those by Beck et al. (1988) or Creamer et al. (1995) who reported higher BAI scores in younger than in older participants. Also, inclusion of participants with speci®c phobias into this sample has to be taken into account. This subsample actually exhibits low total scores in the BAI, ASI, FQ, and MI (see Table 2). Another reason for these low scores can be seen in the high proportion of social phobics in this sample. Item content of all questionnaires focuses more on somatic symptoms than on cognitive symptoms and the latter may be more relevant for

Table 2 Means and standard deviations for anxiety questionnaires and global symptom severity in selected samples

BAI SCL (Anx) ASI MI FQ (Tot) FQ (Ag) FQ (Sp) GSI

Total sample

Healthy subjects

Any anxiety disorder Agoraphobia

Social phobia

Speci®c phobia

Any other disorder

n

n

M (S.D.)

n

M (S.D.)

n

M (S.D.)

n

n

M (S.D.)

n

M (S.D.)

1492 1491 1490 1493 1492 1477 1477 1489

4.08 .24 12.09 1.23 14.77 2.70 6.28 .29

333 334 330 333 333 328 329 334

7.68 .49 15.22 1.43 24.42 4.42 10.56 .55

36 36 36 36 36 36 36 36

10.25 .69 19.31 1.97 33.67 9.14 10.94 .64

130 8.25 (8.31) 130 .53 (.54) 129 15.34 (7.92) 129 1.38 (.43) 129 26.14 (15.10) 128 4.21 (5.02) 128 13.55 (6.84) 130 .47 (.42)

177 6.85 (6.70) 178 .41 (.49) 176 14.88 (8.57) 178 1.43 (.46) 178 24.31 (17.36) 175 4.26 (5.09) 175 9.42 (7.45) 178 .47 (.42)

47 46 47 47 47 47 47 46

6.19 .42 13.45 1.33 23.19 4.86 8.79 .52

M (S.D.)

1872 4.78 (5.33) 1871 .29 (.37) 1867 12.68 (7.66) 1873 1.27 (.34) 1872 16.70 (13.87) 1852 3.06 (3.96) 1853 7.10 (6.07) 1869 .35 (.33)

(4.41) (.30) (7.38) (0.29) (12.57) (3.51) (5.49) (.27)

(7.65) (.55) (7.23) (.55) (15.90) (5.06) (7.23) (.46)

(10.49) (.77) (10.47) (.60) (16.84) (6.40) (6.82) (.55)

M (S.D.)

(4.60) (.39) (6.82) (.33) (17.04) (5.82) (5.77) (.37)

Abbreviations. SCL (Anx): anxiety subscale of the Symptom Checklist; BAI: Beck Anxiety Inventory; ASI: Anxiety Sensitivity Index; MI: Mobility Inventory; FQ: Fear Questionnaire; Total: total score; Ag: agoraphobia subscale; Sp: social phobia subscale; GSI: global symptom severity (symptom checklist).

124

J. Hoyer et al. / Anxiety Disorders 16 (2002) 113±134

social phobics. In fact, social phobics also exhibited low total scores in the BAI, ASI, FQ, and MI (see further analyses below). The FQ subscale for social phobia is the only exception. Conversely, anxiety as well as general symptoms (GSI) were highest among agoraphobics. As Steer and Beck (1996) argued, higher scores on the BAI in agoraphobic patients that have been previously found are not arti®cial (due to an overrepresentation of agoraphobia-related items) because these patients generally exhibit higher degrees of symptomatology than other anxiety patients. Both resultsÐhigher BAI and higher SCL scoresÐwere replicated in our sample. Taken together, these descriptive ®ndings already suggest that the applied anxiety questionnaires with presumed global screening quality (SCL subscale for anxiety, ASI, FO total, and BAI) may have weaknesses in identifying anxiety disorders when really all anxiety disorders, including social and speci®c phobias, have to be identi®ed. 2.2. Screening for any anxiety disorder: measures of predictive accuracy In the following section, results including indices of predictive accuracy are presented at various cut-off points in the questionnaires with a presumed global anxiety screening quality. It was examined to what extent questionnaires could correctly identify anxiety disordered subjects (1) among all subjects and (2) among the subjects with at least one clinical disorder. To approximate clinical reality, subjects with simple phobias without comorbidity, who normally would not present for any consultation, were not regarded as anxiety disordered in the following analyses. The anxiety disordered sample that had to be identi®ed comprised of a total of 198 subjects among the total sample (N ˆ 1877) or the clinical sample (N ˆ 251), thus re¯ecting a base rate of ``anxiety positive'' of 10.5% or 78%, respectively. Results of the BAI are presented in Table 3. When screening in the total sample, no single cut-off score provided both high sensitivity and speci®city. At a cut-off score of 10 (mild anxiety), only 40% of the anxiety group were correctly identi®ed (sensitivity) compared with 89% of the nonanxiety group (speci®city) and 84% of the total sample (hit rate). At a cut-off of 19 (moderate anxiety) sensitivity was extremely low (12% identi®ed), and speci®city extremely high (99%). The Youden-index (see previously) did not surmount a score of .29 (at diverse cut-offs). Higher cut-offs resulted in a quite reasonable PPP of over .50 (which is high given the base rate of 10%) but on the expense of identifying only a very small percentage (less than 10%) of the disordered people. Especially surprising was the ®nding that a relevant percentage of the anxiety disordered participants did not respond positively to any of the BAI items (14 Ss) or only to one or two (28 additional Ss) which is unfavorable particularly for the sensitivity of the instrument. Further analyses showed that the 14 ``non-responders'' in the BAI comprised 13 social phobics and one agoraphobic. This demonstrates that speci®cally for social phobic participants BAI items do not correctly apply to their subjective anxiety problem.

J. Hoyer et al. / Anxiety Disorders 16 (2002) 113±134

125

Table 3 Predictive accuracy for the BAI for any anxiety disorder Cut-off

Sensitivity

Speci®city

PPP

NPP

Hit rate

Youden-index

(a) Total sample (197 anxiety patients, 1675 others) 1 .93 .19 .12 2 .87 .31 .13 3 .79 .44 .14 4 .72 .55 .16 5 .63 .64 .17 6 .57 .72 .19 7 .51 .78 .21 8 .46 .83 .24 9 .42 .86 .26 10 .40 .89 .30 11 .37 .91 .32 12 .32 .92 .33 13 .28 .93 .33 14 .20 .95 .31 15 .20 .96 .34 16 .19 .97 .35 17 .14 .97 .36 18 .13 .98 .43 19 .12 .99 .49 20 .10 .99 .49 21 .09 .99 .51 22 .07 .99 .54 23 .07 1.00 .62

.96 .95 .95 .92 .94 .93 .93 .93 .93 .93 .92 .92 .92 .91 .91 .91 .91 .91 .90 .90 .90 .90 .90

.27 .37 .48 .57 .64 .71 .75 .79 .82 .84 .85 .86 .86 .87 .88 .88 .88 .89 .89 .89 .90 .90 .90

.11 .18 .23 .27 .27 .29 .29 .29 .28 .29 .28 .28 .24 .21 .15 .16 .11 .11 .11 .09 .08 .06 .06

(b) Clinical sample (197 anxiety patients, 54 others) 1 .93 .05 .78 2 .87 .09 .77 3 .79 .24 .79 4 .72 .36 .80 5 .63 .40 .79 6 .57 .45 .79 7 .51 .55 .80 8 .46 .65 .83 9 .42 .67 .82 10 .40 .75 .85 11 .37 .78 .86 12 .32 .84 .88 13 .28 .89 .90 14 .20 .93 .91

.18 .16 .24 .26 .23 .23 .24 .25 .24 .26 .26 .26 .25 .25

.74 .70 .67 .64 .58 .54 .52 .50 .47 .47 .46 .44 .41 .36

.02 .04 .02 .08 .03 .02 .05 .11 .09 .14 .15 .16 .17 .13

When screening in the clinical sample only, speci®city and thus the proportion of correctly identi®ed non-anxiety cases dropped. Although a seemingly very high PPP was reached, NPP and also the hit rate were comparably low which may again be due to the lacking ability of the BAI to detect some of the anxiety patients in our sample. A high proportion of them were incorrectly classi®ed as

126

J. Hoyer et al. / Anxiety Disorders 16 (2002) 113±134

non-anxious at low cut-off points of, e.g., 10, a fact that decreased the hit rate to under 50%. Accordingly, the maximal Youden-index was low (.17 at cut-off 13). For the ASI, results in the total as well as in the clinical sample fell behind the standard reached by the BAI. A maximal Youden-index of .19 (total sample) indicated a low diagnostic utility for identifying anxiety patients in our sample. Therefore, more detailed results on the other measures of predictive accuracy need not be presented here. Additionally, it was tested if ASI scores were higher in anxiety patients at all when compared with the rest of the sample as found, e.g., by Taylor, Koch, and McNally (1992). Due to the large sample size, this was the case (t…1865† ˆ 6:15, P < :001). However, the effect size was small (o2 ˆ :019). This result may illustrate how the existence of statistical group differences is only a weak indicator of discriminant validity, at least none that directly speaks for a given diagnostic utility. With regard to the results of the SCL anxiety subscale (see Table 4), though cut-offs are not directly comparable to those of the BAI, it becomes obvious that the SCL anxiety subscale reached quite similar indices in all categories reported here. A Youden-index of .31 (at cut-off .40) in the total sample indicated that the (more economic) SCL fared even slightly better in providing an optimal ``tradeoff'' between sensitivity and speci®city. 2.3. Screening for agoraphobia: measures of predictive accuracy Indices for predictive accuracy in identifying a speci®c anxiety disorder (agoraphobia) with specialized scales (1) among all subjects and (2) among the subjects with at least one clinical disorder are presented in the following section. Thirty-six Ss were diagnosed as having an agoraphobia alone (n ˆ 28) or panic attacks and agoraphobia (n ˆ 8). On the basis of the total sample this re¯ects a base rate of 1.9% (14.3% in the clinical sample). Compared with the prevalence of any anxiety disorder this base rate is approximately nine times lower which should a priori lead to lower PPP, NPP, and hit rate. First the MI was examined (see Table 5). When screening for agoraphobia in the total sample, at cut-offs 1.4 or 1.5 sensitivity and speci®city both held at a satisfying level. As indicated by the Youden-index of .63 (reasonably higher than in any of the above results), the best cut-off was at 1.5. Here, 86% of the agoraphobics would be identi®ed correctly and 77% of the non-agoraphobics (including patients with other disorders). PPP may seem extremely low at this cut-off (.07), but here the low base rate has to be considered (which is exceeded highly by PPP). NPP would be very high, and therefore, the hit rate would also be at an acceptable level. When screening for agoraphobia in the clinical sample, speci®city dropped moderately, especially at lower cut-offs between 1.1 and 2.0. The amount of correctly identi®ed non-agoraphobics was relatively lower, which is not surprising in a clinical sample, but still remained reasonable. Again 1.5 would be an

J. Hoyer et al. / Anxiety Disorders 16 (2002) 113±134

127

Table 4 Predictive accuracy of the SCL anxiety subscale for any anxiety disorder at selected cut-off points Cut-off

Sensitivity

Speci®city

PPP

NPP

Hit rate

(a) Total sample (197 anxiety patients, 1674 others) .1 .87 .29 .13 .2 .74 .49 .15 .3 .62 .64 .17 .4 .57 .74 .20 .5 .49 .81 .23 .6 .42 .86 .26 .7 .37 .90 .30 .8 .30 .92 .31 .9 .22 .95 .33 1.0 .20 .96 .37 1.1 .17 .97 .40 1.2 .13 .98 .45 1.3 .12 .99 .52 1.4 .12 .99 .59 1.5 .10 .99 .59

.95 .94 .94 .94 .93 .93 .92 .92 .91 .91 .91 .91 .91 .91 .90

.35 .52 .64 .72 .77 .81 .85 .86 .87 .88 .89 .89 .90 .90 .89

.16 .23 .26 .31 .30 .28 .27 .22 .17 .16 .14 .11 .11 .11 .09

(b) Clinical .1 .2 .3 .4 .5 .6 .7 .8 .9 1 1.1 1.2 1.3 1.4 1.5

.24 .19 .20 .21 .25 .23 .25 .24 .24 .24 .23 .23 .23 .23 .23

.71 .63 .56 .54 .52 .47 .45 .41 .36 .35 .33 .31 .30 .30 .29

.02 .04 .04 .01 .10 .07 .14 .11 .09 .11 .07 .08 .02 .08 .06

sample (197 anxiety patients, 54 others) .87 .15 .79 .74 .22 .78 .62 .33 .77 .57 .43 .78 .49 .61 .82 .42 .65 .81 .37 .78 .86 .30 .81 .86 .22 .87 .86 .20 .91 .89 .17 .91 .87 .13 .94 .90 .12 .89 .89 .12 .96 .92 .10 .96 .91

Youden-index

appropriate cut-off (Youden-index ˆ :54) at which 77% of the non-agoraphobics would be correctly identi®ed (sensitivity as before). With regard to PPP, there was a dramatic rise between cut-offs 2 and 2.1 indicating that a positive decision towards a diagnosis of agoraphobia based on a result higher than this cut-off would be true in 72% of the cases. This result should not be compared with the PPP for diagnosing any anxiety disorder as in Tables 3 and 4, because of higher base rates there. Rather, the relatively high Youden-indices should be highlighted which indicated a better performance of the speci®c screening strategy in the total as well as in the clinical sample. Results regarding the FQ-Ag (see Table 6) were fairly similar to those of the MI. When screening for agoraphobia among the total sample, FQ-Ag reached a

128

J. Hoyer et al. / Anxiety Disorders 16 (2002) 113±134

Table 5 Predictive accuracy of the MI for agoraphobia at selected cut-off points Cut-off

Sensitivity

Speci®city

PPP

(a) Total sample (36 agoraphobics, 1837 others) 1.1 .94 .39 .03 1.2 .92 .61 .04 1.3 .86 .73 .06 1.4 .83 .79 .07 1.5 .78 .85 .09 1.6 .64 .89 .10 1.7 .64 .91 .13 1.8 .56 .93 .13 1.9 .42 .94 .13 2.0 .36 .96 .14 2.1 .33 .97 .18 2.2 .33 .98 .22 2.3 .28 .98 .25 2.4 .22 .99 .24 (b) Clinical 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4

sample (36 agoraphobics, 215 others) .94 .26 .18 .92 .49 .23 .92 .58 .27 .83 .67 .29 .78 .76 .35 .67 .82 .38 .64 .87 .44 .56 .87 .42 .42 .89 .38 .36 .91 .41 .36 .95 .57 .33 .97 .63 .31 .97 .61 .22 .98 .67

NPP

Hit rate

Youden-index

1.00 1.00 1.00 1.00 .99 .99 .99 .99 .99 .99 .99 .99 .99 .98

.40 .61 .74 .79 .83 .89 .91 .92 .93 .95 .96 .96 .97 .97

.23 .43 .59 .62 .63 .53 .55 .49 .36 .32 .30 .31 .26 .21

.97 .97 .98 .96 .95 .94 .93 .92 .90 .89 .90 .90 .89 .88

.36 .55 .63 .69 .76 .80 .83 .82 .82 .83 .87 .88 .87 .87

.20 .41 .50 .50 .54 .49 .50 .43 .31 .27 .31 .30 .27 .20

maximum Youden-index at cut-off 6 of .55. When screening among the clinical sample, a maximum Youden-index of .46 was found. In sum, FQ-Ag fared slightly worse than the MI which is not surprising considering that the scale contains only ®ve items. 2.4. Additional analyses: screening for agoraphobia with the BAI and SCL To test for the possibility that the BAI is actually a measure that is better suited for speci®c anxiety symptoms as relevant in agoraphobia, its screening ability for this subgroup was also examined. In neither the total or the clinical sample was the BAI as indicative of agoraphobia as was the MI or the FQ. The maximum Youdenindex was .31 in the total sample, and .30 in the clinical sample. However, these scores were slightly better when compared with the BAI's ability to screen for any

J. Hoyer et al. / Anxiety Disorders 16 (2002) 113±134

129

Table 6 Predictive accuracy of the FQ agoraphobia scale (FQ-Ag) for agoraphobia at selected cut-off points Cut-off

Sensitivity

Speci®city

PPP

NPP

Hit rate

Youden-index

(a) Total sample (35 agoraphobics, 1817 others) 1 .89 .37 .03 2 .89 .49 .03 3 .83 .60 .04 4 .80 .66 .04 5 .74 .76 .06 6 .74 .80 .07 7 .69 .86 .08 8 .66 .88 .10 9 .46 .91 .09 10 .34 .95 .12 11 .29 .97 .12 12 .26 .97 .14 13 .23 .98 .16 14 .20 .98 .18 15 .20 .99 .21

.99 1.00 .99 .99 .99 .99 .99 .99 .99 .99 .99 .99 .99 .98 .98

.38 .49 .61 .67 .76 .80 .85 .88 .91 .94 .95 .96 .96 .97 .97

.26 .37 .43 .46 .50 .55 .54 .54 .37 .29 .26 .23 .21 .18 .19

(b) Clinical sample (35 agoraphobics, 211 others) 1 .89 .30 .17 2 .89 .39 .19 3 .83 .52 .22 4 .80 .58 .24 5 .74 .69 .28 6 .74 .72 .31 7 .69 .76 .32 8 .66 .81 .36 9 .46 .83 .31 10 .34 .90 .36 11 .29 .91 .34 12 .26 .92 .35 13 .23 .95 .42 14 .20 .95 .41 15 .20 .96 .44

.94 .95 .95 .95 .94 .94 .94 .93 .90 .89 .88 .88 .88 .88 .88

.38 .46 .56 .61 .70 .72 .75 .78 .78 .82 .82 .83 .85 .85 .85

.18 .27 .35 .38 .43 .46 .45 .46 .29 .24 .20 .18 .18 .15 .16

anxiety disorder. If screening for agoraphobia with SCL, the maximum Youdenindex was .17 (total sample) and .14 (clinical sample). 2.5. Summary of results Due to the multitude of results, a brief summary is given in the following section. Neither of the questionnaires with a presumed global screening quality performed to a satisfying degree. Only sensitivity or speci®city can be gained. The more economic SCL anxiety subscale performs equally well as the BAI. The ASI exhibited less screening ability than the above mentioned scales.

130

J. Hoyer et al. / Anxiety Disorders 16 (2002) 113±134

When a speci®c screening intention was followed, results showed a tendency towards more precise identi®cation of target subjects. Within speci®c tests the MI performed best. Even when the best result (MI screening for agoraphobia) is considered, PPP based on the test alone was far from being suf®cient for practical use. 3. Discussion Questionnaires developed to quantify clinical symptoms by self-report are usually evaluated on the basis of parameters of classical test theory including objectivity, reliability, and validity. This strategy suggests that these questionnaires do not differ in their theoretical basis and in their intended application from questionnaires that are designed to be indicative of latent psychological constructs. The fact that clinical questionnaires are often interpreted and used as clinical tests, with the primary aim to identify classes or taxa, is often not thoroughly acknowledged in efforts of test constructors to validate their instrument. Beyond other aims as, e.g., identifying clinical features of relevance for therapy or providing indications of clinical change, questionnaires are also used for identifying clinical cases and non-cases. Due to the different sources of failure variance inherent both in self-report and in clinical judgement, neither of these strategies would be regarded as suf®cient for case identi®cation. For the same reason, discriminant validity of a test can never be perfect when validated based on clinical judgement which itself is of restricted validity. These problems in clinical discrimant validity considered, diagnostic accuracy of tests may often be judged skeptical on a priori grounds. However, perfection should not be the goal here. Only when the ability of tests to discriminate between patients and nonpatients is inspected, discriminant qualities of questionnaire can be known and further developed. The strategy of the present studyÐcomparing global and speci®c screening and comparing different questionnaires within eachÐhas led to some considerable new information about the discriminant abilities of anxiety questionnaires in general. Before this is discussed any further, possible rami®cations of the present study have to be considered. These include mainly the speci®city of our sample. It is representative for young female adults in East Germany. Findings may not be generalizable to men, to older adults, or to other countries. On the other hand, ®ndings regarding prevalences are quite comparable to those of the large sample studies by Essau, Karpinski, Petermann, and Conradt (1998) and Wittchen, Nelson, and Lachner (1998). And also, what we ®nd more important, the sample comprises the typical problems that arise when anxiety questionnaires are used in an epidemiological sample, thus when participants have not already self-labeled themselves as patients. This leads to the discussion of our main ®ndings. First of all, at least with the given instruments, screening for anxiety disorders in general with one single

J. Hoyer et al. / Anxiety Disorders 16 (2002) 113±134

131

questionnaire does not seem promising. Patients with differing anxiety disorders would respond heterogeneously to most of the ``unspeci®c'' anxiety items. As evident in our study, a relevant proportion of participants with an anxiety disorder (other than speci®c phobia) did not respond positively to more than one or two BAI items. It has to be noted here that participants had already answered at least part of the questions of the diagnostic interview before they went to answer questionnaire items. This makes implausible that denial or dissimulation of symptoms accounted for the results to a relevant degree. Dissimulators would not have been identi®ed as anxiety patients at all. However, having collaborated in the interview may have changed the reference point with regard to symptom selfascription. Furthermore, it seems probable that social phobics, the largest subgroup of our sample, did not suffer many symptoms in the BAI, ASI, or SCL anxiety subscale and responded accordingly. Social phobics, though they are suffering and limited by their symptoms, often fear only speci®c situations that in some case can be avoided. Anxiety problems of these patients are not accurately addressed with questions focusing, e.g., on bodily symptoms or on more general anxiety. When taking the given heterogeneity of anxiety disorders into account, a questionnaire aiming to assess a broad range of clinical anxiety symptoms should clearly be heterogeneous and multi-faceted in itself. Unfortunately, the FQ is only based in part on such a rationale. It contains multiple anxiety features relevant for heterogeneous anxiety patients, like social phobics, agoraphobics, and blood and injury phobics, but it inconsequentially contains no items relevant for PTSD, panic, or generalized anxiety disorder. In light of our results, the further development of the ``FQ approach'' seems promising when it is intended to yield an overview of anxiety symptomatology by questionnaire. It has to be noted that such a clinical strategy (see also Wittchen & Boyer, 1998) would differ in its rationale from a trait or personality approach that may focus more on a homogenous construct of ``anxiety.'' Clinical and personality approaches should not be mixed or confused both on the theoretical and practical level as long as there is an open discussion whether clinical disorders are better de®ned on the basis of a prototypical or dimensional approach. A second main ®nding is that reasonable predictive accuracy can be gained when a speci®c disorder is targeted as our results on the screening of agoraphobia with the MI and the FQ showed. Symptoms and, respectively items, are less heterogeneous when a speci®c disorder is under investigation. Obviously, it is easier to identify target subjects correctly under these conditions. This appears to be true even if instruments are used that were not speci®cally designed to screen for agoraphobia. The FQ aims at agoraphobic symptoms more as an aspect of anxiety in general and does not contain enough items to be even better suited for screening. The MI focuses only and speci®cally on one of the features of agoraphobia, namely the avoidance tendency. Were other discriminative aspects of the syndrome equally represented in the questionnaire, the screening ability clearly must be expected to rise. Another more speculative explanation for the good results with the MI may be that the questionnaire addresses less how people would feel in a given situation

132

J. Hoyer et al. / Anxiety Disorders 16 (2002) 113±134

but more how they would behave (avoid situations). Maybe such behavior oriented items are less open to the responders' subjective (mis-)interpretation. The high diagnostic utility of questionnaires designed for speci®c anxiety disorders has recently been proven with regard to other anxiety disorders such as social phobia (Stangier et al., 1999) or generalized anxiety disorder (Beck, Stanley, & Zebb, 1996; Wittchen & Boyer, 1998). However, only the latter study was conducted in an epidemiological (community) sample. To conclude, the interpretation of the results presented here would lead to the following tentative or ``heuristic'' orientations for clinical test construction. (1) Do not confuse intentions of personality and clinical testing. (2) Concentrate on speci®c disorders. (3) When aiming at anxiety in general (as a clinical feature), combine elements of all of the heterogeneous facets of anxiety. (4) Use items that operationalize behavior. Clearly, these recommendations, though based on data of a large sample, have to be open to further investigation in different settings. References Abel, U. (1993). Die Bewertung Diagnostischer Tests. Stuttgart: Hippokrates-Verlag. Baldessarini, R. J., Finklestein, S., & Arana, G. W. (1983). The predictive power of diagnostic tests and the effect of prevalence of illness. Archives of General Psychiatry, 40, 569±573. Bech, P. (1993). Rating scales for psychopathology, health status and quality of life. Berlin: Springer. Beck, A. T., Epstein, N., Brown, G., & Steer, R. A. (1988). An inventory for measuring clinical anxiety: psychometric properties. Journal of Consulting and Clinical Psychology, 56, 893±897. Beck, J. B., Stanley, M. A., & Zebb, B. J. (1996). Characteristics of generalized anxiety disorder in older adults: a descriptive study. Behavior Research and Therapy, 34, 225±234. Beck, A. T., & Steer, R. A. (1990). Beck Anxiety Inventory manual. San Antonio: The Psychological Corporation Harcourt Brace Jovanovich Inc. Beck, A. T., & Steer, R. A. (1991). Relationship between the Beck Anxiety Inventory and the Hamilton Anxiety Rating Scale with anxious outpatients. Journal of Anxiety Disorders, 5, 213±223. Becker, E. S., TuÈrke, V., Neumer, S., Soeder, U., & Margraf, J. (2000). Incidence and prevalence rates of mental disorders in a community sample of young women: results of the Dresden study. In: R. Manz & W. Kirch (Eds.), Public health research and practice: report of the public health research association saxony (pp. 259±291). Regensburg: Roderer. Borden, J. W., Peterson, D. R., & Jackson, E. A. (1991). The Beck Anxiety Inventory in nonclinical samples: initial psychometric properties. Journal of Psychopathology and Behavioral Assessment, 13, 345±356. Chambless, D. L., Caputo, G. C., Jason, S. E., Graceley, E. J., & Williams, C. (1985). The Mobility Inventory for agoraphobia. Behaviour Research and Therapy, 23, 35±44. Cox, B. J., Cohen, E., Direnfeld, D. M., & Swinson, R. P. (1996). Does the Beck Anxiety Inventory measure anything beyond panic attack symptoms? Behaviour Research and Therapy, 34, 949±954. Craske, M. G., Rachman, S. J., & Tallman, K. (1986). Mobility, cognitions, and panic. Journal of Psychopathology and Behavioral Assessment, 8, 199±210. Creamer, M., Foran, J., & Bell, R. (1995). The Beck Anxiety Inventory in a non-clinical sample. Behaviour Research and Therapy, 33, 477±485. Derogatis, C. R. (1977). SCL-90-R: administration, scoring & procedures. Manual I for the R(evisited) Version and other instruments of the psychopathology rating scale series. Johns Hopkins University School of Medicine.

J. Hoyer et al. / Anxiety Disorders 16 (2002) 113±134

133

DiNardo, P. A., Brown, T. A., & Barlow, D. H. (1995). Anxiety Disorders Interview Schedule for DSM-IV: lifetime version (ADIS-IV-L). Albany, New York: Graywind Publications. Ehlers, A., & Margraf, J. (1993). Angst vor der Angst: Ein neues Konzept in der Diagnostik der AngststoÈrungen. Verhaltenstherapie, 3, 14±24. Ê ngsten, Ehlers, A., Margraf, J., & Chambless, D. L. (1993). Fragebogen zu koÈrperbezogenen A Kognitionen und Vermeidung (AKV) mit den Skalen BSQ, ACQ und MI. Weinheim: Beltz. Elwood, R. (1993). Psychological tests and clinical discriminations: beginning to address the base rate problem. Clinical Psychology Review, 13, 409±419. Essau, C. A., Karpinski, N. A., Petermann, F., & Conradt, J. (1998). HaÈu®gkeit und KomorbiditaÈt psychischer StoÈrungen bei Jugendlichen: Ergebnisse der Bremer Jugendstudie. Zeitschrift fuÈr Klinische Psychologie und Psychotherapie, 46, 105±124. Franke, G. (1995). Die Symptom Checkliste von Derogatis±Deutsche Version. GoÈttingen: Hogrefe. Fydrich, T., Dowdall, D., & Chambless, D. L. (1992). Reliability and validity of the Beck Anxiety Inventory. Journal of the Anxiety Disorders, 6, 55±61. Gillis, M. M., Haaga, D. A. F., & Ford, G. T. (1995). Normative values for the Beck Anxiety Inventory, Fear Questionnaire, Penn State Worry Questionnaire, and Social Phobia and Anxiety Inventory. Psychological Assessment, 7, 450±455. Glaros, A. G., & Kline, R. B. (1988). Understanding the accuracy of tests with cutting scores: the sensitivity, speci®city, and predictive value model. Journal of Clinical Psychology, 44, 1013±1023. Gotlib, I. H., Lewinsohn, P. M., & Seeley, J. R. (1998). Consequences of depression during adolescence: marital status and marital functioning in early adulthood. Journal of Abnormal Psychology, 107, 686±690. Hamilton, M. (1959). The assessment of anxiety states by rating. British Journal of Medical Psychology, 32, 50±55. Hsiao, J. K., Bartko, J. J., & Potter, W. Z. (1989). Diagnosing diagnoses. Archives of General Psychiatry, 46, 664±667. Kabacoff, R. I., Segal, D. L., Hersen, M., & Van Hasselt, V. B. (1997). Psychometric properties and diagnostic utility of the Beck Anxiety Inventory and the State-Trait Anxiety Inventory with older adult psychiatric outpatients. Journal of Anxiety Disorders, 11, 33±47. Margraf, J., Schneider, S., & Ehlers, A. (1991). DIPS: Diagnostisches Interview bei Psychischen StoÈrungen. Berlin: Springer. Margraf, J., Schneider, S., Soeder, U., Neumer, S., & Becker, E. S. (1996). F-DIPS: Diagnostisches Interview bei Psychischen StoÈrungen (Forschungsversion), Interviewleitfaden, Version 1.1, 7/96. Technical University of Dresden. Marks, I. M., & Mathews, A. M. (1979). Brief standard self-rating for phobic patients. Behaviour Research and Therapy, 17, 263±267. Meehl, P. E., & Rosen, A. (1955). Antecedent probability and the ef®ciency of psychometric signs, patterns, or cutting scores. Psychological Bulletin, 52, 194±216. Mossman, D., & Somoza, E. (1989). Maximizing diagnostic information from the dexamethasone suppression test. Archives of General Psychiatry, 46, 653±660. Moylan, A., & Oei, T. P. S. (1992). Is the Fear Questionnaire (FQ) a useful instrument for patients with anxiety disorders? Behavior Change, 9, 38±49. Oei, T. P. S., Moylan, A., & Evans, L. (1991). Validity and clinical utility of the Fear Questionnaire for anxiety-disorder patients. Psychological Assessment, 3, 391±397. È st, L.G. (1990). The agoraphobia scale: an evaluation of its reliability and validity. Behavior O Research and Therapy, 28, 323±329. Peterson, R. A., & Reiss, S. (1987). Anxiety Sensitivity Index Manual. Palos Heights, IL: International Diagnostic Systems, Inc. Reiss, S. (1991). Expectancy model of fear, anxiety, and panic. Special issue: applied learning theory: research issues for the 1990s. Clinical Psychology Review, 11, 141±153. Reiss, S., & McNally, R. J. (1985). The expectancy model of fear. In: S. Reiss & R. R. Bootzin (Eds.), Theoretical issues in behavior therapy (pp. 107±121). New York: Academic Press.

134

J. Hoyer et al. / Anxiety Disorders 16 (2002) 113±134

Reiss, S., Peterson, R. A., Gursky, D. M., & McNally, R. J. (1986). Anxiety sensitivity, anxiety frequency and the prediction of fearfulness. Behaviour Research and Therapy, 24, 1±8. Schneider, S., Margraf, J., SpoÈrkel, H., & Franzen, U. (1992). Therapiebezogene Diagnostik: ReliabilitaÈt des Diagnostischen Interviews bei Psychischen StoÈrungen (DIPS). Diagnostica, 38, 209±227. Stangier, U., Heidenreich, T., Berardi, A., Golbs, U., & Hoyer, J. (1999). Die Erfassung sozialer Phobie durch die Social Interaction Anxiety Scale (SIAS) und die Social Phobia Scale (SPS). Zeitschrift fuÈr Klinische Psychologie, 28, 28±36. Steer, R. A., & Beck, A. T. (1996). Generalized anxiety and panic disorders: response to Cox, Cohen, Direnfeld, and Swinson. Behaviour Research and Therapy, 34, 955±957. Steer, R. A., Ranieri, W. F., Beck, A. T., & Clark, D. A. (1993). Further evidence for the validity of the Beck Anxiety Inventory with psychiatric outpatients. Journal of Anxiety Disorders, 7, 195±205. Taylor, S., Koch, W. J., & McNally, R. J. (1992). How does anxiety sensitivity vary across the anxiety disorders. Journal of the Anxiety Disorders, 6, 249±259. Wittchen, H. - U., & Boyer, P. (1998). Screening for anxiety disorders: sensitivity and speci®city of the anxiety screening questionnaire (ASQ-15). British Journal of Psychiatry, 173(Suppl. 34), 10±17. Wittchen, H. - U., Nelson, C. B., & Lachner, G. (1998). Prevalence of mental disorders and psychosocial impairments in adolescents and young adults. Psychological Medicine, 28, 109±126. Zarin, D. A., & Earls, F. (1993). Diagnostic decision making in psychiatry. American Journal of Psychiatry, 150.