Journal of Clinical Epidemiology 53 (2000) 940–948
Comparison of ultrasound, radiography, and clinical examination in the diagnosis of acute maxillary sinusitis: a systematic review Helena Varonena,*, Marjukka Mäkelä a, Seppo Savolainenb, Esa Lääräc, Jörgen Hildend a
Department of Services and Quality, Stakes, National Research and Development Centre for Welfare and Health, P.O. Box 220, 00531, Helsinki, Finland b Department of Otolaryngology, Central Military Hospital, Helsinki, Finland c Department of Mathematical Sciences/Statistics, University of Oulu, Oulu, Finland d Department of Biostatistics, University of Copenhagen, Copenhagen, Denmark Received 19 January 1999; received in revised form 22 September 1999; accepted 28 October 1999
Abstract The objective of this study was to assess the discriminative properties of the methods for diagnosing acute maxillary sinusitis (AMS) in unselected patients. The study design was a systematic review of evaluation studies identified by using Medline, by searching reference lists, by hand searches, and by contacting investigators. Evaluation studies were conducted anywhere in the world. Subjects were adults with suspected AMS. Main outcome measures were: sensitivity, specificity, positive and negative likelihood ratios of the primary studies, weighted means of these parameters in each comparison (clinical examination, radiography, and ultrasound compared to a reference standard in diagnosing AMS), and summary ROC curves and their Q* points where sensitivity equals specificity. For the years from 1962 to present, 49 study reports were found; 11 articles on studies that included a total of 1144 patients were eligible. Compared to sinus puncture, radiography was the most accurate method for diagnosing AMS: the Q* point on the summary ROC curve was 0.82 (95% confidence interval, CI, 0.78–0.85). Ultrasound was slightly less accurate than radiography compared to sinus puncture (Q* 0.80, 95% CI 0.76– 0.83). Only two articles reported clinical examination compared to sinus puncture and the Q* for them was 0.75 (95% 0.58–0.86). Clinical examination is a rather unreliable method for diagnosing AMS, even in the hands of experienced specialists. Using radiography or ultrasound improves the accuracy of diagnosis. The diagnosis of AMS is rarely studied in primary care settings. Future comparative trials should preferably combine diagnosis and treatment, evaluating the two aspects of clinical management as unit. © 2000 Elsevier Science Inc. All rights reserved. Keywords: Acute maxillary sinusitis; Diagnosis; Sensitivity; Specificity; ROC curves
1. Introduction Acute maxillary sinusitis (AMS) is one of the most frequent diagnoses in primary health care. Nasal complaints, allergy, and rhinitis are common, and distinguishing between rhinitis and sinusitis is not always simple. Most often the diagnosis of AMS is based on symptoms and clinical examination alone. In Northern Europe and in the Netherlands the sinus ultrasound performed by clinicians has gained wide acceptance in the diagnosis of AMS and in part displaced sinus radiographs. In the United States computed tomography has also been used in the diagnosis of AMS, while in Europe it is used only in secondary care. The diagnostic criteria for AMS are not well established. Most often it is defined as inflammation of the paranasal sinus * Corresponding author. Tel.: ⫹358-9-3967 2291; fax: ⫹358-9-3967 2278. E-mail address:
[email protected] (H.Varonen)
mucosa with retention in the sinuses [1]. Sinus puncture is considered as the gold standard in the diagnosis of sinusitis. However, it is invasive and not easily accepted by the patients. Radiography and computed tomography (CT) have also been used as reference standards. The problem with both of them is that they may produce a number of false-positive results (mucosal thickening or polyps, cysts, or anomalies). To our knowledge, there are no studies in which CT had been compared to sinus puncture. However, there is evidence that patients with the common cold and even nonsymptomatic patients may have pathological findings in the CT of sinuses [1–3]. The difficulties in indicating the retention in maxillary sinuses have led to the overdiagnosing of AMS and to overtreatment with antibiotics. In most countries the current practice is to treat suspected AMS with antibiotics [4]. In Finland, more than 90% of patients with symptoms of suspected sinusitis receive a course of antibiotics at their first visit to a general practitioner when the diagnosis is based on clinical examination only [5].
0895-4356/00/$ – see front matter © 2000 Elsevier Science Inc. All rights reserved. PII: S0895-4356(99)00 2 1 3 - 9
H. Varonen et al. / Journal of Clinical Epidemiology 53 (2000) 940–948
The aim of our study was to evaluate through a systematic review of the research literature the efficacy of clinical examination, ultrasound, and radiography as diagnostic measures for AMS in unselected populations. A second aim was to put some proposed meta-analytical approaches to test.
2. Material and methods 2.1. Search strategy We searched the Medline database from 1966 to April 1999 and the Finnish medical database, Medic, from 1977 onwards using a variety of both Medical Subject Heading (MeSH) terms and textword options. Two investigators (H.V. and M.M.) independently assessed the database searches to find potential study articles. Reference lists of pertinent articles and reviews were used to identify more studies. We hand-searched volumes dating from 1980 on of four important journals in the field (Journal of Allergy and Clinical Immunology, Acta Oto-Laryngologica, Annals of Otology, Rhinology and Laryngology, Journal of Ultrasound in Medicine). Finally, to ensure completeness, we personally contacted experts studying the diagnosis of sinusitis. 2.2. Inclusion criteria We included studies that compared a diagnostic method with a reference standard. As reference methods we accepted sinus puncture or computed tomography. Our target population was adults with a suspected AMS (symptoms for less than 3 months) in primary care or a comparable setting. If the setting was not clearly stated, we accepted the study if the population consisted of acute patients with no diagnostic procedures done before the methods under study. We accepted articles in English, German, French, Scandinavian languages, and Finnish. We excluded studies on chronic sinusitis or otherwise selected patient populations (e.g., consisting solely of allergic or surgical patients). 2.3. Methodological standards We used the methodological standards proposed by the Cochrane Collaboration Methods Working Group on Diagnosis and Screening [6]. The criteria for study validity were as follows: 1) Was the test compared with a valid reference standard? 2) Were the test and reference standards measured blind of each other? 3) Was the choice of patients assessed by the reference independent of the test’s result? 4) Was the test measured independently of all other clinical information? 5) Was the reference standard measured before any interventions were started with knowledge of the test results? 6) Were tests compared in a valid design? We did not accept into our meta-analysis studies that failed on standards 3, 5, and 6; as regards standards 1, 2 and 4, see results. Two authors (H.V. and M.M.) rated independently the methodological quality of the studies, and the disagreements were solved by consensus.
941
In most studies sinuses were counted rather than patients. As we had no firm information on the right–left crossdependence that thereby arises, we had to disregard it, as did the investigators concerned. 2.4. Data collection We collected data from the original studies in the form of a common 2 ⫻ 2 contingency table of observed frequencies (true positive, false positive, true negative, false negative). If the numbers of patients were not given in the study reports, we contacted the authors and asked for the original data. Ultrasound findings were considered positive according to the criteria established by Revonta: scans with the presence of back wall echo ⭓3.5 cm from the initial echo were considered positive, others negative [7]. In sinus radiographs the findings with air-fluid level, complete opacity, or mucosal thickening of ⭓6 mm were considered as positive. In studies of clinical examination, we only collected data on the physician’s overall clinical impression. 2.5. Statistical analysis The statistical methods are described in detail in the statistical appendix. From the original numbers we calculated the sensitivity, specificity, and their confidence intervals for the diagnostic method under study. We also calculated the prevalence of sinusitis in each study population and the positive and negative likelihood ratios for the diagnostic method in each study. We categorized the original studies by the type of reference standard used and performed summary analysis within each group. We calculated the summary sensitivities, specificities, and likelihood ratios and their confidence intervals while weighting the studies by inverse variance [8]. More appropriately, we formed summary receiver operating characteristic curves (SROC) by using the Meta-test software produced by Joseph Lau, and by modifying the method introduced by Moses, Shapiro, and Littenberg [9–11] using zero as the slope parameter. As a summary estimate we used the Q* value, the point where the SROC curve shoulders up to the desirable northwest corner [12,13]. Q* is the point of the SROC curve in which the sensitivity and the specificity are equal. Geometrically it implies intersecting the SROC with the NW-SE diagonal of the ROC square; the coordinates of this point are: (false positive, true positive fraction) ⫽ (1 ⫺ Q*, Q*). 3. Results 3.1. Search results The Medline searches yielded 1054 references and Medic search yielded 49. Altogether 168 abstracts were printed for further examination. One investigator (H.V.) critically rated the study articles and the questionable studies were discussed with the other reviewers. A total of 49 studies concerning the diagnosis of maxillary sinusitis in adults were retrieved. The results of the literature searches
942
H. Varonen et al. / Journal of Clinical Epidemiology 53 (2000) 940–948
and the points of the individual studies for radiography and ultrasound compared to sinus puncture are presented in Figs. 1 and 2. In the comparison to sinus puncture, radiography was the most accurate diagnostic method for AMS (weighted mean LR⫹ 3.36, LR⫺ 0.026). Radiography had both good sensitivity and specificity in the diagnosis of AMS. In the two primary-care studies (that of van Buchem et al. and that of Laine et al.) the sensitivity was lower than in the hospital-based studies. The overall results of ultrasound compared to puncture were not much weaker than those of radiography (LR⫹ 2.78, LR⫺ 0.30) but they were more heterogenous. Revonta [7] achieved better results than the other authors did. The summary Q* values of radiography and ultrasound were comparable, 0.82 (95% CI 0.78–0.85) for radiography and 0.80 (95% CI 0.76–0.83) for ultrasound (Table 5). We found only two studies in which clinical examination was compared to sinus puncture in all patients, both published by Berg and Carenfelt [13,19]. The sensitivity of clinical examination performed by experienced otorhinolaryngologists was 0.69, and the specificity 0.79 (Table 4).
Table 1 Results of the literature searches Source
No. of additional reports found
Medline Medic Reference searches Hand searches Personal contact Total
30 1 12 0 6 49
Reasons for exclusion Primary reasons for exclusion
No. of reports
Selected patient population Inadequate data Chronic sinusitis Unacceptable reference method Retrospective study Eligible reports Total
12 10 6 8 2 11 49
and the primary reasons for study exclusion are presented in Table 1. Eleven articles met our inclusion criteria, there being two pairs of duplicate publications [14,15] and [16,17] and two that were unpublished at the time of reviewing [12,18]. Table 2 gives the characteristics of the settings and the patient populations of the included studies.
3.4. Computed tomography as reference standard Only Hansen [16] had compared clinical examination to CT in unselected patients with suspected AMS. The nominal accuracy of clinical examination was poor in this general-practice study: the sensitivity of overall clinical impression was 0.85 (95% CI 0.82–0.89), specificity 0.23 (95% CI 0.18–0.27), LR⫹ 1.10 and LR⫺ 0.64.
3.2. Methodological quality of included trials The methodological characteristics of the included trials are collected in Table 3. The methodological aspects of the studies were often rather vaguely reported. The patients studied were seldom described in detail (e.g., age, severity of disease, duration of symptoms, dropouts). The presentation of results varied and several types of categories and tabulations were used. Sensitivity and specificity were reported in most cases but likelihood ratios in one case only. Sometimes it was difficult to find out the quality of blinding.
4. Discussion In our meta-analysis, radiography was the most accurate method to diagnose sinusitis in unselected patients. Ultrasound was almost as accurate but the results of ultrasound were heterogeneous. This may be a result of differences in who performs the scan, in the equipment used, or in the aims of use: ultrasound was not accurate in predicting cysts or mucosal thickening in sinuses. However, an ultrasound finding of back-wall echo indicating retention in the sinuses was a strong evidence to suspect sinusitis.
3.3. Sinus puncture as reference standard The estimates of sensitivity, specificity, positive and negative likelihood ratios, and prevalence of sinusitis in individual studies are presented in Table 4. The SROC curves
Table 2 Characteristics of patient populations of included studies Year
First author, (ref)
Setting
Patients, N
Age (percentage of males)
Duration of study
1963 1980 1982 1985 1988 1995 1995 1997 1998
McNeill [30] Revonta [7] Kuusela [14] Berg [19] Berg [13] Hansen [16] van Buchem [30] Savolainen [18] Laine [12]
ENT-clinic ENT-clinic Military clinic ENT-clinic Emergency ward General practice General practice Military clinic General practice Total
150 215 105 105 155 174 40 161 39 1144
10 to over 50, (52%) Adults 17–28, (100%) 10–75, mean 37, (33%) adults, mean 38, (38%) 19–62, (30%) over 18, (33%) 17–68, (81%) 16–68, median 37 (33%)
NA NA NA ⬍3 months ⬍3 months ⬍1 month NA ⬍1 month ⬍1 month
NA ⫽ data not available. ENT-clinic ⫽ ear, nose, and throat clinic.
H. Varonen et al. / Journal of Clinical Epidemiology 53 (2000) 940–948
943
Table 3 Methodological characteristics of included studies
Year
First author
Diagnostic method
Reference standard
1963 1980 1982 1985 1988 1995 1995 1997 1998
McNeill Revonta Kuusela Berg Berg Hansen van Buchem Savolainen Laine
Clin. exam, x-ray u-sound, x-ray u-sound, x-ray clin. exam, u-sound clin. exam clin. exam u-sound, x-ray u-sound, x-ray u-sound, x-ray
x-ray, SP SP, x-ray SP x-ray or SP SP CT (SP for a subgroup) SP SP SP
Test and ref. standard measured independently of each other?
Categorya
Test measured blind of all clinical information?
No Yes NA Yes NA Yes Yes Yes Yes
NA I NA II BA II I I I
NA No NA NA Infeasible Infeasible No No No
The three right-most columns refer to Cochrane working group standards 1, 2 and 4; violation of 3, 5, and 6 meant exclusion. a Categories: I: Test measured independently of reference standard and reference standard independently of test (MOST VALID). II: Test measured independently of reference standard but not vice versa. NA ⫽ data not available. SP ⫽ sinus puncture. CT ⫽ computed tomography.
The likelihood ratios of all methods studied to diagnose AMS were modest: LR⫹s from 2 to 5 affect pretest probabilities only slightly. With the current methods, the diagnosis of AMS remains challenging, especially in primary care settings, where the pretest probability of disease is low. The performance of clinical impression would presumably have been even poorer if the Berg studies had not used experienced otorhinolaryngologists. The methods for diagnosing AMS have not been extensively studied in unselected, primary care populations. We found 49 articles concerning the diagnosis of AMS and most of them were carried out in ear, nose, and throat clinics. Most often we excluded studies because of their selected patient population, because of serious methodological flaws, or because of missing data specifications. Although the methods for diagnostic meta-analyses have improved in the last years they still are less developed than the methods for meta-analyses of treatment effectiveness. Several methods for combining the data from diagnostic studies have been proposed [20,21] and guidelines for metaanalyses evaluating diagnostic tests have been recommended [10,22]. However, there is no consensus on a clinically informative point estimate with confidence limits for diagnostic meta-analyses. For this purpose we used the Q* point suggested by Moses et al. [9] and weighted means of sensitivity, specificity, and likelihood ratios. The lack of a good reference method is one of the major obstacles to studying fuzzy everyday illnesses such as sinusitis. Sinus puncture is the most reliable method of confirming retention in maxillary sinuses, but is unpleasant for the patient. Sinus puncture was applied as a gold standard method to all patients only in studies carried out in the Nordic countries and the Netherlands. CT and magnetic resonance imaging are sensitive but probably not very specific methods for diagnosing sinusitis [2,3]. We did not find studies in which CT and MRI had been compared to sinus puncture. Radiography is the most common reference method in studies concerning the diagnosis of sinusitis. The sensitivity of radiography compared to puncture varied in our review
from 61% to 93%. It was especially low in the recent primary health care studies (Laine et al. [12] 61%, and van Buchem et al. [23] 63%). The radiographs in these studies were interpreted by experience radiologists. The acceptability of a reference standard with so low a sensitivity may well be questioned and we decided to exclude these studies. A weak gold standard impairs the results of the method under study. It would be desirable to have a method by which a test evaluation against an erroneous gold standard test could be corrected by taking into account what has been shown about that test itself vis-à-vis a putatively more nearly perfect gold standard [24]. However, this cannot be done without making unverifiable assumptions about the simultaneous behavior of the three tests concerned. The importance of presenting the three-way tables seems to remain unappreciated, as few authors provide such data. There is an association between the prevalence of sinusitis and the sensitivity of the test in the material used in this review: the smaller the prevalence, the smaller the sensitivity. This may reflect a measurement bias caused by routine: When the prevalence is low the investigator becomes accustomed to negative findings and errs more easily toward a diagnosis of no disease, resulting in an increased number of false-negative diagnoses. Another explanation may be the predominance of less severe and early phase disease of AMS in primary care. Publication bias is more of a problem for any type of observational studies than for randomized therapeutic trials [25]. We made an effort to minimize the effect of publication bias on our results by contacting researchers to find unpublished studies, but we may have missed a number of studies. The methodological quality of the studies in this review was variable. The articles often gave insufficient information on substantial methodological issues such as blind assessment of data or selection of patients. In several studies the patients’ characteristics, duration of symptoms, and eligibility criteria were not satisfactorily described and therefore the study results must be treated cautiously. If the blind
944
H. Varonen et al. / Journal of Clinical Epidemiology 53 (2000) 940–948
Table 4 Sinus puncture as reference standard: results of the eligible studies Radiography compared to puncture (SROC Figure 1) Study
N*
Prev.
Sens.
(95% CI)
Spec.
(95% CI)
LR⫹
LR⫺
McNeill 1963 Revonta 1980 I Revonta 1980 II Kuusela 1982 van Buchem 1995 Savolainen 1997 Laine 1998 Total Weighted mean
242 170 60 156 62 234 72 996
0.61 0.46 0.58 0.53 0.26 0.80 0.32
0.76 0.87 0.80 0.83 0.63 0.93 0.61
(0.71–0.82) (0.82–0.92) (0.70–0.90) (0.77–0.89) (0.50–0.75) (0.90–0.96) (0.50–0.72)
0.76 0.91 0.80 0.72 0.91 0.62 0.98
(0.70–0.81) (0.87–0.95) (0.71–0.89) (0.65–0.79) (0.84–0.98) (0.55–0.68) (0.95–1.00)
3.12 9.94 2.92 2.92 7.19 2.43 29.83
0.31 0.14 0.52 0.24 0.41 0.11 0.40
0.87
(0.85–0.88)
0.89
(0.88–0.91)
3.36
0.26
Ultrasound compared to puncture (SROC Figure 2) LR⫹
LR⫺
Study
N*
Prev.
Sens.
(95% CI)
Spec.
(95% CI)
Revonta 1980 I Revonta 1980 II Revonta 1980 III Kuusela 1982 van Buchem 1995 Savolainen 1997 Laine 1998 Total Weighted mean
170 200 60 156 48 234 72 940
0.46 0.53 0.58 0.53 0.27 0.80 0.32
0.87 0.92 0.94 0.71 0.54 0.81 0.61
(0.82–0.92) (0.88–0.95) (0.88–1.00) (0.64–0.78) (0.40–0.68) (0.76–0.86) (0.50–0.72)
0.91 0.81 0.72 0.64 0.94 0.72 0.53
(0.87–0.95) (0.75–0.86) (0.61–0.83) (0.56–0.71) (0.88–1.00) (0.67–0.78) (0.42–0.65)
9.94 4.73 3.37 1.94 9.42 2.94 1.30
0.14 0.10 0.08 0.46 0.49 0.26 0.74
0.85
(0.84–0.87)
0.82
(0.80–0.83)
2.78
0.30
Clinical examination compared to puncture LR⫹
LR⫺
Study
N
Prev.
Sens.
(95% CI)
Spec.
(95% CI)
Berg 1985 Berg 1988 Total Weighted mean
90 155 245
0.48 0.44
0.75 0.66
(0.66–0.84) (0.58–0.73)
0.78 0.79
(0.70–0.87) (0.73–0.85)
3.45 3.13
0.32 0.43
0.69
(0.65–0.73)
0.79
(0.75–0.82)
3.25
0.40
N* ⫽ number of sinuses studied; N ⫽ Number of patients studied; 95% CI ⫽ confidence interval; Prev. ⫽ prevalence; Sens. ⫽ sensitivity; Spec. ⫽ specificity; LR⫹ ⫽ positive likelihood ratio; LR⫺ ⫽ negative likelihood ratio.
assessment of data and an adequately broad spectrum of patients are not guaranteed, the diagnostic test evaluations are liable to bias, which may falsely elevate the estimate of the test’s efficacy [26,27]. Methodological standards for diagnostic test evaluation have improved [28,29], facilitating better study designs. Indeed, we noted the methodological issues were better mastered in the more recent studies included in this review. The literature reviewed here clearly demonstrates that the statistical problems posed by interdependent observations are poorly understood. First, the right and left sinuses of a given patient with suspected bilateral AMS do not represent two independent clinical problems. In fact, the two sides are interdependent on three levels: nosologically because infection on one side makes infection on the other side more likely; management-wise because the two sides are examined by the same clinician and the same radiologists, who cannot wear “blinders”; and therapeutically because antibiotics, when prescribed, may benefit both sides even when one side is false negative. Second, a number of studies [7,13,16,17,19,23,30] investigate three or more diagnostic modalities but do not give the relevant three-way breakdown (i.e., 2⫻2⫻2 tables comparing the three modalities). This precludes the highly desirable
paired-data comparisons that would sharpen the conclusions of a review of the present kind (as it is comparative standard errors cannot be calculated). It also blocks the road to intelligent suggestions as to how one can sharpen the message of those studies that employ a suspect gold standard. An accurate diagnosis is worthless if it has no therapeutic consequences. The study from van Buchem et al. [31] showed that AMS diagnosed with radiography clears equally well with placebo and with antibiotics. The metaanalysis published by Stalman et al. [32] led to the same conclusion. The newest meta-analysis by de Ferranti et al. [33] showed that although antibiotics were significantly more efficacious than placebo in achieving cure of clinical symptoms in AMS, still over two thirds of patients receiving placebo recovered spontaneously. A conclusion of this evidence may be that most AMS patients recover spontaneously but certain subgroups of AMS patients may benefit from antimicrobial therapy. No studies are available that evaluate the diagnostic policies for AMS in conjunction with therapeutic options. In view of the findings of de Ferranti et al. [33] and van Buchem et al. [31], this kind of large-scale combined diagnostic–therapeutic policy trial is very much needed. Whatever kind of serious research into AMS diagnosis and treat-
H. Varonen et al. / Journal of Clinical Epidemiology 53 (2000) 940–948
945
Fig. 1. Radiography compared to puncture: SROC curve and the individual studies. Upper curve is the unweighted analysis in Meta-Test software, lower curve the weighted analysis.
ment is undertaken—and the suggestions made here are by no means exhaustive—it should be accompanied by costeffectiveness analyses.
Dr. Joseph Lau, who provided us with the Meta-Test software for diagnostic meta-analysis. Statistical appendix
5. Conclusions Clinical examination is not a reliable method for the diagnosis of AMS. If a correct diagnosis is considered important (i.e., if antibiotic treatment is effective despite claims to the contrary), ultrasound or radiography should be used as diagnostic aids. In the diagnostic test evaluations more attention should be paid to study methodology and to the interplay between diagnostic output and therapeutic options. The diagnosis of AMS is rarely studied in primary care settings, which is sad as the results from secondary care are not safely generalizable to primary care populations. Future comparative trials should preferably combine diagnosis and treatment, evaluating the two aspects of clinical management as unit. Acknowledgments We wish to thank Drs. Jens Georg Hansen and Karri Laine, who consented to sending us their original data, and
The data were analyzed and results presented by modifying the approach presented by Moses et al. [9] and supplementing it with some weighted summary measures. Assume that we have m studies, each evaluating the same diagnostic test procedure T against the same gold standard method G. In each study i let ri1, ri0, si1, and si0, respectively, denote the observed numbers in the typical fourfold table for test evaluation: ri1 ⫽ true positives, ri0 ⫽ false positives, si1 ⫽ false negatives, and si0 ⫽ true negatives. Let also ni1 ⫽ ri1 ⫹ si0 be the number of truly diseased and ni0 ⫽ ri0 ⫹ si0 the number of nondiseased. Then the estimates of the familiar test properties from this study are ri1/ni1 for the sensitivity or true positive rate Qi, and ri0/ni0 for the false positive rate Fi ⫽ 1 ⫺ specificity. The positive likelihood ratio (i.e., the likelihood ratio associated with a positive test result) is defined LRi⫹ ⫽ Qi/Fi, and estimated by the ratio of the estimates of Qi and Fi. The negative likelihood ratio is LRi⫺ ⫽ (1⫺Qi)/(1⫺Fi), estimated analogously. The odds ratio is
946
H. Varonen et al. / Journal of Clinical Epidemiology 53 (2000) 940–948
Fig. 2. Ultrasound compared to puncture: SROC curve and the individual studies. Upper curve is the unweighted analysis in Meta-Test software, lower curve the weighted analysis.
OR i = LR i + ⁄ LR i – = [ Q i ⁄ ( 1 – Q i ) ] ⁄ [ F i ⁄ ( 1 – F 1 ) ] ,(1) and it is estimated by (ri1/si1)/(ri0/si0). Let Li1 ⫽ ln(ri1/ni1) ⫺ ln(ri0/ni0) be the logarithm of the estimated positive likelihood ratio and Li⫺ ⫽ ln(si1/ni1) ⫺ ln(si0/ni0) that of the negative likelihood ratio. Let further Di ⫽ ln(ri1/si1) ⫺ ln(ri0/si0) be the estimated log-odds ratio, and Si ⫽ ln(ri1/si1) ⫹ ln(ri0/si0). The estimated variances of the sensitivity and specificity estimates are V(ri1/ni1) ⫽ ri1 (ni1 ⫺ ri1)/ni13 and V(ri0/ni0) ⫽ ri0(ni0 ⫺ ri0)/ni03. The variance of the estimated positive log-likelihood ratio is estimated as V(Li⫹) ⫽ 1/ri1 ⫹ 1/ri1 ⫺ (1/ni1 ⫺ 1/ni0) and that of the negative loglikelihood ratio by V(Li⫺) ⫽ 1/si1 ⫹ 1/si0 ⫺ (l/ni1 ⫹ 1/ni0).
The variance of Di is estimated by V(Di) ⫽ 1/ri1 ⫹ 1/ri0 ⫹ l/si1 ⫹ 1/si0, which is the estimated variance V(Si) of Si, too. Summary estimates of sensitivity and specificity, respectively, were obtained by calculating the weighted averages from the corresponding study-specific estimates, the weight wi for each i being the inverse of the estimated variance V(ri1/ni1) or V(ri0/ni0) as defined above. The standard error of each of these summary estimates is calculated as SE ⫽ (1/ oiwi)1/2. For each of the two likelihood ratios the analogous summary estimate is the antilogarithm of the weighted average of the study specific estimated log-likelihood ratios, the weights being V(Li⫹) for the positive and V(Li⫺) for the negative LR, respectively.
Table 5 Results obtained with summary statistics, the Q* values Comparison
Sinuses in meta-analysis, N
Q* point on summary ROC line (sensitivity ⫽ specificity) (95% CI)
Radiography-Puncture Ultrasound-Puncture Clinical examination-Puncture
996 940 245a
0.82 (0.78–0.85) 0.80 (0.76–0.83) 0.75 (0.58–0.86)
N ⫽ number of sinuses. Number of patients, not sinuses.
a
H. Varonen et al. / Journal of Clinical Epidemiology 53 (2000) 940–948
Moses et al. [9] based their estimation of the summary ROC curve for a given diagnostic test on a linear model relating Di and Si: D i = α + βS i + e i where ei is a random term and ␣ and  are parameters to be estimated. This equation implicitly relates Q and F to each other, and thus specifies a theoretical ROC curve for the test in question, describing the true positive rate Q as a smooth function of the false positive rate F. The relation takes the following form: Q = 1 ⁄ { 1 + exp [ – α ⁄ ( 1 – β ) ] [ ( 1 – F ) ⁄ F ]
(1 + β) ⁄ (1 – β)
} (2)
over all values of F between 0 and 1. If the slope parameter  can be assumed 0, the function above is simplified into Q = ( 1 ⁄ { 1 + exp ( – α ) [ ( 1 – F ) ⁄ F ] } ),
(3)
and it will then also have the property of being concave (having a monotonically decreasing first derivative) rather than being S-shaped. In this case the ROC curve reduces in fact to a simple hyperbola. Moses et al. [9] recommend to estimate the parameters ␣ and  either (i) using equally weighted least squares (EWLS), minimizing squared deviations in the D direction, each study having the same weight, or (ii) using a robustresistant alternative. The EWLS method leads to the formulae for estimates of ␣ and  that are well known from common linear regression analysis. We used this method throughout. It turned out in each analysis that no strong evidence surfaced against the simplifying assumption of zero slope, so we also analyzed the data making the assumption of  ⫽ 0. The EWLS estimation of ␣ is then reduced to a simple arithmetic average of the values of Di over the m studies. Putting the resulting estimate A of ␣ in the simple formula (2) above gives the estimated summary ROC curve. The variance of A is estimated as V(A) ⫽ (1/m2)[兺i V(Di) ⫹ 兺i(Di⫺A)⫺/(m⫺1)], which takes into account the possible heterogeneity in the test properties across the studies. The lower and upper limits of the 95% confidence interval for ␣ are calculated as ␣i ⫽ A ⫺ 1.96 V(A)1/2, ␣U ⫽ A⫹ 1.96 V(A)1/2. Another summary measure, defined by Moses et al. [9], refers to the point in the theoretical ROC curve in which the sensitivity Q and specificity 1⫺F are equal. This common value is called Q* and is obtained from the functional form above as Q* ⫽ 1/[1 ⫹ exp(⫺␣/2)]. The estimated value of Q* is derived accordingly from the estimate A of ␣, and the limits of the 95% confidence interval for Q* are 1/[1 ⫹ exp(⫺␣L/2)] and 1/[1 ⫹ exp(⫺␣U/2)].
References [1] Shapiro GG, Rachelefsky GS. Introduction and definition of sinusitis. J Allerg Clin Immunol 1992;90:417–8.
947
[2] Calhoun KH, Waggenspack GA, Simpson GA, Hokanson JA, Bailey BJ. CT evaluation of the paranasal sinuses in symptomatic and asymptomatic populations. Otolaryngol Head Neck Surg 1991;104:477–81. [3] Gwaltney JM, Jr., Phillips CD, Miller RD, Riker DK. Computed tomographic study of the common cold. N Engl J Med 1994;330 :25–30. [4] Lindbaek M, Hjortdahl P, Johnsen UL-H. Randomised, double blind, placebo controlled trial of penicillin V and amoxycillin in treatment of acute sinusitis infections in adults. BMJ 1996;313:325–8. [5] Mäkelä M, Leinonen K. Diagnosis of maxillary sinusitis in Finnish primary care. Scand J Primary Health Care 1996;14:29–35. [6] Cochrane Methods Working Group on Systematic Review of Screening and Diagnostic Tests: Recommended Methods. In: Available at http://som.flinders.edu.au/cochrane/; 1997. [7] Revonta M. Ultrasound in the diagnosis of maxillary and frontal sinusitis. Acta Otolaryngol (Stockh) Suppl 1980;370:1–55. [8] Petitti DB. Meta-Analysis, Decision Analysis, and Cost-Effectiveness Analysis. New York: Oxford University Press, 1994. [9] Moses LE, Shapiro D, Littenberg B. Combining independent studies of a diagnostic test into a summary ROC curve: data-analytical approaches and some additional considerations. Stat Med 1993;12:1293– 1316. [10] Irwig L, Macaskill P, Glasziou P, Fahey M. Meta-analytic methods for diagnostic test accuracy. J Clin Epidemiol 1995;48(1):119–30. [11] Littenberg B, Moses LE. Estimating diagnostic accuracy from multiple conflicting reports: a new meta-analytic method. Med Decis Making 1993;13:313–21. [12] Laine K, Määttä T, Varonen H, Mäkelä M. Diagnostic acute maxillary sinusitis in primary care: a comparison of ultrasound, clinical examination and radiography. Rhinology 1998;36:2–6. [13] Berg O, Carenfelt C. Analysis of symptoms and signs in the maxillary sinus empyema. Acta Otolaryngol (Stockh) Suppl 1988;105:343–9. [14] Kuusela T, Kurri J, Sirola R. Ultraääni varusmiesten sinuiittidiagnostiikassa-punktion, ultraääni—ja röntgentutkimuksen löydösvertailu (Ultrasound in diagnosing acute maxillary sinusitis in consricpt soldiers—a comparison of puncture, ultrasound and radiography. In Finnish.). Ann Med Milit Fenn 1982;57:138–41. [15] Kuusela T, Kurri J, Sirola R. Ultraschall in der Sinusitis-Diagnostik bei Rekruten—Vergleich der Befunde der Punktion, Ultraschall- und Röntgenuntersuchung. Wehrmed Mschr Heft 1983;11:461–4. [16] Hansen JG, Schmidt H, Rosborg J, Lund E. Predicting acute maxillary sinusitis in general practice population. BMJ 1995;311:233–6 [17] Hansen JG, Schmidt H, Rosborg J, Lund E. Klinisk kriterium for sinuitis maxillaris acuta i almen praxis. Ugeskr Laeger 1996;158:3156–9. [18] Savolainen S, Pietola M, Kiukaanniemi H. An ultrasound device in the diagnosis of acute maxillary sinusitis. Acta Otolaryngol (Stockh) Suppl 1997;529:148–52. [19] Berg O, Carenfelt C. Etiological diagnosis in sinusitis: ultrasonography as clinical complement. Laryngoscope 1985;95:851–3. [20] Kardaun JW, Kardaun OJ. Comparative diagnostic performance of three radiological procedures for the detection of lumbar disk herniation. Methods Inform Med 1990;29:12–22. [21] Campbell G. Advances in statistical methodology for the evaluation of diagnostic and laboratory tests. Stat Med 1994;13:499–508. [22] Irwig L, Tosteson AN, Gatsonis C, Lau J, Colditz G, Chalmers TC, et al. Guidelines for meta-analyses evaluating diagnostic tests. Ann Intern Med 1994;120:667–76. [23] van Buchem L, Peeters M, Beaumont J, Knottnerus JA. Acute maxillary sinusitis in general practice: the relation between clinical picture and objective findings. Eur J Gen Pract 1995;1:155–60. [24] Walter S, Irwig L. Estimation of test error rates, disease prevalence and relative risk from misclassified data: a review. J Clin Epidemiol 1988;41(9):923–37. [25] Easterbrook PJ, Berlin JA, Gopalan R, Matthews DR. Publication bias in clinical research. Lancet 1991;337:867–72. [26] Begg CB. Biases in the assessment of diagnostic tests. Stat Med 1987;6:411–23.
948
H. Varonen et al. / Journal of Clinical Epidemiology 53 (2000) 940–948
[27] Ransohoff DF, Feinstein AR. Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. N Engl J Med 1978;299:926–30. [28] Reid MC, Lachs MS, Feinstein AR. Use of methodological standards in diagnostic test research. JAMA 1995;274:645–51. [29] Mulrow CD, Linn WL, Gaul MK, Pugh JA. Assessing quality of a diagnostic test evaluation. J Gen Intern Med 1989;4:288–95. [30] McNeill RA. Comparison of the findings on transillumination, x-ray and lavage of the maxillary sinus. J Laryngol Otol 1963;77:1009–13. [31] van Buchem FL, Knottnerus JA, Schrijnemaekers VJJ, Peeters MF.
Primary-care-based randomised placebo-controlled trial of antibiotic treatment in acute maxillary sinusitis. Lancet 1997;349:683–7. [32] Stalman W, van Essen GA, van der Graaf Y, de Melker RA. Maxillary sinusitis in adults: an evaluation of placebo-controlled doubleblind trials. Fam Pract 1997;14(2):124–9. [33] de Ferranti SD, Ioannidis JP, Lau J, Anninger WV, Barza M. Are amoxycillin and folate inhibitors as effective as other antibiotics for acute sinusitis? A meta-analysis. BMJ 1998;317:632–7.