Assessing gains in diagnostic utility when human papillomavirus testing is used as an adjunct to Papanicolaou smear in the triage of women with cervical cytologic abnormalities Eduardo L. Franco, phD,a, b and Alex Ferenczy, MDc, d
Montreal, Quebec, Canada OBJECTIVE: We aimed to provide simple methods for calculating expected sensitivity and specificity when an adjunctive test IS added to a conventional test. STUDY DESIGN: Use of adjunctive methods for the tnage of women with cervical abnormalities produces an apparent gain in sensitivity over Papanicolaou cytologic testing alone. This increase in sensitivity can be misleading, even If deemed significant by results of a statistical test. Combined testing prevents a loss In specificity but sometimes offers no real gain in sensitivity. A nominal increase In sensitivity always occurs by chance whenever an adjunctive test is used in parallel With a conventional one, even if the new test is totally random With respect to the disease being evaluated. RESULTS: Gains in sensitivity and losses In specificity have to be gauged against expected levels of these parameters when a random adjunctive test IS coupled with Papanicolaou screening and not gauged against the performance of cytologic testing alone.
CONCLUSION: We provide simple formulas for calculating the expected sensitivity and specificity In conditions of combination testing to provide more realistic baselines for assessment of the screening efficacy contributed by the adjunctive test. (Am J Obstet Gynecol 1999;181'382-6.)
Key words: Cervical intraepithelial neoplasia, human papillomavirus, screening, triage of cervical lesions
In screening for a disease with a presumptive test that has known sensitivity and specificity, it is possible to find that addition of a second screening test for a separate trait of the same disease will result in increased sensitivity in detection of the disease. A case in point is the u~e of human papillomavirus (HP\') testing for the triage of women with equivocal or low-grade cervical lesions, a topic that has received considerable attention in obstetrics and gynecology journals. I -7 Much of the enthusiasm about management algorithms results from the perception of improved sensitivity when HPY testing is performed in parallel with repeated cytologic study. When
From the Department~ of Oncology." E.lJldemlOlog,,:,1' and Pathology,' A/CClll Ulllverstty. and the Szr Mortimer B. J)aliIS JewI~h General Ho~pllal.d
Supported m part by a gumt from the US NatIOnal Jnstllutel of HfIllth (CA 7(269) and by grants from the Medlral Re,earch COlwnl of {'(l/Iada (MA-13647 and MT-/3649). ELf: wa, the renplfllt of a SenIOr Research Srholar "1wmd fiom the Fouds de 10 Rerhenhe en Sanll du Qulher. RerPlvedjorpuhlzcatlOn Decemher 21, 1998: wOI
Copyn[!;ht © 1999 by Musby. Jnr 0002-9378/99 $8.00 + 0 6/1/98604
382
all women who have positive results of either a repeated Papanicolaou examination or an HPY test are referred for colposcopy, the diagnostic yield of clinically important lesions invariablv increases. This gives the clear impression that combined testing has greater sensitivity than cytologic testing alone. Thi~ article present~ the argument that the apparent gain in sensitivity with the addition of HPV testing to cytologic studv can be misleading, even when documented with a test of statistical significance. Combined testing typically can prevent a loss in specificity but sometimes offers no real gain in sensitivity. An apparent gain in sensitivity always occurs by chance alone whenever an adjunctive test is used in parallel with a conventional one, even if the new test brings no new information and is random with respect to the disease being evaluated. Gains in sensitivitv and losses in specificity have to be gauged against the expected levels of these parameters, if Papanicolaou cytologic screening is augmented by a random adjunctive test, and not gauged against the original sensitivity and specificity of cytologic testing alone. A random adjunctive test
How can we calculate the gain in sensitivity and loss in specificity expected by chance alone for an adjunctive
Franco and Ferenczy 383
Volume 181, Number 2 Am] Ob,tet Cynecol
Table I. Diagnostic utility of repeated Papanicolaou smear, HPV testing, and their combination in triage of 365 women with a referral Papanicolaou smear with cytologic abnormalities HI~tologzfllllV confirmed mVI((l1 IIltmejJithdwl nfopllllla or cervlral mnm (No.)
lhagr test
Result
Papamcolaou ;,mear Abnormal Normal HPV te;,ting Positive Negative Abnormal or p05itive Combined for HPV Normal and negative lor HPV Ahnormal or po;,itive Papanicolaou and test re;,ult random test* Normal and negative test result
Present
AbIent
H5 .Jcl 12.Jc 63 16.Jc
47 l:H 40 138 63
23
115
1.Jc5 + .Jc5% of.Jcl
~
163A
.Jc7 + 45% of 131
.Jc] - .Jc5% of.Jc]
~
22.6
131-45% of131
~
105.8 ~
Selll1/lV/tv (%j
Spenficitv (%)
78
74
53
66
78
45
88
65
Not applicable
88
41
Not applicable
Perrentage ofposztlve test results
72 2
Results from actual testmg compared with those expected if a hypothetical random adjunctive test had been used in conjunction with the repeat Papanicolaou smear. Data from Ferenczy et al. li *Assumed to ha\e the ,arne rate of positive results as HPV te5ting in the same patient population, that is, 45%.
test that produces results that are random with respect to cervical disease status and cytologic result? In the simplest form, one can conceive of flipping a coin in tandem with Papanicolaou cytologic testing. Choosing, for example, heads to indicate a positive result, combining this information with the cytologic result, and intel preting the combination as implying a positive result if either result is positive invariablv increases sensitivity and decreases specificity. Flipping an unbiased coin multiple times translates to 50% sensitivity and 50% specificitv in the long run. This happens because coin flipping as a screening test tends to produce a 50% "prevalence" irrespective of disease status. The 50% coin flip prevalence is not an ideal model for deriving the expected result~ from chance alone. The appropriate random adjunctive test must mimic the behavior of HPV testing by producing the same frequency of positive results, which varies with patient population, prevalence of cervical disease, and other reasons. Using a specific example from published data we can illustrate the effect of adding a second test on the sensitivity and specificitv of Papanicolaou cytologic examination and compare it with the expected results from chance. Table I shows the effects of combined Papanicolaou smear and HPV testing in diagnosing histologicallv confirmed squamous intraepithelial lesions among 365 women with a referral abnormal Papanicolaou smear in a studv by Ferenczy et al. h A cytologic diagnosis of atypical squamous cells of undetermined significance or worse was used to define an abnormal Papanicolaou smear. Sensitivity and specificity were calculated for each test in isolation and for their combination. Also shown in Table I are the expected frequencies of combined test results if
the Papanicolaou smear study had been augmented by a hypothetical random test that produced the same overall rate of positive results (except as noted) for this patient population as did HPV testing. The expected frequencies are based on a simple calculation that shifts 45% (the rate of positive results of HPV tests for the same population) of the frequencies of false-negative and true-negative Papanicolaou results to those of true-positive and false-positive combinations, respectively. Sensitivity and specificity are then calculated in the usual manner with the expected frequencies. By computing 95% confidence intervals around the sensitivity and specificity estimates for the combined testing, we can better judge whether the new values provide statistically significant (at the 5% level) gain with respect to the baseline levels of cytologic testing alone or cytologic testing plus a random test. Although it is clear that in the study summarized in Table I combined testing seemed to enhance sensitivity (88%: 95% confidence interval, 82-92) in a statistically significant manner (from 78% with Papanicolaou testing alone),6 the increase was nonexistent compared with the expected joint result from Papanicolaou testing and a random test (also 88%; Table I). The advantage of the actual test combination was prevention of the typical loss of specificity that occurs with series testing. The overall specificity (65%: 95% confidence interval, 57-71) was significantly greater than that with the combination of cytologic study and a random adjunctive test (41 %) that had the same rate of positive results as the HPV test in this patient population. Depending on lesion prevalence, this may translate into substantially improved negative predIctive values for the combination of cytologic testing and HPV testing over
384 Franco and Ferenczy
Allgll'll'J'J9 '.Ill ,
Ob,tet C;mecol
Table II. Interpretation of gain in sensitivity and loss in specificitv when HPY testing is added to Papanicolaou cvtologic study in triage of women with cervical cvtologic abnormalities SlfSnifu antlv diffmnt [tom ba,plme
Dwgno,tlf utlht, (%)
LesIOn dwgnosed
Study
Cox et all
CIN
POSitIVe remit on SfllPflr
At Iea,t LSIL
Rate oj POSitive HP1No. of result DlClgnoltu patients (%) IIldex
482
29.3
Cox et al 2
At least LSIL
At least ASCUS
217
41.9
Hatch etaP
At least HSIL
At least LSIL
311
53.4
Wright et al 4
CIN
At least ASCUS
217
59.0
Ferenczy et alb
At least HSIL
At least ASCUS
365
44.9
At least ASCUS
462
35.5
Kaufman et aF At least CIN2
Expected 95% Expected confidenre with from Papa 111mterval Papa 111Papa 111colaou Pa/mlll- wlaou jor rolaou Com- combllland and colaou tflt test bmed ation random random alone-:' tflt* lfll,t te,ts alone of tests
Sensitivity Specificitv Semitivity SpecifiCIty Sen,itivity SpeCIfiCIty Sensitivi tv Specificitv Semitivitv SpeClficitv Sensltlvitv
44 92 60 77 76 57 73 72 88 57 63
78 79 90 58 92 43 92 55 95 47 82
71-84 75-83 79-96 51-65 86-95 36-51 87-97 46-6.5 88-98 42-53 71-89
60 65 77 45 89 ')~
Yes~
89 30 93 31 76
Yes:j: Yes§ No Ye,:j:
Ye,:j: Ye,:j: Yes:j: Ye,:j: No Yes:j: No Ye5:j: No Yes:j: No
Specificity
62
49
44-54
40
Yes§
Yes:j:
~,
Yes:j: Yes~
Yes:j: Yes~
Yes:j:
Ye,~
Results are from selected published studies. CIN, Cervical intraepithelial neoplasia; LSII-, low-grade squamom intraepithelial le5ion; ASCUS, atypical squamous cells of undetermined significance; HSIL, high-grade ,quamous intraepitheliallesion. *Computed from the raw data of the original ,tudy If not provided as a result. tExpected values for a combination of the actual Papanicolaou cvtologic studv and a hypothetIcal random adjunctive test with the same rate of positive results as HPV testing for that population. :j:Value for the index in combined testmg is significantly greater than the baseline le\el. §Value for the mdex in combined testing is slgmficantlv lower than the ba,eline level.
that which can be obtained with cytologic testing or HPV testing alone.
Chance-adjusted sensitivity and specificity In practice, we can compute the expected null values for sensitivity and specificity without resorting to the actual diagnostic table frequencies for cytologic testing, as in Table I. This simplifies the task of assessing published results for joint diagnostic yield when only the actual parameters are given but not the table frequencies. The following 2 formulas allow generalization of the computations shown in Table I: SExpected
=
VI!Expected
SCvtologlc testmg + P (l - SCvtologic te'tlllg) for expected null sensitivity VI'C\tologic testing - P (WCytologic te'tin~) for expected null specificity =
where S is sensitivity, W is specificity, SCvtologlC te,tmg and V\'CytologlC testmg represent, respectively, the sensitivity and specificity of the original Papanicolaou smear test, SExpected and WExpected denote the adjusted (for the addition of the new test) sensitivity and specificity, and P is the expected rate of positive results for HPV testing or
any other adjunctive test used in tandem in the same population. Sand W values are expressed as probabilities, not as percentages. Using the formulas and computing sensitivity, specificity, and 95% confidence intervals for the actual combined testing data helps one judge whether the added sensitivity or loss in specificity is of statistical significance with respect to the baseline values expected for Papanicolaou cytologic examination aided by a chance test. A varietv of simple statistical programs for computation of approximate or exact confidence intervals are available in the public domain, freely obtainable on the World Wide Web and available from E.L.F. on request.
Examples Table II shows a reappraisal of some key published findings on the diagnostic utility of combined Papanicolaou smear and HPV testing for triage of women referred with cervical cytologic abnormalities. These studies vary by clinical outcome (low-grade squamous epithelial lesion or high-grade squamous epithelial lesion on the basis of the Bethesda classification~ and allinclusive cerTical intraepithelial neoplasia or cervical intraepithelial neoplasia grade 2 or 3) and according to the
Yolume 181, Number 2 Am JOb,tet G)necol
definition of the minimal cytologic abnormality that constituted an abnormal Papanicolaou smear (abnormal squamous cells of undetermined significance or lowgrade squamous epithelial lesion). We calculated 95% confidence intervals for sensitivity and specificity on the basis of actual data from combined testing given in the articles, except for the studies of Wright et al 4 and Ferenczy et al,6 which provided confidence intervals. Computation of the expected null values for sensitivity and specificity in each case was based on the rate of positive HPV test results found by the authors for the studies cited in the same patient sample. The statistical interpretation of gain in sensitivity or loss in specificity is given for each study with respect to two separate baseline estimates for these parameters: Papanicolaou result alone or Papanicolaou result with a result of a random adjunctive test (two right-most columns in Table II). All but one of the studies in Table II (Ferenczy et a1 6) indicated significantly increased sensitivity for the combination of tests compared with Papanicolaou test alone, meaning that the estimate for Papanicolaou test alone was not included within the 95% confidence interval for the sensitivity of combined testing or that a specific statistical test provided by the authors indicated significance. Except for the studies by Cox et al,1. 2 in which combined testing clearly represented a significant gain in sensitivitv irrespective of baseline, none of the studies showed an increase in sensitivitv when gauged against a baseline of Papanicolaou with a random adjunctive test that yielded the same rate of positive results as HPV testing for that population. In all studies the loss in specificity with respect to the baseline of Papanicolaou testing alone could be construed as significant. In every situation, however, combined Papanicolaou testing and HPV testing yielded specificity levels significantly greater than that of the combination of a Papanicolaou and a random adjunctive test. It is noteworthy that evaluated in terms of the chance-corrected expected levels, the absolute lllcreases in sensitivity were lower than those in specificity for most of the studies in Table II.
Comment Conventional data analysis methods for studies of diagnostic or screening performance of laboratory tests include computation of sensitivity, specificity, and predictive values based on 2 x 2 table frequencies in which are assembled the combination of true and false results, both positive and negative. The diagnostic or screening utility of a test can be assessed in isolation through measurement of the strength of the statistical association between the test results and disease status, for instance, with the Pearson X2 or Fisher exact test~. As an alternative, calculation of confidence intervals around diagnostic perfor-
Franco and Ferenczy 385
mance indexes gives an idea of the precision of the estimates, which helps in deciding whether a particular sensitivity or specificity value is significantly greater than a particular threshold for acceptability. We argue that this paradigm for assessing the efficacy of screening tests can lead to misleading results when tests are used in parallel, as is in the triage of women with cervical cytologic abnormalities for colposcopic examination, an important concern for gynecologists and family practitioners. Use of adjunctive tests that supplement diagnostic information from a Papanicolaou cytologic study usually enhances the sensitivity to detect clinically relevant cervical lesions. Such gaim in sensitivity, especially if associated with low-cost adjunctive procedures, are viewed with keen interest because of the constant need to optimize the allocation of health care resources to prevent invasive cervical disease. Moreover, particularly in the United States, the ever-present climate of malpractice litigation leads to emphasis in research on the development of medical tests that improve sensitivity and minimize risk for false-negative screening results that could evolve into irreversible malignant disease. In most of the studies reviewed in this report it was concluded that enhanced sensitivity resulted from using HPV testing in combination with Papanicolaou cytologic study. In some cases 3 , 4 the evidence was supported by P values from specific significance tests. Results of such tests are misleading because they do not refer to the appropriate null hypothesis. In combination testing, such as the situation for triage of women with cervical abnormalities, one must expect additional true-positive results by chance alone because a proportion of the patients with disease missed at cytologic testing are "rescreened" with the adjunctive test, even if the latter is truly random. Depending on sample size, the nominal increase in sensitivity (over Papanicolaou testing alone) provided by the combination of a Papanicolaou test and random adjunctive test can be deemed statistically significant, as shown in Table I. Such a conclusion underscores the fallacy of measuring gains in absolute terms against the unaided testing baseline. Receiver-operator characteristic analysis, also known as receiver-operator characteristic curve analysis, is a useful tool for assessing increases in sensitivity "penalized" by concomitant increases in false-positive rate for tests that yield results on a continuous scale or for evaluating combinations of multiple tests. In HPV testing, receiver-operator characteristic curve analysis is particularly useful for the evaluation of the most appropriate molecular sensitivity of the new Hybrid Capture II microtiter method (Digene Corporation),9 However, use of a simple. conventional receiver-operator characteristic curve analysis would not prevent the fallacious conclusions reached in a comparison of combination testing with Papanicolaou
386 Franco and Ferenczy
testing alone, as discussed herein. An appropriate receiver-operator characteristic analysis should include for comparison a separate baseline curve that plots the sensitivity and false-positive rates for several combinations of cytologic tests and random tests to mimic the different '"prevalences" along the linear range of HPV test results on the basis of distinct cut points. The formulas presented herein adjust the expectation for sensitivity to incorporate the proportion of false-negative results that become reclassified as true-positive results when an unrelated random test is applied to the pool of abnormal results at cytologic study. These formulas also adjust the expectation for specificity to exclude from the frequency of true-negative results the same proportion, that is, patients who would ultimately be classified as having false-positive results because of the same hypothetical random test. In consequence, the threshold for accepting a gain in sensitivity is increased and that for a loss in specificity is lowered, which provides a more sound basis for judging an improvement in diagnostic yield. Except for that of Ferenczy et al,6 none of the articles discussed in detail decreased specificity com.equent to adding HPV testing to triage on the basis of a repeated Papanicolaou smear beyond admitting that it represented a shortcoming inherent to a test combination. In all studies what was viewed as a loss in specificity with respect to Papanicolaou testing alone represented a significant gain in preventing the false-positive results that would be inherent to a testing condition with a higher prevalence of positive results (ie, combined testing). The loss in specificity prevented by a sensible combination of cytologic study and HPV testing can have practical benefits in the design of management algorithms that maximize the predictive value of positive and negative triage results. The arguments and the technique we provide in this report should be viewed as a plea for more rational assessment of the merits of combined testing in triage for cervical cytologic abnormalities. We propose a more cogent epidemiologic approach for quantitative assessment of screening efficacy when an adjunctive method, such as HPV testing, is added to Papanicolaou cytologic studv. Our literature review did not discriminate among different HPV test formats because our intent was to illustrate a trend in misinterpretation of clinical epidemiologic data. In practice judicious assessment of published results of the diagnostic or screening efficacy of combined or individual tests should take into account the differences among populations and study settings (eg, con-
A.u,,;mt 19'19 Am I Ob,tet G\nelol
trolled academic environment versus clinical practice), the variability of the laboratory data, specimen adequacv, and other key variables. Although our line of reasoning depicted a generally negative conclusion about joint sensitivity claims in the literature, we believe that HPV testing and other adjunctive screening procedures can have a useful role in improving the diagnostic yield of repeated Papanicolaou smear triage of women with cervical abnormalities. HPV testing is a promising tool for primary cervical cancer screening, particularly in settings where the quality of cytologic testing is below standard. lO We believe that the method proposed herein can help researchers working in the field to use more objective goa], for gauging screening efficacy in dual-test combinations and place the focus only on testing formats or algorithms that truly supplement or correct the information provided by Papanicolaou cytologic examination alone. REFERENCES 1. CoxJT. Schiffman MH. Wm~elberg AI, PatterwnJM. An evaluatIon of human paplllomavlrus testing a, part of referral to colp05COpy chmc, Ob,tet Gynecol 1992.80::l89-95. 2. COX JT, Lonncl AT, Schiffman MH. Sherman ME, Cullen A, Kurman RJ. Human papillomavirus te'tmg by Hybrid Capture appear, to be useful in tnagmg women with a cytologiC dIagnosis of atypIcal 'quamous cells ofundetermmed significance. Am J Ob,tet G\ necol 199.1),172:946-54. 3. Hatch KD. Schneider A, Abdel-Nour MW. An evaluation of human paplllomavirus te'tmg for mtermediate- and hlgh-n,1. t\ pe, a, triage before colpmcop\. Am J Obstet Gvnecol 1995:172: 1150-5. 4. Wnght TC, Sun XW, Koulos J. Companson of management algonthms for the evaluation of women with low-grade cvtOIOglC abnormalitle, Ob,tet Gynecol 1995:1'\5 202-10. 5. SchneIder A, Zahm DM. Klrthmayr R, Schneider \ 1~. Sueenmg for cervltal intraepithelial neoplaSIa grade 2/3: validity of cvtologlL ,tuth, lelVlcography, and human papillomavirus detect](}n. AmJ Ob,tet Gvnecol 1996,174:15:l+-41 6. Ferenu) A, Franco E, Ar,eneau J, Wnght Te, Richart RM. DI.lgn05tic performance of Hvbnd Capture human papillomavirus deox)fibonucleic acid assay combmed with liquidba,ed cVtolOglC study. Am J Ob,tet Gynecol 1996; 175:6.1) 1-6 7. Kaufman RH, Adam E, Icenogle J, Reeves We. Human papillomavir", te'tll1g ." triage for atypical squamom cells of undetermll1ed slgmficance and low-grade 'quamous intraepithelial lesIOns' semitivlty. ,peclfiCltv, and cost-effectlvene". Am J Obstet GmecoI1997;177:9:l0-6. 8. Solomon D. The 1988 Bethesda system for reporting cervIcal/vaginal cytologIC dlagno,e,; developed and approved at the NatIOnal Cancer Institute Worbhop, Bethesda, Maryland, December 12-1:~, 1988.J Chn Cytol CytopathoI1989,:l3:567-74. 9 Wnght TC. Lonnu A, Ferris DG, et al. Reflex human papillomavirus deoxvnbonuclelC aCId testing 111 women WIth abnormal Papamcolaou ,mear,. AmJ Obstet GynecoI1998;178:962-6. 10. Lonnu A. Hybnd Capture method for detection of human papillomavlr", DNA 111 chnical specimens. Paplllomavirus Rep 1996;71-5.