The long debate on the sixth guaiac test: time to move on to new grounds

The long debate on the sixth guaiac test: time to move on to new grounds

Journal of Health Economics 9 (1990) 495497. North-Holland EDITORIAL The Long Debate on the Sixth Guaiac Test: Time to Move on to New Grounds Con...

215KB Sizes 0 Downloads 33 Views

Journal

of Health

Economics

9 (1990) 495497.

North-Holland

EDITORIAL The Long Debate on the Sixth Guaiac Test: Time to Move on to New Grounds Constantine Department

GATSONIS

of Health Care Policy, Harvard Final version

Medical

received

School, Boston, MA 02115, USA

October

1990

The vigorous discussion that followed the initial publication of Neuhauser and Lewicki’s (N-L) ‘What do we gain from the sixth stool guaiac’ (1975), has subsided in recent years. In this issue of JHE, however, readers are treated to a new chapter of the saga. The article by Brown and Burrows (B-B, 1990, pp. 429-445, this issue) sets forth a comprehensive critique of the original analysis and concludes that the N-L findings are invalid. The crux of the matter seems to rest in the way N-L estimated various test accuracy measures using the data provided in the original article by Greegor (1969). B-B help clear several layers of confusion by discussing the assumptions needed for accuracy indices to be derived from the Greegor data. Unfortunately, they then proceed to add Gew layers of confusion of their own. Before turning to specifics about the analysis, let us review the generally accepted version of the Greegor data. Among 278 asymptomatic patients who were given the six-test battery, 24 had positive results and 254 had negative results, using the ‘any-test-abnormal’ positivity criterion. Upon further assessment, only 2 of the 24 patients with positive guaiac tests were found to have cancer. (Greegor himself disagrees somewhat with this version [Greegor (1975)] but everyone else seems to be in agreement!) The Greegor article does not provide information about definitive (e.g., barium enema) follow-up assessment of the 254 patients with negative guaiac test results. For analysis purposes, these cases have been regarded as true negatives. The complete account of the Greegor data includes information on the outcome of the individual guaiac tests. Greegor reports 11 positive outcomes among the 12 single tests performed on the two truly diseased cases and also 46 positive outcomes among 126 tests performed on the remaining 22 cases who tested positive but were found to have no cancer. Assuming that the remaining 254 truly negative cases utilized the full six-test battery, there must 0167-6296/91:SO3.50

c

1991-Elsevier

Science

Publishers

B.V. (North-Holland)

496

C. Gatsonis,

Ediroriol

have been 1524 (254 x 6) single tests with negative outcomes in this subgroup of the patient population. This brings the total number of single tests performed to 1662 (1524 + 136 + 12). Let us assume that the usual accuracy indices (sensitivity and specificity) are well defined concepts for a single guaiac test (and set aside complications such as accounting for colorectal polyps). In order to estimate these indices on the basis of the Greegor data, one would need to make the assumption that the results of single tests are independent, both between and within patients. The first level of the independence assumption (across patients) is clearly defensible, while the second level (within patients) may not be so and would have to be assessed empirically. Assuming that both levels are valid, one would estimate the sensitivity of a single test as g=O.917 and the specificity of a single test as 1604/1650=0.972 (s.e. =0.004). (Asymptotic standard errors are provided where appropriate.) For a battery of six tests, the independence assumption makes it possible to use formulas (A.1’) and (A.2’) provided in the appendix of B-B and thus derive an estimate of approximately 1 for (battery) sensitivity and 0.843 (se. =0.021) for (battery) specificity. It is possible, however, to estimate the accuracy indices for a sixtest battery in an alternative way, which only requires the independence assumption to hold across patients. The results need not agree with those based on the full independence assumption, and in fact this is the case (although mildly) with the guaiac test data. Using the patient-level data, the sensitivity of a battery would be estimated to be close to 1 (the battery detected both the two truly diseased cases), but the specificity would be estimated at 254/276 =0.92 (s.e. =0.016). Brown and Burrows provide a thorough account of the assumptions needed to derive accuracy estimates and of errors contained in the original N-L article. As others have also suggested before them [Kelleher and Vautrain (1975), Prescott et al. (1980)], B-B use the full set of truly negative cases to derive an estimate of 0.972 for the specificity of a single test. Assuming independence holds, things would have been tine if they had stopped there. Unfortunately, B-B proceed to treat this estimate as an estimate of the combined specificity of a battery of six tests and derive yet another estimate of the specificity of a single test. To make things worse, they do the same thing with the estimate of the sensitivity of a single test (E) derived under the independence assumption. The rest of their cost calculations are based on these erroneously derived estimates of sensitivity and specificity for a single test. Thus, B-B end up confusing matters more than they helped clarify. The importance of working with highly accurate estimates of the test parameters cannot be overemphasized in this context. One only needs to point out to the disquieting fact that changes in the third or fourth decimal digit can produce sizable differences in the cost assessments (thus underlining

C. Gatsonls.

Editorial

497

the need to investigate and quantify how the error from the parameter estimation process propagates into the cost analysis). Even if B-B had used methodologically sound estimates of specificity, there is very little they could do in terms of resolving the fundamental limitation of the Greegor data and that is the scarcity of information about the sensitivity of the test and prevalence of the disease. The problem arises primarily from the fact that there were only two cases with proven cancer in the study sample and is compounded by the apparent lack of definitive information about the true disease status of the 254 patients who had negative test results. To the extent that no definitive assessment was made on these patients, the possibility that further cancers may have existed among them is not negligible (in statistical parlance, tierijkation bias may be present in these data [Begg (1987)]). Presence of further cancers may have seriously altered the estimates of sensitivity and prevalence, with correspondingly large effects on the results of the cost analysis. In fact, given that the shaky ground of the sensitivity assessments was understood from the beginning of the debate, it is surprising that the argument about the substantive merits of the original N-L conclusions has lasted for so long. Of course, this does not take anything away from the tremendous contribution the N-L article has made to the growth of the disciplines of clinical decision making and cost-effectiveness analysis. Where do we go from here? Neuhauser in his reply and others a decade earlier [Prescoll et al. (198O)J point to the right answer. Instead of endlessly arguing about the analysis based on Greegor’s data, better data should be collected on the accuracy of the test and the prevalence of the disease, as well as on the clinical and economic outcomes resulting from its use. To these recommendations one should add that more attention should be paid to the underlying statistical methodologic issues.

References Begg, C.B., 1987, Biases in the assessment of diagnostic tests, Statistics in Medicine 6, 41 l-423. Brown, K. and C. Burrows, 1990, The sixth stool guaiac test: S47 million that never was, Journal of Health Economics 9. 429-445 (this issue). Greegor, D.H., 1969, Detection of silent colon cancer in routine examination, A Cancer Journal for Clinicians 19, 330-337. Greegor, D.H., 1975, Letter, New England Journal of Medicine 293, 994. Kelleher. M. and R. Vautrain, 1975, Letter, New England Journal of Medicine 293, 995. Neuhauser, D. and A.M. Lewicki, 1975, What do we gain from the sixth stool guaiac?, New England Journal of IMedicine 293, 226-228. Prescott, N., K. McPherson and J. Bell. 1980, Cost effectiveness of screening for occult blood in the stool: Another look, New England Journal of Medicine 303, 1306.