Cost effectiveness of multiphasic screening: Old controversies and a new rationale

Cost effectiveness of multiphasic screening: Old controversies and a new rationale

COST EFFECTIVENESS OF MULTIPHASIC SCREENING--WEa.~ER, ALTSttULER 23. Kulikowski,C. A.: Artificial intelligence approaches to medical consultation. Pro...

670KB Sizes 2 Downloads 22 Views

COST EFFECTIVENESS OF MULTIPHASIC SCREENING--WEa.~ER, ALTSttULER 23. Kulikowski,C. A.: Artificial intelligence approaches to medical consultation. Proc. 4 III. Conf. Med. Info. Syst., May 1978, p. 162. 24. Kulikowski, C.A.: The design of expert-level consultation systems: CASNETand EXPERTformalisms. Proc. Symp. Artif. Intell. Med., Tokyo, Japan, November 1978. 25. Politakis, P., Weiss, S., and Kulikowski, C.: Designing consistent knowledge bases for expert consultation systems. Proc. 13 Ann. Hawaii Int. Conf. Syst. Sci., Honolulu, Hawaii, January 1980, p. 675. 26. Yu, v. L., et al.: Antimicrobial selection by a computer. J.A.M.A.,242:I279, 1979.

27. Greenes, R. A.: The diagnostic test order decision problems. In Proc. 6 Ill. Conf. Med. Info. Sys., Urbana, April 1980. (To be published.) 28. Levy, A. H., and Baskin, A. B.: A paradigm for medical problem solving. Proc. 12 Hawaii Int. Conf. Sys. Sci., Honolulu, Hawaii, 1980. 29. Baskin, A. B., and Levy, A. H.: .XfEOmas--an interactive knowledge acquisition system. Proc. 2 Ann. Symp. Comp. Applic. Med. Care, November 1978, p. 344. 30. Wilhams, B. T., Galen, R., Pass, T., and Fink, D.i Computer Aids to Clinical Decisions. Boca Raton, Florida, CRC Press, 1980. Mercy Hospital 1400 West Park Avenu~ Urbana, Illinois 61801

C O S T E F F E C T I V E N E S S OF M U L T I P H A S I C SCREENING: OLD CONTROVERSIES AND A NEW RATIONALE Mario Werner, M.D.,* and Charles H. Altshuler, M.D.'~

ABSTRACT T h e cost effectiveness o f multiphasic screening is evaluated f r o m a conceptual as well as f r o m a practical viewpoint. C o n c e p t u a l analysis includes a consideration o f the technical sensitivity and specificity o f the tests used, o f the prevalences o f the.screened diseases, a n d o f the costs and values associated with d i f f e r e n t outcomes o f screening. Practical considerations include the potential o f muhiphasic screening for increasing productivity, for reassuring patients, a n d for r e d u c i n g morbidity and mortality. A l t h o u g h all these issues can be cogently f o r m u l a t e d , at present n o n e can be d o c u m e n t e d by a c o m p r e h e n s i v e set o f data leading to irrefutable conclusions. T h e r e f o r e , the issue o f who should be screened continues to be obfuscated by controversy and prejudice. T o resolve this dilemma a new rationale f o r the use o f multiphasic screening is developed. This is based on a small n u m b e r o f uncontroversial facts a n d leads to practical proposals relating to how standards for useful test batteries can be constructed.

Insight into the cost effective use o f laboratory tests has greatly increased both from a conceptual a n d f r o m a practical viewpoint, but these two views seem as far as ever f r o m m e r g i n g into some u n i f y i n g doctrine. 1"3 T h e need for s u c h - a n integration has

been most noticeable in the use o f s t a n d a r d test profiles a n d multiphasic screening, since strongly held tenets a r g u e both for a n d against that a p p r o a c h to diagnosis. 4 This discussion o f the cost effectiveness o f

Accepted for publication February 15, 1980. *l'rofessor of l'athology (Laboratory Medicine), Tile George Wasbington University Medical Center, Washington, D.C. "{Clinical Professor of Pathology, Medical College of Wisconsin. Director, Del)artment of l'atho/ogy, St. Jnseph's Hospital, Milwaukee, Wisconsin. HUglAN PATHOLOGY--VOLUME 12, NUMBER 2 February 1981

111

HUMAN PATHOLOGY--VOLUME 12, NUMBER 2 screening undertakes a conceptual analysis as well as a review of practical considerations in order to draw the conclusions appropriate to the aggregate of these reflections. Finally, a new rationale for the use of standard test profiles is presented. Only untargeted screening is considered. However, even in this restricted field one must distinguish the screening of clinic patients, hospital patients, and subjects who consider themselves healthy in order to avoid unwarranted conclusions. Moreover, what may be true for a certain age and sex group may not apply to another population. C O N C E P T U A L ANALYSIS A comprehensive valuation of the cost effectiveness of any medical procedure requires at least three considerations./First, the technical properties of the procedure must be taken into account. Second, the prevalences of the conditions relating in a positive and in a negative, way to the procedure must be considered. Third, the costs and values associated with favorable and unfavorable ,outcome must be weighed. These three considerations can be treated as the layers of a conceptual model that condenses all considerations into a single number, the exp6cted value of tile procedure, EV. ~' 3 EV = VDT" P(D). P(TID) + V ~ - P(D) 9 P(TID) --Ct~T 9 P ( D ) 9 P ( T I D )

- CDT " P(D) 9 P(T[D) Applied to a diagnostic test, the four additive terms of this expression represent tile probabilities, P, of all possible outcomes: t r u e positive, TID, false positive, T]D, true negative, T[D, and false negative, T[D. These probabilities are the first layer of the model. Each term is weighed by a multiplicative factor reflecting the probability of the given outcome according to the frequency of positives, P(D), and negatives, P(D), in the tested population. This application of Bayes' theorem represents the second layer of the model. Finally, each term is multiplied by the costs, C, and values, V, associated with the different outcomes, and this is the third layer of the model.

The First Layer of the Model: Technical Properties The intrinsic discriminatory capabilities of assays can be described by two properties: the ability to distinguish true positive from "noise~" or sensitivity, and the ability to distinguish true negative from "noise," or specificity? These quantities are calculated by comparing test outcome with. the presence or absence of the diagnostic conditions in a "decision matrix" or "truth table." Positives among the diseased indicate technical sensitivity and negatives among the nondiseased, technical specificity. (These measures

112

February1981 have alternatively been called diagnostic sensitivity and specificity.) Neither quantity can be improved without damage to the other, but straightforward techniques are available to analyze this inevitable trade_off.2, 3, 5, 6 Simple as these fundamental concepts are, their application becomes laborious, because realistic estimates of technical sensitivity and specificity cannot be obtained from selected ideal populations. Rather these evaluations should be based on a patient sample that closely portrays tile population in which the tests in question will be applied. When such a sample of random subjects is used, physicians must establish the diagnoses to which test outcome will be compared through extensive chart review, since the r e c o r d e d diagnoses may not be assumed to be adequate and reliable and a re-evaluation of the raw data cannot be entrusted to a clerk. At present only a few published analyses of teclmical sensitivity and specificity have been based on such verified diagnostic information. Thus, the estimates of the technical reliability of the same test often vary widely. For instance, some studies of the lecithin-sphingomyelin ratio in amniotic fluid reported almost perfect distinction between mature and immature babies, but others obtained less favorable results. 6 A second concern relates to the establishment of the decision threshold used to separate a positive from a negative test outcome, because technical sensitivity and specificity reflect the way in which the normal range has been defined. Thus, one study inappropriately considered it significant when a screening assay produced some 5 per cent abnormal findings, even though the normal limits had been defined as the range of two standard deviations about the mean of a reference population (i.e., about a 95 p e r cent reference interval). As it is, the decision threshold can be set to optimize different desired outcomes of testing, but this facility so far has been exploited only rarely. 5' 6 Further confusion in the evaluation of technical sensitivity and specificity is caused by "regression toward the mean. ''r The laws of random sampling dictate that the number of pathological valuesdiminishes when abnormal findings are verified by a repeat assay. In an analysis of over 2000 repeats of abnormal findings, regression toward the mean produced norreal findings with frequencies ranging from 8 per cent in the case of serum cholesterol to 47 per cent in the case of serum bilirubin. TM

The Second Layer of the Modeh Disease Prevalence Apart from the intrinsic discriminatory capabilities of a procedure, the probability of establishing a true positive or negative finding depends on the prevalences of the conditions associated with positive and negative results. As reconsequence, the frequency of positives in a given population is expected to vary among tests.

COST EFFECTIVENESS OF MULTIPHASIC SCREENING--WERNER, ALTSttULER In one series o f over 1000 hospital admissions subjected to 18 chemistry tests o f blood, uric acid assay p r o d u c e d the highest n u m b e r o f 16ositives, 15 p e r centP Aspartate transaminase, glucose, a n d lactate d e h y d r o g e n a s e assays each yielded over 10 p e r cent abnormal values and u r e a nitrogen and bilirubin assays, each about 6 p e r cent. O t h e r tests p r o d u c e d positives in smaller n u m b e r s roughly equal o r even inferior to the f r e q u e n c y o f occurrences outside the r a n g e o f two s t a n d a r d deviations 9about the m e a n o f a Gaussian distribution. Some parameters, such as plasma p o t a s s i u m c o n c e n t r a t i o n , yielded only r a r e positives. In general, the closer a m e a s u r e d quantity is related to survival, the less likely abnormal findings are to occur in an a m b u l a t o r y population, since homeostatic mechanisms tend to regulate such par a m e t e r s as plasma electrolyte a n d blood gas levels within tight limits. T h e r e f o r e , these m e a s u r e m e n t s yield little in screening. Like technical sensitivity a n d specificity, prevalence must be established in the population in which a g i v e n test is to be used. Morbidity statistics for the general population only provide guideposts to the actual prevalences in specific populations a n d c a n n o t deal at all with the fact t h a t d i f f e r e n t conditions may cause a similar test o u t c o m e . Reports on the yield o f screening tests are sparse. Using a battery o f 20 chemistry a n d h e m a t o l o g y tests a m o n g 1000 hospital admissions, Korvin, Pearce, and Stanley t~ f o u n d 2223 (11 p e r cent) a b n o r m a l results. O f these, 675 (3.4 p e r cent) were clinically predictable, and 1325 (6.6 p e r cent) did n o t ' l e a d to a new diagnosis, but 83 (0.4 per cent) led to a new diagnosis in 77 patients, or 7.7 p e r cent o f the screened population. AnotheP, similar study d e t e c t e d some abnormality in over 40 per cent o f the screened subjects (Table 1). n O f f o u r positive findings, t h r e e were considered consistent with either the p r e s e n t o r a past diagnosis, while one r e m a i n e d u n e x p l a i n e d o r was considered an e r r o r . I f one allows for d i f f e r e n t degrees o f certainty, a new diagnosis was obtained in TABLE 1. POSITIVE FINDINGS AMONG ABOUT 2800 HOSPITAL ADMISSIONS IN AN 11 TEST SCREEN*t

about 7.5 p e r cent o f the screened population, a finding f r e q u e n c y r o u g h l y c o r r e s p o n d i n g to that, o f the previously detailed study. T h e most crucial feature o f these statistics is the low rate o f positive findings, and as a result o f Bayes' t h e o r e m the dilution effect comes into play. t T o illustrate this effect, consider a test that p r o p e r l y classifies nine o f 10 subjects with a n d without a certain disease (90 p e r cent technical sensitivity and specificity). W e r e the prevalence o f the disease 10 p e r cent, only h a l f the positive tests would in fact be associated with the disease. T h u s , owing to dilution, low disease prevalence cripples diagnostic sensitivity o r positive predictive value even t h o u g h the intrinsic discriminatory capabilities o f the test p r o c e d u r e are good.

The Third Layer of the Model: Costs and Values In tile crudest a p p r o x i m a t i o n tile cost o f an assay can be e q u a t e d with its price. In a m o r e refined a p p r o a c h attention is also paid to the f r e q u e n c y o f true positives, or disease prevalence, and the cost o f a positive test is calculated, t~ In this way the cost o f a positive u r i n e culture in m e n can be 10-fold that in women, a l t h o u g h the analytical p r o c e d u r e is identical (Table 2). Weighing according to prevalence inflates the cost disparities between assays. T h u s , in a study c o n d u c t e d in 1970 tile cost o f a positive m a m m o g r a p h y e x c e e d e d that o f a positive a u d i o m e t r y over 200-fold, while the divergence in unit cost was less than 20-fold. A truly c o m p r e h e n s i v e analysis o f value has to consider all costs a n d values associated with ruling in and ruling out disease, as indicated by the calculation o f the e x p e c t e d value. ~, 3 Both values a n d costs associated with true positives are related to therapy. T h e value associated with true negatives is confirmation o f health; t h e r e are only the costs o f the analytical TABLE 2. UNIT COST AND COST PER POSITIVE TEST CALCULATED FROM 44,663 MULTIPHASIC EXAMINATIONS OF APPARENTLY HEALTHY SUBJECTS* Cost per

Percentage

Abnormal findings " Consistent with present diagnosis Consistent with past diagnosis Unexpected and unexplained New diagnosis Possibly new "diagnosis Artefact or error

43.2 28.2 5.1 9.3 4.0 3.4 0.4

*Consisting of glucose, urea, sodium, potassium, chloride, carbon dioxide, calcium, phosphorus, total protein, albumin and uric acid assay. For the classification of abnormal-results, "slightly expanded" 95 per cent ranges obtained in a normal population were used. tAfter Bryan, D. J., Warne, J. L., Viau, A., Musser, A. W., Schoonmaker, F. W., and Theirs, R. E.: Clin. Chem., 12:137-143, 1966.

Type of Test Mammography Urine culture (men) Calcimh Respirometry Cholesterol Blood pressure Uric acid Chest x-ray film Urine culture (women) Urine protein Urine glucose Audiometry

Unit Cost

Positive Case

4.90 0.20 0.29 0.31 0.29 0.42 0.29 0.46 0.20 0.18 0.18 0.25

408.00 50.00 22.30 14. I 0 12.15 10.20 6.40 6.20 6.05 2.80 2.20 1.85

($)

.

($)

*After Collen, M. F., Feldman, R., Sieglaub, A. B., and Craw-

ford, D.: New EngI.J. Med.,283:459-463, 1970.

113

I~[JMAN PATHOLOGY--VOLUME 12, NUMBER 2 procedure. False negatives and false positives create costs but no values. The costs of a false negative compose all the consequences of untreated disease, and the costs of false positives are unnecessary diagnostic follow-up and therapy. Such enhanced weighing of test outcome swells the dollar cost diversity between tests further and therefore should facilitate selection of the most cost effective procedures. However, including human suffering in objective measures poses obvious difficulties, and consequently the application of these calculations may remain controversial if not impracticable in many cases. PRACTICAL

ISSUES

Health screening has three practical goals: a reduction of morbidity and, mortality, an increase in physician productivity, and the reassurance of patients. Diagnostic data, on the other hand, can be used for three purposes: to discover disease, to confirm disease, find to exclude disease. What are the relationships between these goals and purposes? Discovery of disease may reduce morbidity and mortality; exclusion of disease may reassure patients; and the productivity of the physicians may be increased through the discover),, confirmation, and exclusion of disease by laboratory procedures.

Increase of Productivity In assessing the value o f health screening, little attention has been paid to the long sequence of events that separate analytical findings from the medical decisions, which are their ultimate purpose (Fig. 1). In such a chain all irrevocable errors occurring at a point where action is required are equivalent regardless of where they occur. Thus, from the point of view of outcome, a true positive finding that does not lead

Result Norma~normal Ignore/~ecognized Previously k n o ~ k n o w n Not retested~Retested Norma~Abnormal Not treated

Treated

Figure 1. Linkage betweentest outcomeand relevant medical

action.

114

Febntary1981 TABLE 3. RESPONSE OF PHYSICIANS TO T H E A P P E A R A N C E OF A N A B N O R M A L L A B O R A T O R Y F I N D I N G IN A SCREENING B A T T E R Y OF TESTS APPLIED TO 547 PATIENTS*

e,

Outcome

Total abnormal Test repeated Becomes normal Remains abnormal Diagnosis Made Not made Interpretation when no diagnosis made Not clinically significant None No note

53

20

7

51

5 2

3 1

2 0

5 3

7 46

3 17

3 4

6 45

22 17 7

7 7 3

0 3

16 16 13

1

*After Sclmeiderman, L. H., De Salvo, L., Baylor,S. and Wolf, P. L.: Arch. Intern. Med., 129:88-90, 1972. to tile proper action is indistinguishable from a falsely negative test. In both cases the detection of true positives (technical sensitivity) is impaired in a formal sense, but in practical terms overall productivity is compromised. How tight.is the linkage between a positive test outcome and medical action? Table 3 lists abnormal findings in alkaline phosphatase, bilirubin, aspartate transaminase, and urea nitrogen assays obtained during the screening of 547 patients admitted to a university hospital. 13 Only about one in four of these findings caused repeat testing to be undertaken. The confirmed positives necessarily were even fewer. Ultimately a diagnosis was made in but few cases, because most abnormal results were considered not clinically significant or were altogether ignored. Bates and Yellin t4 have reported one of the best detailed analyses of the yield of multiphasic screening of apparently healthy subjects. Among a population of about 2400 case s glucose assay was the test most frequently found pos!tive (17.9 per cent), followed by audiometry (11.1 per cent), cholesterol assay (9.9 per cent), total lipid assay (8.5 per cent), and the confirmation of bacteriuria (8.2 per cent). Tonometry, visual field examination, assay of forced expiratory volume, hemoglobin, protein bound iodine, calcium, potassium, uric acid, alkaline phosphatase, and aspartate transaminase assays each yielded less than 6.6 per cent positive tests, and in these cases the question of how the normal range had been defined would have merited closer examination. Once an abnormal finding" had been obtained, the frequency of confirming action varied greatly among assays (Table 4) ~. Tests indicating hyperglycemia, bacteriuria, or anemia were repeated in about one of two cases. Abnormal resuhs in tonometry and

COST EFFECTIVENESS OF MULTIPHASIC SCREENING--WERNrR, ALTSrIULER TABLE 4. FREQUENCIES BY WHICH RETESTING WAS ORDERED TO CONFIRM ABNORMAL FINDINGS IN DIFFERENT TESTS OF A SCREEN*t Percentage

Glucose Urinary bacteria Hemoglobin Tonometry Uric acid Protein bound iodine (or T4) Aspartate transaminase Alkaline phosphatase Visual fields Forced expiratory volume Audiometry

54.3 49.1 44.8 29.9 29.3 27.4 24.1 21.6 7.3 4.2 1.7

*See text for details of battery and population. rafter Bates, B., and Yellin, J. A.: J. Amer. Med. Assoc., 222:74-78, 1972.

in uric acid, protein b o u n d iodine, aspartate transaminase, and alkaline phosphatase d e t e r m i n a t i o n s led to retesting, r o u g h l y speaking, only in a b o u t o n e o f f o u r cases. Abnormalities o f visual field, forced e x p i r a t o r y volume, o r a u d i o m e t r y led to retest!ng even m o r e rarely. T h e frequencies with which new m a n a g e m e n t resulted f r o m an a b n o r m a l finding again varied a m o n g tests (Table 5). About o n e in t h r e e subjects with elevated total lipid o r glucose levels obtained ne'er m a n a g e m e n t . A u d i o m e t r y yielded f r e q u e n t positive tests, but these were rarely c o n f i r m e d by a retest o r led to new m a n a g e m e n t . T h e s e studies, as others d o c u m e n t i n g deficiencies in the f o l l o w - u p o f a b n o r m a l laboratory findings, ~5-~7 show that the most crucial issue affecting productivity is data utilization, not data acquisition. 4 A fair assessment o f the benefits to be obtained f r o m screening c a n n o t be based on the m e r e evaluation

TABLE 5. FREQUENCIES WITH WHICH NEW MANAGEMENT WAS INSTITUTED ON THE BASIS OF AN ABNORMAL FINDING IN DIFFERENT TESTS OF A SCREEN*t 9

Total lipids Glucose Cholesterol Uric acid Urinary bacteria Hemoglobin (or hematocrit) Forced expiratory volume Tonometry Calcium Visual fields Audiometry

Percentage

35.9 34.9 28.8 27.6 27.4 27.3 6.7 .6.3 5.4 3.7 2.3

*See text for details of test batter)" and population. rafter Bates, B., and Yellilr J. A.: J. Amer. Med. Assoc.~ 222:74-78, 1972.

o f how o u t c o m e actually was affected; r a t h e r one must examine how o u t c o m e should have been affected. Without this obvious realization m a n y otherwise well c o n d u c t e d investigations, such as a f r e q u e n t l y q u o t e d Australian study, ~8 r e m a i n uninterpretable.

Reassurance of Patients A c o m m o n l y e m b r a c e d belief o f multiphasic health screening holds that late health care costs are traded for early costs. According to this p r e s u m p t i o n , screening initially p r o d u c e s large costs when latent and incipient disease is u n c o v e r e d a n d treated, whereas the offsetting savings resulting f r o m imp r o v e d health are only e a r n e d later. T h e r e f o r e , it is a r g u e d that preventive health screening must lead to financial failure b e f o r e the benefits o f these efforts can be reaped. T o test this hypothesis, one o f the a u t h o r s cond u c t e d a study o f multiphasic screening in a trade u n i o n that provides c o m p r e h e n s i v e health care benefits to its m e m b e r s . 19 A m o n g the screened subjects (426 cases), f o u r r o u g h l y equal g r o u p s had all n o r m a l findings (109 cases), one isolated a b n o r m a l finding (88 cases), only insignificant a b n o r m a l findings (95 cases), o r findings considered sufficiently i m p o r t a n t to require attention by a physician (134 cases). T h e latter subjects were advised in writing to obtain medical assistance ahd received two f u r t h e r written reminders if they did not do so. In tile end, the screened subjects were matched by race, age, and sex with o t h e r u n i o n m e m b e r s . This c o m p a r i s o n showed that the screened individuals p r e s e n t e d fewer claims and that their average claim was m a r k e d l y smaller than in individuals not screened (Table 6). As a resuh, total health care e x p e n d i t u r e s were m a r k e d l y smaller in the g r o u p o f screened subjects, and by extrapolating f r o m the f o u r m o n t h study period, annual savings in health care costs o f $190.00 (year 1971) for every s c r e e n e d subject were calculated. Others have since r e p o r t e d similar findings. 2~ A retrospective study o f o l d e r males receiving multiphasic health check-ups over a p e r i o d o f five to seven years c o n c l u d e d tlmt there was a r e d u c t i o n in selfrated disability and time lost f r o m work, a g r e a t e r p r o p o r t i o n o f subjects working when c o m p a r e d to TABLE 6. COMPARISON OF CLAIMS AND EXPENDITURES FOR REIMBURSEMENT OF HEALTH CARE COSTS BY SUBJECTS RECEIVING A HEALTH CARE SCREEN AND RANDOMLY MATCHED CONTROLS* Not Screened

Individuals in group Individuals with claims (4 months) Number of claims (4 months) Average claim (dollars) * Total expenditure (dollars/4 months)

426 68 72 498 35,856

Screened

426 20 24 289 6,936

*After Werner, M., and Brecher, G.: Clin. Res.,21:191, 1973.

115

HUMAN PATHOLOGY--VOLUME 12, NUMBER 2 controls, and a lower self-reported utilization of medical services. 2t Therefore, it is reasonable to conclude that multiphasic health screening can have a beneficial economical impact in its early stages, through psychological reassurance, which is independent of the strictly medical value of the screening program.

Reduction in Morbidity and Mortality Although morbidity and mortality are clearly defined and conceptually simple benchmarks, the effect of screening upon them has hardly been documented. Large numbers of subjects are required to estimate morbidity, and the study of mortality in addition necessitates long time periods. Table 7 shows statistics for over 10,000 subjects, split into equal control groups and study groups receiving regular multiphasic health testing and followed over a seven year period. 22 By comparing the death rates per 1000 persons from all causes, it was found that subjects participating in the screening program did somewhat better than contrgls, but the statistical significance of this difference is questionable. On the other hand, the difference in the death rates due to potentially postponable diseases is more marked and may be significant from not only a statistical but a practical point of view. Still, it should be recognized that postponable diseases are responsible only for a minor fraction of all deaths.

"CONCLUSIONS RATIONALE

AND

A NEW

Concepts for the rigorous analysis of cost effectiveness of multiphasic health screening and profile testing may have been developed, but the systematic and comprehensive data base required for their application is not available. As it is, pertinent information is scanty, perhaps outdated and unreliable, and little seems to be added to it through new research. Indeed, the very ground rules under which such investigationsshould be c o n d u c t e d n e e d to be clearly defined. Therefore, evaluation continues to rely largely on practical experience. Here available evidence appears to contradict some of the common preconceived notions both favoring and against the use of health screening. What conclusions result from this analysis? Since a formal justification of multiphasic health testing is unlikely to come forth speedily, prejudices supported by some partial truths will continue to confound the issue of who should be screened. T h e Utilization of test profiles can be improved by some straightforward measures detailed previously by us, 4 but if multiphasic screening is to remain in use, the ties imposed by this dilemma must be broken and a rationale for profile testing must be developed that circumvents the established battle positions. We propose that profile testing be justified by four uncontroversial facts. First, mechanization re-

116

February 1981 TABLE 7. C O M P A R I S O N OF MORTALITIES I N T W O P O P U L A T I O N S , O N E RECEIVING REPEATED M U L T I P H A S I C CHECK-UP E V A L U A T I O N S OVER A SEVEN YEAR PERIOD, T H E O T H E R SERVING AS C O N T R O L * Parameter

Study

Control

Population at risk Death rate per 1000 persons for a 7 year period For all causes Potentially postponable

5146

5540

35.6 3.7

39.2 7.4

*After Dales, L. G., Friedman, G. D., Ramcharan, S., Siegelaub, A. B., Campbell, B. A., Feldman, R., and Collen, M. F.: Prevent. Med., 2:221-235, 1973. duces the incremental cost per test when more tests are added to an established test or test profile. Indeed, as analytical costs continuously decrease, the stable or even increasing costs of sample collection become relatively more onerous, and there is an economical incentive to perform an optimal number of assays on each sample. Second, test requests came in a countless variety o f mixtures. For instance, one university hospital counted more than 1000 different test combinations ordered over a one month period. 2a T h e question arises whether this diversity has a medical justification, since catering to such heterogeneous wishes necessitates a complex and costly organization. The work force must be kept well trained; reliability requires intense supervision; most important, record keeping must resolve the monstrous task o f keeping track of samples collected by patient, analyses performed by test, and reports being again collected by patient. As a consequence of these complexities, many laboratories had to introduce electronic data processing. Third, there is a superficial belief that laboratory tests serve mainly as diagnostic aids. In reality, most testing is done to support the management of patients rather than for diagnosis. For instance, the tens of thousands of glucose assays performed annually in the laboratory of any middle sized hospital obviously are not ordered to discover new cases of diabetes mellitus but to manage known diabetics. Similarly, requests for potassium assays typically exceed those for the other plasma electrolytes in number not because hyperaldosteronism or related conditions frequently are a diagnostic consideration but because potassium metabolism becomes a concern in patients receiving a variety of treatments. Fourth, contrasting with the infinite variety of diagnostic possibilities, the management of patients has been reduced to a rather finite number of approaches. For instance, whereas renal insufficiency can be due to multiple congenital diseases, a variety of modes of infection, various nephrotoxic effects, or extrinsic causes such a; inadequate perfusion, management of the resulting conditions relies on a core of standard measures. Tiros, it is reasonable to match

COST EFFECTIVENESS OF M U L T I P H A S I C SCREENING--WERNER, ALTSHULER s t a n d a r d i z e d m a n a g e m e n t by test p r o c e d u r e s s t a n d a r d i z e d into fixed profiles. T o g e t h e r these p o i n t s lead to the c o n c l u s i o n t h a t t h e use o f s t a n d a r d i z e d test profiles is m e d i c a l l y justifiable, e x p e d i e n t , a n d cost effective, b u t implem e n t a t i o n o f test profiles r e q u i r e s c o n s i d e r a t i o n o f t h r e e issues. First, e a c h institution s h o u l d d e v e l o p t h e b a t t e r y o f profiles best suited f o r it. O r d e r i n g patt e r n s n o t only v a r y with the p o p u l a t i o n s e r v e d b u t reflect t h e p r e v a l e n t types o f medical p r a c t i c e as well. T o d o c u m e n t existing c i r c u m s t a n c e s , a statistical d a t a base m u s t be established. T h i s is m o s t easily d o n e by a n a l y z i n g the o r d e r s f o r l a b o r a t o r y w o r k s u b m i t t e d , say, o v e r a o n e m o n t h p e r i o d . S e c o n d , t h e b u l k o f test r e q u e s t s s h o u l d be acc o m m o d a t e d by a limited n u m b e r o f test profiles. Analysis o f c u r r e n t o r d e r i n g p a t t e r n s is likely to disclose that the g r e a t e r p a r t o f l a b o r a t o r y w o r k a l r e a d y falls into a restricted n u m b e r o f typical panels, w h e r e a s the. p r o f u s e n u m b e r s o f o t h e r test c o m b i n a t i o n s a r e individually o r d e r e d o n l y rarely. T h i r d , a p p r o p r m t e m a t h e m a t i c a l analysis o f t h e statistical d a t a base s h o u l d specify o p t i m a l sets o f profiles f o r v a r i o u s p u r p o s e s . By using B o o l e a n fact o r analysis, the d i f f e r e n t sets o f profiles c a n be d e f i n e d by w h i c h t h e f r e q u e n c y o f o r d e r s n o t m e t . b y a profile is k e p t fixed, the n u m b e r o f profiles is k e p t fixed, t h e sizes o f the profiles are k e p t fixed, or, alternatively, t h e f r e q u e n c y o f p e r f o r m i n g tests t h a t h a d n o t b e e n specifically r e q u e s t e d is k e p t fixed.

REFERENCES 1. Werner, M., Brooks, S. H., and Wette, R.: Strategy for cost effective laboratory testing. Hum. Path., 4:17-30, 1973. 2. Werner, M.: Ein dreischichtiges Modell zur Bewertung der Wirksamkeit yon Analysen und Befunden. Med. Welt, 28:1254"-1257, 1977. 3. Werner, M.: Will abstract models change the practice of medicine? In Benson, E: S., and Rubin, M. (Editors): Logic and Economics of Clinical Laboratory Use. New York, Elsevier, 1978, pp. 41-46. 4. Werner, M., and Altshuler, C. H.: Utility of muhiphasic biochemical screening and systematic laboratory investigations. Cliu. Chem., 25:509-551, 1979. 5. Werner, M., and Mohrbacher, R.J.: Aids to evaluate diagnostic performance applied to immunological creatine-kinase

6.

7. 8. 9.

10. 11.

12. 13. 14. 15. 16. 17. 18.

19. 20. 21. 22.

23.

MB assay in myocardial infarction. J. Clin. Chem. Clin. Biochem., 17:359-362, 1979. Werner, M., Genta, V. M., and Williams, D.: Optimization of diagnostic discrimination applied to the amniotic fluid lecithin/sphingomyelin ratio. Clin. Chem. Acta, 101:155-162, 1980. Sackett, D. L.: The usefulness of laboratory tests in healthscreening programs. Cfin. Chem., 19:366-372, 1973. Bauer, F. W.: Biochemical profiles of suburban, inner city, and neuro-psychiatric patients. Md. State Med. J., 23:32-33, 1974. Belliveau, R. E., Fitzgerald, J. E., and Niizkerson, D. A.: Evaluation of routine profile chemistry screening of all patients admitted to a community hospital. Amer. J. Clln. Path., 53:447--451, 1970. Korvin, C. C., Pearce, R. E., and Stanley, J.: Admissions screening: clinical benefits. Ann. Intern. Med., 83:197-203, 1975. Bryan, D. J., Warne, J. L., Viau, A., Musser, A. W., Schoonmaker, F. W., and Thiers, R. E.: Profile of admission chemical data by multicbannel automation: an evaluative experiment. Clin. Chem.,,/2:137-143, 1966. Collen, M. F., Feldman, R., Siegelaub, A. B., and Crawford, D.: Dollar cost per positive test for automated multipbasic screening. N. Engl. J. Med., 283:459--463, 1970. Schneiderman, L. J., DeSalvo, L., Baylor, S., and Wolf, P. L.: The "abnormal" screening laboratory result. Arch. Intern. Med., 129:88--90, 1972. Bates, B., and Yellin, J. A.: The yield of multiphasic screening. J. Amer. Med. Assoc., 222:74-78, 1972. Wheeler, L. A., Brecher, G., and Sheiner, L. B.: Clinical laboratory use in the evaluation of anemia. J. Amer. Med. Assoc., 238:2709-2714, 1977. Brook, R. H., and Stevenson, R. L.,Jr.: Effectiveness of patient care in an emergency room. N. Engl. J. Med., 283:904-907, 1970. Kelly, C. R., and Martin, J. j.: Ambulatory medical care quality. Determination of diagnostic outcome. J. Amer. Med. Assoc., 227:1155--1157, 1974. Durbridge, T. C., Edwards, F., Edwards, R. G., and Atkinson, M.: Evaluation of benefits of screening tests done immediately on admission to hospital. Clin. Chem., 22:968-971, 1976. Werner, M., and Breeher, G.: Multiphasic screening. Early disease detection or reassurance? Clin. Res., 21:191, 1973. Collen, M. F., et al.: Muhiphasic checkup evaluation study. 4. Preliminary cost benefit analysis for middle-aged men. Prevent. Med., 2:236-246, 1973. Ramcharan, S., et al.: Multiphasic checkup evaluation study. 2. Disability and chronic disease after seven years of muhiphasic health checkups. Prevent. Med., 2:207-220, 1973. Dales, L. J., et al.: Multiphasic checkup evaluation study. 3. Outpatient clinic utilization, hospitalization and mortality experiences after seven years. Prevent. Med., 2:221-235, 1973. Cole, G. W.: Personal communication. Division of Laboratory Medicine The George Washington University Medical Center 901 23rd Street, N.W. Washington, D.C. 20037 (Dr. Werner)

117