ORIGINAL CONTRIBUTION criteria, high yield, methods of evaluation; methodology, research, high-yield criteria
A Simple Method for Evaluating the Safety of High.Yield Criteria High-yield criteria for selecting patients for further investigation or treatment sometimes may be associated with an increase in the false-negative rate (decrease in sensitivity), when compared with conventional methods of selection. A simple method, suitable for use by nonstatisticians, is presented. It enables the determination of the sample size required to measure the false-negative rate to a given degree of precision, or the smallest increase in false-negative rate that a study of a given size is likely to detect. Examples of use are provided. [Wears RL, Kamens DR: A simple method for evaluating the safety of high-yield criteria. Ann Emerg Med April 1986;15:439-444.] INTRODUCTION An increased interest in controlling costs and developing a logical basis for medical decision making have led to the development of so-called high-yield criteria (HYC) to be used in selecting patients who are most likely to benefit from subsequent testing or treatment. HYC are rules that may use any combination of historg, physical findings, laboratory results, and the like which, when applied, will identify a population of patients who are much more likely to have the condition in question than is the larger population of patients currently identified as being at risk. Such criteria have been developed for skull, t,2 extremity,3 cervical spine, 4 and plain abdominal 5 radiographs, lumbar punctures, 6 and coronary care unit admission,7, s to name only a few. Essentially, any,author who proposes restriction of diagnostic or therapeutic procedures to d ~limited class of patients is proposing HYC for that clinical situation, whether this is explicitly stated or not. For example, many patients present to emergency departments complaining of recent onset of chest pain. This population of patients contains a subpopulation of patients with acute ischemic heart disease (AIHD) (eg, acute myocardial infarction, unstable angina} that may constitute as much as 30% of the total population at risk. 7 Because definitive identification of this subgroup is so expensive and time consuming that it cannot be applied to the entire population at risk, the physician must select patients who presumably are at greater risk to undergo definitive identification and/or treatment by hospital admission and monitoring, serial enzymes, and so forth. Physicians currently select chest pain patients for admission by using a combination of ill-defined rules and subjective feelings best described by the term "clinical judgment." Clinical judgment, as currently used, however, leads to hospital admission for about 50% of the population at risk, of whom almost 50% eventually turn out not to have AIHD; conversely, about 2% to 3% of those patients not selected can be shown to have been suffering from AIHD at the time of the ED visit. 7 Good HYC for admission for acute chest pain should increase the specificity (Table 1) of selection practices {decreasing false-positive admissions} without decreasing unacceptably their sensitivity (or increasing false-negative discharges.} While this may be theoretically achievable by the development of new tests that have a higher information yield, it more often is the case that HYC attempt to adjust the cutoff point for patient selection using currently available information. Here sensitivity usually fails in attempts to increase specificity.9 Thds any evaluation of HYC must ask the following question: Is the decrease in sensitivity brought about by the HYC clinically acceptable?
15:4 April 1986
Annals of Emergency Medicine
Robert L Wears, MD, FACEP Donald R Kamens, MD, FACEP Jacksonville, Florida From the Division of Medical Computer Applications, Department of Emergency Medicine, University of Jacksonville, Jacksonville, Florida. Received for publication May 24, 1985. Revision received October 30, 1985. Accepted for publication December 23, 1985. Presented at the University Association for Emergency Medicine Annual Meeting in Kansas City, Missouri, May 1985. Address for reprints: Robert L Wears, MD, FACEE Synchrony Systems, 4191 San Juan Avenue, Jacksonville, Florida 32210.
439/91
H I G H - Y I E L D CRITERIA Wears & K a m e n s
Statement of the Problem The acceptability of a given decrement in sensitivity will depend on the corresponding increase it produces in the false-negative rate and the clinical consequences of false-negative decisions. Clinicians might be willing to accept a 100% increase in falsely negative decisions to omit a foot radiograph, but would be unwilling to accept a 20% increase in false-negative decisions to send home an adult with acute chest pain. It generally is not appreciated that small decrements in sensitivity m a y p r o d u c e m a n y f o l d increases in the false-negative rate. For example, a decrease in sensitivity from 0.95 to 0.90 doubles the false-negative rate [Table 2). Therefore, it becomes extremely important that the sensitivity of HYC be known with a high degree of precision prior to their general acceptance in practice. Because it is likely that third-party payers may begin to require large-scale i n t r o d u c t i o n of HYC into clinical practice for reimbursement of particular diagnostic or procedure codes,:O we may not be able to rely on a period of gradually increasing use to identify problems with any particular implementation of HYC. We must be able to establish both safety and efficacy prior to their general application. Unfortunately m a n y of the current studies of HYC have concentrated on efficacy, and have not rigorously addressed potential increases in false negatives. In most of these, changes in sensitivity or false-negative rate were not statistically significantly different f r o m t h e c o n t r o l s used~ however, many of these studies were not large enough to detect a doubling or even a tripling of the false-negative rate. For example, in children presenting w i t h their first febrile c o n v u l s i o n , ]offe and colleagues 6 were able to develop a set of HYC for selecting patients to undergo l u m b a r puncture; their sensitivity was 1.0 in that their HYC were positive in all 11 patients with purulent meningitis. Their study was so small, however, that there was only a 35% chance that a true sensitivity of less than 0.85 could be detected; in fact, there is a 5% chance t h a t the true s e n s i t i v i t y m i g h t be lower than 0.76. This corresponds to a false-negative rate of 0.24, which is more than twice the highest falsenegative rate ever reported in this setting. n Because even the most difficult 92/440
TABLE 1. Definitions
Disease
Selection Test + A ~
Selection Test -
C
D
No Disease
Sensitivity
=
True positive rate -
A A+B
Specificity
=
True n e g a t i v e r a t e =
D C+D
Prevalence
=
A + B A + B + C + D
B
False-negative rate = 1 - true-positive rate; false-positive rate = 1 - true-negative rate.
TABLE 2. Small changes in sensitivity produce large changes in the
false-negative rate Before
After
Sensitivity
0.95
0.90
0.05/0.95 = 5.3% decrease
False-negative rate
0.05
0.10
0.05/0.05 = 1 0 0 % increase
(=
1 -
% Change
sensitivity)
A 5% decrease in sensitivity has produced a 100% increase in the false-negative rate.
bureaucrat would be unlikely to require HYC that could cause 50% or more of the cases of purulent meningitis to be missed, it is obvious how important it is to evaluate the safety of HYC as well as their efficacy. Although the idea of the power of a study is well grounded in the biostatistical literature, 12 it is only reported in 12% of the papers appearing in leading general medical journals. 13 The purpose of our paper is to use the statistical concept of power to develop a simple tool that can be used by investigators, editors, h e a l t h economists, teachers, and clinicians to evaluate the safety of HYC in terms of the m a x i m u m acceptable increase in false negatives. This will be expressed in two ways: one to determine the sample size required to m e a s u r e sensitivity to a given degree of precision, and the other to determine the smallest increase in false negatives that a study of given size is capable of detecting. We are not aware of other studies that have addressed this question apart from a specific clinical scenario. Annals of Emergency Medicine
Formulae The derivation of the following formulas, the simplifying a s s u m p t i o n s involved, and their validation are presented in the Appendix. Studies evaluating HYC use either internal controls in the form of random assignment of patients to experimental and control groups, or external controls, in which the investigator or reader m u s t compare the results obt a i n e d to c o n t r o l v a l u e s t h a t are k n o w n f r o m the literature or s o m e other source. The first design obviously is superior, but it is considerably more expensive and more difficult to perform~ we present formulas applicable to both (Appendix). Fortunately, the approximate forms presented here differ only by a factor of two. The formula is most easily remembered in the form of Equation 1 (Figure), where n = the n u m b e r of patients with the disease or condition in question in each group in the study; t = the m a x i m u m percentage increase in false negatives considered chnically acceptable, expressed as a d e c i m a l fraction leg, 100% increase = 1.0); and 15:4 April 1986
FIGURE. Equations.
K = nt 2
(Equation 1)
t = x,'~
(Equation 2)
K --- a constant dependent on the cutoff for a and [3 errors, and on the sensitivity of current m e t h o d s of selection (Appendix). For the generally accepted a and [3 error cutoffs, K is equal to: 15.7(s)
(l-s) t = '~/
75 (90)
¥
~
0.91
(Equation 2A)
(i-s)
TP TP + FN
s*
(Equation 3)
TP*
(Equation 4)
TP* + ( I + I ) F N
(Equation 5)
d=s-s*
d
TP TP + FN
1 - (I+t)FN 1 - (1 +t)FN + (1 +t)FN
(Equation 6)
d = t(1-s)
n = [z~ 2 ~
(Equation 7)
+ z[8 ~/ ( p + d ) ( q - d ) + ( p - d ) ( q + d ) ] 2 d
2s [ z ~ + z [ 3 f l + ( 1 - s ) t 2 + ( 2 s - 1 ) t n = i2(l_s)L 2s
n = [z~ ~ - +
z ~ / ( p + d) (q - d) ]2
n = [z~, ~ q - +
~/
n =
pq + ( q - p ) d - d 2 ] 2
]2
(Equation 8)
(Equation 9)
(Equation 10)
(Equation 11 )
15.7(s) t2(1 - s)
(Equation 12)
n = 7.84(s) t2(1 - s)
(Equation 13)
15:4 April 1986
for a two-group, comparative study, or 7.68(s)
Annals of Emergency Medicine
for a single group, external controls study. Using these values implies that we will decide that a difference in sensitivity is significant at the P < .05 level, and that the risk of failing to detect a difference of t or more is 20%. This is most easily understood by substituting a value for t into the following sentences: "There is a 20% chance of a [100t) p e r c e n t i n c r e a s e in false n e g a t i v e s b y u s i n g t h e H Y C " ; or "There is a 1 to 4 chance that there w i l l be a [100t) p e r c e n t i n c r e a s e in false negatives by using the HYC." The value of K for various values of sensitivity is shown [Table 3). By rem e m b e r i n g t h a t K ~--- 150 f o r a b e n c h m a r k s e n s i t i v i t y of 0.90, other values of K m a y be estimated quickly. Recall that t h e false-negative rate is equal to 1 m i n u s the s e n s i t i v i t y (in this case, false-negative rate = 0.10). If we double the false-negative rate [sensitivity = 0.80), we half K (to about 75); if we halve the false-positive rate (sensitivity = 0.95), we double K (to about 300 for a comparative study). By rearranging the terms of equation 1 to solve for n or t, we can determ i n e either the sample size required to have an 80% chance of detecting a given increase in false negatives, or t h e s m a l l e s t i n c r e a s e in false negatives that a study of given size is likely to detect. A n o t h e r e a s y - t o - r e m e m b e r benchm a r k m a y be identified by noting that a doubling of the false-negative rate would m a k e t equal to one, yielding the relationship n = K = (141 or 71}, which simply states that a comparative study requires about 140 patients w i t h disease in each group in order to have a I in 5 chance of detecting a d o u b l i n g of t h e f a l s e - n e g a t i v e rate (from 0.10 to 0.20). If the current f a l s e - n e g a t i v e r a t e is e s t i m a t e d at 0.05, about 300 patients per group are 441/93
HIGH-YIELD CRITERIA Wears & Kamens
required, and vice versa. It is important to note that n in these expressions represents the number of patients with the disease or c o n d i t i o n in question. The total number of patients required in the study may be estimated by dividing the number of diseased patients by the prevalence in the study population. APPLICATION The preceding discussion has been entirely general. To understand the utility of the formulas, consider the following two examples.
TABLE 3. Value of K
Sensitivity
K (two-group)
0.99
K (single-group)
1,552
776
0.975 0.95
611 298
306 149
0.90 0.85
141 89
71 44
0.80
63
31
0.75 47 Sensitivity is the estimated sensitivity of current selection practices.
24
Example 1 HYC are desired to decrease the number of extremity radiographs ordered for evaluation of ambulatory patients with minor trauma. Preliminary investigation has revealed that the sensitivity of current selection practices is 0.95, that is, 95% of such patients who have fractures are judged "positive" by clinicians in that they have extremity radiographs ordered. It has been decided that a 25% increase in false negatives is acceptable in light of cost savings anticipated. How many patients with fractures are required to be 80% sure that an increase of 25% or greater is not missed in a twogroup, comparative study? In this situation, t = 0.25 and s = 0.95; thus, using equation 1, K is adjusted to 298 (or 300 may be used for mental calculation) to obtain 298 n = - 4,768 (0.25) (0.2s) If only one patient in 30 in the population studied has a fracture, we need to include about 143,000 (4,768 x 30 x 2 groups) total patients in the study to be 80% sure that the increase in false negatives is 25% or less. This is such an impractical number that the investigator would be likely to accept a larger increase in false negatives; for example, he might decide to accept a 100% increase in false negatives. This gives n ~-~ 300 patients with fractures needed in each group, implying a total study size of (only) 18,000. Although this is m a n y times larger than the usual study, it is not inconceivable. More important, it is now clear that implementation of HYC validated in a study of this size may be associated with a 20% risk of doubling the number of false negatives. Finally, the investigator might be forced to change the study design to a single group with external controls in
94/442
order to achieve tractability. By dividing K by 2, we determine that 150 patients with fractures (or about 4,500 total) are required to be 80% certain that the false-negative rate has not more than doubled.
Example 2 A study is published regarding HYC for selecting patients with acute head injuries for emergency computed tomography (CT}. It is estimated from the literature and clinical experience that current techniques of selecting patients for CT of the head have a sensitivity of 0.90, that is 90% of patients with acute intracranial injury are sent for emergency CT scans. The study attempted to validate the HYC on a group of 90 patients w i t h knowfi acute intracranial injury; there was no control group. The false-negative rate found in the study was 0.10, which is not significantly different from the value estimated for current selection practices. What is the largest increase in false negatives that might have gone undetected in this study? In this situation the proportionate increase in false negatives is given by rearranging Equation 1 to produce Equation 2 {Figure). Substituting n = 90 and K ~ 75 produces the figure 0.91 in Equation 2A (Figure). Thus implementation of the HYC based on this study might be associated with about a 90% increase in false negatives, a near doubling. The odds are 4 to 1 against the false-negative rate increasing by this or more. It should be kept in mind that these are approximate techniques designed to produce order of magnitude estimations of either study size or resolving power. Thus if Equation 1 suggests that 281 patients are required, a study with only 200 still may be valid; a
Annals of Emergency Medicine
study based on only 25 patients, however, probably would be worthless. Also, the accuracy of these approximate relationships decreases as t becomes progressively greater than 1.0; however, they still are accurate within a range of about 20% {two groups) to 30% (single group) in estimating sample sizes from t = 1.0 to t = 2.0. Moreover, HYC that are expected to triple (t = 2.01 the false-negative rate seem unlikely to warrant long consideration. DISCUSSION The sample size requirements for m a n y c o n d i t i o n s for w h i c h HYC would be desirable are likely to be prohibitively large; this represents a sort of "medical uncertainty principle" in that although reliable knowledge about the precise value of the sensitivity of any set of HYC is in principle possible {given sufficient resources and time), it is, in practice, unobtainable. This leads to the paradox that, if the disease in question is sufficiently rare such that HYC are desirable, it will probably be impossible to validate them, while if it is common enough to enable easy validation, HYC may not be of much use! There are several alternative strategies that investigators, health planners, and other interested parties might use to further their goals of making medical practice more logical and more cost effective. The first option would be to abandon the ideal of doing a "definitive" study lie, a two-group, prospective study done on an undifferentiated population) and instead do two smaller, retrospective studies, one on patients with the complaint in question who have been determined to be "normal" to establish efficacy lie, a low 15:4 April 1986
APPENDIX
We have u s e d t h e n o r m a l approxi m a t i o n to t h e exact b i n o m i a l distribution even w h e n the total n u m b e r of cases is small. Computer-based experimentation with sample values has s h o w n t h a t t h e error i n t r o d u c e d by this approximation is small and, for the purposes of this paper, negligible. Also, we can by symmetry limit ourselves to consideration of sensitivities between 0.5 and 1.0; in practice, sensitivities will tend to be in the higher half of this range. Consider the situation in which any d e c r e a s e i n s e n s i t i v i t y is d e e m e d clinically u n a c c e p t a b l e b e c a u s e t h e cost of false negatives is too high. Because we k n o w t h a t due to r a n d o m variation it is unlikely that measurem e n t of the sensitivity of a selection rule will be exactly equal on two different determinations, we m u s t determ i n e a range of acceptability around our target sensitivity. T h i s is done most meaningfully by considering the m a x i m u m proportionate increase we would tolerate in the n u m b e r of false n e g a t i v e s . For e x a m p l e , a 50% increase m i g h t be tolerable; if we call this tolerance factor t and express it as a decimal {0.5 in this case), then the m a x i m u m f a l s e - n e g a t i v e r a t e (FN) considered acceptable will be FN(1 + t). (t is a proportionate increase, that is, an increase in FN from 0.05 to 0.075 is a 50%, n o t a 2.5% increase, and t would equal 0.50.) Thus if the original s e n s i t i v i t y s was, by d e f i n i t i o n as s h o w n in Equation 3 (Figure}, where TP is the true-positive rate and FN is the false-negative rate, the lowest acceptable sensitivity of the HYC s* will be as shown in Equation 4 (Figure). Because we k n o w t h a t t h e t r u e p o s i t i v e rate plus t h e false-negative rate m u s t equal 1.0, TP* m u s t be equal to 1 - (I+t)FN. (t m u s t always be less than the ratio of s/(1-s) or TWFN because, by definition, s, TP, and FN cannot be less t h a n zero.) Letting d be the difference b e t w e e n the original sensitivity and the m i n i m u m acceptable HYC sensitivity (s) we derive Equations 5 and 6 (Figure). This can be simplified to Equation 7 (Figure). Because d represents the m a x i m u m tolerable difference in sensitivity, to ensure the safety of HYC we m u s t be able to d e t e r m i n e t h e i r s e n s i t i v i t y w i t h i n a range of - d. Now the problem reduces to one of either determining the sample size required to measure a proportion to a given degree of accuracy ( _+dl for a single-group, external controls design, or of determining the group size required to d e t e r m i n e the difference between proportions to an accuracy of +_d for the two-group,
15:4 April 1986
comparative design. Because sensitivity is defined only in groups of p a t i e n t s having t h e disease, the numbers we determine from t h e following formulas will refer to t h e n u m b e r s of p a t i e n t s h a v i n g the disease or c o n d i t i o n i n q u e s t i o n in each group. They can be converted to total n u m b e r of p a t i e n t s by dividing by t h e e s t i m a t e d prevalence i n the study population. In the two-group case, assuming the experimental and control groups to be roughly equal, the required group size n is approximated ls,16 by Equation 8 (Figure), where p is the sensitivity, q equals 1 - p, d is calculated as above, and z~ and z B represent the deviate for t h e ~ and f~ errors, respectively. By s u b s t i t u t i o n we c a n derive a n expression for n solely in terms of the sensitivity and t (Equation 9J (Figure). This is a formidable expression for w h i c h n o closed-form simplification could be found. Fortunately, for any realistic values of s and t, the quantity under the square root sign is close to 1.0. This was validated by a computer program that calculated the values of that quantity for 0.5 ~< s < 1.0 and 0 < t ~<1.0. In this range, the value of the coefficient of zB is b e t w e e n 1.0 and 1.2; i t g r a d u a l l y i n c r e a s e s as t increases past 1.0. Because in practice zc~ is usually m u c h greater than z[5, the degree of error produced in n by this simplification will be significantly less t h a n 20%; we therefore assume the coefficient of ~ to be 1.0. For the singl~-group design, the appropriate expression for sample size is as shown in Equation 10 (Fignre},lsA 6 where p, q, za, zB, and n have the same m e a n i n g s as b6fore. By s u b s t i t u t i o n we can obtain an expression for n in terms of s and t as we did previously (Equation 11)(Figure). As before, a rather i n t r a c t a b l e expression results. In this case, computer-based investigation of a n u m b e r of numerical simplifications showed that the quantity (q-p)d - d 2 is sufficiently close to zero in the ranges of clinical i n t e r e s t (same as above) t h a t the resulting value for n would be in error by at m o s t 15%. A l t h o u g h t h i s is s o m e w h a t greater t h a n the error expected in the two-group model, we believed t h a t because the single-group, external controls model is most useful as a preliminary investigation, and because t h e simplified e x p r e s s i o n differed from the two-group model only by a factor of two, the trade-off was a good one. Finally, we used Cohen's 19 arbitrary assertion t h a t the c~ error should be 0.05 (two-tailed zc~ = 1.96) and [3 error no more than four times the c~ error ([5
Annals of Emergency Medicine
= 0.20, one-tailed z B = 0.84) to obtain E q u a t i o n 12 (Figure) for two-group, comparative studies, and Equation 13 for single-group, e x t e r n a l c o n t r o l s studies. These final estimates were validated by computer programs that compared the estimated to the exactly calculated sample sizes over the above ranges. In the areas of greatest clinical interest (s ~> 0.70 and t <~ 1.0) the average error was about 8% for the two-group model and around 12% for the single-group design. F u r t h e r m o r e , t h e e s t i m a t e d value for n always was less than the exact value. T h u s t h e s e are conservative estimates in that if a study is too small as d e t e r m i n e d by the estimate, it will only look worse if n is calculated exactly. It should be noted that the choice of 20% for [5 error will i n f l u e n c e t h e choice of t. This is most easily appreciated by s u b s t i t u t i n g various values for t and [5 into the sentence, "There is a (10015) percent chance that there will be a (100t) percent increase in false n e g a t i v e s . " Investigators w i s h i n g to use different values for (x and f~ error may easily calculate a different constant for the numerator. For example, if one wishes only a 10% chance of missing a given increase in false negatives, then zf~ for [5 = 0.10 is 1.28, and the numerator becomes 21.0 for twogroup designs and 10.5 for single-group studies. Because t h e expression s / ( 1 - s ) appears in both expressions and can be r e l a t e d to a b e n c h m a r k v a l u e i n a m a n n e r that is easy to remember, the n u m e r a t o r was c o m b i n e d w i t h t h e b e n c h m a r k s = 0.90 to produce the K (Table 3). Sample size r e q u i r e m e n t s for LYC can be analyzed in a similar manner. There are two possible kinds of errors: false negatives (LYC are negative, but patient does not have the disease in question) and false positives (LYC are positive, yet patient has the disease). It follows that the efficacy of LYC is related to h a v i n g a low false-negative rate (which equals 1 - sensitivity), and the safety is related to a low falsepositive rate (1 - specificity). If we define positive LYC as indicating absence of disease, then the specificity will be equal to the proportion of patients having positive LYC who have the disease or c o n d i t i o n in question. This is algebraically identical to the situation with HYC in that we m u s t seek to m e a s u r e the proportion of patients with a given finding in the subpopulation of patients having the disease or condition in question. T h u s the formulas developed for HYC apply to LYC also.
443/95
HIGH-YIELD CRITERIA Wears & Kamens
false-positive rate), and t h e other on patients w h o are proven to have the disease in question to establish safety [or a low false-negative rate}. Even this strategy, however, m a y require either m u l t i c e n t e r studies or very long periods for patient collection if the disease in q u e s t i o n is even m o d e r a t e l y unc o m m o n leg, purulent meningitis presenting w i t h convulsions). A n alternative strategy would be to give up the idea of developing HYC to i d e n t i f y a group of p a t i e n t s at increased risk of a certain condition and i n s t e a d develop " l o w - y i e l d c r i t e r i a " [LYC}, that would identify a group of patients who are almost certainly normal. If t h e LYC were n o t met, t h e n the decision to go ahead w i t h further diagnostic or t h e r a p e u t i c m o d a l i t i e s would not be forced but would be left to the judgment of the physician. This s t r a t e g y s e e m s i n h e r e n t l y m o r e appealing to both clinicians and "cost containers." The clinician's exercise of his best judgment on his patient's beh a l / i s not impeded, and indeed m a y be assisted, b e c a u s e he n o w w o u l d have a rationale to defer further investigation in a patient who arouses minimal suspicion of disease. In addition, the health economist is able to eliminate any frivolous or exploitive use of medical resources w i t h o u t risking the denial of needed services to a patient he has never evaluated. At first glance, LYC might seem to avoid the problems that plague HYC because they are concerned w i t h ident i f y i n g t h e c o n d i t i o n of n o n d i s e a s e , which is more prevalent than disease. Establishing the safety of LYC, however, still requires e s t i m a t i o n of the probability that the LYC will falsely call a d i s e a s e d p a t i e n t n o r m a l . N o m a t t e r how positive LYC are defined, we m u s t be able to estimate this error rate w i t h a certain degree of precision. Because this error rate is defined only in the subgroup of patients who actually have the disease, the problem becomes algebraically identical to the HYC situation (Appendix). Although the LYC strategy fails to offer any m a t h e m a t i c a l or statistical advantage over HYC, there are compelling reasons to believe that it m a y be a m o r e a p p r o p r i a t e v e h i c l e i n which to pursue the goals of cost cont a i n m e n t and rational, efficient patient care. First, it seems to fit the " t w o - s t a g e " e v a l u a t i o n p r o c e s s outlined above m o r e l o g i c a l l y t h a n do HYC. Second, we believe that LYC are 96/444
m u c h more likely to be accepted by c l i n i c i a n s (and p a t i e n t s ) t h a n are HYC. Third, if selection criteria are b e i n g d e v i s e d to c o n t a i n costs, it seems ethically m o r e appropriate to concentrate on identifying those pat i e n t s w h o m o s t a s s u r e d l y do n o t need further testing or treatment than to a t t e m p t to supplant the clinician (who, after all, is the only one w h o has evaluated the individual patient in question) in deciding w h i c h do. Finally, it m a y be conceptually easier to devise and validate criteria to identify p a t i e n t s w h o c e r t a i n l y do n o t need further evaluation than vice versa. CONCLUSION The safety of HYC depends heavily on the morbidity and m o r t a l i t y associated w i t h any increase in false negatives. M o s t current studies of HYC have n o t m e a s u r e d t h e i r s e n s i t i v i t y precisely enough to assure clinicians that an unacceptable increase in false negatives is unlikely. We have presented a s i m p l e m e t h o d for e v a l u a t i n g such studies in terms of their impact on false-negative rates. T h e r e s p o n s i b i l i t y to e n s u r e t h e safety of H Y C rests on t h o s e w h o argue for their i m p l e m e n t a t i o n . It is p a r t i c u l a r l y alarming to consider the possibility that certain HYC might become established in clinical practice by the actions of third-party payers interested in cutting costs on the basis of inadequate evaluations appearing in the literature. N e w drugs require both safety and efficacy to be established b e f o r e t h e y c a n be m a r k e t e d ; we should use no less a standard in evaluating the introduction of other new modalities into clinical practice. The s t a n d a r d a p p l i c a t i o n of t h i s o r other12,14-18 t e c h n i q u e s to e s t i m a t e the power of a study will benefit clinicians and patients by ensuring that the safety and efficacy of HYC are est a b l i s h e d p r i o r to i m p l e m e n t a t i o n , and by m a k i n g it m o r e difficult for dogmatic statements about the utility of HYC based on inadequate sample size to infiltrate the literature.
REFERENCES 1. Bell RS, Loop JW: The u t i l i t y and futility of radiographic skull examinations for trauma. N Engl I Med 1971; 284:236-239. 2. Phillips LA: A study of the effect of high yield criteria for emergency room skull radiography, US Dept of Health, Education, and Welfare publication (FDAI 78-8069, July 1978. Annals of Emergency Medicine
3. Brand DA, Frazier WH, Kohlhepp WC, et al: A protocol for selecting patients with injured extremities who need x-rays. N Engl J Med 1982;306:333-339. 4. Wales LR, Knopp RK, Morishima MS: Recommendations and evaluation of the acutely injured cervical spine: A clinical radiologic algorithm. Ann Emerg Med 1980;9:422-428. 5. Eisenberg RL: Evaluation of plain abdominal radiographs in the diagnosis of a b d o m i n a l pain. A n n Surg 1983;197: 464-469. 6. ]offe A, McCormick M, DeAngelis C: Which children with febrile seizures need lumbar puncture? A m ] Dis Child 1977; 52:188-191. 7. Pozen MW, D'Agostino RB, Selker HP, et al: Predictive instrument to improve coronary-care-unit admission practices in acute ischemic heart disease. N Engl J Med 1984;310:1273-1278. 8. Goldman L, Weinberg M, Weisberg M, et al: A computer-derived protocol to aid in the diagnosis of emergency room patients with acute chest pain. N Engl J Med 1984;307:588-596. 9. Swets JA, Pickett RM: Evaluation of Diagnostic Systems: Methods from Signal Detection Theory. New York Academic Press, 1982, pp 15-46. 10. Maloney TW, Rogers DE: Medical technology - - A different view of the contentious debate over costs. N Engl J Med 1979;301:1413-1419. 11. Rutter N, Smales ORC: The role of routine investigations in children presenting with their first febrile convulsion. Arch Dis Child 1977;52:188-191. 12. Feinstein AR: Clinical Biostatistics. St Louis, CV Mosby, 1977. 13. DerSimonian R, Charette LJ, McPeek B, et al: Reporting on methods in clinical trials. N Engl l Med 1982;306:1332-1337. 14. Freiman JA, Chalmers TC, Smith, Jr H, et ah The importance of beta, the type II error and sample size in the design and interpretation of the randomized control trial. N Engl J Med 1978;299:690-694. 15. Colton T: Statistics in Medicine. Boston, Little, Brown and Company, 1974. 16. Fleiss JL: Statistical Methods for Rates and Proportions. New York, John Wiley & Sons, 1981. 17. Thompsom MS: Decision-analytic determination of study size. Med Decis Making 1981; 1:165-179. 18. Diamond GA, Forrester JJ: Clinical trials and statistical verdicts: Probable grounds for appeal. N Engl J Med 1983; 98:385-394. 19. Cohen J: Statistical Power Analysis for the Behavioral Sciences. New York, Academic Press, 1977. 15:4 April 1986