Evaluationand F’mgmmPlanning,Vol. 18, No. 1. pp. I-I I, 1995 Copyright 0 1995 Elsevier Science Ltd Printedin the USA. All rights mawd Ol49-7189/95 $9.50+ .OO
Pergamon 0149-7189(94)ooo41-7
PROGRAM QUALITY AND PROGRAM EFFECTIVENESS: A Review of Evaluations of Progmms to Reduce Excessive Medical Diagnostic Testing EMILJ. POSAVAC Loyola University of Chicago
ABSTRACT A review of 61 interventions to reduce the rate of diagnostic testing revealed that many approaches to reducing test use yielded favorable results; the mean reduction observed was 22.0%. The heterogeneity of the interventions, target populations, and type of testing to be reduced did not permit carrying out a meta-analysis. However; this heterogeneity did permit the study of the relationship between indexes of program quality and effectiveness. Prior to implementation, few program developers measured the levels of excessive testing or developed conceptual foundations for the interventions. Interventions that were based on analyses of need and concerns about negative side effects tended to be more effective. It is suggested that focusing interventions on processes thought to foster the development of expert decision-making skills would improve the likelihood that interventions would show long-term effectiveness.
developers who had adapted the treatment adherence pro grams to the daily lives of patients and those who changed clinic practices were relatively more successful. The present report summarizes a review of evaluations of interventions to motivate physicians to reduce the rate of diagnostic testing. This review permitted an examination of the degree to which care in intervention design was related to the effectiveness of interventions. Concerns about overuse of medical tests are wide spread (Beck, 1993). Not only are unnecessary tests a waste of medical resources, they are also a threat to the well-being of patients because false-positive findings can lead to unnecessary treatments that may be accompanied by negative side-effects (Thompson, Kirz, dc Gold, 1983). The importance of this topic would merit a thorough me&analysis of interventions to reduce unnecessary testing. If several authors use similarly diagnosed patients, employ similar independent variables (such as a standard treatment protocol), and observe well-defined
Social scientists argue that organizational interventions based on well-articulated theoretical foundations are more likely to affect the target population than are interventions based on informal ideas about human behavior (Bickman, 1990). However, without a careful assessment of the unmet needs of the target population, it is unlikely that an intervention will be as effective as it could have been (Kettner, Maroney, & Martin, 1990; McKillip, 1987; Striven & Roth, 1990). Although many methodologists share these views, many practitioners develop interventions with little attention to the mechanism that would have to occur for the intervention to be successful (Lipsey, 1993). For example, a meta-analysis of evaluations of programs to improve the rate at which patients adhere to treatment regimens revealed that the typical intervention was based on tire presumption that additional information would be enough to change patient behavior; such interventions had little impact (Posavac, Sinacore, Brotherton, Helford, & Turpin, 1985). On the other hand, the program
Laura Hoffman and Dennis Dew assisted in the coding of the studies. Requests for reprints should be sent to Emil J. Posavac, Psychology Department, Chicago, IL 60626. 1
Loyola University
of Chicago, 6525 North Sheridan Road,
2
EMIL J. POSAVAC
outcomes (such as recovery rate or length of inpatient stay), a meta-analysis may be able to isolate the best practices. However, these conditions were not met in the evaluations of interventions to reduce diagnostic testing: patient illnesses were uncontrolled, there are few standards of diagnostic test use, and little attention was paid to standard patient outcomes. Although it became clear that a traditional meta-analysis was not justified, this heterogeneity permitted an examination of the relationship between the characteristics of interventions and their effectiveness levels. Carefully planned and implemented interventions were hypothesized to result in more effective programs. This general expectation was based on the idea that interventions are likely to be better prepared when problems to be solved are carefully assessed than when needs are not measured. Furthermore, interventions are likely to be more effective when the system-wide impacts are measured than when possible negative side-effects are ignored (Siber, 1981). The importance of examining a variety of effects has recently been highlighted by Kaplan (1990) who has noted that ambitious treatment programs to reduce heart attacks appeared to be successful until the death rate for all causes was examined. When this variable was used, it was learned that although the treated group experienced a lower death rate due to heart attacks, its members did not live longer than the controls because they were more likely to die from other causes. This finding suggests that innovators who showed an awareness of the need to examine a broad range of variables, some of which were unintended, would be likely to have produced more effective programs. This expectation goes beyond many discussions,of the value of program theory that often stress the inability to generalize beyond local conditions when program theory in ignored (Bickman, 1987). It is here suggested that program theory is important for achieving even local effects, not only for developing generalizations. There were a number of ways in which this general hypothesis was tested. Rather than listing each hypothesis at this point, two examples might suffice: (a) It was expected that an empirical analysis of the extent and nature of overtesting would permit the development of more precisely focused interventions, which would, in turn, result in a greater reduction in testing. (b) Similarly, it was expected that those program developers who considered how best to change physician-ordering practices would have designed and implemented the more effective interventions. All tests of this general hypothesis are included in the results section of this report.
METHODS All published evaluations of programs to reduce excessive diagnostic testing were sought. To qualify for inclusion the
studies had to include: (a) quantitative measures of the rate of diagnostic testing with real, not simulated, patients; (b) an intervention group or time period; and (c) some sort of comparison, nonintervention group or time period. Only English language materials were used. A list of the 45 studies examined is provided in the Appendix. The publication dates range from 1973 through 1992; 59% were published in 1983 through 1987. No studies meeting these criteria were discarded except as described below. The variables to be coded were defined in a coding guide. Coding was carried out semi-independently by the author and two graduate students. When disagreements were detected, the author reviewed the material searching for (a) errors and (b) different interpretations of the coding guide or the dependent variables. Since the codings could be defined objectively, reaching agreement was not difficult. Study Characteristics Coded The following variables were recorded: whether a reduction was sought for testing in general or for specific tests, identity of the target professional, number of medical care providers, type of medical care facility, year of publication, whether testing was measured during the intervention or during a follow-up period, type of research design, and type of intervention. The categories of the variables coded that are not simply dichotomies or quantities are given in Table 1. Coded variables thought to indicate quality of the program design and implementation are given below. Planning and Implementation In order to examine the quality of intervention planning, the reports were examined for a number of indicators of careful planning and sensitivity to side effects on the part of the program developers. The following intervention characteristics were recorded: (a) whether reasons for excessive testing were at least discussed, (b) whether the degree of excessive testing was measured, (c) whether reasons for excessive testing were examined empirically, (d) whether a theory of behavioral change was used explicitly, (e) whether possible negative side effects of the intervention were mentioned, (f) whether data were gathered to rule out negative side effects, (g) whether any positive side effects of the intervention were considered, and (h) whether follow-up data were gathered to detect an effect after the intervention ended. It was expected that attention to points a through d would contribute to the effectiveness of intervention designs and concerns about points e through g would suggest a more complete view of the possible effects of the interventions, and the inclusion of follow-up observations would imply that the innovators recognized that practice patterns could regress toward initial levels once the novelty of the interventions wore off.
3
Program Quality and Program Effectiveness
CODED INTERVENTION
TABLE 1 CHARACTERISTICS THAT ARE NOT FULLY LISTED IN THE TEXT
Variable
n
Categories
Medical care facility
19 11 2 12 5 2 1 7
Inpatients of medical school hospital Outpatients of medical school clinic Emergency patients at medical school hospital Inpatients of community hospitala Outpatients of community hospital clinica Emergency patients of community hospital Outpatients of HMO Other (viz. VA hospital)
Target population
27 12 1 10 1 4 3
Resident Attending physicians Medical school faculty Residents and medical school faculty Residents, faculty, and attending physicians Residents and attending physicians Others (viz. nurse practitioner)
Type of research design
22 21 13 1 1
Pretest-Posttest Comparison groups Random assignment Random assignment Time series
Types of interventions 3 2 2 2 2 1 1 13 9 2 21 aOne intervention
was implemented
of physicians of patients
Preventive strategies Written clinic information distributed Clinical information given in classes Cost information given as tests are ordered Clinical information given as tests are ordered Standard protocols of care developed Review of practice Practice reviewed in classes Written review of practice Feedback on actual costs of tests ordered Administrative change Change of forms, rules, etc. Announcement of monitoring of test practices Multiple interventions
in both settings.
Index of Effect Si The 45 reports contained 61 interventions. Since some reports included more than one intervention and since many reports included multiple dependent variables, analyses could have been based on effect sizes for reports (n = 43, interventions (n = 61), or dependent variables (n = 111). In a previous study, parallel analyses based on these three levels of analyses yielded identical conclusions (Posavac & Miller, 1990). Unless otherwise noted, the following analyses were based on the 61 interventions using the mean effect size of the dependent variables when more than one was used. Effect sizes, for example, d (Cohen, 1988) or r (Lipsey, 1990), could be calculated from the information supplied in some reports. When the dependent variable was number of tests per patient, a standard deviation could have
been reported; however, many authors omitted standard deviations as well as the values of inferential test statistics. In other reports, especially those that described interventions to reduce the use of a specific test, the proportions of patients evaluated with a specific diagnostic test in the intervention and control groups were reported. Standard errors in such studies could be estimated if the number of patients had been reported; however, many reports omitted the number of patients treated during the observation periods. Other authors reported the total number of tests over a time period (e.g., one month). These omissions not only made it impossible to calculate d or r, they also made it impossible to carry out a psychometric meta-analysis advocated by Hunter and Schmidt (1990). Since the dependent variables (number and cost of diagnostic tests ordered) were ratio scales, a more direct
4
EMIL
J. POSAVAC
index of effect was used: percentage reduction of diagnostic testing.’ The percentage reduction was calculated as follows: [Control or Preintervention
level - Intervention
[Control or Preintervention
level]
level]
In other words, the effect size index used was based on the reduction of testing found by subtracting the level of testing of the intervention group or during the intervention time period from the level of testing in the comparison (or control) group or during the intervention time period. This difference was divided by the level of testing for the nonintervention group (or time period) and multiplied by 100. If an intervention was followed by an increase in testing, the index of reduction was negative. The most common measure of testing was number of tests ordered per patient (n = 78). In addition, 33 dependent variables were in units of dollars. Both number of tests ordered and cost of tests ordered were reported for 21 interventions. Eighteen of the descriptions of these 21 interventions included sample sizes. Among these 18, the correlation of percentage reduction based on number of tests versus percentage reduction based on cost of tests, weighted for sample size, was -90 (n = 18). When the mean percentage reductions were calculated for those reports describing multiple interventions, the correlation became .93 (n = 13). The mean effect sizes for number of tests (M = 17.1%) and cost of tests (M = 18.1%) were quite similar among these reports, matched groups t (20) = -47, p = -64, two-tailed. On the basis of the large size of these correlations and similar magnitudes of effect sizes, reductions for number of tests and costs of tests were treated as equivalent; if both dependent variables were reported for one intervention, the mean of the two was used in the analysis. All but one of the evaluations were carried out using the reduction achieved while the interventions were in place. Some reports provided test levels also during a follow-up time period after the intervention was no longer in effect. When calculating the effect size for the follow-up assessment, the formula for the index of reduction was modified by using the level of testing observed during the follow-up in place of the level of testing during the intervention. One report provided information on the reduction of testing only at a follow-up time period after the interven-
‘The two
variables, number of tests and cost of tests, satisfy the detini-
tion of ratio scales -a
rational, nonarbitrary. zero point and equal inter-
val scaling (Pedhazur & Schmelkin,
1991). It is true that tests differ in
how invasive they are. how much patient discomfort is involved,
tion had ended. Since it seemed misleading to include it with other studies that examined the concurrent effect of the intervention, it was dropped. One outlier study was dropped because its effect was markedly different from those of the others; testing increased, more than doubling, during the intervention, The percentage reduction could not be calculated for one intervention. Dropping these three interventions reduced the number of interventions analyzed to 58. Unless otherwise mentioned, the analyses reported below refer to the effects of 58 interventions while the program was in effect.
RESULTS A variety of research designs were used for the 58 interventions: 22 were based on prepost-intervention observations; 2 1, nonequivalent comparison groups; 13, randomly assigned groups; 1, randomly assigned patients; and 1, interrupted time series. Among most of these evaluations, nontreatment groups were probably contaminated by knowledge of the treatment since their members knew studies were under way, and in many cases the members of the nontreatment groups had contact with members of the treatment groups. In one study physicians were members of both the treatment and the nontreatment groups because patients were randomly assigned to conditions with the intervention being delivered through a computer-based test ordering system. Note that such contamination is most likely to reduce differences between groups, not inflate such differences. All of the target professionals in 27 of the 58 interventions were medical residents; see Table 3. An additional 15 interventions involved residents and some other group. Only 16 interventions focused on changing the diagnostic practices of groups that contained no residents. The interventions occurred in a variety of settings: 32 in medical school settings (19, hospital inpatient units; 11, outpatient clinics; 2, emergency rooms), 18 in community hospitals (12, hospital inpatient2 units; 5. outpatient2 clinics: 2, emergency rooms), and 8 in a variety of other setting such as VA hospitals (4 inpatient units, 4 outpatient clinics). Overall, of the 58 interventions, 34 involved inpatients, 23 outpatients, and 1 both. The mean number of medical professionals in either intervention or control groups was 55.6; for 15 of the interventions, the number of professionals was not reported. Reductions in Diagnostic Testing Associated With the Interventions The focus of this report is on the relationship between (a) the level of effectiveness and (b) the quality of the inter-
and
how expensive the tests are. For most studies it was impossible to estimate the invasiveness and discomfort of omitted tests since the interven-
20ne intervention in a community hospita; applied to care for both inpa-
tions were designed to reduce all testing.
tients and clinic outpatients.
Program Quality and Program Effectiveness
vention design and quality of the evaluation. Those analyses, however, are more understandable when placed in the context of the overall effectiveness of the studies. Overall, as shown in Table 2, the 58 interventions lowered the rate of testing by 22.0% (SD = 18.6%, SEM = 2.4%). The mean reduction is somewhat less when weighted by sample size, 15.9% (SD = 15.6%, SEA4 = 2.4%). Over half of the difference between the unweighted and weighted means is accounted for by the higher reductions achieved by the 15 interventions for which the number of medical practitioners was not given, unweighted mean = 31.8%. Unless otherwise noted, all mean reductions reported in the balance of this paper will be weighted by sample sizes. EDctiveness by Type of Intervention. The mean percentage reductions associated with different types of single strategy interventions (see Table 2) differed from each other when using an unweighted one-way ANOVA. According to the Scheffe test, this effect was largely due to the greater reductions after administrative changes (35.3%) compared to reductions after reviews of past test ordering patterns (10.9%). This effect, however, only approached significance (p = .063) when a weighted ANOVA was used even though the difference between the mean reductions for administrative change interventions versus those based on reviews of past practice was nearly the same whether the weighted or unweighted analysis was used, 22.7% and 24.4%. respectively. Thus, the drop in the F ratio appeared to be due to the lower statistical power related to the loss of 7 of the 11 administrative change interventions that did not report sample sizes.3 Interventions were grouped on the basis of (a) whether they were directed at specific diagnostic tests or testing in general and (b) whether inpatients or outpatients formed the patient population. A 2 x 2 (type of testing targeted by patient type) weighted ANOVA revealed that reductions among outpatients (M = 23.5%) were larger than those among inpatients (M = 11.5%, F( 1,38) = 6.55, p < .02, d = .80). Neither the type of testing nor the interaction of test type with patient type were related to effectiveness, Fs < 1.0. Effectiveness by Type of Evaluation Design. An unweighted one-way ANOVA was used to compare the mean percentage reductions for pretest-posttest designs versus comparison group designs versus designs in which practitioners were randomly assigned to conditions. This analysis revealed a significant difference among means (F(2.53) = 4.57, p < .02) with the 22 pretest-posttest
3The standard
deviations
in the weighred analysis were smaller than
those. in the unweighted analysis, further supporting the loss of power interpretation of the nonsignificant F.
5
designs being more effective (M = 29.0%) than the 13 designs using randomly assigned practitioners (M = 10.3%). However, half of the reports of pretest-posttest designs failed to mention sample size; these 11 interventions produced large percentage reductions (M = 38.4%). When the ANOVA was carried out weighting by sample size, the F dropped below 1.0. In order to examine the permanence of the reductions, follow-up observations after the intervention had ended were carried out by some evaluators. Of the 58 interventions, nine evaluations of interventions included sample size and follow-up observations to learn the degree to which the lower rates of testing were maintained after the intervention ended. The mean length of time between the intervention and the follow-up was 5.9 months. The effect during the interventions (M = 29.2%) exceeded that observed at the follow-up (M = 17.9%). Although the rate of reduction of testing at the follow-up was barely 60% of the initial reduction, the difference was not statistically significant (matched groups t(8) = 2.02, NS). However, this r-test was based on so few observations that its power to detect a difference was quite low. Note that at the time of the follow-up observations, residents would have had approximately a half year of additional training and experience, which could account for a least a portion of the reduction still observed at the follow-up. Effectiveness of the Interventions Associated With Characteristics of the Design of the Interventions and the Quality of the Evaluation The findings just presented serve as a context for the results presented below, which address the question of whether indexes of quality of the design of the intervention and the quality of the evaluation are associated with greater effectiveness. The differences in mean effect sizes as related to indexes of careful planning and implementation are summarized in Table 3. All r-tests in the table are directional because good planning and implementation practices were hypothesized to be associated with greater impact. Information on Planning the Interventions. Nearly three quarters (72%) of the descriptions of the interventions mentioned a reason for the overuse of diagnostic testing; however, these interventions were not more effective than the other interventions. The mean effect size of the 21 interventions for which the extent of overuse was measured exceeded that of the 20 interventions planned without regard to measuring the extent of need; this difference, however, only approached statistical significance. However, as shown in Table 3, when program planners used empirical observations to verify their beliefs about the reasons for overuse, markedly more effective interventions resulted; the effect size associated with this practice was 2.24.
EMIL. J. POSAVAC
6
TABLE 2 MEAN PERCENTAGE REDUCTION FOR DIFFERENT TYPES OF INTERVENTIONS WEIGHTED AND UNWEIGHTED MEANS
FOR BOTH
Unweighted Meana
n
Weighted Meanb
R
Review of past practice Prevention oriented Adminstrative change Multiple strategies
10.9% 18.2% 35.3% 25.0%
15 11 11 21
11.0% 20.6% 33.7% 13.9%
14 9 4 16
All
22.0%
58
15.9%
42
Intervention
Type
Note: The strategies used in the multiple strategy intervention programs were examined; however, no conclusions could be drawn because there were 24 different pairings of strategies. Although the single most frequent pairing was clinical classroom discussion plus feedback on costs of tests ordered, this combination represented only 6 of the 46 pairings. a /=(3,54) = 4.84, p < .005. b F(3,39) = 2.65, p = ,062.
Conceptual Development of the Interventions. The descriptions of about a fourth of the interventions included theoretical reasons why the intervention should change the behavior of the practitioners ordering diagnostic tests; however, these interventions were not reliably more effective than those that did not; in fact, they may have been less effective. Side Effects. Reducing diagnostic tests would result in poorer medical care if too many or the wrong diagnostic tests were omitted. On the other hand, reducing unnecessary tests could have the effect of speeding up medical care or of improving it if false positive findings .were reduced. Three side effect indexes were rated and are also included in Table 3. Care about possible negative side effects was taken as a sign of careful program implementation. Those evaluations that included the consideration of negative side effects were more reliably effective than those evaluations in which negative side effects were ignored. Seven interventions were accompanied with observations chosen to detect negative side effects; these interventions were reliably more effective than those not including such observation procedures. Some research teams showed interest in the positive side effects of a lowered testing rate;4 the interventions designed and evaluated by these writers were more reliably effective than those prepared and evaluated by writers who did not mention positive side effects.
4Enamples of positive side effects of interventions to reduce diagnostic testing include: physicians thinking more carefully about ordering diagnostic tests which the intervention did not target for reduction (liemey, McDonald,
Hui et al., 1988). and physicians feeling less threatened about
potential challenges to their decisions when they are following a defined treatment protocol (Wachtel, Moulton, Pezzullo, & Hamolsky, 1986).
Continued Impact. The last entry in Table 3 contrasts those interventions whose lasting impact was examined using a follow-up assessment of reductions in diagnostic testing. Note that the dependent variable is the same as for the other entries in Table 3, the reduction concurrent with the intervention. In other words, the question addressed is whether program innovators who sought to learn if the programs had long-term effects developed more effective programs. The mean reductions show that those including follow-up observations did produce reliably more effective programs.
DISCUSSION Before drawing general conclusions from this review, three major problems must be mentioned. First, the variety of diagnostic procedures targeted for reduction created important limitations on the interpretations that may be made of the patterns found. As noted above, some researchers tried to reduce testing in general, but others focused on one or a small number of tests. It is not surprising that a narrower focus was associated with a greater percentage reduction. A second limitation concerns the practical question of how much of a reduction is needed to make the effort worthwhile. Although it is clear that interventions to reduce the rate of diagnostic testing do have an impact, few intervention developers suggested what level of reduction would be enough to make the intervention worth continuing. Some authors mentioned that sizable expenses were involved in the reviews of charts that were needed to implement the program. Third, the permanence of the changes in physicians’ diagnostic practice has not been effectively tested; however, the evidence that is available shows that when
Program Quality and Program Effectiveness
7
TABLE 3 MEAN PERCENTAGE REDUCTION IN DIAGNOSTIC TESTING CONCURRENT WITH THE INTERVENTIONS CLASSIFIED BY USE OR NONUSE OF AN ASPECT OF CAREFUL PLANNING
Variable Reasons for overuse discussed Yes(n= 31) No(n= 12) Extent of overuse measured Yes(n= 21) No (n = 22) Reasons for overuse measured empirically Yes (n = 5) 48.7% No(n= 38) Theory used in designing change strategy Yes(n= 9) 11.8% No(n= 33) Negative side effects considered Yes(n= 21) No(n= 20) Data gathered to rule out negative side effects Yes (n = 7) 24.6% No (n= 25) Positive side effects considered Yes(n= 6)34.1% No (n = 37) Follow-up observations made Yes (n = 9) 29.2% No(n= 34)
Mean % Reduction
ta
16.1% 15.6%
.02
20.1% 13.0%
1.49
<.08
.45
13.4%
4.71
<.Ol
2.24
17.7%
-1.22b
21.6% 10.3%
2.50
<.Ol
.78
13.1%
2.12
<.05
.39
13.8%
2.77
<.Ol
1.22
14.0%
2.22
<.02
.83
d
P
NS
NS
-
-
Note: The percentages refer to reductions while the interventions were in effect, not at the follow-up. Numbers of observations vary because the descriptions of some interventions were not sufficiently clear about what was done and in some cases the variable did not apply to the intervention used. a All t-tests were weighted by the sample sizes; directional probabilities are given. b The negative sign indicates an unexpected reversal.
the intervention is removed, the rate of testing does begin to increase toward the previous levels. Amount and Type of Reduction Desired In order to specify an amount of reduction that is desired, it is necessary to conduct a needs assessment. While many medical educators, journalists, and government program administrators believe that there are too many diagnostic tests ordered, there seems to be little data on what proportion of tests is not needed. Only a few authors mentioned that they assessed the level of testing before designing an intervention. In only four studies did the authors mention clearly the degree of reduction that they felt would be appropriate; three of these targeted the routine use of a test that the authors believed was useful only with patients with specific problems. Some authors mentioned that certain types of testing are undesirable (e.g., retesting before meaningful changes could have occurred, or testing routinely for conditions with very low base rates). However, unless the charts of patients were reviewed, it was not possible to
know whether useful tests were also omitted along with unnecessary tests. The success of many interventions was evaluated solely on the basis of the volume or costs of tests ordered without any way of monitoring how the reductions were achieved; it would seem better to verify that the tests omitted were really unnecessary. Reasons for Overuse of Diagnostic Testing A number of reasons have been suggested for the overuse of diagnostic testing, including: overcaution, inexperience, fear of mistakes, habit, or the desire for total diagnostic certainty (Griner, 1979). Most intervention developers referred to impressions of the reasons for overuse. Descriptions of only five of the interventions (8%) included an empirical analysis of the reasons for overuse of testing in the specific setting in which the intervention was applied. Many evaluation methodologists remind US that attempts to change behavior are more likely to be effective if the reasons underlying an undesirable behavior are known. For example, if a test is inappropriate because it is unlikely to yield information to rule out the
8
EMIL J. POSAVAC
diagnoses being considered, an effective intervention to reduce its use would differ from one designed to reduce the use of a test that has been superseded by a more informative test. Or, an intervention targeted at residents will have minimal impact on the number of tests ordered for patients whose attending physicians have determined independently what type of diagnostic work-up they want; although the residents are ordering the tests, they are not making the decisions (Pugh, Frazier, DeLong, Wallace, Ellenbogen, & Linfors, 1989). In such a situation the intervention should be targeted at the attending physicians as well as residents. ‘Qpical Interventions and Test-ordering Decisions The prototypal intervention described in these reports is based on these change strategies: residents should learn more facts to make better choices in test ordering; residents are told that the supervisors of residency education will be watching their test ordering rates (that are too high); and the rate of test ordering of each resident will be compared to those of other residents. It seems that the implicit assumption is that residents could do a better job if they would just put their minds to it. However, residents are already expected to absorb a great deal of information, are often fatigued, and are under considerable stress. Residents have such demanding schedules that one former resident said that she learned how to “nap” between heart beats while examining patients (Harrison, 1982). Taylor’s work (1992). revealing that even second year medical students report completing relatively little of their reading assignments, suggests that trying to get residents to do more studying about the use of diagnostic tests may not be effective because residents are already tired, stressed, and cognitively overloaded. Kritchevsky and Simmons (1991) discuss medical care problems in terms of whether they are “systemic” or “extrasystemic,” that is, whether a problem is caused by an environmental condition or policy experienced by all workers in a unit or due to correctable mistakes of specific individuals, respectively. Most writers imply that the problem of overuse of diagnostic testing is solely the fault of residents; few writers consider that, given the demands on residents, it may be impossible for them to reflect on their test-ordering decisions. In addition to limits on time and energy, Schmidt, Norman, and Boshuizen (1990) write that new residents use a different approach to gathering information for diagnoses compared to experienced physicians. Schmidt et al. imply that residents seek to follow the careful procedures described in books on medical decision making (for example, Ridderikhoff, 1989; Sox, Blatt, Higgins, & Marton, 1988), but experienced physicians do not. Residents depend on knowledge of pathophysiological causal models of disease entities in making decisions, while experienced physicians use “illness scripts” based
on their memory of illnesses, patient characteristics, and outcomes. “Most of the time most experts never do problem solving . . . ” (Schmidt et al., 1990, p. 619); consequently, experts do less and arrive at decisions more quickly than novices do. Norman, Neufeld, Woodward, McConvey, and Walsh (1985) found that experienced physicians treating actual patients in their own practices used approximately half of the tests that they themselves had recommended to be used with simulated patients presenting the same symptoms. When faced with an abstract task, the physicians used theoretical, book knowledge, but when faced with real patients, they depended on their extensive memory of patients and used “illness scripts” to make their decisions. If Schmidt et al. are correct, focusing the resident on additional medical knowledge about tests or informing the resident about excessive testing are not interventions that are likely to help residents to become more like experienced physicians, even if the immediate results of the interventions are moderate reductions in the use of tests. Such interventions will yield only marginal improvements in performance because they are not based on the actual process used by experienced physicians in selecting tests. Instead of asking residents to work even harder, it would be best to try to help them to develop better ways of making diagnostic test choices. Some authors seem aware that even successful interventions that were not based on a conceptual foundation offer little guidance for others. For example, Marton. Tul, and Sox (1985) remarked that “ . . . our study provides little insight into the mechanisms of change following our interventions” (p. 820). If more of the designers of the interventions had focused on why testing rates are higher than optimal, perhaps more interventions would have been more effective and the evaluations more informative. Effective Interventions The ideal intervention would be directed at improving the competence of physicians rather than just demonstrating that testing can be reduced. Such an intervention would (a) be an ongoing one rather than a short-term project, (b) facilitate learning without adding to already heavy workloads, (c) be inexpensive to implement, and (d) become part of the hospital’s or clinic’s regular record keeping system. It would seem best to focus on the process of making test-ordering decisions while they are being made when the patient’s characteristics are known to the physician, rather than to study past decisions after the patient is no longer being treated and details about the patient have been forgotten and need to be reviewed. Recognizing the incredible extent of medical knowledge that physicians must use, Tierney and coworkers (1987, 1988, 1990) developed on-line systems to provide information to physicians as they were making test ordering decisions. When an order for a test was entered into a
9
Program Quality and Program Effectiveness computer system by a physician (usually a resident), information on cost or on the probability of the test achieving the purpose listed appeared in a box on the terminal screen. The physician could reconsider the order in the light of the additional information and either cancel or verify the order. The intrusion was minimal. It would seem that providing information at the precise moment the physician is making an order decision is an ideal learning situation and will help the resident develop skills in diagnostic test usage in the most efficient manner. Such a reminder system would also be valuable to experienced physicians since there is evidence that they too have difficulty managing the volume of medical knowledge (Clinton, 1992). The greater power of contemporaneous information relative to after-the-fact feedback was demonstrated when Tiemey, Hui, and MacDonald (1986) contrasted the effectiveness of on-line reminders of preventive-care protocols versus monthly feedback reports on past practice. In contrast to efforts to improve the test selection competence of individual physicians through feedback and education, Hillman (1991) argues that an effective way to influence physicians is through institutional rules. He suggests that any interventions that require the physician to estimate a benefit-to-cost ratio each time a test ordering decision is made will be inefficient and ineffective. His suggestion received support from the studies reviewed here because making an administrative change was clearly the most effective single intervention strategy. It seems possible that rules could be combined with on-line clinical information systems as developed by Tiemey and colleagues to produce an intervention likely to be quite effective with minimum negative side effects. Why Reduce Testing? Medical educators want diagnosis to be effective and efficient with a minimum of false positive results (Wagner & Moore, 1991); however, it is necessary to note that private health insurers in the United States have developed a compensation system that rewards “doing procedures” rather than efficiency. Several authors of the studies reviewed (Billi, Hejna, Wolf, Shapiro, & Stross. 1987) commented that reductions in diagnostic testing lower the charges for hospital care but do not lower its cost. Even when charges are lowered, the cost of hospital care is not reduced until laboratories are closed or technical staffs reduced. Hospital managers arc motivated to reduce testing for Medicare patients whose charges are set by federally determined diagnosis-related groups (DRG) payment schedules; for such patients improving the efficiency of residents, and physicians in general, may be cost-effective (Rich, Gifford, Luxenberg, & Dowd, 1990). However, medical care facilities are rewarded for excessive testing for privately insured patients and for patients without insurance. Until changes are made in reimbursement poli-
ties, one cannot be surprised if physicians low inefficient testing practices.
continue
to fol-
CONCLUSIONS First, with regard to diagnostic testing, the literature shows conclusively that interventions can reduce testing in the short run. Although this is gratifying, few studies address the issue of why a medical care facility should support such interventions when the facility may lose money in the process. It has also not been shown that typical interventions markedly improve the efficiency of medical care in the long run. Second, the present report demonstrates that innovators who carry out a need assessment before implementation and carefully monitor possible side effects after implementation do indeed produce more effective interventions. While this is what we would have expected, it is reassuring to learn that there is empirical support for such beliefs. The bad news is that only a small proportion of innovators follow the recommended planning sequence that is more likely to be related to effective interventions: empirical assessment of the problem (or need), theorybased planning, carefully monitored implementation, measurement of hypothesized and unexpected impacts, and, last, program refinement. Such an approach to intervention development may lead to interventions that would be more likely to improve the diagnostic testing practices of residents and attending physicians even when their work is no longer being observed by researchers.
REFERENCES BECK, J.B. (1993). Does feedback reduce inappropriate test ordering? Archives of Pathology and Laboratory Medicine. II 7,33-34. BICKMAN, L. (Ed.). (1987). Using program theory in evaluation. New directions in program evaluation, no. 33. San Francisco: Jossey-Bass. BICKMAN, L. (Ed.) (1990). Advances in using program theory, New directions in program evaluation. no. 47. San Francisco: Jossey-Bass. BILLI, J.E., HEJNA, G.F., WOLF, EM., SHAPIRO, L.R., & STROSS, J.K. (1987). The effects of a cost-education program on hospital charges. Journal of General Intemal Medicine, 2,306-3 11. CLINTON, J.J. (1992). Overview: Research Health Care Policy and Research, 151,4-L
activities.
COHEN, J. (1988). Statistical power for the behavioral ed). Hillside, NJ: Erlbaum.
Agency for
sciences (2nd
GRJNER, P.F. (1979). Use of laboratory tests in a teaching hospital: Long-term trends. Annals of Internal Medicine, 90,24%248. HARRISON, M. (1982). A wornan in residence. New York: Random House.
EMIL J. POSAVAC
10
HILLMAN, A.L. (1991).Managing tives. Health Afiirs,
the physician:
Rules versus incen-
IO, 138-146.
HUNTER, J.E., 8z SCHMIDT, Newbury Park, CA: Sage.
EL. (1990). Methods of meta-analysis.
KAPLAN, R.M. (1990). Behavior as the central outcome American Psychologist. 45,121 l-1220.
in health care
KETTNER. P.M., MORONEY, R.M., & MARTIN, L.L. Designing and managing programs. Newbury Park, CA: Sage.
(1990).
KRITCHEVSKY, S.B.. & SIMMONS, BP (1991). Continuous quality improvement: Concepts and applications for physician care. JAMA, 266, 1817-1823. LIPSEY, M.W. (1990). Design sensitivity: Statistical power for experimental research Newbury Park, CA: Sage. LIPSEY, M.W. (1993). Theory as method: Small theories of treatments. In L.B. Sechrest & A.G. Scott (Eds.), Understanding causes and generalizing about them. [New directions in program evaluation, no. 57.1 San Francisco: Jossey-Bass. MARTON,
A.R., TUL. V., & SOX, H.C., JR. (1985). Modifying testordering behavior in the outpatient medical clinic. Archives of Internal Medicine, 145, 8 16-821. McKILLIP,
NORMAN, G.R., NEUFELD, V.R., WOODWARD, C.A.. h&CONVEY, G.A., & WALSH, A. (1985). Measuring physicians’ performance by standardized patients. Joumal of Medical Education, 59.925-934.
POSAVAC, E.J., SINACORE, J.M., BROTHERTON, S.E., HELFORD, M., & TURPIN, R.S. (1985). Increasing compliance to medical treatment regimens. Evaluation & the Health Professions, 8.7-2. PUGH, J.A., FRAZIER, L.M., DeLONG, E., WALLACE, A.G., ELLENBOGEN. P.. & LINFORS, E. (1989). Effect of daily charge feedback on inpatient charges and physician knowledge and behavior. Archives of Internal Medicine. 149.426-429. RICH, E.C., GIFFORD, G., LUXENBERG, M., & DOWD. B. (1990). The relationship of house staff experience to the cost and quality of inpatient care. JAMA. 263.953-957. Methods
in medicine.
SCHMIDT, H.G., NORMAN, G.R., & BOSHUIZEN, cognitive perspective on medical expertise: Theory Academic Medicine, 65.6 I l-62 I.
Boston:
TAYLOR, C.R. (1992). Great expectations: The reading habits of year II medical students. The New England Journal of Medicine, 326.1436-1440. THOMPSON, R.S., KIRZ, H.L., & GOLD, R.A. (1983). Changes in physician behavior and cost savings associated with organizational recommendations on the use of “routine” chest x-rays and multichannel blood tests. Preventive Medicine, 12.385-396. TIERNEY, W.M., HUI, S.L., & MCDONALD, C.J. (1986). Delayed feedback of physician performance versus immediate reminders to perform cam. Medical Care, 24.659-666. TIERNEY, W.M., MCDONALD, Computer predictions of abnormal ing. JAUA, 259, 1194-I 198.
C.. HUI, S., & MARTIN, D.(1988) test results: Effects on outpatient test-
TIERNEY, W.M., MCDONALD. C., MARTIN, D., HUI, S.L., & ROGERS, M.P. (1987). Computerized display of past test results: Effect on outpatient testing. Annals of Internal Medicine, 107,569-574. TIERNEY, W.M., MILLER, M.E., & MCDONALD. C. (199O).Theeffect on test ordering of informing physicians of the charges of outpatient diagnostic tests. The New England Joumal of Medicine, 322,1499-1504.
WAGNER, J.D., & MOORE, D.L. (1991). Preoperative laboratory testing for the oral and maxillofacial surgery patient. Journal of Oral and Marillofocial Surgery, 49, 177-182.
design,
POSAVAC, E.J., & MILLER, T. (1990). Some problems caused by not having a conceptual foundation for health research. Psychology and Health, 5. 13-23.
RIDDERIKHOFF, J. (1989). Academic Publishers.
K.I.
WACHTEL, T., MOULTON, A.W., PEZZULLO, J., & HAMOLSKY. M. (1986). Inpatient management protocols to reduce health care costs. Medical Decision Making, 6, 101-109.
J. (1987). Need analysis. Newbury Park, CA: Sage.
PEDHAZUR, E.J., & SCHMELKIN, L.P. (1991). Meusurement, and analysis. Hillsdale, NJ: Lawrence Erlbaum Associates.
SOX, H.C., JR., BLAT-T, M.A., HIGGINS, M.C., & MARTON, (1988). Medical decision making. Boston: Butterworth Publishing.
Kluwer
H&A. (1990). A and implications.
SCRIVEN, M., & ROTH, J. (1990). Special feature: Needs assessment. Evaluation Practice, II, 135-140. SIBER, S.D. (1981). Fatal remedies: The ironies of social intervention. New York: Plenum.
APPENDIX: STUDIES REVIEWED Applegate, W.B., Bennett, M.D., Chilton, L., Skipper, B.J.. & White, R.E. (1983). Impact of a cost-containment educational program on housestaff ambulatory clinic charges. Medical Care, 21,486-l%. Bareford, D., & Hayling, A. (1990). Inappropriate use of laboratory services: Long term combined approach to modify request patterns. British Medical Journal, 301. 1305-1307. Berwick, D.M., & Coltin, K.L. (1986). Feedback reduces test use in a health maintenance organization. JAMA, 255, 1450-1454. Billi, J.E., Hejna, G.F., Wolf, EM., Shapiro, L.R., & Stross, J.K. (1987). The effects of a cost-education program on hospital charges. Journal of General Internal Medicine, 2.306-3 11. Cohen, D.I., Jones, P., Littenburg, B., Neuhauser, D. (1982). Does cost information availability reduce physician test usage? A randomized clinical trial with unexpected findings. Medical Care, 20.286-292. Davidoff, E, Gcodspeed, R., & Clive, J. (1989). Changing test ordering behavior: A randomized controlled trial comparing probabilistic reasoning with cost-containment education. Medical Care, 27.45-58. Dixon RH.. & Laszlo, J. (1974). Utilization of clinical chemistry services by medical house staff. Archives of Internal Medicine, 134, 1064-1067. Dowling, P.T., Alfonsi, G., Brown, M.I., & Culpepper, L. (1989). An educational program to reduce unnecessary laboratory tests by residents. Academic Medicine, 64.410-412. Edelman, B.B., Groleau, G.A., & Barish, R.A. (1990). Use of mildly restrictive administrative protocol to reduce orders for manual blood film examination from the emergency department. Journal of Emergency Medicine, 8, I-13.
Program
Quality
and Program
Eisenbcrg, J.M. (1977). An educational program to modify laboratory use by house staff. Journal of Medical Education, 52.578-581. Everett, G.D., deBlois, S., Chang. P., & Holets, T. (1983). Effect of cost education, cost audits, and faculty chart review on the use of laboratory services. Archives of Internal Medicine, 143,942-944. Fowkes, F.G.R., Hall, R., Jones, J.H., Scanlon. M.F., Elder, G.H., Hobbs, D.R., Jacobs, A., Cavell, I.A.J.. & Kay, S. (1986). Trial of strategy for reducing the use of laboratory tests. British Medical Journal, 292, 883-885. Gama, R,, Nightingale, PG., Broughton, P.M.G., Peters, M., Ratcliffe, J.G.. Bradley, G.V.H., &Berg, J. (1991). Modifying the request patterns of clinicians. Journal of Clinical Pathology, 45.248-249. Golden, W.E., Pappas, A.A., & Lavender, R.C. (1987). Financial unbundling reduces outpatient laboratory use. Archives of Internal Medicine, 147, 1045-1048. Gortmaker, S.L., Bickford, A.E. Mathewson, H.O., Dumbaugh, K., & lirrell, PC. (1988). A successful experiment to reduce unnecessary laboratory use in a community hospital. Medical Care, 26.631642. Grivell, A.R., Forgie, H.J., Fraser, C.G., & Berry, M.N. (1981). Effect of feedback to clinical staff of information on clinical biochemistry requesting patterns. Clinical Chemistry, 27, I7 17-l 720. Grivell, A.R., Forgie. H.J., Fraser, C.G., &Berry, M.N. (1982). League tables of biochemical laboratory costs: An attempt to modify requesting patterns. The Medical Journal of Australia, 2.326-328. Groopman, D.S., & Powers, R.D. (1992). Effect of “standard order” deletion on emergency department coagulation profile test. Annals of Emergency Medicine, 21.524-527. Karas, S., Jr. (1980). Cost containment in emergency medicine. JAMA, 243, 1356-1359. Kroenke, K., Hartley, J.F., Copley, J.B., Matthews, J.I., Davis, C.E., Foulks, C.J., & Carpenter, J.L. (1987). Improving house staff ordering of three common laboratory tests. Medical Cam, 25.928-935. Lyle, C.B., Bianchi, R.F., Harris, J.H., &Wood, Z.L. (1979). Teaching cost containment to house officers at Charlotte Memorial Hospital. Journal of Medical Education, 54.856-862. Martin, A.R., Wolf, M.A., Thibodeau, L.A., Dzau, V, & Braunwald, E. (1980). A trial of two strategies to modify the test-ordering behavior or medical residents. The New England Journal of Medicine, 303, 1330-1336. Marton, A.R., Tul, V., & Sox. H.C., Jr. (1985). Modifying test-ordering behavior in the outpatient medical clinic. Archives of Internal Medicine, 145.816-821. Mazes, B., Lubin, D., Modan, B.. Ben-Basset, I., Gitel, S.N., & Halkin, H. (1989). Evaluation of an intervention aimed at reducing inap propriate use of preoperative blood coagulation tests. Archives of Internal Medicine, 149, 1836-l 839. Novich. M., Gills, L., & Tauber, A.I. (1985). The. laboratory test justified: An effective means to reduce routine laboratory testing. American Journal of Clinical Pathology, 84.756-759. Orient, J.M., Kettel, L.J.. Sox, H.C., Sox, C.H.. Berggren. H.J., Woods, A.H.. Brown, B.W., & Lebowitz, M. (1983). The effect of algorithms on the cost and quality of patient care.. Medical Care, 21, 157-167. Pozen, M.W., & Gloger. H. (1976). The impact on house officers of educational and administrative interventions in an outpatient department.
Effectiveness
11
Social Science&Medicine, 10.491-495. Pugh, J.A., Frazier, L.M., DeLong, E., Wallace, A.G., Ellenbogen, I?, & Linfors, E. (1989). Effect of daily charge feedback on inpatient charges and physician knowledge and behavior. Archives of Internal Medicine, 149.426429. Rhyne, R.L., & Gehlbach, S.H. (1979). Effects of an educational fcedback strategy on physician utilization of thyroid function panels. The Journal of Family Practice, 8, 1003-1007. Schroeder, S.A., Myers, L.P., McPhee, S.J., Showstack, J.A., Simborg, D.W., Chapman, S.A., & Leong, J.K. (1984). The failure of a physician education as a cost containment strategy. JAMA, 252.225-230. Schroeder, S.A., Kenders, K., Cooper, J.K., & Piemme, T.E. (1973). Use of laboratory tests and pharmaceuticals: Variation among physicians and effect of cost audit on subsequent use. JAMA, 225.969973. Sherman, H. (1984). Surveillance effects on community physician test ordering. Medical Care, 22.80-83. Spiegal, J.S., Shapiro, M.F., Berman, B., & Greenfield, S. (1989). Changing physician test ordering in a university hospital. Archives of Internal Medicine, 149.549-553. Sussman, E., Goodwin, I’., & Rosen, H. (1984). Administrative change and diagnostic test use: The effect of eliminating standing orders. Medical Care, 22.569-572. Thompson, R.S., Kirz, H.L., & Gold, R.A. (1983). Changes in physican behavior and cost savings associated with organizational recommendations on the use of “routine” chest X rays and multichannel blood tests. Preventive Medicine, 12, 385-396. liemey, W.M., McDonald, C., Martin, D., Hui, S.L., & Rogers, Ml? (1987). Computerized display of past test results: Effect on outpatient testing. Annals of Internal Medicine, 107, 569-574. Tiemey, W.M., McDonald, C.. Hui, S.. & Martin, D. (1988) Computer predictions of abnormal test results: Effects on outpatient testing. Journal of the American Medical Association, 259, 1194-l 198. liemey, W.M., Miller, M.E., & McDonald, C. (1990). The effect on test ordering of informing physicians of the charges of outpatient diagnostic tests. The New England Journal of Medicine, 322, 1499-l 504. Wachtel, T., Moulton, A.W., Pezzullo, J., & Hamolsky, M. (1986). Inpatient management protocols to reduce health care costs. Medical Decision Making, 6, 101-109. Wachtel, T.J., & O’Sullivan, P (1990). Practice guidelines to reduce testing in the hospital. Journal of General Internal Medicine, 5,335-341. Williams, S.V., & Eisenberg, J.M. (1986). A controlled test to decrease the unnecessary use of diagnostic tests. Journal of General Internal Medicine, 1, 8-13. Winkins, R.A.G., Pop, P., Grol, R.P.T.M., Kester, A.D.M., & Knottnents, J.A. (1992). Effect of feedback on test ordering behaviour of general practitioners. British Medical Journal, 304.1093-1096. Wones, R.G. Failure of low-cost audits with feedback to reduce laboratory test utilization. Medical Cam. 15.78-82. Wong, E.T., McCarron; M.M., & Shaw, S.T. (1983). Ordering of laboratory tests in a teaching hospital: Can it be improved? JAMA, 249, 3076-3080. Zaat. J.O.M., van Eijk. J.Th.M., & Bonte, H.A. (1992). Laboratory test form design influences test ordering by general practitioners in the Netherlands. Medical Care, 30, I 89-l 98.