Evaluation: Review of the Past, Preview of the Future M. F. SMITH
INTRODUCTION The purpose of this paper is to summarize ideas of the contributors to this volume about where we have been as a field of evaluation and where we are going in the future (see “From the Editor” for explanation of how this special issue evolved). However, there is no attempt to link what is here with what any one of them has said in a previous work. There is an attempt to link what one said on a particular theme/idea with that of the others. In the process, some of my own thoughts are revealed. Though the invitation to the authors was broad, a majority of the views speak more particularly to program evaluation than they do to evaluation more broadly defined-at least that is how I perceive them. There is a fairly good balance between assessments of where we are and predictions about the future, and many lean in the direction of being optimistic. Nine topics are discussed here but others are presented in some detail by individual authors. For example: Striven discusses the methodology of “whether and how to” get to the final conclusion-the last synthesis step-in an evaluation. He gives leads on how to get rules and standards and how to assign weights. Lincoln discusses “several schools of thought and new streams of inquiry” that feed into criticism of the “classical scientific stance” in social inquiries. The nine topics on which author statements are summarized are: qualitative vs. quantitative debate, evaluation’s purpose, professionalization of the field, program failure, evaluators as program developers, evaluators as advocates, evaluation knowledge, evaluation expansion, and methodological/design considerations. I shall apologize in advance for any inaccuracies in my summary of the 16 authors’ views. Readers should study all the papers, not only to avoid any misunderstandings that I may have had of their ideas, but also to acquire the full depth of knowledge that they have provided on these topics. M. F. Smith, Coordinator of CES Program Planning and Evaluation, and Professor, College of Agriculture, University of Maryland, College Park, MD. 20742. Evaluation Practice, Vol. 15, No. 3, 1994, pp. 215-227.
Copyright 0 1994 by JAI Press, Inc.
ISSN: 08861633
All rights of reproduction in any form reserved.
215
216
EVALUATIONPRACTICE,15(3),1994 QUALITATIVE
VS. QUANTITATIVE
“DEBATE”
Eight of 16 writers for this issue chose to address this topic in a specific manner. I, myself, must confess to a certain amount of naivete regarding the continuing and often heated discussion. It seems entirely reasonable (to me) to assume that some methods work better and that some types of data are more relevant in some situations than in othersas most of the authors here would seem to agree. So, if the issue is on methods, per se, I don’t understand the debate-and I don’t see why it would/should continue. On the other hand, if the issue is far deeper, that is, a difference in philosophy or epistemology or on “world view,” then I believe the debate will continue unabated for a long time to come-at least until the 50s and older among us are dead and gone (me included). Those who believe in some external reality will probably not be convinced otherwise, and vice versa. Striven* rates this debate “right up there with thinking that the great problem for the U. S. Public Service is whether it should hire men or women.” The answer is “Yes”! Hire one or the other or both, depending on the person’s particular skills/competencies for the job at hand. Chen offers a solution for finally getting us out of the morass. He suggests that no debate take place without first specifying a certain program configuration for evaluation, that is, a specific evaluation problem. Then the debate would be on which method(s) can best serve in evaluating that particular configuration. In other words, the question is not which method/approach is best but which is best for a particular circumstance/situation. Striven echoes this view when he says “If there’s anything left to the debate that’s worth debating, it is specific case studies, to see if and how each approach can conflict or combine with the other in new ways.” Chen characterizes both qualitative and quantitative methods as cockroaches, rather than dinosaurs, which some advocates have called each other. “Since program evaluation requires multiple functions.. .qualitative and quantitative methods are bound to survive.. .” Boruch: Such arguments are old and they are moot; the best of contemporary well designed field evaluations exploit multiple approaches to understanding. However, “whether the gratuitous quarrels in academia will disappear is unclear.” Bickman: The work of those who use only one of the two approaches “will have important limitations. . . . I suspect it will take the new generation of evaluators to accomplish true integration.” Yin: The conflict between qualitative and quantitative dimensions is not likely to disappear in the 21st Century, however, integrated designs relying on both will be more commonplace. One of the three potential scenarios for case study research is that the method will be used as a major integrating force between quantitative and qualitative dimensions. Light: More and more of our colleagues are abandoning a debate about the merits of quantitative versus qualitative evaluation methods, in favor of an exploration for how to most productively use both methods together. House calls this debate “the most enduring schism in the field” and says it originated because of the excessive claims made for quantitative studies, that is, that only quantitative methods were sufficiently objective to meet social science research requirements. He points out that qualitative methods need more study and development. They have not gone through the decades of critical examination that quantitative procedures have. For
Evaluation:Review of Past, Preview of Future
217
example, “bias controls have not been given enough attention and much work needs to be done on improving qualitative data collection methods. . . . many biases that creep into qualitative designs are different from those that affect quantitative studies.” Campbell: We should not confuse the qualitative/quantitative issue with the epistemological critique. Instead, we should increase the qualitative context to help with tasks such as evaluating the plausibility of rival hypotheses. Lincoln states that this debate is “much more than a simple disagreement between evaluation models;” that it really is a difference in view of what the political world is like, that is, that it points to “a new political world,” a restructuring of society along Paolo Freire’s egalitarian ends: “overturning oppression and achieving social justice through empowerment of the marginalized, the poor, the nameless, the voiceless.” EVALUATION’S PURPOSE It is not so easy here to stay out of the trap that Stufjebeam warns against of mixing up the role(s) of evaluation with its goal(s). Sometimes these are not easily discernible. For example, Lincoln says that evaluators “cannot fail again to be involved with the making of tough policy choices, with the just distribution of social goods and services.” Chelimsky states that evaluation’s “fundamental purpose” is “helping to make government more effective, more responsive, more accountable, and even, better managed.” Campbell predicts that “The program evaluation profession will (and should) continue to be committed to (making causal inferences as to impact), focused both on the benefits hoped for by the program advocates and the undesirable side-effects feared by the program opponents.” Otherwise, we run the risk of evaluation not being done or of the task being turned over to others who will seek such answers. &riven and Stufflebeam believe we undertake evaluations to determine an object’s value-to invoke defensible criteria to establish its worth, merit, or quality. Reichardt and Shadish agree (and I find myself in this camp) that evaluation establishes merit or worth but that it does more, for example, both consider it important for evaluators to make recommendations for the improvement of programs. Shadish offers a new definition for evaluation: to “use feasible practices to construct knowledge of the value of the evaluand that can be used to ameliorate the problems to which the evaluand is relevant.” Chen and Sechrest are in agreement that the focus of evaluation has changed from its ‘original concern for the later stages of policy processes, such as evaluating outcomes or impact, to a focus on implementation processes. Chen says this change in focus came about after finding that the majority of programs have little or no impact primarily because of faulty implementation. Sechrest suggests it came about for two reasons: One is that measuring program outcomes requires a great deal of rigor; it is demanding. When program evaluators discovered how difficult it was, many abandoned the attempt and decided to focus on process, which is generally more tractable. Sechrest ‘sobservation about the second reason for this switch is that we did not have adequate research designs to measure program impact and rather than assuming the difficult tasks of improving the designs and developing better measures (which some fields have done) program evaluators decided to focus on something else-process.
EVALUATION PRACTICE, 15(3), 1994
218
Jerry Pournelle (psychologist and writer of a column in BYTE computer magazine) calls this “dazzling them with footwork.” He gave the example of an Ohio State professor who is no longer interested in the “usual definition of literacy,” but rather with “how well kids do things differently.” According to Pournelle: “I recognize what he’s doing: if you can’t solve the real problem, change the definitions and criteria so that you can work on something else. . . . The result isn’t very useful to the customer, but it gets you off the hook.“The Ohio professor was redefining literacy as reacting to TV, and of course students are “successful” in doing that. They will always be successful if you keep changing the definition of success. “Anyone can hit a target glued to the end of a rifle” (p. 217). Reichardt and Sechrest believe that evaluation as a field is sacrificing its future if it does not deal directly and effectively with this question of measuring program value. Sechrest says that if we are not able to tell stakeholders if their ameliorative efforts are in fact ameliorative, then we will have little influence and ultimately will become ignored altogether. Reichardt echoes the same sentiments when he states that evaluators must find credible ways to justify judgments about differences in quality. “(They) must be able to defend themselves from claims of complete relativism-that all research is equally valid and that any program is as good as any other.” Otherwise, he says, we will abdicate our ability to evaluate. House offers similar sentiments when he states that “evaluation’s social utility and future depend significantly on its credibility” which is dependent on being perceived “as accurate and honest, an extension in some sense of science.. .” House indicates that the most important unfinished task for evaluation theory may be “to expand the logic of value judgments. . . . There is still an unease about conducting studies to determine value.” Per Striven, he notes that the basic logic of evaluation is that “x is good, better than y.” He distinguishes between making the value judgments and being prescriptive about them, that is, telling people what to do. (Of course the latter is reprehensible.) THE PROFESSIONALIZATION Sechrest talks about his disappointments
(OR NOT) OF EVALUATION
relative to evaluation. His first disappointment is that “program evaluation is not well established and readily recognized as an occupation or as an intellectual discipline.” One reason is that “ . . .most program evaluation is being done in an ad hoc way by persons with no particular training in, and perhaps not even much knowledge of, the field.” A second reason is a “lack of consensus on training and failure to develop and maintain training programs.” His most passionate reason for what he sees as evaluation’s failure to be recognized as a discipline is that “we who are involved in it have never worked hard enough at making it into a discipline/field. Few . . . identify themselves first and foremost as program evaluators. Bickman has an optimistic view about the professionalization of evaluation which he defines as standards development, licensing, “and other mechanisms that both serve to set minimal quality standards and limit the practice to those who qualify.” He sees it as “an inevitable trend as evaluation moves at a quicker pace into the real world marketplace” but not without some problems, for example, tension between standards and diversity of approaches and further polarization between practitioners and academics. House indicates that professionalization is essential as a condition for evaluation to maintain its integrity and utility. “Evaluators must disclose unfavorable results even at
219
Evaluation:Rzuiew of Past, Preview of Future
risk of offending powerful sponsors, and evaluators need strong professional support to do this.” “ Credibility depends on the backing of a strong professional community which defends professional standards and supports individual endeavors.” Shudish says that evaluation does “poorly the most essential things that any profession must do-clearly identify its tasks and functioning, and the knowledge needed to do them.” Sechrest agrees: “program evaluation has not developed a comprehensive, unified view of itself and its mission and a set of agreed upon methods for accomplishing that mission. . .A discipline. . . must have a core or center to survive.” Stufjebeum: Evaluation “is rife with unresolved issues, including some seriously flawed proposals. On the other hand, the evaluation field has developed professional standards, even if it hasn’t yet matured to the point of consistently applying and fulfilling them. n Striven gives evaluation a D grade “for self concept for failing to confront the myths about the impropriety and subjectivity of evaluation. Even in the domain where theory has had some play, of the seven waves of theories of program evaluation that have led the discussion over the last 30 years, five reject the idea that the principal function of evaluation is to produce evaluative conclusions, surely a sign of profound confusion.” Bickman and Sechrest see evaluation practice focused at the master’s degree level of preparation. Bickman predicts that more development will occur in education and training, and at the master’s level because of demands of practice: “most evaluation, as practiced at the local level, does not require a doctoral degree,” especially as evaluation moves into the private sector. Sechrest gave the example of a professional group whose members work primarily for state auditor generals and who are responsible for a large portion of the evaluation of state-funded programs: “many . . . appear to have had no specialized training for program evaluation; probably most were not trained at the doctoral level.” PROGRAM
FAILURE
There was no disagreement with the premise that social programs are not ameliorating the problems to which they are addressed and that they are poorly constructed. Chen says the latter probably caused the former, that is, the programs are not solving the important problems because they are poorly planned. Chelimsky places the blame on lack of needed basic research, “it’s less a case that prevention programs ‘don’t work’ than that we are lacking the fundamental knowledge to construct them so that they will.” Reichurdr places blame on the “puny” resources devoted to the program relative to its size and the “puny” time and effort spent on development. He notes that it is not unusual for more funds to be spent on summative evaluation of a program than is spent on its initial plan. Sechrest adds the view that part of the problem with programs has been the tendency for policy makers (and evaluators) to “write off” programs that showed no positive impact, which meant most programs, and start over looking for some new “bright idea.” Chelimsky attributes some blame for failure to show success of social programs to too few resources available for evaluation and to the insufficiency of the state-of-the-art of evaluation to do such things as effectively count “hidden” populations like the homeless, to understand and measure “quality” like medical care, and to deal with concepts like risk of acquiring cancer or becoming infertile, etc.
220
EVALUATION PRACl-ICE,15(3),1994 EVALUATORS AS PROGRAM DEVELOPERS
Many of the writers here believe that program failure, as described above, could be lessened or even perhaps eliminated if evaluators become more involved in program development. Bickman offered six predictions about the future of evaluation and one of these was that there will be more involvement by evaluators in program development. And, if evaluators do not assume this role, then others will. Chelimsky says “evaluation seems destined” to play a major role in the formulation of new programs and policies. Patton has taken the boldest moves in this direction with what he has defined as “developmental evaluation” where “The evaluator is part of a design team whose members collaborate to conceptualize, design, and test new approaches in a long-term, on-going process of continuous improvement, adaptation, and intentional change. The evaluator’s primary function in the team is to elucidate team discussions with evaluative data and logic, and to facilitate data-based decision making in the developmental process.” Reichardt believes that program evaluators will be major players in the solution of social problems only to the extent that they more actively assist in the creation of social programs. This role will require that the field of program evaluation devote as much effort to developing substantive knowledge and theory as to developing methodological technique and practice . . .progress in ameliorating social problems lies less in restricting our activities to evaluation and more in gaining the additional skills needed to perform diagnosis and remediation. This may require more substantive expertise than many formative evaluators presently have. Sechrest says that rather than writing off programs that show no effect, a more productive alternative was and is to try to find out what went wrong, that is, “we might do better to focus more of our efforts on program development than simply on evaluation.” If evaluators are serious about program improvement, Chen asserts that it seems reasonable to expect that they will play a greater role in the planning process in the future. And, if this engagement is to be meaningful then evaluators must go beyond such things as community needs assessments or ex ante cost-benefit analyses for choosing among different options, and work directly with program planners and policy makers in the basic design work. However, in spite of the apparent merits of evaluators being partners in the program development process, Chen says there are two problems that need to be solved before evaluators can comfortably make such a move. The first problem is one of knowing how to proceed for which we need conceptual frameworks or models for guides. Second is the dilemma between advocacy and credibility. “Evaluations are considered credible when evaluators have no vested interest in the programs they evaluate.. . When evaluators are intensively involved in planning programs, they become advocates of the programs. . . . and are no longer regarded as objective and independent evaluators.” For this dilemma, Chen suggests requiring the evaluators to limit their evaluation role to that of internal evaluator and require them to make all data and all evaluation procedures available for periodic outside audits and inspections. “Resolving how to participate in planning processes without compromising evaluators’integrity will be an important and challenging task for us in the future.”
Evaluation:Review of Past, Preview of Future
EVALUATORS
221
AS ADVOCATES
According to Sechrest, program evaluation has become split between an ideologically “liberal” group “who wish to use program evaluation to forward a political agenda they favor and an essentially non-ideological group who wish evaluation to be objective and scientific. . . . Many in the liberal ideological group insist that science cannot be objective; indeed, some insist that nothing is objective.” Lincoln’s views are probably the more liberal of the writers in this volume. She says that we are living in a changeable sociopolitical context-“a world which is now postmodem, post structural, and very nearly post-fourth . . . Our tracks as a profession suggest a model of research and evaluation which is more, rather than less, activist oriented, . . . we need to speak out and make our commitments known.” House observes that some qualitative evaluators are misled by relying too much on postmodernist theory. “ . . .the notion that truth is always and only a condition of power relationships can mislead. If there are no correct and incorrect expressions and no right or wrong but only power relationships, then there is nothing that argues against evaluation being used by the powerful for their own self-interests. Such a relativist position seems to be an intellectual and ethical dead-end.” Campbell cautions us to “not reject the new epistemologies out of hand.” Any specific and plausible challenge to an unexamined presumption of ours can be of great value and should be taken seriously. “. . . Do be open to learning from and being persuaded by welldeveloped specific exemplars of the application of new epistemological/ methodological orientations. . . . Don’t be persuaded by new forms of a generalized skepticism which result in a blanket rejection of the possibility of usefully valid generalizable knowing by any science.” “ Making causal hypotheses about the effect of deliberate interventions in the world, and critically debating these effects with our social fellows, are essential parts of what it is to be human.” “ Any epistemological position which rules out dialectical disputations about causal relationships is unauthentic and irrelevant.” However, the epistemology we adopt cannot claim to produce truth as has traditionally been defined, that is, as fully justified and known to be true independently of the justification. Stufjebeam: “Some recent popular conceptualizations of program evaluation are inadequate philosophically, theoretically, and practically, and are therefore potentially counterproductive.” The example he uses is empowerment evaluation as described by David Fetterman in his 1993 Presidential Address to the American Evaluation Association meeting. He proposes the adoption of an objectivist approach to evaluation where evaluators “search for, validate, and invoke defensible criteria for determining merit and worth of given objects.” When competently executed, objectivist evaluations will “lead to conclusions that are correct-not correct or incorrect relative to a person’s position, standing, preference, or point of view.” House views evaluation as an emerging discipline from concepts and methods supplied by several disciplines. There is the beginning of a conceptual framework that provides a basis for internal intellectual development. There are correct and incorrect studies and proper and improper ways of doing things, even though debates about them may be furious at times. A different kind of evaluator advocacy is warned against by Reichardt. He points to the potential self-interest bias in tactical research (e.g., being susceptible to sponsors’ pressures and unconsciously finagling the results to please the client, for example, to get
EVALUATIONPRAClKE, 15(3),1994 a grant renewed) that, as these types of studies are practiced with increased frequency, “will alter the public’s perception of the role to be played by policy-relevant research. . . . (It) will be perceived more and more as simply advocacy for preestablished positions-as just another way of presenting one’s self interested point of view.” Evaluators must be able to defend themselves against the increasing public skepticism about research and science, they “must find credible ways to justify judgments about differences in quality” of the programs, products, objects they evaluate. “Otherwise, we will abdicate our ability to evaluate.” Stufflebeam warns us also about advocacy on the part of clients to have evaluations that convey “the particular message, positive or negative, that the client/interest group hopes to present, irrespective of the data...” Many administrators would pay handsomely “for such nonthreatening, empowering evaluation service.” EVALUATION
KNOWLEDGE
NEEDED
Campbell speaks to the need for knowledge about the practice of evaluation as one of expanding and refining our causal inference tools. He suggests adding to evaluation reports a “redesign” appendix in which descriptions are included for how the program might have been implemented and the evaluation frame extended “so as to make possible a more definitive impact assessment in a future study.” Both Chen and Shadish point to the need for information about evaluation for the development of contingency frames. These devices are “theoretical concepts that identify a general choice point and the options at that point, and then show how and why evaluation practice should differ depending on which option is present or chosen” (Shadish). Chen notes that such devices should help stop the qualitative/quantitative debate by moving the discussion from which method(s) is(are) best to pointing out which method/procedure is called for under a particular circumstance or program configuration. Chen states that the future advancement of the field is dependent upon the accumulation of a systematic body of both empirical and theoretical evaluation knowledge. The empirical knowledge will come from evaluators who are perceptive in examining why they use a particular method or methods for a particular kind of program configuration and who report what happens after the application. Theories will need to be clearly specified so that they can be empirically tested or verified. Because there are no institutional arrangements that support evaluators to do nothing but think, develop, and test their theories about evaluation, Chen suggests that both the empirical and theoretical knowledge will have to come from evaluation practitioners. (If Bickman and Sechrest are correct about practicing evaluators being educated at the master’s level or not educated in evaluation at all, then this theoretical work that Chen talks about may not be forthcoming or at least not very rapidly.) EVALUATION
EXPANSION
Berk: The variety of disciplines involved in evaluation research, for at least some questions, will include better representation from the natural sciences, especially research on environmental issues. Two reasons are that the empirical question is often significantly defined in natural science terms, and a significant portion of the interested audience will be natural scientists and/ or policy makers with natural sciences training.
223
Evaluation:Review of Past, Previezuof Future
Light: In the last 10 years, focus on the substantive disciplines of education, psychology, and health, which played such an important role in the field’s early development, has been joined by an increasing engagement with other social sciences, including criminal justice, economics, and family welfare. Bickman: Evaluation will become more valued (and more used) in the policy world. The best use will come from studies of programs when the policy decision is not “hot,” that is, when the program is not based on strong beliefs. Chelimsky says we have already greatly increased the policy relevance of evaluation: decision makers located in a variety of places and at a variety of levels in the U.S. can now expect useful information from evaluations. And, Sechrest observes that “many individual program evaluators and many evaluations have had an impact on public policy.” Bickman and Light: There is increasing interest in and use of evaluation in the private sector. Bickman suggests that two movements that evaluators should not miss are quality improvement of services (which is primarily the application of evaluation principles) and the re-engineering of government and business. Chelimsky calls the latter “one of the most important developments of all.” Because of Congress’ Government Performance and Results Act of 1993 combined with Vice President Gore’s National Performance Review, “every agency in the U. S. Government will have its own evaluation function” by 19961997. Chelimsky, House, and Striven describe how evaluation is spreading “at a fairly rapid pace” around the globe. Evaluation associations are forming in Europe, Central and South America, and in Asia. In fact, this developmental dimension is one of only two dimensions of evaluation that &riven is willing to give a grade of A. (The other dimension so deserving is in the payoff from the practice of evaluation which &riven says is “enormous” because it “can and often does save lives, families, species, quality of life, and money-and can perhaps save the globe.“) Berk: There is an increasing use of applied research and evaluation in litigation. Reasons for this expansion are escalating expertise and escalating concern about quality control. Litigation issues include: (a) discrimination, that is, is the organization living up to some standard of fairness relative to personal attributes in its decision making? (b) environmental consequences of human actions, and (c) violations of statutes or regulations governing economic transactions (e.g., price fixing). Boruch: The use of randomized experiments will continue to expand to new areas of application, including areas that have heretofore been regarded as inappropriate. Examples include volatile or controversial areas like the rehabilitation of pregnant females in prison, medical experiments employed more frequently with women as the primary target or as members of a stratum or block in general field tests; nonvolatile expansions like IRS and the U.S. Postal Service. Expansion is NOT EXPECTED though in higher education-one reason is that incentives for them to do so are not clear. Boruch: Public media will increase the frequency with which it covers the results of experiments and the coverage quality will increase.
METHODOLOGICAL/DESIGN
CONSIDERATIONS
Several authors described trends already present, or expected, in evaluation in the methodological arena in the future. Two reasons for the trends, provided by Bickman
EVALUATION PRACTICE, 15(3), 1994
and Berk, respectively, are a) evaluation, itself, is becoming (will become) more methodologically complex which will require more complex statistical approaches, for example, to handle non-randomized real-life situations and to examine more theoretical predictions, and b) new techniques and procedures have, in fact, been developed. Chelimsky: One of the most salient characteristics of today’s studies is the prevalence of multimethod designs that use the strengths of one method to parry the weaknesses of another. Light: “A particular change that is expected to become overarching for the field is an increasingly urgent emphasis on the importance of creating good original designs for evaluation projects; designs that reflect the real-world complexity of any intervention. “I believe we have moved far beyond debating whether certain evaluation designs are ‘good’ while others are ‘bad.’ Right now, Light says, there is consistent data to support the conclusion “If you tell me how you will design your next evaluation, I think we can do a pretty good job predicting what results you will find.” [Sechrest asked the question of what would have a greater influence on the design of an evaluation of a social program-the nature of the problem or the particular identity of the program evaluator. He adds, “I think the answer is clear, and that is a good part of what is troublesome about program evaluation today.“] (Thus, by deductive reasoning; 1. Tell me who the evaluator is and I will tell you what the design will be; 2. Tell me the design and I will predict the results, and therefore 3. Tell me the evaluator and I predict the results!) Yin offers three potential scenarios for the future of the case study method. The scenario he expects to prevail is that the case study method will be used to produce evaluations that will be considered of great value to future evaluators. The design will be multiply nested to accommodate studies that focus on complex interventions in multiple settings with multiple initiatives and with no singular clientele target group; outcomes will be important, long term, and over-determined (influenced by conditions other than the interventions). Boruch: Controlled field tests will more frequently involve random allocation of institutions, geographic areas, or organizations to alternative regimens, rather than individuals. Reasons for this trend are (1) the institutional level is the level at which interventions are deployed, (2) it insures independence of units, and (3) context has great influence. Yin points out, though, that it is increasingly difficult to have control/comparison groups when the unit of analysis is an organization or program. Reasons are the unlikelihood of being able to manipulate another organization, costs are too high, and closeness of fit is unlikely for unique organizations. One solution offered by Yin is the substitution of explicit rival hypotheses for the use of control or comparison groups. The more these can be specified, the more they can be specified directly by tracking the existence of alternative sequences of events with the intervention group. Bickman: Program theory approaches will be (are being) developed to help overcome problems that most programs are not well thought out theoretically. House recommends the program theory approach as a way to begin to address some of the issues related to the making of causal inferences. To promote future progress in this area, Chen says there
Evaluation:Reviewof Past,Previewof Future
225
is a “great need for developing more dynamic and flexible approaches for constructing program theory” and to “develop prototypes for a variety of theory-driven evaluations.” Yin: Renewed attention will be given to the way that narratives are composed in science, how narratives embed scientific argument, and therefore how narratives implicitly embed research designs, especially cross-experiment or cross-case designs. Boruch: The empirical basis for deciding whether and when alternatives to randomized experiments are acceptable will increase. He cited some examples and indicated that meta-analyses may offer benefits in this arena. Meta-analyses were also promoted by House, who saw the approach as useful in dealing with causal inference concerns; by Reichardt as a means to publicize the ineffectiveness of social programs and the underfunding of their development; and by Stufflebeam, who recommended the approach as a way to ensure the integrity of the field, that is, to verify that standards are being followed in the practice of evaluation. Boruch: The exploitation of underutilized but defensible alternatives to randomized tests will increase. Among the alternatives cited is the use of regression discontinuity designs. Boruch: The governmental and intellectual separation of sample survey enterprise from field experiment enterprise will decline, albeit slowly. Berk: There is a move away from conventional statistics toward data analysis, for example: (a) a shift away from formal statistical inference to data exploration-from the deductive styles to “fishing” for interesting relationships; (b) an increased interest in statistical diagnostics, that is, assessing the credibility of statistical procedures; (c) newly created statistical techniques that are “robust” to violations of statistical assumptions, that is, they give sensible results despite problems like outliers, leverage points, and too-small samples. Berk: In the real world in which empirical research is undertaken, convenient statistical assumptions justifying much past practice do not hold. House would agree. He calls for the revamping of experimental design because it was originally formulated on an “inadequate theory of causation” which may have worked in simpler cases with fixed treatments and conditions, but does not hold very well in social science investigations where the interactions are much more complex. Campbell tells us to recognize that most of our evaluation tasks will be in settings that preclude the use of good quasi-experimental or experimental designs, and to not attempt or pretend to achieve definitive evidence of causal impact in these instances. However, he says, the lack of opportunities to practice plausible causal inference should not deter us from continuing to develop our methodological competence for this task, and to being committed to impact analyses, “there is still much of value that can be done to improve program effectiveness.” Berk: For many evaluation specialties, investigators will have to regularly retool. Graduate studies and experience will not suffice. SO, WHERE ARE WE? Most of us do not see any rationale for a continuation of the debate on qualitative vs. quanitative methods-some methods are better than others in some situations. However, those who see it as a difference in philosophy or epistemology believe that it will continue.
226
EVALUATION PRACTICE, 15(3), 1994
Some believe the purpose of evaluation is not so clear anymore for many practicing evaluators; that the original concern for evaluating outcomes or impact (product) has been changed to a focus on implementation (process); and that roles that evaluators (and other professionals) might perform for clients are being confused with the goal. This lack of clarity and confusion on the part of evaluators may also lead to confusion on the part of clients and perhaps even to their use of this confusion as a way to get evaluations that report their preferred views. The opinions were mixed about the professionalization of evaluation. Some authors believe there has been sufficient time to be recognized as an intellectual discipline but that it has not occurred. Others have an optimistic view of evaluation moving at a quickening pace into the “real world marketplace,” of affecting policy and serving other needs of a broad variety of stakeholders. There is probably an association between place of occupational residence and opinion on this matter. There was no disagreement with the premise that social programs are not ameliorating the problems to which they are addressed and that they are poorly constructed. Some expressed the view that evaluators may share some of the blame, that is, more progress may have ensued in some arenas of social problems if evaluation studies had focused on the multiple end goals of determining if a program made a difference and determining what went wrong in those that did not. In response to so much program failure, several of the writers propose that evaluators become more involved in program development. If/ When this happens, definite steps will need to be taken to ensure that any evaluations of these programs are, and are viewed by external stakeholders as, credible. Some here are frankly worried that an ideological split has occurred in the field of evaluation between those who wish to use evaluation as a way to promote a favored political agenda and those who wish evaluation to be objective and scientific. Another concern is the potential self-interest bias in tactical research, for example, finagling results to please a client. These are critical concerns that the field must address, for if we cannot be trusted to be neutral on program topics and objective, to the best of our abilities, then what reason is there for our services? If the field is to advance, knowledge is needed about the practice of evaluation. We need a systematic body of both empirical and theoretical evaluation knowledge. More work must be devoted to the development and testing of theories, and practice must be examined to determine which methods work in what contexts and which do not. Practicing evaluators are expected to lead in this work. Evaluation is seen as expanding into new areas and around the globle. Evaluation associations are forming in Europe, Central and South America, and in Asia. Use is increasing in disciplines like the natural sciences, criminal justice, and economics; and it is increasing in the private sector, for example, quality improvement of services. Many changes are occurring in methods/designs of evaluation, A particular urgent change that is expected is the development of good original designs, designs that reflect the real-world complexity of interventions. There is recognition that statistical assumptions justifying much past practice do not hold in the real world in which empirical research is undertaken; there is recognition that much of our evaluation tasks will be in settings that preclude the use of good quasi-experimental or experimental designs, and that a revamping of “theories of causation” may be in order.
Evaluation:ReviezuofPast,Previewof Future
227
Higher education was a leader in the origin of the field of evaluation but some do not see it as being a leader now. I do not know all the reasons, and I cannot speak for others in academe, but I believe Sechrest’s personal observation, that he had been too easily seduced by some other interest or opportunity, would hold for many others. I have to also say that in the near past, at least at my institution, budget cuts and the ensuing reductions in personnel have taken away the luxury of time to spend in creative contemplation of the field. That excuse will not work for all institutions of higher learning nor will it suffice for the past thirty years. Finally, I wish to reflect on my six years as editor of Evaluation Practice. I know that I have not seen a representative sample of all the evaluations that were implemented during that time. However, I have seen examples from many, many areas; for example, the armed services, private foundations, school systems, the Federal Bureau of Investigation, agriculture, health, qualitative and quantitative, formative and summative, inside and outside the U.S., and so on. Only a few of these have yet been published. I am worried about the quality of the product that some clients may be receiving. Some of the problems that surfaced over and over in the manuscripts were: 1) unclear descriptions of the evaluation and the results, that is, I could not determine what was intended and/ or what occurred in the study; 2) unclear evaluation questions and lack of congruence between reported questions and types/ amounts of collected evidence; 3) lack of steps being taken to ensure reliability and validity of data-gathering tools; 4) haphazard definition and selection of sources for providing evidence; and 5) conclusions not sticking close enough to the evidence. Even though I might agree with Bickman that evaluation as practiced at the local level does not require a doctoral degee, it does require a broad repertoire of methodological competencies and a high degree of skill within them. Based on my last six years of reading about other evaluators’ experiences, 1 believe that what the field of evaluation needs more than anything else is to increase the skills and competencies of those who perform evaluations. There is a field here requiring unique knowledges and skills; there are proper and improper ways of doing things; there are appropriate and inappropriate studies; and there are methodological and ethical standards for how we practice evaluation. REFERENCE Pournell,
J. (1993). Chaos
manor. Byte, 217.