Evaluation and Program Planning, Vol. 8, pp. 251-253, 1985 Printed in the USA. All rights reserved.
0149-7189/85 $3.00 + .OO Copyright ID1985 Pergamon Press Ltd
THE ERS STANDARDS FOR PROGRAM EVALUATION Guidance for a Fledging Profession
YVONNA S. LINCOLN University of Kansas
ABSTRACT THE ERS Standards, together with those of the Joint Committee, symbolize a commitment to the professionalization of evaluation. The two sets of standards, however, differ on at least jive important dimensions: the perceived linearity of evaluation activities; preferred and permitted approaches (with one set heavily favoring quantitative/experimental approaches over any others); the purposes of evaluation; obligations to clients and stakeholders; and whether or not social science can be value-free (objective). Because these fundamental assumptions are so divergent, the possibility of combining or effecting a compromise between the two sets appears unlikely at this time.
and excellence in practice. But before the standards can be unified, serious issues remain to be resolved. The overlap between the two sets (see Rossi, 1982) obscures the extent to which the evaluation profession remains sharply and deeply divided over what constitutes adequate practice. Indeed, adequate practice can be described quite distinctively, depending on a practitioner’s model-linked persuasions, methodological training and preferences, and paradigmatic assumptions (Guba & Lincoln, 1981, 1982). Models, methods, and paradigms suggest powerful value options and choices. It is to some of those value assumptions that this article is addressed. The ERS Standards are divided for purposes of convenience into six sections, felt to be “roughly in order of occurrence” (ERS Standards Committee, 1982). Those are: formulation and negotiation; structure and design; data collection and preparation; data analysis and interpretation; communication and disclosure; and utilization. This order represents what was deemed to be the sequence of an ordinary evaluation contract, from initial contact and contract to guidance in using the final recommendations. The structure of the list itself embodies the first assumption. That assumption is roughly this: Proceeding in a straight line, unless design changes are necessary, the first communication an evaluator is likely to have with a client follows data
That the practice of evaluation is growing toward maturity is evidenced by three hallmarks: its coalescence into several national professional organizations; its consideration of various subspecializations within the field (front-end analysis or needs assessment, evaluability assessment, formative evaluation, impact of summative evaluation, program monitoring and metaevaluation [ERS Standards Committee, 19821); and its creation and adoption of several sets of standards to guide professional practice. The adoption of standards is typically a signal to clients, consumers, and professionals alike that substantial agreement exists among and between practitioners of a craft as to what constitutes good, or responsible, service, and often (although not in this case) also sets the boundaries outside of which mispractice may be assumed. The debate over this signal has begun and is likely to continue as several sets of standards now exist (Joint Committee, 1982; ERS Standards Committee, 1982). While there is substantial overlap, there is little doubt that strong value and methodological assumptions of a very different character undergird both sets. Stufflebeam (1982) and others have called for discussions to determine how the ERS and Joint Committee’s standards might be unified. Presumably, unification of existing standards would communicate further agreement in the field regarding competence
Reprint requests should be sent to Dr. Yvonna S. Lincoln, E.P.A. Dept., 1 Bailey Hall, University of Kansas, Lawrence, Kansas 66045. 251
252
YVONNA
analysis. By implication, once the contract is sealed, the first real information to be delivered to the client are the data, which are derived from a first-cut analysis. Evaluators operating, however, in a more responsive mode, would argue that negotiation continuously occurs throughout the evaluation, as does data analysis, interpretation, and full disclosure. Thus, one set of assumptions leads to linear and highly sequenced activities, where another set of assumptions leads to a cyclic, feedback-feedforward, iterative set of evaluation activities. It is clear, of course, that standards or guidelines of any length and complexity need to be organized into rubrics or taxonomies for access. But the categories chosen do ultimately represent larger understandings and assumptions. The linear quality to this particular set carries with it implications for the structure of evaluation activity with which the entire evaluation community may not be entirely comfortable. A second value assumption embedded in the ERS Standards is their strongly quantitative and experimental bent. Words like nontreatment, sampling, reliability, validity, generalization, replicable, and cause-andeffect leave little doubt as to which methods are thought to possess the most power. The preface to the “structure and design” section carefully specifies that case studies are “as subject to specification as the design of an experimental study” (ERS Standards, 1982, p. 13), nowhere noting that case studies are often the product of very different evaluation purposes and methods, and which often cannot be fully specified a priori. And nowhere in the same section (or any other) are qualitative measures given a direct nod. The orthodox methods (quantitative) are clearly given, and where not given, strongly implied. Assuming a preference for a priori (rather than emergent) designs, experimental or quasi-experimental models, and quantitative methods, new evaluators would find clear and ample direction for ensuring the technical adequacy of their evaluation efforts. Those evaluators persuaded to more ethnographic and/or responsive models, however, would find no direction in defining the adequacy of their reports. As a result, the methodological values that undergird the design “admonitions” virtually ignore one set of strongly persuaded evaluation practitioners. It is not just that the standards mention case studies without providing commentary on ways to insure their technical adequacy. The implication is clear that a case study is, or ought to be, a variant of a technical report. For some social scientists, nothing could be farther from the truth. That is, case studies and technical reports are constructed from very different kinds of data, collected by different methods, and intended to serve different purposes. The presumption that a case study can be treated just like any other technical report would be viewed by some evaluators - including the author - as erroneous.
S. LINCOLN
The standards are also value-laden on a third dimension: the purposes that evaluation might serve. For instance, the reliance on cause-effect relationships hides the fact that often, no satisfactory “cause” may be isolated for a given effect. The search for statistically significant cause-effect relationships has blinded evaluators and clients alike to more diffuse, but also more powerful, social forces operating within the context. The press to determine the most appropriate instrumentation for a given evaluation effort has occasionally obscured two painful facts: one, instruments may not exist for the most salient features of an evaluation context; and two, some webs, patterns, and meshes of social behavior may not be explorable at all with paper-and-pencil instrumentation. Equally important, the description of what is, and why people think it to be so may be as important a function of evaluation as verification of cause-effect hypotheses. The roles of description, judgment, and characterization may ultimately serve more purpose for policy matters than dozens of impact studies full of no significant-difference statistical analyses. Ironically, the purposes of description, judgment, and characterization are those best served by the narrative-style case study rather than the technical report; and the case study is often more accessible to both audiences and clients than the more statistically oriented technical report. A fourth assumption, one that has plagued evaluation from the beginning, is what might be called the “client-blinders.” It is surely easier to think, as other professions do, of the client as the agency or individual who requests and pays for services. It is most assuredly the case that those who contract for evaluation services are those in whom power and authority for a program’s delivery is vested. But shaping an evaluation effort soley to serve the information needs of those who enjoy formal authority ignores other audiences who have felt or will feel the effects of a program’s continuation, modification, or discontinuance. Evaluators may share an unwritten but moral enjoinder to seek out those who might have legitimate interest, although no formal authority. This is nowhere more critical than in the arena of social action programs, when the enabling legislation may be sufficiently broad and program targets sufficiently diffuse or unclear to render necessary extra attention to defining audiences and those presently or potentially affected. While the term client may represent clearly the commissioning authority for an evaluation, it fails to recognize adequately the legitimate moral claims of other audiences to evaluation results. Finally, the program standards fail to recognize the role of values in evaluation efforts. The continuing emphasis on objectivity and freedom from bias ignores mounting recognition by the scientific community that all sciences, but especially the social sciences, are
ERS Standards value-bound. The search for objectivity- for freedom from bias and the influence of values-is a modernday quest for the Holy Grail. The press for objectivity camouflages the need for presentation of competing value systems, for open acknowledgment of bias, and for evaluation findings that are firmly anchored in the multiple realities of multiple audiences’ experience. These assumptions do not negate the positive contributions both sets of standards have made and will make in the future. Perhaps the most powerful purpose lies in provoking serious and concerned debate in
253
the profession and among clients for evaluation reports, including federal, state, and local government agencies- by far the most frequent clients for evaluation services. Contrasts between the ERS and Joint Committee Standards, however, point to serious conflict on the meaning of adequate, competent, or excellent practice. While ongoing discussion on unifying the standards will be useful over time, it is unlikely whether compromise will be possible in the near future.
REFERENCES ERS STANDARDS COMMITTEE. (1982, September). Evaluation Research Society Standards for Program Evaluation. New Directions for Program Evaluation, IS, 7-19.
JOINT COMMITTEE ON STANDARDS FOR EDUCATIONAL EVALUATION. (1982). Standards for Evaluation of Educational Programs, Projects and Materials. Boston: Allyn-Bacon.
CUBA, E. G., & LINCOLN, Y. S. (1981). Effective Evaluation. San Francisco: Jossey-Bass.
ROSSI, P. H. (Ed.). (1982, September). Standards for evaluation practice. New Directions for Program Evaluation, 15.
GUBA, E. G., & LINCOLN, Y. S. (1982, Winter). Epistemological and methodological bases for naturalistic inquiry. Educational Communications and Technology Journal, 31, 233-252.
STUFFLEBEAM, D. L. (1982, September). A next step: Discussion to consider unifying the ERS and Joint Committee Standards. New Directions for Program Evaluation, 15, 27-36.