Studies in Educational Evaluation 33 (2007) 5–14 www.elsevier.com/stueduc
THE EDUCATOR ROLE OF EDUCATIONAL EVALUATORS: A TRIBUTE TO ARIEH LEWY
Lorin W. Anderson
University of South Carolina, USA
In a paper written in 1995, Arieh Lewy questioned the impact of evaluation on educational practice, offering several reasons for the lack of impact. One of the major reasons, as Arieh saw it, was the communication gap between researchers and decision makers. In order to close this gap, evaluators must become educators, working with a variety of intended audiences to help them understand the methodology and results of evaluation studies. The role of conceptual frameworks in developing shared understanding is explored. Five recommendations for helping evaluators become educators are offered.
Abstract Educators, practitioners, and policy makers have long debated the impact of the results of evaluation studies on educational practice. If evaluation is to influence practice, the communication gap between researchers and decision makers must be reduced or eliminated. In order to close this gap, evaluators must become educators, working with a variety of intended audiences to help them understand the methodology and results of evaluation studies. To do this, a greater understanding of the intended audience(s) is needed. In addition, conceptual frameworks must be developed to facilitate a common or shared understanding of the results of evaluation studies.
Although I had met Arieh Lewy two or three times while a doctoral student at the University of Chicago in the early 1970's, my first opportunity to really sit down and talk with him came in Toronto in February, 1980. We were attending a planning meeting for the IEA Classroom Environment Study (Anderson, Ryan, & Shapiro, 1989). Although Arieh had attended previous meetings of the planning committee, this was the first meeting that I attended. In fact, I was invited to the meeting to see if I could find a way of moving the study forward in light of substantial differences that existed within the planning committee. 0191-491X/$ – see front matter # 2006 Published by Elsevier Ltd. doi:10.1016/j.stueduc.2007.01.002
6
L.W. Anderson / Studies in Educational Evaluation 33 (2007) 5–14
The morning after the first day's sessions we talked about the nature of these differences over breakfast at McDonald's (Arieh's choice). As Arieh saw it, there were, in fact, three subgroups. One group wanted to focus on the direct link between specific teacher behaviors and student achievement. A second group, the one with which Arieh identified, wanted to include additional student variables in the study (e.g., engaged time, perceptions, aspirations) and emphasized a more macro-examination of the classroom, focusing on curriculum units rather than daily lessons. The third group sat on the proverbial fence, with a greater interest in getting the study started than in debating what they saw as rather esoteric matters. I remember asking Arieh how he thought I could get the job done. He began by admitting that it would not be an easy task. He then opined that if I could find a way of moving the group beyond the details to the consideration of larger issues, I might be able to find common ground. It was good advice and I took it. I began by asking the planning group members to combine variables to form fewer, larger categories, which were termed "constructs." Two major criteria were used to determine whether specific variables would be included in a particular construct. First, the data pertaining to all of the variables within a construct had to be derived from the same type of instrument (e.g., observation measures, self-report instruments). Second, the hypothesized relationships among the variables within a construct as well as the relationships between the variables and measures of student achievement had to be in the same direction (i.e., positive or negative). After much discussion and debate, fifteen constructs were identified from an initial list containing almost 200 specific variables. After the fifteen constructs (and related variables) were endorsed by all members of the planning committee, the constructs were placed in a causal model or path diagram – a conceptual framework. After another round of discussion and debate, the causal model also received unanimous support. Arieh was right. Moving people toward a comfortable level of abstraction and away from a focus on concrete details and instances enabled consensus to be achieved. Following consensus, the study began. The Impact of Evaluation on Educational Practice In 1995, Arieh wrote a paper in which he considered the impact of evaluation studies on educational practice. Accepting the frequently-made claim that evaluation has had little impact on practice, Arieh offered several reasons for this lack of impact. Among the reasons were the lack of proper funding for research and evaluation, the reluctance of teachers to change their practices, and the conservative nature of the educational bureaucracy. He also examined some of the ways in which evaluation studies could have a greater impact on practice. Among the solutions he considered were improving the quality of research and evaluation methodology, placing a greater emphasis on the "practical consequences of [evaluation] findings" (p. 202), and finding ways of "decreasing the communication gap between researchers and decision makers" (p. 200). In the two decades that have passed since this paper was published, great improvements in methodology have been made. These improvements have occurred in all aspects of research and evaluation methodology: sampling, instrument development, psychometrics, and statistical models and procedures. Nonetheless, I would venture to
L.W. Anderson / Studies in Educational Evaluation 33 (2007) 5–14
7
guess that the impact of evaluation studies on educational practice has increased little, if at all, during this same time period – certainly not in proportion to the advancements in methodology. I would argue that the primary reason for the lack of impact of evaluation results on educational practice is, in fact, the communication gap that exists between evaluators and decision makers. Furthermore, this communication gap exists because (1) decision makers are not a homogeneous group, and (2) evaluators and decision makers use quite different conceptual frameworks in their attempts to understand the world around them. Until evaluators begin to assume a greater role as educators, the impact of evaluation studies on educational practice will continue to be minimal. The Heterogeneity of Decision Makers In his interview with Adrienne Bank (1985), Arieh differentiated between two types of evaluation studies that have the potential to improve education. "First, there are smallscale studies that do not stir up emotional reactions in the public. … A great many studies carried out in curriculum evaluation have had an impact on action. They affected immediate decisions, led to the changing of instructional materials, and so forth. But, they did not cause a great public outcry. They constituted a matter of concern only to the evaluator and to the program developer. They did not appear in newspaper headlines" (p. 88). In contrast to these studies are the large-scale evaluation projects that deal with basic issues of educational policy. "These evaluations served as a stimulus to public awareness of issues. They created public debates. But, on the issues themselves the decisions were made on the basis of political considerations. Evaluation data had only a small effect, but they were never fully disregarded. … Both sides tried to highlight the results that supported their approach" (p. 88). In reading these descriptions, one can incorrectly focus on the distinction between large- and small-scale studies. The primary difference, however, lies in the audiences of the evaluation reports: program developers vs. politicians. It is not surprising that program developers are more likely to use evaluation data properly since, by definition, they are interested in programmatic change. In addition, they quite likely understand the results since at least some of the data were collected in accordance with program specifications (e.g., objectives, implementation guidelines). Finally, in an ideal world, evaluators work closely with program developers from the beginning to the end of the project, thus increasing the likelihood that there will be a shared understanding of both program and evaluation. Politicians, on the other hand, represent varying viewpoints – viewpoints that are consistent with the philosophies and platforms of their political parties. Worldwide, one party typically stands for the status quo, while at least one other party desires change. Not surprisingly, then, each side emphasizes the results that support its position. Furthermore, unlike researchers, politicians look for data to support their often staunchly held positions. I am reminded of a study that I conducted in the mid-1980s on the implementation and effectiveness of compensatory and remedial programs. After I had presented the results of the study to a legislative committee, one of the committee members came up to me and told me that although it was a "fine study" and the recommendations were "reasonable," the
8
L.W. Anderson / Studies in Educational Evaluation 33 (2007) 5–14
timing was bad. "Bad timing?" I asked. "Yes," he said. "We've already made our decision. We just haven't announced it." It is noteworthy that teachers are perhaps the least likely audience for evaluation studies. Increasingly, teachers are not primary decision makers within educational systems. Although they still make micro-level decisions (e.g., What questions should I ask my students?), macro-level decisions (e.g., What objectives I should pursue?) are made at higher levels of the educational system, by local, state, and/or national authorities. Even the professional development opportunities offered to teachers by local education authorities (LEAs) are based primarily on what administrators believe teachers need, not what teachers themselves believe would be beneficial to them or improve their effectiveness. The presence of different audiences typically requires being aware of different motives. Following Arieh's description of the two types of evaluation studies, it is entirely possible that the program developer has motives that cause him or her to skew his or her perceptions of the data, focusing primarily on those data that support program adoption. Is this really any different from the data selectivity of politicians that Arieh mentioned? Evaluators operate on the assumption that data are used to find answers to important questions and they suppose that everyone else operates the same way. Policy makers, however, may be more interested in data that support their answers to important questions. But policy makers are not alone in this regard. I have had far too many graduate students who, when asked about the purpose of their study, replied: "I want to prove that … " Different audiences also have different information needs. Many decision makers (including politicians) have a "bottom line mentality." They do not have time to read a 150-page evaluation report. To meet the information needs of these decision makers, then, it is important to prepare and distribute a well-written executive summary of the evaluation report. Unlike policy makers, reviewers of manuscripts submitted for publication are likely to read the entire report, examining methodology, results, and the extent to which the recommendations offered are consistent with the results obtained. With respect to the need to connect recommendations with the available data, Arieh offered the following observation: "Unfortunately, researchers, who in their empirical work apply rigorous methods in proving the correctness of their hypotheses, tend to provide less satisfactory evidence for the veracity of their statements when they deal with the effect of research findings on educational practice" (Lewy, 1995, p. 199). Evaluators need to understand the motives of diverse audiences and do a better job of meeting varying information needs. We need to remember that not everyone thinks the way we do. We also need to remember that writing for ourselves (or people like us) is much easier than writing for others. Understanding the Results of Evaluation Studies Stated simply, our understanding of evaluation results depends to a large extent on the conceptual framework we bring to the data collected during the evaluation study. Stated differently, the data do not speak for themselves. Rather, we interpret the data using the lens through which we look at them. These lenses are our conceptual frameworks. Arieh understood the importance of conceptual frameworks: "Without a conceptual
L.W. Anderson / Studies in Educational Evaluation 33 (2007) 5–14
9
framework data summaries remain a disjointed set of facts, which do not contribute to our understanding of the dynamics of various phenomena" (Lewy, 1981, p. 60, emphasis mine). His point is clear: understanding is not derived from facts; rather, understanding is derived from examining the facts through the lens provided by our conceptual frameworks. In fact, our understanding of a given phenomenon can be defined in terms of how well our conceptual framework describes or explains the phenomenon. Because conceptual frameworks are abstract, we must rely on empirical evidence (i.e., data) to test their validity. Our level of understanding, then, can be determined by the extent to which the data supports our conceptual framework. Stated somewhat differently, we examine how well the data "fit" our framework. The better the fit, the more complete our understanding. This concept of understanding as the fit between abstract concepts and empirical evidence raises two important questions. First, how well do the data have to fit our conceptual framework? Second, what do we do when the data do not fit our framework very well? Relative to the first question, a series of data analytic rules have been established to guide our decision making. For example, the proportion of variance in the outcome measure accounted for by the other measures included in the conceptual framework is a standard way of examining the fit of the data to the framework using multiple regression techniques. The greater the proportion of variance explained by the framework, the greater our understanding of the phenomenon under investigation. Similarly, path coefficients below .10 typically result in the elimination of variables from conceptual frameworks when path analytic techniques, such as LISREL, are used. The elimination of these variables generally results in a better fit between the conceptual framework and the data and, consequently, a greater understanding. Relative to the second question – what to do when the data do not fit our conceptual framework as well as we had hoped? – the answer is more complex. For example, we may eliminate obvious "outliers" in the data set and retest the framework. Alternatively, we may eliminate variables from the framework (i.e., remodel the framework) and re-examine the fit. The goal of both of these actions is the same; namely, to increase the fit between our conceptual framework and our data. There are occasions, of course, where the most prudent action is to abandon the framework and start over. On these occasions, the misfit is so large that no amount of remodeling will result in an appropriate degree of fit. Abandoning a conceptual framework is extremely difficult for most of us. After all, we have spent a great deal of time and effort developing the framework and, consequently, often have a vested interest in it. If the desire to "save" the framework is very great, evaluators may feel pressure – internal or external – to create data so that the fit of the data with the framework is virtually guaranteed. Although this option has been used in the past (see, for example, Fletcher, 1991) and today's evaluators may feel some degree of pressure to produce "positive results" when they receive money to conduct evaluations of someone's "pet programs," manipulating data to increase fit is at best poor evaluation and at worst unethical. It is important to point out that both evaluators and decision makers have and use conceptual frameworks. However, the formulation of conceptual frameworks by members of the two groups is quite different (Fenstermacher, 1982). Evaluators' conceptual frameworks tend to be grounded in the paradigms that define the field. In contrast, decision makers' conceptual frameworks are grounded in their experiences – past, present,
10
L.W. Anderson / Studies in Educational Evaluation 33 (2007) 5–14
and, increasingly, future. This distinction is not meant to denigrate the conceptual frameworks of decision makers. As House, Matins, and McTaggart (1989) have argued: "the choice [between evaluators' and decision makers' conceptual frameworks] is not between valid scientific knowledge on the one hand and invalid … superstitution on the other, as the problem is often posed" (p. 15). Rather, the distinction suggests that because the interpretation of data depends primarily on one's conceptual framework, different interpretations of the same data by evaluators and decision makers should be expected. Fenstermacher (1982) has suggested three ways in which the results of evaluation studies can impact on educational practice. First, empirically-derived rules can be established for decision makers to follow. Second, evaluation results can be presented to decision makers for their examination and subsequent acceptance or refutation. Third, schemata (that is, representations that "provide a way to 'see' a phenomenon and a way to think about it" (p. 9)) can be offered for consideration by decision makers. In reviewing these three ways in which bridges can be built between research and practice, Fenstermacher concludes that if he were a bridge inspector, he would "condemn bridges built with rules, certify bridges built with evidence, and commend bridges built with schemata" (p. 7). If evaluation studies are to influence educational policy and practice, then, evaluators must become educators. That is, they must help decision makers build conceptual frameworks (schemata) that will enable them (the decision makers) to not only understand the results of the studies, but to understand them in a manner more consistent with the evaluators' understanding. The influence of evaluation studies on education practice is likely to be increased substantially when, and only when, evaluators and practitioners share a common understanding. The Evaluator as an Educator What can evaluators do to become more effective educators? In this section, I offer five recommendations. In combination, the recommendations are based on the complex interrelationships among evaluators, decision makers, conceptual frameworks, data, and recommendations. 1. Evaluators must make explicit their conceptual frameworks. It has been my experience that evaluators enjoy methodological issues and arguments far more than they do substantive ones. Emotionally-charged debates over the merits of qualitative versus quantitative studies, two- vs. three-parameter latent trait models, and structured vs. narrative records of classroom observations litter the pages of evaluation journals. Debates over substantive issues such as the conditions under which smaller class sizes could be expected to yield greater student achievement (Anderson, 2000) or the various roles that principals can and/or should play in school reform (Anderson & Shirley, 1995), on the other hand, are represented far less often and, when represented, generate far less emotion. Concerns for substantive issues lie at the heart of the development of conceptual frameworks. And, without conceptual frameworks, data are meaningless. In this regard, it is instructive to remember the distinction between data and information made by Stufflebeam, Foley, Gephart, Guba, Hammond, Merriman, and Provus (1971). In essence,
L.W. Anderson / Studies in Educational Evaluation 33 (2007) 5–14
11
information is data that are meaningful and useful to the decision maker. Merwin (1973) elaborated on this point as follows: If the data are in a form such that the measurement sophistication of the decision maker permits understanding (i.e., it is meaningful to him), and it is relevant to and useful in making the decision, it is given the label "information" (p. 5). If improved communication with decision makers is an important goal for evaluators, then developing conceptual frameworks for every evaluation study and explicitly sharing those frameworks with others (including decision makers) is an important first step. 2. Evaluators must get to know decision makers. I believe that one of the things that made Arieh an excellent evaluator was that he understood decision makers. This knowledge did not come from his formal graduate education. Rather, it came from his experiences prior to this education. "If I compare my way in evaluation to that of my colleagues, I see the major difference not in the basics of training but in our age of entrance into evaluation work and our previous experiences. I worked for twenty years in education (from 1943 till 1962) before I took up Ph. D. studies in evaluation. I was a grade school teacher. … I have been a high school teacher and a school principal. I have a great familiarity with the objects of evaluation through personal experience. Many of my colleagues have gone directly into evaluation without that personal experience" (Bank, p. 82). Often the relationship between evaluators and decision makers can be best summed up as a "we-they" sort of thing. We know what we're doing; they do not. We have pure motives; they are motivated by "politics." This we-they dichotomy must be replaced with "us." We're all in this together. Without sound evaluations education cannot improve. At the same time, however, without wise decisions based on sound evaluations education cannot improve either. Not all of us can move into evaluation rather late in life as Arieh did, accumulating a wealth of experiences prior to entering the field. If we cannot, however, it would behoove us to study decision makers in much the same way as we study reading programs, school organization, and parent education and involvement. Getting to know decision makers – who they are and what makes them "tick" – will help evaluators close the communication gap. 3. Evaluators must use the language of the decision maker. This recommendation follows directly from the previous one. If evaluators do not understand decision makers, it is extremely difficult to use their language. Based on my three decades working with decision makers at all levels – local, state, national, and international – I would offer the following observations. First, although the distinction between formative and summative evaluation is important to evaluators, it is far less important to decision makers. They would like all evaluations to serve both purposes. Second, statistical significance without practical significance is irrelevant to decision makers. They want to know if a program or practice is worth all of the time, effort, and money needed to make the change. Third, wellprepared, clearly labeled graphs and charts are far easier for decision makers to understand
12
L.W. Anderson / Studies in Educational Evaluation 33 (2007) 5–14
than are tables containing numbers with four digits to the right of the decimal point. Fourth, raw regression coefficients are more important to decision makers than standardized regression coefficients. Raw regression coefficients are in the "metric of decision making" while standardized coefficients are not. Fifth, although we understand our abbreviations and acronyms, decision makers often do not. Start tossing around AERA, TIMMS, HLM, and ANOVA and watch decision makers' eyes glaze over. Every evaluator could certainly add to the list I started above. The point, however, is that evaluators must take the time to find out what "works" and doesn't "work" in communicating with decision makers. 4. Evaluators must be aware of the role that timing plays in accepting and using the results of evaluation studies. Timing plays a critical role in all of education (see, for example, Anderson, 1985). So, it should come as no surprise that timing is critical in our attempts to use evaluation studies to influence educational practice. From a personal perspective my experience with "bad timing" (described earlier) made me more aware of the importance of "good timing." When is the best time to release the results of the evaluation study? What is the "window of opportunity" for making decisions related to the results of the study? What are the steps involved in the decision making process and who is involved at each step? Answers to questions such as these will help bridge the gap between evaluation studies and educational practice. 5. Evaluators must clearly show the connection between their data and their recommendations. About ten years ago I reviewed a manuscript that was submitted for publication to a reputable journal. Among the comments I made to the authors of the manuscript was that I failed to see the connection between their results and their recommendations. I suggested that they make the connection explicit in their discussion section. About a month later, I received a fairly standard letter from the editor of the journal thanking me for reviewing the manuscript and informing me of the publication decision (which was to reject the manuscript). In a handwritten note at the bottom of the letter, the editor had written: "The authors saw no need to further discuss the connection between their data and their recommendations. They said it was 'self-evident.'" Many of the truths that we as evaluators hold to be self-evident are not self-evident to others. This is particularly true when our audience is decision makers. Connecting our recommendations with our data serves two purposes. First, the data provide the basis (i.e., reason or justification) for our recommendations. Second, without the connection, our recommendations are no better (or no worse) than any one else's recommendations. If we can ignore our data as we make our recommendations, how can we criticize others who do the same? Evaluators and Educators: A Postscript Most evaluators recognize that Arieh Lewy was a prominent educational evaluator with an international reputation. What may be less well known is that Arieh was an educator at heart. His love of education began early. "I studied in Budapest in the high school section of the rabbinical seminary. I was attracted to the profession of teaching, mainly toward secondary school teaching. For me it was an avenue for social mobility" (Bank, 1985, p. 80). For me, also.
L.W. Anderson / Studies in Educational Evaluation 33 (2007) 5–14
13
As mentioned earlier, Arieh's path to evaluation was not a direct one. "When I was in Israel working as a high school teacher in Hebrew and Talmudic studies, I again became attracted to mathematics. I moved into the fields of psychometrics and measurement, which eventually brought me to testing. Probably this is another case where persons with frustrated ambitions in the field of mathematics go into evaluation" (Bank, 1985, p. 81). As did I. Something else Arieh said in his interview with Adrienne Bank explains why he was so successful in the field of educational evaluation: "My previous studies in Talmud and philosophy have had a great impact on my work, in the sense that I always try to examine issues from angles or points of view that are different from the ones that have been presented to me" (p. 81). A tolerance of different viewpoints while remaining true to his principles: this not only summarizes Arieh's life and work, it is an inspiration to us all.
References Anderson, L.W. (1985). Time and timing. In C.W. Fisher & D.C. Berliner, (Eds.), Perspectives on instructional time. New York: Longman. Anderson, L.W. (2000). Why should reduced class size lead to increased student achievement? In M.C. Wang & J.D. Finn (Eds.), How small classes help teachers do their best (pp. 3-24). Philadelphia, PA: Temple University Center for Research in Human Development and Education. Anderson, L.W., Ryan, D.W., & Shapiro, B.J. (1989). The IEA Classroom Environment Study. Oxford, England: Pergamon Press. Anderson, L.W., & Burns, R.B. (1990). The role of conceptual frameworks in understanding and using classroom research. South Pacific Journal of Teacher Education, 18, 5-18. Anderson, L.W., & Shirley, J.R. (1995). High school principals and school reform: Lessons learned from a statewide study of Project Re:Learning. Educational Administration Quarterly, 31, 405-423. Bank, A. (1985). Evaluation in Israel: A conversation with Arieh Lewy. New Directions for Program Evaluation, 1985 (25), 79-92. Fenstermacher, G.D. (1982). On learning to teach effectively from research on teacher effectiveness. Journal of Classroom Interaction, 17 (2), 7-12. Fletcher, R. (1991). Science, ideology, and the media: The Cyril Burt scandal. Somerset, NJ: Transaction Publishers. House, E.R., Matins, S., & McTaggart, R. (1989). Validity and teacher inference. Educational Researcher, 18 (7), 11-15, 26. Lewy, A. (1981). Timeliness and efficiency in serial evaluation activities. Educational Evaluation and Policy Analysis, 3 (3), 55-65.
14
L.W. Anderson / Studies in Educational Evaluation 33 (2007) 5–14
Lewy, A., & Kugelmass, S. (Ed.) (1981). Decision-oriented evaluation in education: The case of Israel, Proceedings of an Israel-American seminar on educational evaluation. Glenside, PA: International Science Services. Lewy, A. (1995). The impact of evaluation studies: The researcher's view. In W. Bos & R.H. Lehmann (Eds.), Reflections on educational achievement: Papers in honor of T. Neville Postlethewaite. Munich: Waxman Publishing Company. Merwin, J.C. (1973). Educational measurement of what characteristic of whom (or what) by whom and why? Journal of Educational Measurement, 10 (1), 1-6. Stufflebeam, D.L., Foley, W J., Gephart, W.J., Guba, E.G., Hammond, R.L., Merriman, H.O. & Provus, M. (1971) Educational evaluation and decision making. Itasca, IL: F. E. Peacock.
The Author LORIN ANDERSON is a Distinguished Professor Emeritus of Educational Leadership and Policies at the University of South Carolina where he served on the faculty for 33 years. His research and writing interests include curriculum and course design, educational programs for educationally-disadvantaged children and youth, and data-based decision making in education. Correspondence: