Program Evaluation

Program Evaluation

Program Evaluation Christiane Spiel, Barbara Schober, and Evelyn Bergsmann, University of Vienna, Vienna, Austria Ó 2015 Elsevier Ltd. All rights rese...

163KB Sizes 5 Downloads 165 Views

Program Evaluation Christiane Spiel, Barbara Schober, and Evelyn Bergsmann, University of Vienna, Vienna, Austria Ó 2015 Elsevier Ltd. All rights reserved.

Abstract Evaluation systematically investigates characteristics and merits of programs and can be done in all phases of a program. The purpose of evaluation is to provide information on the effectiveness of programs so as to optimize outcomes, quality, and efficiency. The article gives an introduction to program evaluation by considering the following four questions: What is program evaluation? At what levels can programs be evaluated? What standards can be set to enhance the quality and fairness of evaluation? How can benefits to users be achieved?

Evaluations are, in a broad sense, concerned with the effectiveness of programs. While common sense evaluation has a very long history, evaluation research, which relies on the scientific method, is a young discipline that has nevertheless grown massively in recent years. In the last few decades, program evaluation has made important contributions to various social domains, e.g., to education, health, economics, and environment. This article gives an introduction to program evaluation by considering the following four questions: What is program evaluation? At what levels can programs be evaluated? What standards can be set to enhance the quality and fairness of evaluation? How can benefits to users be achieved?

What Is Program Evaluation? Definition, Distinction from Other Activities, Users and Doers of Evaluations Evaluation, broadly construed, has a very long history. For example, in ancient Rome, tax policies were altered in response to observed fluctuations in revenues, and many thousands of years ago, hunters developed complicated procedures for fishing and trapping. In recent years, common sense program evaluation has evolved into evaluation research, which relies on scientific research methods. This article focuses on this last type of evaluation (for introductory texts on evaluation research see, e.g., Berk and Rossi, 1999; Fink, 2005; Mathison, 2005; Mertens and Wilson, 2012; Rossi et al., 2004; Stufflebeam and Shinkfield, 2007; Wholey et al., 2010). In what follows, the terms evaluation, program evaluation, and evaluation research are used synonymously. Concern for the first large-scale national evaluations of social programs in the USA first arose as part of the War on Poverty, and methods for monitoring programs are still evolving rapidly. Evaluation has proven its usefulness to such an extent that most federal and state agencies require evaluations of new programs. Programs are the objects of evaluation, sometimes also called projects or interventions. Program evaluation derives from the idea that social programs should have demonstrable merits and explicit aims by which success or failure may be empirically judged. For example, literacy programs for high school students should lead to measurable improvements in their reading skills. Introductory texts on evaluation describe programs as systematic activities that are provided on a continuing basis to achieve preplanned purposes e.g., a

International Encyclopedia of the Social & Behavioral Sciences, 2nd edition, Volume 19

national program to train teachers to work in rural and underserved areas, a campaign to vaccinate all children in a school district (e.g., Fink, 2005; Joint Committee on Standards for Educational Evaluation, 2011). Evaluation systematically investigates characteristics and merits of programs. It is the purpose of evaluation to provide information on the effectiveness of programs so as to optimize outcomes, quality, and efficiency. Program evaluations are conducted by applying many of the same methods that social researchers rely on to gather reliable and valid evidence. These include formulating the study questions, designing the evaluation, setting standards to assess the program impact, conducting ongoing monitoring of the program functioning, collecting and analyzing data, assessing program effectiveness relative to costs, and reporting results. For examples of evaluation in various fields of clinical and applied psychology see, e.g., Bickman et al. (1995), Gueron and Pauly (1991), Hope and Foster (1992), Maggin et al. (2011), Spiel et al. (2006), or Wong et al. (2008). While a few authors (e.g., Scriven, 2002) take the view that theories are a luxury for the evaluator and not essential to explanations, the majority of researchers call for injections of theory into every aspect of evaluation design and analysis (e.g., Berk and Rossi, 1999; Coryn et al., 2011; Jacobs et al., 2012; Mathison, 2005). This implies that program evaluation capitalizes on existing theory and empirical generalizations from the social sciences, and is, therefore, after all, applied research. By relying on the scientific method, program evaluation differs from activities such as newspaper reporting and social commentary. Program evaluation differs from the social sciences in its goals and audience. Evaluation goals are defined by the client, that is, the individual, group, or organization that commissions the evaluator(s), e.g., policy makers. However, often programs and clients do not have clear and consistent goals that can be evaluated for effectiveness. Therefore, to meet evaluation standards, program evaluation should be conducted by professional evaluators who are experts in the social sciences, using a broad repertoire of concepts and methods, but who also have expertise in project and staff management and have at least basic knowledge of the evaluated domain. While the impetus for evaluation rests in policy, information from evaluations are used by the stakeholders, all individuals, or groups that may be involved in or affected by a program evaluation. These are at least six groups (Fink, 2005): (1) federal, state, and local government, e.g., a city,

http://dx.doi.org/10.1016/B978-0-08-097086-8.22015-1

117

118

Program Evaluation

the National Institute of Health; (2) program developers, e.g., health care service providers, the curriculum department in a school; (3) communities, e.g., a geographically contiguous area, people with a shared health-related problem; (4) policy maker, e.g., a state subcommittee on human services; (5) program funders, e.g., the Department of Education; (6) social researchers, e.g., in government, public agencies, and universities. Therefore, program evaluation is also an interdisciplinary task where evaluators and stakeholders work together (Cousins and Chouinard, 2012).

much gain in monetary units can be expected due to reduced use of health services. The entire impact of a program e.g., the extent of its influence in other settings is very difficult, maybe even impossible, to assess. Here, impact evaluations are applied, that is, historical reviews of programs are performed after the programs have been in operation for some period of time. If a new medical education program was implemented, an improvement in the quality of the entire health system might be expected as an impact of the program. However, the evaluation of such an improvement might be extremely difficult.

Key Concepts and Assignment Criteria Evaluation can and should be done in all phases of a program. Baseline data collected before the start of the program are used to describe the current situation, e.g., the need for an intervention. In addition, baseline data are used to monitor and explain changes. For example, if the emergency medical services attempt to reorganize their service to better care for children, they may collect interim data to answer questions such as: In what cases did the emergency medical services fail to meet the needs of critically ill children? What are the differences between their needs and the needs of adults? What is the current performance of medical staff? Before the start of the program a prospective evaluation can be employed to determine the program’s potential for realization, its effectiveness, and its impact, that is, the scope of its effects. In formative evaluation, interim data are collected after the start of a program but before its conclusion. The purpose of formative evaluation is to describe the progress of the program and, if necessary, to modify and optimize the program design. A process evaluation is concerned with the extent to which planned activities are executed and therefore is nearly always useful. For example, process evaluation can accompany implementation of a community-oriented curriculum in medical schools. Outcome evaluation deals with the question of whether programs achieve their goals. However, as mentioned before, goals are often stated vaguely and broadly and therefore program effectiveness is difficult to evaluate. In practice, the concept of effectiveness must always address the issue ‘compared to what?’ Often, it is difficult to distinguish program effects from chance variation and from other forces affecting the outcome. Berk and Rossi (1999) distinguish between marginal effectiveness, relative effectiveness, and costeffectiveness. Marginal effectiveness concerns the dosage of an intervention measure. For example, to prove the assumption that a better student–teacher ratio would improve student performance, the classroom size has to be reduced. In this case, two or more different student–teacher ratios are the doses of intervention and their outcomes are compared. Relative effectiveness is evaluated by contrasting two and more programs, or a program and the absence of the program. For example, lecture-based learning can be contrasted with problem-based learning in its effects on students’ practical performance. Cost-effectiveness considers effectiveness in monetary units. Evaluation is usually concerned with the costs of programs and their relationship to effects and benefits. For example, in the evaluation of a physical education program, one can ask how

At What Levels Can Programs Be Evaluated? Programs can be evaluated not only in different phases but also on different levels. Kirkpatrick (1998; Kirkpatrick and Kirkpatrick, 2010; see also Spiel, et al., 2010a) defined four sequential levels; each level is important and has an impact on subsequent levels. The evaluation process becomes more difficult and time-consuming at higher evaluation levels, but it also provides more valuable information. In the following paragraphs, the four levels are briefly described using a parenting program as an example (see Spiel and Strohmeier, 2012).

Level 1 – Reactions Evaluation on this level measures how program participants react to the program. Therefore, on this level customer satisfaction is measured. For instance, when parents are asked how much they have liked attending a course on parenting styles, this kind of evaluation would tap the level of reactions.

Level 2 – Learning Learning can be defined as the extent to which participants change attitudes, improve knowledge, and/or increase skills and competencies as a result of attending the program. That means, for example, that this kind of evaluation would investigate whether parents have improved their competencies after attending a course on parenting styles. Such an evaluation could apply questionnaires or tests before and after the course.

Level 3 – Behavior Evaluation on this level investigates whether changes in behavior have occurred because of attending a program. Thus, the question of whether the learning can be transferred into everyday life is crucial for this kind of evaluation. When using the example of the parents attending a course on parenting styles, the question of whether they would apply what they have learned to their daily parenting behavior is the main goal of an evaluation study focusing on this level. Observations or interviews with the children might be the methods of choice in that specific case.

Level 4 – Results Here, the focus of the evaluation is on the final results occurring because of participation in the program. The final results can

Program Evaluation

include, for instance, increased production, improved quality, increased sales, decreased costs, or higher profit. Taking the parenting styles course as an example, an evaluation on this level would measure the adaptation outcomes of the children of participating parents in the long run. However, population and sample selection have to be carefully defined for such an evaluation.

What Standards Can Be Set to Enhance the Quality and Fairness of Evaluation? Because of the increasing number of evaluations and the high impact that decisions based on results of evaluations may have, there has been an ongoing discussion since the early 1970s about guidelines for effective evaluations. In the absence of any clear definition of what constitutes a reasonable program evaluation, the Joint Committee on Standards for Evaluation Research (JCSEE) in cooperation with evaluation and research specialists compiled knowledge about program evaluation and established a systematic process for developing, testing, and publishing evaluation standards. The Joint Committee began its work in 1975. In 1989, the progress for developing standards was adopted by the American National Standards Institute, and became available worldwide. Originally, the standards were developed to guide the design, implementation, and assessment of evaluations of educational programs, projects, and materials. Since then, the Program Evaluation Standards (JCSEE, 2011) have expanded to include applications of the standards in such other settings as medicine, nursing, the military, business, law, government, and social service agencies. The standards are published for people who commission and conduct evaluations as well as for people who use the results of evaluations. The Joint Committee defined an evaluation standard as a principle that is mutually agreed upon by people engaged in the professional practice of evaluation, that, if met, will enhance the quality and fairness of an evaluation (JCSEE, 2011). The standards provide advice on how to judge the adequacy of evaluation activities, but they do not present specific criteria for such judgments. The reason is that there is no such thing as a routine evaluation, and the standards encourage the use of a variety of evaluation methods. The 30 standards are organized around the five important attributes of sound and fair program evaluation: utility, feasibility, propriety, accuracy, and evaluation accountability (JCSEE, 2011). Utility standards guide evaluations so that they will be informative, timely, and influential. They require evaluators to keep the stakeholders’ needs in mind in all phases of the evaluation. The eight standards included in this category are Evaluator Credibility, Attention to Stakeholders, Negotiated Purposes, Explicit Values, Relevant Information, Meaningful Processes and Products, Timely and Appropriate Communication and Reporting, Concern for Consequences, and Influence. Feasibility Standards call for evaluations to be realistic, prudent, diplomatic, and economical. They recognize that evaluations usually are conducted in natural settings. The four standards included in this category are Project Management, Practical Procedures, Contextual Viability, and Resource Use.

119

Propriety Standards are intended to facilitate protection of the rights of individuals affected by an evaluation. They urge evaluators to act lawfully, scrupulously, and ethically. In this category, seven standards are included: Responsive and Inclusive Orientation, Formal Agreements, Human Rights and Respect, Clarity and Fairness, Transparency and Disclosure, Conflicts of Interests, and Fiscal Responsibility. Accuracy Standards are intended to ensure that an evaluation will reveal and transmit accurate information about the program’s merits. They determine whether an evaluation has produced sound information. The eight accuracy standards include Justified Conclusions and Decisions, Valid Information, Reliable Information, Explicit Program and Context Descriptions, Information Management, Sound Designs and Analyses, Explicit Evaluation Reasoning, and Communication and Reporting. Validity refers to the degree to which a piece of information or measure assesses what it claims to assess, while reliability refers to the degree to which a piece of information is free from ‘measurement error,’ that is, the exactness of the information. Evaluation Accountability Standards propose the need to investigate, document, and improve evaluation quality and accountability. The three standards included in this category are Evaluation Documentation, Internal Metaevaluation, and External Metaevaluation. Metaevaluation is the evaluation of an evaluation. Internal Metaevaluations are done by the commissioners of the evaluation, whereas External Metaevaluations are conducted by evaluators who are not involved in the evaluation at hand. For each standard, the Joint Committee has summarized guidelines for application and common errors. In addition, one or more illustrations of the standard’s application are given (JCSEE, 2011). However, the Joint Committee acknowledges that the standards are not all equally applicable in every evaluation. Professional evaluators must make a selection, identifying those that are most applicable in a particular context. Evaluation standards are also provided by other committees and associations, such as Guiding Principles for Evaluators (American Evaluation Association, 2004; for further information on different standards see Patton, 2012).

How Can Benefits to Users Be Achieved? Types of Evaluation Utilization The utility standards present different approaches to enhance the utility of an evaluation. But there is not just one point of view concerning evaluation utilization; different types of evaluation utilization have emerged in recent decades (Contandriopoulos and Brousselle, 2012; Fleischer and Christie, 2009). Rossi et al. (2004) mentioned three types of utilization of evaluations: instrumental, conceptual, and persuasive/symbolic (see also Balthasar, 2006; Johnson, 1998). Balthasar and Johnson added process use/process-related use, a type that was originally suggested by Patton (1997; see also Alkin, 2005; Weiss et al., 2005). Weiss et al. (2005) added the so-called imposed use to the four aforementioned types. As a matter of course, imposed use is not yet an established type of use. Table 1 presents descriptions of these five types of evaluation utilization (see also Bergsmann and Spiel, 2009).

120

Program Evaluation Types of evaluation utilization

Table 1

Types of evaluation utilization Instrumental use

Conceptual use

Symbolic use

Process use

Imposed use

User-Oriented Evaluations

Explanation Indicates “instances where respondents in the study could document the specific way in which the social science knowledge had been used for decision-making or problem-solving purpose” (Alkin, 2005: p. 435). Indicates “instances in which the knowledge about an issue influences a policy maker’s thinking although the information was not put to a specific, documentable use” (Alkin, 2005: p. 435). Indicates instances in which evaluations are used to “justify preexisting preferences and actions” (Weiss et al., 2005: p. 13). “The evaluation process itself could have impact; that is, it could lead to utilization . the act of conducting an evaluation and the interactions between the evaluator and various stakeholders can lead to changes in the individuals or in the organisation. Thus it is not the findings of the evaluation that contribute to use but the process itself” (Alkin, 2005: p. 435). “It is a type of use that comes about because of pressure from outside” (Weiss et al., 2005: p. 16).

Start with stakeholder analysis

3. Negoate a process to involve primary intended users in making evaluaon decisions

6. Make design methods and measurement decisions

10. Acvely involve users in interpreng findings

The future of evaluation is tied to the future effectiveness of programs (Patton, 1997). However, in order to improve programs, users’ perspectives have to be integrated into program evaluation. Moreover, according to Patton (2012), the specific use of an evaluation should be defined by the users. Consequently, user-oriented evaluation is recommended because it requires and promotes systematic and continual critical reflection and feedback. Participants have the opportunity to learn the logic of evaluation and the discipline of evaluation reasoning. Prominent evaluation concepts in this field are utilization-focused (Patton, 1997), empowerment (Fetterman and Wandersman, 2005), participatory (King and Cousins, 2007), and collaborative evaluations (Cousins and Shulha, 2008). Patton (1997; see also 2012) proposed the concept of utilization-focused evaluation. According to him, the evaluator and the users work together to reach the utilization defined by the users. Patton (2012) divides the evaluation into several steps (see Figure 1); each step should contribute to the specific and intended use. The first step is to start with stakeholder analysis: After identifying potential intended users of the evaluation, the primary intended users should be determined. They are then invited to clarify and prioritize the primary purposes and intended uses of the evaluation. Determining purposes and intended uses of the evaluation is a further crucial step in the utilization-focused evaluation process. This step, including the prioritization of the evaluation questions, is expatiated in

1. Idenfy interests and commitments of potenal users

4. Determine the primary purposes and intended uses of the evaluaon

7. Simulate use with fabricated potenal findings

11. Facilitate intended use by intended users

End by evaluang the evaluaon Figure 1

Concept of utilization-focused evaluation.

8. Collect data

2. Determine primary intended users

5. Focus Priorize evaluaon quesons and issues

9. Organize data to be understandable to users

12. Disseminate findings to potenal users

Program Evaluation

detail in the following section. The following steps of the utilization-focused evaluation process (Figure 1) are similar to scientific research. However, users should be actively involved in all steps. The final step of the process is a metaevaluation of the evaluation (see also JCSEE, 2011). Empowerment evaluation is another model that aims at improving the evaluated program, mostly through the development of the skills and autonomy of selected participants (Fetterman, 2005; Fetterman and Wandersman, 2005). It is generally conducted with a pool of program staff members and stakeholders who will be actively involved in identifying evaluation questions, collecting and analyzing data, and formulating next steps.

Goal Explication as an Important Prerequisite for User Orientation Evaluation goals, including the intended uses, are defined by the clients who commission the evaluation. However, clients often do not have clear and consistent goals – either for their programs or for the evaluation – that can be evaluated for effectiveness. Consequently, they have not developed a kind of working model that relates actions within the program to the program goals. In such cases, professional evaluators should systematically work with the clients and representatives of all stakeholder groups to answer the following central questions (Spiel et al., 2010a; see also Spiel et al., 2012): (1) What are the goals of the program? (2) How should these goals be achieved? (3) How could program effectiveness (goal attainment) be recognized? (4) Are the needed resources available? A workshop on goal explication is commonly organized to answer these four questions (see Spiel et al., 2010b). Using a wide range of facilitation techniques, the clients and stakeholders are supported in precisely describing their evaluation goals and prioritizing them (Question 1). Very often, the congruence between goals and activities is rather weak. Evaluators help the clients identify how the intended activities might work and how they can contribute to reaching the intended goals (Question 2). This also includes the systematic discussion of whether the resources needed (money and staff, but also the stakeholders’ willingness to participate in the program) are available (Question 4). Answering the third question is typically very difficult for the clients of evaluation. They have to identify measureable indicators of effectiveness, labeled ‘operationalization’ in scientific research (Vogt and Johnson, 2011). Consequently, answering the third question also includes defining the goals of the evaluation. Again, evaluators use facilitation techniques to support the process of identifying the indicators for effectiveness. If possible, criteria for effectiveness are also defined (Fink, 2005). Applying such a procedure should also contribute to establishing an ‘evaluative attitude’ in the clients and stakeholders in the long run.

Concluding Remarks In the last few decades, the demands for sound program evaluation have continued to grow in such a way that some authors (e.g., Pawson and Tilley even in 1997) fear it may become mandatory that everything be evaluated. Therefore, it is

121

essential that quality control procedures and standards are introduced for all facets and tasks of evaluation and that the evaluation itself is evaluated against pertinent standards. Furthermore, the user orientation should be systematically considered. Several authors have made recommendations on how to recognize and consider the users’ needs. However, there is also the risk of oversimplifying and only using a checklist of standards. But there is no fixed recipe of how to run a successful evaluation.

See also: Alcohol Interventions: Disease Models vs. Harm Reduction; Economic Evaluations and Electoral Consequences; Educational Evaluation: Overview; Empowerment Evaluation; Implementation Science; Performance Appraisal and Evaluation; Social Experiments; Social Psychology: Research Methods.

Bibliography Alkin, M., 2005. Utilization of evaluation. In: Mathison, S. (Ed.), Encyclopedia of Evaluation. Sage, Thousand Oaks, CA, pp. 434–436. American Evaluation Association, 2004. Guiding Principles for Evaluators. American Evaluation Association, Fairhaven, MA. Balthasar, A., 2006. The effects of the institutional design on the utilization of evaluation: evidenced using qualitative comparative analysis (qca). The International Journal of Theory, Research and Practice 12 (3), 1–16. Berk, R.A., Rossi, P.H., 1999. Thinking about Program Evaluation. Sage, Thousand Oaks, CA. Bergsmann, E., Spiel, C., 2009. Context, standards and utilisation of evaluation in Austrian federal ministries. In: Fouquet, A., Méasson, L. (Eds.), L’évaluation des politiques publiques en Europe : Cultures et futurs/Policy and Programme Evaluation in Europe: Cultures and Prospects. L’Harmattan, Grenoble, pp. 187–198. Bickman, L., Guthrie, P.R., Foster, M., Lambert, W.E., Summerfelt, T., Breda, C., Heflinger, C., 1995. Managed Care in Mental Health: The Fort Bragg Experiment. Plenum, New York. Contandriopoulos, D., Brousselle, A., 2012. Evaluation models and evaluation use. Evaluation 18 (1), 61–77. Coryn, C.L.S., Noakes, L.A., Westine, C.D., Schroter, D.C., 2011. A systematic review of theory-driven evaluation practice from 1990 to 2009. American Journal of Evaluation 32 (2), 199–226. Cousins, B., Chouinard, J.A., 2012. Participatory Evaluation Up Close: An Integration of Research-based Knowledge. Information Age Publishing, Charlotte, NC. Cousins, B., Shulha, L., 2008. Complexities in setting program standards in collaborative evaluation. In: Smith, N., Brandon, P. (Eds.), Fundamental Issues in Evaluation. Guilford, New York, pp. 139–158. Fetterman, D.M., 2005. Empowerment evaluation. In: Mathison, S. (Ed.), Encyclopedia of Evaluation. Sage, Thousand Oaks, CA. Fetterman, D.M., Wandersman, A., 2005. Empowerment Evaluation: Principles in Practice. Guilford Press, New York. Fink, A., 2005. Evaluation Fundamentals: Insights into Outcome, Effectiveness and Quality of Health Programs. Sage, Thousand Oaks, CA. Fleischer, D.N., Christie, C.A., 2009. Evaluation use: results from a survey of U. S. American evaluation association members. American Journal of Evaluation 30 (2), 158–175. Gueron, J., Pauly, E., 1991. From Welfare to Work. Russell Sage, New York. Hope, T., Foster, J., 1992. Conflicting forces: changing the dynamics of crime and community on ‘problem’ estate. British Journal of Criminology 32 (4), 488–504. Jacobs, W.J., Sisco, M., Hill, D., Malter, F., Figuerodo, A.J., 2012. Evaluation theorybased evaluation: information, norms, and adherence. Evaluation and Program Planning 35 (3), 354–369. Johnson, R.B., 1998. Toward a theoretical model of evaluation utilization. Evaluation and Program Planning 21 (1), 93–110. JCSEE – Joint Committee on Standards for Educational Evaluation, 2011. The Program Evaluation Standards, third ed. Sage, Thousand Oaks, CA. King, J.A., Cousins, J.B., 2007. Making sense of participatory evaluation: framing participatory evaluation. New Directions for Evaluation 2007 (114), 83–105.

122

Program Evaluation

Kirkpatrick, D., 1998. Evaluating Training Programs, second ed. Berrett-Koehler, San Francisco, CA. Kirkpatrick, D.L., Kirkpatrick, J.D., 2010. Evaluating Training Programs: The Four Levels, third ed. Berrett-Koehler, San Francisco, CA. Mathison, S. (Ed.), 2005. Encyclopedia of Evaluation. Sage, Thousand Oaks, CA. Maggin, D.M., Chafouleas, S.M., Goddard, K.M., Johnson, A.H., 2011. A systematic evaluation of token economies as a classroom management tool for students with challenging behavior. Journal of School Psychology 49 (5), 529–554. Mertens, D.M., Wilson, A.T., 2012. Program Evaluation Theory and Practice: A Comprehensive Guide. Guildford Press, New York. Patton, M.Q., 1997. Utilization-focused Evaluation: The New Century Text, third ed. Sage, Thousand Oaks, CA. Patton, M.Q., 2012. Essentials of Utilization-focused Evaluation. Sage, Thousand Oaks, CA. Pawson, R., Tilley, N., 1997. Realistic Evaluation. Sage, London. Rossi, P.H., Lipsey, M.W., Freeman, H.E., 2004. Evaluation: A Systematic Approach, seventh ed. Sage, Thousand Oaks, CA. Scriven, M., 2002. Evaluation Thesaurus, fourth ed. Sage, Newbury Park, CA. Spiel, C., Gradinger, P., Lüftenegger, M., 2010a. Grundlagen der Evaluationsforschung [Evaluation research basics]. In: Holling, H., Schmitz, B. (Eds.), Handbuch Statistik, Methoden und Evaluation [Handbook Statistics, Methods, and Evaluation]. Hogrefe, Göttingen, pp. 223–232. Spiel, C., Lüftenegger, M., Gradinger, P., Reimann, R., 2010b. Zielexplikation und Standards in der Evaluationsforschung [Goal explication and standards in evaluation research]. In: Holling, H., Schmitz, B. (Eds.), Handbuch Statistik, Methoden und Evaluation [Handbook Statistics, Methods, and Evaluation]. Hogrefe, Göttingen, pp. 252–260. Spiel, C., Schober, B., Reimann, R., 2006. Evaluation of curricula in higher education: challenges for evaluators. Evaluation Review 30 (4), 430–450. Spiel, C., Schober, B., Reimann, R., 2012. Modelling and measurement of competencies in higher education: the contribution of scientific evaluation. In: Zlatkin-

Troitschanskaia, O., Blömeke, S. (Eds.), Modeling and Measurement of Competencies in Higher Education. SensePublishers, Rotterdam, pp. 183–194. Spiel, C., Strohmeier, D., 2012. Evidence-based practice and policy: when researchers, policy makers, and practitioners learn how to work together. European Journal of Developmental Psychology 9 (1), 150–162. Stufflebeam, D.L., Shinkfield, A.J., 2007. Evaluation Theory, Models, & Applications. Jossey-Bass, San Francisco, CA. Vogt, W.P., Johnson, R.B., 2011. Dictionary of Statistics & Methodology: A Nontechnical Guide for the Social Sciences. Sage, Thousand Oaks, CA. Weiss, C., Murphy-Graham, E., Birkeland, S., 2005. An alternate route to policy influence: how evaluations affect D.A.R.E. American Journal of Evaluation 26 (1), 12–30. Wholey, J.S., Hatry, H.P., Newcomer, K.E., 2010. Handbook of Practical Program Evaluation, third ed. Wiley, San Francisco, CA. Wong, V.C., Cook, T.D., Barnett, W.S., Jung, K., 2008. An effectiveness-based evaluation of five state pre-kindergarten programs. Journal of Policy Analysis and Management 72 (1), 122–154.

Relevant Websites http://www.eval.org/ – American Evaluation Association (AEA). https://netforum.avectra.com/eWeb/StartPage.aspx?Site¼APPAM&WebCode¼HomePage – Association for Public Policy and Measurement (APPAM). http://www.degeval.de/ – DeGEval – Gesellschaft für Evaluation. http://www.europeanevaluation.org/ – European Evaluation Society (EES). http://www.europeanevaluation.org/community/nese.htm – Network of Evaluation Societies in Europe (NESE). http://www.evaluation.org.uk/ – UK Evaluation Society (UKES).