Evaluation and Program Planning 35 (2012) 439–444
Contents lists available at SciVerse ScienceDirect
Evaluation and Program Planning journal homepage: www.elsevier.com/locate/evalprogplan
Essential competencies for program evaluators in a diverse cultural context Yi-Fang Lee a,1, James W. Altschuld b,*, Lung-Sheng Lee c,2 a
National Taiwan Normal University, 162 HePing East Road Section 1, Taipei 10610, Taiwan The Ohio State University, 3253 Newgate Court, Dublin, OH 43017, United States c National United University, 1 Lienda, Miaoli 36003, Taiwan b
A R T I C L E I N F O
A B S T R A C T
Article history: Received 21 February 2011 Received in revised form 10 July 2011 Accepted 21 January 2012 Available online 8 February 2012
Essential evaluator competencies as identified by Stevahn, King, Ghere, and Minnema (2005) were studied in regard to how well they generalize to an Asian (Taiwan) context. A fuzzy Delphi survey with two iterations was used to collect data from 12 experts. While most competencies fit Taiwan, there were a number of unique ones. A complete set of results is provided along with the implications of the findings and what they might mean for evaluation in Taiwan particularly in relationship to the professionalization of evaluation. ß 2012 Elsevier Ltd. All rights reserved.
Keywords: Cultural context Evaluator competencies Fuzzy Delphi study Professional development
1. Introduction
2. Theoretical review
It is accepted that a set of core competencies is a sine qua non condition of a profession (Stevahn, King, Ghere, & Minnema, 2006; Worthen, 1999). What are the core competencies of evaluation and where are evaluators in thinking about them and the training needed for quality assessments of programs and projects? A variety of skills, knowledge and dispositions have been proposed for evaluators (Anderson & Ball, 1978; Dewey, Montrosse, Schroter, Sullins, & Mattox, 2008; Sanders & Worthen, 1970; Treasury Board of Canada Secretariat, 2001; Worthen, 1975) although consensus regarding what is required has not yet been reached. King and her colleagues (King et al., 2001; Stevahn et al., 2005, 2006) identified what they termed the essential competencies for program evaluators (ECPE). That ground breaking work would be especially valuable if it generalized to a context quite different from the one in which it was initially developed. To that end, the ECPE was studied in Taiwan through the use of a fuzzy Delphi survey. The major questions were:
Competence signifies some level of expertise with the multifaceted abilities that a person needs to be successful in a field (Stevahn et al., 2005). These are complex action systems of knowledge, skills and strategies that can be applied to work within a profession in conjunction with proper emotions, attitudes, and effective selfregulation (Rychen, 2001). Competencies guide training programs, enhance reflective practice, and drive credentialing, licensure, or program accreditation procedures (Altschuld, 1999; Ghere, King, Stevahn, & Minnema, 2006; Stevahn et al., 2005). Recently evaluation has directed attention to core competencies to improve its status as a profession or ‘‘near-profession’’ (Worthen, 1999). Smith (1999, 2003) concluded that although the theory and practice of evaluation continued to evolve there were persistent schisms within it. One is an epistemological divide between constructivists (Lincoln, 1994) and positivists (Shadish, Cook, & Campbell, 2002) although this may be diminishing with time. Another is whether an evaluator should be a program advocate, who assists programs (Fetterman, 2001) or an independent assessor, who values the importance of objectivity (Scriven, 1997). Despite such differences, other writers suggest that the field has attained legitimacy with a recognized body of knowledge, methods of inquiry, and procedures (Altschuld, 2005; Davis, 1986; Gussman, 2005). Two evaluation association-based projects (the Canadian Evaluation Society, CES, and the American Evaluation Association, AEA) focused on developing inventories of evaluator competencies. The Canadian project was led by Zorzi, McGuire and Perrin (McGuire & Zorzi, 2005; Zorzi, McGuire, & Perrin, 2002; Zorzi, Perrin, McGuire, Long, & Lee, 2002) and for AEA it was
(1) Does the ECPE framework fit the context of an Asian country? (2) Are there unique competencies in Taiwan? (3) If so, what factors contribute to them?
* Corresponding author. Tel.: +1 614 389 4585; fax: +1 614 688 3258. E-mail addresses:
[email protected] (Y.-F. Lee),
[email protected] (J.W. Altschuld),
[email protected] (L.-S. Lee). 1 Tel.: +886 2 7734 3388; fax: +886 2 2392 9449. 2 Tel.: +886 37 381871; fax: +886 37 320610. 0149-7189/$ – see front matter ß 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.evalprogplan.2012.01.005
440
Y.-F. Lee et al. / Evaluation and Program Planning 35 (2012) 439–444
conducted by King, Stevahn, Ghere and Minema (Ghere et al., 2006; King et al., 2001; Stevahn et al., 2005, 2006). While the goals of the two efforts were similar (producing lists of competencies applicable in a wide range of settings) the approaches were somewhat different. The CES model included not only competencies but types of evaluations and phases of evaluation which led to some overlap or repeating of competencies (Huse & McDavid, 2006). ECPE, on the other hand, linked or cross-walked several sources of competencies (CES Essential Skills Series in Evaluation, 1991, AEA Guiding Principles for Evaluators, 1995 and Joint Committee on Standards for Educational Evaluation, 1994). Due to that feature we chose ECPE as a framework for our study. ECPE, based upon a review of the literature, consisted of an initial list of competencies which was then validated for appropriateness (King et al., 2001). Feedback was collected from conference presentations, expert consultations, and comparisons to existing sources in order to be as comprehensive as possible (Stevahn et al., 2005). The final product is a user-friendly taxonomy of 61 competencies in six categories: (a) professional practice (professional norms and values); (b) systematic inquiry (technical aspects of evaluations); (c) situational analysis (analyzing and attending to the contextual and political issues related to the evaluation); (d) project management (the nuts-and-bolts of managing an evaluation); (e) reflective practice (understanding one’s practice and level of evaluation expertise); (f) interpersonal competence (skills needed for implementing an evaluation). The six categories and the skills within them were a springboard for engaging evaluators in self-analysis and group discussion on an array of issues associated with practice. Stevahn et al. (2005) stressed that systematic studies are needed to achieve agreement on the importance of the essential competencies. We explored the applicability of ECPE to an Asian context distinct from the one in which it was developed. Chouinard and Cousins (2009) proposed that cultural differences influence evaluation methodology and methods selection, intergroup dynamics, cross-cultural understanding, and evaluator roles and provide insight into value perspectives and conflicts (SenGupta, Hopson, & Thompson-Robinson, 2004). Lee’s study (1997) noted contrasts between Eastern and Western views. The East, to a notable degree, sees interpersonal relationships, cooperation, and authoritarian orientation as more meaningful when compared to the West focus on self-development, self-fulfillment, competition, and democratic decision making. Conceptual views have an effect on evaluation practice and what it means to be a qualified evaluator; as an example the East’s greater emphasis on relationships could lead to less willingness to deal with or report negative results. 3. Description of research context The focus of this study was perceptions of evaluators involved in the Program Accreditation administered by the Higher Education Evaluation and Accreditation Council (HEEAC) of Taiwan. The missions of the HEEAC are to: develop indicators and mechanisms for university evaluation; conduct higher education evaluation and accreditation commissioned by the Ministry of Education (MOE); advise the MOE on policy based on evaluation reports. Predicated on accreditation results, all programs will be classified as accredited, conditionally accredited, or as a failure. Such designations are important and affect funding allocations and the size of student enrollments. Accreditation began in 2006, in response to a national law that required all programs be evaluated once every 5 years. The firstcycle evaluation (from 2006 to 2010) stressed input and process evaluations with the emphasis shifting to process and outcome in
the second cycle (from 2011 to 2015). Procedures include a selfevaluation and a 2 day on-site visit. The HEEAC invites 4–6 professors or experts in related fields to serve as evaluators for each site visit. They attend an orientation and a short training workshop held by the HEEAC. Activities during the site visit consist of interviews with stakeholders (faculty, staff, students, graduates, etc.), document review, survey administration, and classroom observation. Since the visits were to be done in a competent manner there was a need to identify essential competencies for sound evaluation practice. This is especially relevant in Taiwan where there has been progress toward professionalization of evaluation, but at the same time, shortages of experienced evaluators are apparent, stable career opportunities in the field are lacking, and formal preparation programs need to be developed (Lee, Altschuld, & Hung, 2008). Evaluation has been influenced by the US (Lee et al., 2008) yet cultural forces shape what an evaluator does and what constitutes meaningful evaluation work. These were major considerations in this study.
4. Methodology The main goal was to understand if a western model of essential competencies for evaluators would be viewed as compatible to an Asian setting. To what degree does it or does not it fit and if there are unique skills in Taiwan what are the reasons for that? A secondary issue was to pinpoint needs for training evaluators. A Delphi technique with two iterations was utilized. Its basic feature is sequential questionnaires interspersed with summary and feedback of opinions derived from previous responses (Dalkey & Helmer, 1963; Hung, Altschuld, & Lee, 2008; Linstone & Turoff, 1975; Powell, 2003). The researchers generated a nationwide list of 12 panelists using the criteria of acknowledged expertise and 10 years of evaluation experience. A letter was sent inviting them to join the panel and all agreed to serve on two rounds of the survey. They averaged 17 years of teaching evaluation, doing relevant research, and conducting evaluations. They also had extensive service with prior evaluations of higher education as well as having been involved in numerous evaluation endeavors. 4.1. First survey The first survey was developed from the Stevahn et al. framework—61 competencies in six categories. The quality of the Mandarin translation was reviewed by several faculty members and minor modifications were made in accord with their suggestions. Then the panelists judged the fit of each item to Taiwan in terms of whether it should be retained, removed, or modified. For items to be removed or modified, the panel provided concrete suggestions for refinement. One open-ended question was at the end of survey for other comments. All panelists completed the instrument. Nine items were retained and 45 were modified with changes noted in the text (see Table 1). It must be emphasized that most modifications were considered to be minor and straightforward (language) with little or no alteration to the main concept/content of the item. A few needed more than fine-tuning. As an example, the item using the phrase ‘specify program theory’ where respondents noted that program theory would not have meaning in Taiwan; thus words related to program logic models and assumptions underlying the effectiveness of a program were used. Another case was the wording ‘writes formal agreements’ which would not be commonly understood, so it was changed to ‘write evaluation proposal.’ In addition, 10 new items were generated and eight deleted or merged based on panel feedback (Table 2). Many of the
Y.-F. Lee et al. / Evaluation and Program Planning 35 (2012) 439–444
441
Table 1 Retention removal, modification, or addition of essential competency items for program evaluators in Taiwan from the 1st survey. Category
Stevahn et al. (2005)
# of items in 1st survey
# of items
Retention
Removal
Modification
Professional practice Systematic inquiry Situational analysis Project management Reflective practice Interpersonal competence
6 20 12 12 5 6
0 4 2 3 0 0
2 2 2 2 0 0
4 14 8 7 5 6
Addition 7 0 0 1 0 2
Total 11 18 10 11 5 8
Total
61
9
8
45
10
63
Table 2 Changes made to the first survey for the second iteration. Category
Removals/mergers
Additions a
Professional practice
Conveys personal evaluation approaches and skills to potential clients Contributes to the knowledge base of evaluationa
Systematic inquiry
Assesses validity and reliability of data (merged two items together) Analyzes & interprets data\(merged from Interprets data and Analyze data)
Situational analysis
Remains open to input from others (moved to the 1st category)a Modifies the study as neededa
Project management
Responds to requests for proposalsa Budgets an evaluation and justifies cost needed(merged two items together)
Interpersonal competence a
Prepares well prior to the evaluation process Ensures the confidentiality of information Respects and follows the evaluation process Avoids possible inside benefit Remains open to input from others (moved from another category) Recognizes and responds to policies related to evaluation practice Identifies and supports the purposes of evaluation tasks
Assesses the cost and benefit for the project
Conducts interviewing task Considers peer relationship
Item has been removed.
adjustments arose from cultural considerations (see later discussion). The result was a 63 item survey for the second round. 4.2. Second survey The second survey had two scales per item—importance and current level of competency. Each competency was rated as to how important it was for a qualified evaluator in Taiwan to have and the current competency level for evaluators (the experts on the panel were instructed to answer the latter based upon understanding of all evaluators in the country not on their own level of skill). A fuzzy Delphi technique was used which has a scale range instead of a single score to represent a rating. The fuzzy approach has the potential to deal with uncertainty/ambiguity in ratings as is often the case in social science (Ross, 2004). It might be more realistic in terms of how respondents think when providing a score for an item [(Fig._1)TD$IG]rather than making a single point value judgment. Some other
benefits of the scale are novelty and improving response rate, since respondents can more honestly make their ratings (Chang & Wang, 2006; Kaufmann & Gupta, 1988). Fig. 1 is an example of fuzzy Delphi scales employed in the second survey. Panelists marked a range from 0 on the low side to 1 on the high end in .1 increments for importance (what should be) and competency (what is). Complex mathematical calculation is required to generate fuzzy number results (detailed information is in an in-process paper). Each item yields the following summary results:
mR the highest level of the panel’s judgment mL the lowest mT a single score for an overall group judgment The distance between mR and mL is an indication of spread and only mT values higher than a specified criterion indicate essential
Fig. 1. Fuzzy Delphi scales used in the second survey.
Y.-F. Lee et al. / Evaluation and Program Planning 35 (2012) 439–444
442
Table 3 Number of items in competency categories and those exceeding the criterion for importance per category. Category
Number of items
Number of items exceeding .675
Professional practice Systematic inquirya Situational analysis Project managementa Reflective practice Interpersonal competence
11 18 10 11 5 8
8 11 4 5 4 6
Total
63
38
a
Some respondents gave a score of 0 for the lower limit of an item which made their rating unusable for fuzzy numbers where calculations come from geometric means. These scores were excluded and thus computations were from a lesser number of respondents.
5.2. Comparison of importance and current competency levels Similar to importance, a narrow range of mT values for competence per category was observed, .59–.62 (Table 4). Evaluators on the whole were more competent in interpersonal competence and less so in project management and situational analysis. Among the individual items in the framework, evaluators were equipped mostly with the competencies of ‘Aware of self as an evaluator’ (mT value for current competency level = .67), ‘Acts ethically and strives for integrity and honesty in conducting evaluations’ (.66), and ‘Communicates with clients throughout the evaluation process’ (.66). While mT ratings for importance items tended to be higher than for current competency the discrepancy for categories was small ranging from .07 to .1. 5.3. Needs index values for competencies
competencies in terms of importance. It is also possible to derive need indices from the gap between mT of importance and mT of competence and the mT scores for an entire category can be estimated. The rationale for needs identification was based on the gap between ‘‘what is,’’ or the current state of affairs, and ‘‘what should be,’’ or the desired state (Altschuld & Witkin, 2000). All 12 panelists responded to the second round. A criterion .675 for importance was determined by averaging group perceptions of what the standard should be for high importance areas. 5. Results 5.1. Judgment for essential competencies in the setting Thirty-eight out of 63 items had mT values for importance at or surpassing the criterion with the highest percentage (80%) in the reflective practice category followed closely by interpersonal competency (75%) as noted in Table 3. The highest competencies for importance across categories were ‘Avoids possible inside benefit’ (mT = .77), ‘Assesses validity and reliability of data’ (.76), ‘Develops recommendations’ (.75), ‘Prepares well prior to the evaluation process’ (.75), and ‘Acts ethically and strives for integrity and honesty in conducting evaluations’ (.75). Five items exceeding the criterion for importance were new ones different from Stevahn et al.’s framework. They are ‘Prepares well prior to the evaluation process’, ‘Ensures the confidentiality of information’, ‘Respects and follows the evaluation process’, ‘Avoids possible inside benefit’, and ‘Conducts interviewing task’. All six of the categories attained mT values for importance across their item sets of greater than .66, and four of them surpassed the .675 criterion (Table 4). Because all of the categories were preselected, such patterns were not surprising. ‘Professional practice’ was considered the most important, and the least were ‘Project management’ and ‘Situational analysis.’ Generally, the results were in a restricted range of mT values from .66 to .70. Table 4 Fuzzy number values for importance and current competency per category. Category
Professional practice Systematic inquirya Situational analysis Project managementa Reflective practice Interpersonal competence
Importance
Current competency
mR
mL
mT
mR
mL
mT
.82 .80 .79 .78 .81 .81
.43 .43 .47 .46 .51 .42
.70 .69 .66 .66 .68 .69
.74 .74 .73 .73 .75 .76
.53 .54 .56 .55 .57 .52
.60 .60 .59 .59 .61 .62
a Some respondents gave a score of 0 for the lower limit of an item which made their rating unusable for fuzzy numbers. These scores were excluded from the analysis.
Means difference analysis (MDA) is a commonly accepted procedure for studying discrepancies and it was assumed to be a reasonable index for fuzzy scores. A mean of the importance and competence ratings across all items in a category was computed and then the gap between them became the standard for looking at discrepancies of individual items. If the item gap was higher than the standard, the item was a need. Thirty-four items were greater than the standard but the highest MDA value was only .19, suggesting that the needs for the competences were not especially large (Table 5). The competencies with higher needs indices than others were ‘Prepares well prior to the evaluation process’, ‘Assesses validity and reliability of data’, and ‘Reflects on personal evaluation practice’. 6. Discussion 6.1. Generalization of competencies to Taiwan From the first survey, all panel members agreed on the approach of Stevahn et al. in relation to the clustered competencies; with mostly small changes needed for the Taiwanese context. A majority of items in each category were retained with minor modifications in wording for the second instrument and on the second round they were judged high in terms of importance. Based upon the data, Stevahn et al.’s framework worked relatively well although contextual impacts were identified. 6.2. Contextual influences in the competencies A few competencies were different for evaluators in Taiwan such as skills in program management—‘Writes formal agreements’, ‘Budgets an evaluation’, and ‘Trains others involved in conducting the evaluation’. Despite the fact that the panel was expert in evaluation, the HEEAC context probably influenced responses. The tasks noted above were done by a HEEAC team not by evaluators and in Taiwan a large number of evaluators have quite variable knowledge/experience in program management. Similar observations were seen in the following competencies that are normally completed before an evaluation begins or early in the process—‘Conducts literature review’, ‘Frames evaluation questions’, and ‘Develops evaluation designs’. Evaluators seldom or less than fully deal with these activities so their judgments of importance in the second survey tended to be on the lower end of the spectrum (mT values less than .60). In regard to ‘Systematic inquiry’ skills, respondents commented that a detailed list of such competencies was not germane since the career of evaluation was not well developed in the country. Most faculty, even those more specialized in evaluation, work on it only part-time. This was also
Y.-F. Lee et al. / Evaluation and Program Planning 35 (2012) 439–444
443
Table 5 Examples of items with higher needs indices per category. Category
Examples of items exceeding the cutoff scorea
MDAb
Professional practice
Prepares well prior to the evaluation process Ensures the confidentiality of information Avoids possible inside benefit
.19 .15 .13
Systematic inquiry
Assesses validity and reliability of data Analyzes and interprets data Identifies data sources
.17 .14 .12
Situational analysis
Serves the information needs of intended users Respects the uniqueness of the evaluation site and client Addresses conflicts
.14 .13 .11
Project management
Trains others involved in conducting the evaluation Budgets an evaluation and justifies cost needed Assesses the cost and benefit for the project
.14 .14 .10
Reflective practice
Reflects on personal evaluation practice Pursues professional development in evaluation Pursues professional development in relevant content areas
.16 .11 .10
Interpersonal competence
Uses written communication skills Conducts interviewing task Uses negotiation skills
.11 .10 .10
a b
The overall importance and satisfaction difference per category. The top three discrepancy that exceeded the cutoff score per category. The cutoff scores were .1, .09, .07, .07, .07, and .07 for the six categories, respectively.
apparent in items dealing with reliability and validity. These topics were combined on the second survey as were analyzes and interprets data (Table 2). Two items differing from Stevahn et al.’s framework related to contextual influences. One was ‘Conducts interviewing task’. In Taiwan, all evaluators were required to conduct interviews with faculty, students and staff in on-site visits; thus the ability was viewed important. The other was the removal of ‘Contributes to the knowledge base of evaluation’ where the panel commented that they contributed more to enhancing program quality than knowledge enrichment. 6.3. The impact of culture on responses In Table 2, several items revealed the top-down or authoritarian environment. This was obvious by the items that were added to the survey ‘Respects and follows the evaluation process’, ‘Recognizes and responds to policies related to evaluation practice’, and ‘Identifies and supports the purposes of evaluation tasks’ and those that were removed, ‘Modified the study as needed’ and ‘Conveys personal evaluation approaches and skills to potential clients’. Some respondents mentioned that because program accreditation was a government-mandated process, evaluators were expected to adhere to (as opposed to ‘follow’ in the American context) HEEAC procedures designed to ensure consistency and fairness. In Taiwan it was critical that evaluators assess outcomes related to policies, obviously accreditation was a means to carry out government policy. Another cultural dimension was apparent in two new items— ‘Avoids possible inside benefit’ and ‘Ensures the confidentiality of information’ in professional practice. Due to the relatively small size of the country, evaluators and faculty members in the program being evaluated might know each other, even to the extent of longterm friendships. It was observed (anecdotally) that a few evaluators leaked preliminary results/inside information that undermined the fairness of the evaluation. Hence the concern noted above. On the other hand, these were delicate interpersonal connections intricately interwoven into the fabric of a society and rigid or harsh rules might not fully work. Instrument reviewers were keenly aware of this leading to the item about ‘considers peer relationship’.
6.4. The usage of fuzzy numbers (potential biases in results) The idea of using a fuzzy scale to capture the subtle nature of responses appeared to work well in the study. All respondents made judgments in a range of score values and discrepancies between the upper and lower ends were achieved (Table 4). The approach seemed utilitarian, but a few problems occurred in data analysis. First, it relies heavily on complex mathematic models and there are multiple ways to deal with the data (Chen & Hwang, 1992). The technique, which is mainly employed in fields like engineering and computer science, has had limited application and research in social science and education. Second, this was a needs assessment where research was even in less abundance and it was more of a preliminary or exploratory study. Given these factors, a straight forward way to calculate scores based on Chen and Hwang was used. Does it really fit this situation? Is it appropriate? Are there problems in the calculations? What about the validity and reliability of collecting data with fuzzy scales? These questions underscore the need for further investigation. Another issue is that some respondents gave a score of 0 for the lower limit of an item which made their rating unsuitable for fuzzy numbers. The calculations are based on geometric means so in the future such responses will have to be eliminated through better directions to respondents. 6.5. No competencies obtained high need index values As mentioned previously, a secondary interest of the study was to begin to identify training needs for evaluators. In Table 5, even though some competencies had higher need indices than others, they were not very pronounced and pressing needs for improvement were not clear. This might stem from the HEEAC’s requirement for all evaluators to complete short-term training on evaluation knowledge, skills, and ethics prior to the evaluation. Moreover, there may be errors inherent in mR and mL values for the needs index due to how they were calculated and the small number of panel members. 7. Lessons learned Our study to a high degree validated Stevahn et al.’s taxonomy of evaluator competencies in a context quite different from the one
444
Y.-F. Lee et al. / Evaluation and Program Planning 35 (2012) 439–444
in which it was developed. It fit Taiwan with the exception of some unique competencies. Additional items (like ‘Avoids possible inside benefit’ and ‘Ensures the confidentiality of information’) were highly associated with the norms and values of the national culture. Thus the assessment of the core body of skills, knowledge and dispositions for evaluators was improved by taking into consideration subtle aspects of the setting. Another relevant factor affecting how competencies are viewed might be the level of the evaluation profession in the country. Compared with the West, evaluation in Taiwan is in a premature state–unstable career opportunities, limited need for evaluation specialists, and less formal preparation for evaluators. Under these conditions, detailed descriptions of competencies might not have mattered that much and that is why some items had to be merged or were rated low in importance. Lastly, this was an exploratory investigation and more validation is needed. Would other ways of calculating fuzzy numbers produce different results? Will the same results occur with a larger sample? Would there be similar outcomes in other Asian countries? If not, what cultural factors influence the results— we just do not know. To answer a few of these questions, the researchers are conducting a follow-up study to contrast fuzzy and Likert scales as well as to replicate the findings with a much bigger group of respondents. Acknowledgement This study was supported under a National Science Council Research Grant in Taiwan (NSC 98-2410-H-260-034). References Altschuld, J. W. (1999). The certification of evaluators: Highlights from a report submitted to the Board of Directors of the American Evaluation Association. American Journal of Evaluation, 20(3), 481–493. Altschuld, J. W. (2005). Certification, credentialing, licensure, competencies, and the like: Issues confronting the field of evaluation. Canadian Journal of Program Evaluation, 20(2), 157–168. Altschuld, J. W., & Witkin, B. R. (2000). From needs assessment to action. Thousand Oaks: SAGE Publications. Anderson, S. B., & Ball, S. (1978). The profession and practice of program evaluators. San Francisco, CA: Jossey-Bass. Chang, P. C., & Wang, Y. W. (2006). Fuzzy Delphi and back-propagation model for sales forecasting in PCB industry. Expert Systems with Applications, 30(4), 715–726. Chen, S. J., & Hwang, C. L. (1992). Fuzzy multiple attribute decision making: Methods and applications. New York: Springer-Verlag. Chouinard, J. A., & Cousins, J. B. (2009). A review and synthesis of current research on cross-cultural evaluation. American Journal of Evaluation, 30(4), 457–494. Dalkey, N. C., & Helmer, O. (1963). An experimental application method to the use of experts. Management Science, 9(3), 458–467. Davis, B. G. (1986). Overview of the teaching of evaluation across disciplines. New Directions for Program Evaluation, 29, 5–14. Dewey, J. D., Montrosse, B. E., Schroter, D. C., Sullins, C. D., & Mattox, J. R. (2008). Evaluator competencies: What’s taught versus what’s sought. American Journal of Evaluation, 29(3), 268–287. Fetterman, D. M. (2001). The transformation of evaluation into a collaboration: A vision of evaluation in the 21st century. American Journal of Evaluation, 22(3), 381–385. Ghere, G., King, J. A., Stevahn, L., & Minnema, J. (2006). A professional development unit for reflecting on program evaluator competencies. American Journal of Evaluation, 27(1), 108–123. Gussman, T. K. (2005). Improving the professionalism of evaluation. Ottawa: Centre for Excellence in Evaluation, Treasury Board Secretariat of Canada. Hung, H. L., Altschuld, J. W., & Lee, Y. F. (2008). Methodological and conceptual issues confronting a cross-country Delphi study of educational program evaluation. Evaluation and Program Planning, 31(2), 191–198.
Huse, I., & McDavid, J. C. (2006). Literature review: Professionalization of evaluators. Prepared for the CES evaluation professionalization project. University of Victoria. Kaufmann, A., & Gupta, M. M. (1988). Fuzzy mathematical models in engineering and management science. Amsterdam: North-Holland. King, J. A., Stevahn, L., Ghere, G., & Minnema, J. (2001). Toward a taxonomy of essential evaluator competencies. American Journal of Evaluation, 22(2), 229–247. Lee, E. (1997). Overview: The assessment and treatment of Asian American families. In E. Lee (Ed.), Working with Asian Americans: A guide for clinicians (pp. 3–36). New York: Guilford press. Lee, Y. F., Altschuld, J. W., & Hung, H. L. (2008). Practices and challenges in educational program evaluation in the Asia-Pacific region: Results of a Delphi study. Evaluation and Program Planning, 31(4), 368–375. Lincoln, Y. S. (1994). Tracks toward a postmodern politics of evaluation. Evaluation Practice, 15(3), 299–310. Linstone, H. A., & Turoff, M. (1975). The Delphi method: Techniques and applications. Reading. MA Addison Wesley Publishing. McGuire, M., & Zorzi, R. (2005). Evaluator competencies and professional development. Canadian Journal of Program Evaluation, 20(2), 73–99. Powell, C. (2003). The Delphi technique: Myths and realities. Journal of Advanced Nursing, 41(4), 376–382. Ross, T. J. (2004). Fuzzy logic with engineering applications. Hoboken, NJ: John Wiley & Sons Ltd. Rychen, D. S. (2001). Introduction. In D. S. Rychen & L. H. Salganik (Eds.), Defining and selecting key competencies (pp. 1–15). Seattle, WA: Hogrefe and Huber. Sanders, J. R., & Worthen, B. R. (1970). An analysis of employers’ perceptions of the relative importance of selected research and research-related competencies and shortages of personnel with such competencies. Technical Paper No. 3. Boulder, CO: AERA Task Force on Research Training, Laboratory of Educational Research. Scriven, M. (1997). Truth and objectivity in evaluation. In E. Chelimsky & W. R. Shadish (Eds.), Evaluation for the 21st century: A handbook. Thousand Oaks, CA: Sage Publications. (pp. xiii, 542 pp.). SenGupta, S., Hopson, R., & Thompson-Robinson, M. (2004). Cultural competence in evaluation: An overview. New Directions for Evaluation, 102, 5–19. Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasiexperimental designs for generalized causal inference. Boston, MA: Houghton Mifflin. Smith, M. F. (1999). Should AEA begin a process for restricting membership in the profession of evaluation? American Journal of Evaluation, 20(3), 521–531. Smith, M. F. (2003). The future of the evaluation profession. In T. Kellaghan & D. L. Stufflebeam (Eds.), International handbook of educational evaluation (pp. 373–386). Boston, MA: Kluwer Academic. Stevahn, L., King, J. A., Ghere, G., & Minnema, J. (2005). Establishing essential competencies for program evaluators. American Journal of Evaluation, 26(1), 43–59. Stevahn, L., King, J. A., Ghere, G., & Minnema, J. (2006). Evaluator competencies in university-based evaluation training programs. Canadian Journal of Program Evaluation, 20(2), 101–123. Treasury Board of Canada Secretariat. (2001). Evaluation policy Retrieved December 10, 2010, from http://www.tbs-sct.gc.ca. Worthen, B. R. (1975). Some observations about the institutionalization of evaluation. Evaluation Practice, 16, 29–36. Worthen, B. R. (1999). Critical challenges confronting certification of evaluators. American Journal of Evaluation, 20(3), 533–555. Zorzi, R., McGuire, M., & Perrin, B. (2002). Evaluation benefits, outputs, and knowledge elements: Canadian Evaluation Society project in support of advocacy and professional development Retrieved July 28, 2006, from http://consultation.evaluationcanada.ca/pdf/ZorziCESReport.pdf. Zorzi, R., Perrin, B., McGuire, M., Long, B., & Lee, L. (2002). Defining the benefits, outputs, and knowledge elements of program evaluation. Canadian Journal of Program Evaluation, 17(3), 143–150.
Yi-Fang Lee, Ph. D., is an associate professor at National Taiwan Normal University in Taiwan. She has published and presented in the area of needs assessment and Science, Technology, Engineering, and Mathematics retention with underrepresented minorities.
James W. Altschuld, Ph. D., is a professor Emeritus at The Ohio State University. He has presented and published extensively on evaluation topics, especially on needs assessment and the evaluation of science and technology education.
Lung-Sheng Lee, Ph. D., is a professor and the president of National United University, Taiwan. He has participated in a variety of institutional and program evaluation for more than 30 years.