Evaluation and Program Planning 41 (2013) 12–18
Contents lists available at SciVerse ScienceDirect
Evaluation and Program Planning journal homepage: www.elsevier.com/locate/evalprogplan
Further considerations of evaluation competencies in Taiwan Yi-Fang Lee a,1, James W. Altschuld b,*, Lung-Sheng Steven Lee a,2 a b
National Taiwan Normal University, 162 HePing East Road Section 1, Taipei 10610, Taiwan The Ohio State University, 3253 Newgate Court, Dublin, OH 43017, United States
A R T I C L E I N F O
A B S T R A C T
Article history: Received 4 August 2012 Received in revised form 6 May 2013 Accepted 13 June 2013
A list of evaluator competencies (Stevahn, King, Ghere, & Minnema, 2005) was adapted to fit the Taiwanese context by Lee, Altschuld, & Lee (2012). It was studied as to how it generalized to a large sample in Taiwan. Likert and Fuzzy surveys with needs assessment formats (importance and competence) were mailed via random assignment to two groups of participants. The questions for the study were: do the modified competencies relate country-wide to Taiwan, did the investigation uncover training needs for evaluators, and were there convergent rating patterns across the two forms of the instrument? The results supported a fit of the modified competencies to the context and convergent validity was observed but strong competency needs were not apparent. Reasons for the findings and implications for future research are discussed. ß 2013 Elsevier Ltd. All rights reserved.
Keywords: Evaluator competency Training needs
1. Introduction The field of educational program evaluation (EPE) is progressing toward a profession around the world. This is true for the AsiaPacific region but at the same time concerns have arisen there about the quality of evaluations, shortages of experienced, trained evaluators, and identification of the skills or competencies for qualified persons (Garden, 2010; Hay, 2010; Hung, Altschuld, & Lee, 2012; Kumar, 2010; Lee, Altschuld, & Hung, 2008). Accordingly in 2011, Lee et al. (2012) conducted research in the region related to the topic. It was based on the work of Stevahn et al. (King, Stevahn, Ghere, & Minnema, 2001; Stevahn, King, Ghere, & Minnema, 2005, 2006) as focused on Essential Competencies for Program Evaluators (ECPE). Their fit to an Asian country (Taiwan) was investigated by a two round Delphi technique with a small panel (n = 12) of experts. This led to minor adaptations to the framework. The current effort reexamined that preliminary work with hundreds of respondents in Taiwan. Two scales (Likert, Fuzzy) were used to measure competencies via a needs format (desired and current statuses) which permitted the examination of discrepancies. By this means further evidence was obtained for the validity and generalizability of the competencies to the context in consideration. What was learned can also be used to enhance the professionalization of the field of evaluation. The major questions were:
* Corresponding author. Tel.: +1 614 389 4585; fax: +1 614 688 3258. E-mail addresses:
[email protected] (Y.-F. Lee),
[email protected] (J.W. Altschuld),
[email protected] (L.-S. Lee). 1 Tel.: +886 2 7734 3398; fax: +886 2 2392 9449. 2 Tel.: +886 2 7734 3417; fax: +886 2 2392 1015. 0149-7189/$ – see front matter ß 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.evalprogplan.2013.06.002
(1) Do the modified competencies fit for a large sample of Taiwanese faculty members who are regularly involved in the evaluation (accreditation) of higher education? (2) If it does, what are their perceptions of the ‘‘what should be’’ status and what is their own personal status? (Are needs apparent?) (3) Are the results from instrument formats alike or different? (Was there convergent validity?) 2. Summary from the authors’ prior 2011 research The taxonomy of evaluator competencies of Stevahn et al. (2005) was reviewed by experts in Taiwan in a Delphi survey leading to small wording and content changes. Then, revised competencies in 6 clusters were put into a traditional needs assessment format and rated by the same panelists on Fuzzy importance and attainment scales. Ratings were high and the experts felt that the items and categories made sense for Taiwan. While this finding supported the relevance of Stevahn’s framework to Taiwan, some unique skills were identified – ‘Respects and follows the evaluation process’ and ‘Ensures the confidentiality of information’ thus making for a more culturally sensitive investigation. For example, top-down or authoritarian environment, a strong link between outcomes and government policy, and the delicate interpersonal balance among the evaluators and those being evaluated were factors noted in panelist comments. By building in such factors, the assessment of evaluator skills, knowledge and dispositions would be improved. Areas for change based on discrepancies from importance and current levels of competency, were not found. What might have contributed to the result (small panel size, issues in the Fuzzy
Y.-F. Lee et al. / Evaluation and Program Planning 41 (2013) 12–18
scales, others) was discussed and further research with a larger sample and alternative scaling was recommended. 3. Evaluator roles and competencies Evaluators deal with conflicts and complex conditions in practice as noted in the literature (Lynch, 2007; Mark, 2002; Montgomery, 2000; Patton, 2008; Russ-Eft, Bober, de la Teja, Foxon, & Koszalka, 2008; Turner, 2006). An especially important investigation was that done by Skolits, Morrow, and Burr (2009) in which they posited three phases of evaluation – pre, active, and post where evaluators served as managers, detectives, designers, negotiators, diplomats, researchers, judges, reporters, use advocates, and learners. They generated descriptions of each role and the kinds of competencies required. Another effort was that of King, Stevahn and their colleagues. They perceived that tasks, skills, and content areas had to be ‘‘derived from a systemic process or validated by empirical consensus building among diverse professionals’’ (King et al., 2001, p. 230). To that end, they produced 61 competencies in six domains: systematic inquiry, professional practice, situational analysis, project management, reflective practice, and interpersonal competence (King et al., 2001; Stevahn et al., 2005). The categories and items in them should guide evaluators in reflection, self-analysis, and discussion about an array of knowledge, skills, and dispositions affecting practice. Zorzi, McGuire, and Perrin (2002) used the framework for a study completed under the aegis of the Canadian Evaluation Society (CES). Forty-nine competencies in five practices were generated: reflective, technical, situational, management, and interpersonal practices. After member consultation and expert review, what they produced was approved by CES as related to its new credentialing system for evaluators (CES, 2009). The efforts of all of the above researchers align well with Spencer and Spencer’s (1993) model of competency design. Its steps are: (1) define performance effectiveness criteria; (2) identify a criterion sample; (3) collect data; (4) analyze data and develop a competency model; (5) validate the competency model, and (6) prepare applications of the competency model. Most of what was done previously tied into the first four steps but not so much 5 and 6, and especially the idea of validation (Wilcox, 2012). This presents an opportunity. Wilcox (2012) explored this gap using a unified theory of validity from Messick (1989, 1995). Evidence was collected to demonstrate the extent to which the ECPE met five validity criteria: contentrelated, substantive-related, consequence-related, generalizabilityrelated, and externally related evidence. For each criterion, the questions respectively were to what extent do the ECPE measure an evaluator’s competence, are they inclusive of all necessary competencies for an evaluator to conduct evaluations, the use or interpretations of the competencies does not have negative consequences for evaluators, are they applicable to practice in various areas, and does competence correlate with other measures of competencies? Data from a survey and interviews were gathered and analyzed indicating strong support for the first three criteria with mixed and limited evidence for the latter two. An additional concern is how well the competencies work across different contexts and samples (Stevahn et al., 2005; Wilcox, 2012). The current effort is exactly along these lines. 4. Setting and sample The context was University Evaluation Programs administered by the Higher Education Evaluation and Accreditation Council (HEEAC) of Taiwan. The Council was established in 2005 to
13
determine if degree-granting programs in a spectrum of fields in colleges and universities met standards of quality. The institutions must take part in the process once every five years except for engineering, computing, technical, and architectural education which are accredited by the Institute of Engineering Education. Accredited status has to be renewed periodically to demonstrate continuing fulfillment of HEEAC criteria. Programs classified as conditionally accredited or failing may have their funding and/or the size of student enrollments reduced. With a high-stakes like these, the Council strove for consistency and fairness in its procedures (self-evaluation, site visits). Four to six professors or experts in appropriate disciplines are invited to be ‘evaluators’ for the visit. They have variable knowledge and experience in program management and evaluation and receive some training (usually one day) from HEEAC for the accreditation process (being on site, interviewing, compiling results, writing report). Most, even those who are considered specialists, have only part-time involvement in evaluation. ‘Evaluators’ in this setting differ from what would be the case in other parts of the world where the reference relates more to individuals professionally trained in the field. Although the sample was drawn from those trained for site visits, the experience of some of its members was not limited to accreditation but extended to various other evaluation activities in higher education (see Section 6.1). Therefore, they would be capable of judging the competencies essential for a qualified program evaluator. 5. Methodology Surveys with Likert and Fuzzy formats were generated about the aforementioned evaluator competency framework as adapted to Taiwan. Is it valid for the country in the eyes of hundreds of respondents and if so, are training needs uncovered? Does the structure of instruments produce similar or different results? 5.1. Instrument design The instruments were based upon findings from the authors’ 2011 study. They had six categories with 63 modified competencies from Stevahn’s framework: Professional practice (n = 11), Systematic inquiry (18), Situational analysis (10), Project management (11), Reflective practice (5), and Interpersonal competence (8). Demographic and evaluation experience questions were included, e.g. how many times was a person involved in evaluation for higher education in the last 3 years, do you teach/conduct evaluation, do relevant research, etc.? An open-ended question was provided for comments. Each item in a category was rated in terms of its importance for a qualified evaluator in Taiwan and the current competency of the respondent in regard to it. One form utilized a traditional 5-point Likert scale and the other a Fuzzy one. The Fuzzy scale has a range for rating instead of a single score going from above 0 to 1 in .1 increments, on the low and high ends respectively. Higher values denote more positive ratings (see Figs. 1 and 2). 5.2. Data analysis Descriptive statistics (mean, standard deviation) were calculated for Likert scores. A critical value (3.5) was chosen arbitrarily by the researchers for a one-sample t-test, and if an item was significantly higher in importance it was designated an essential competency. Chen and Huang’s (1992) method was employed for Fuzzy scores in which respondents’ ratings were transformed to triangular Fuzzy numbers; then ‘defuzzified’ to synthesize the
Y.-F. Lee et al. / Evaluation and Program Planning 41 (2013) 12–18
14
Fig. 1. Examples of Likert scales used in this study.
Fig. 2. Examples of Fuzzy scales used in this study.
scores into a consensus of all participants. Three values are reported:
mR the highest level of the judgment mL the lowest mT a single score for an overall group judgment The distance between mR and mL represented variability. The middle point of the interval (0.5) was the screen for importance (Bodjanova, 2006). Only items above it were looked at in subsequent analyses. Results from two forms were examined to see whether if they were similar. A multi-trait, multi-method approach as outlined by Campbell and Fiske (1959) suggests a strong correlation between different measures of one trait is evidence of convergent validity. The concept was included here by checking the rating patterns by category and item from the Likert and Fuzzy scales. Mean difference analysis (MDA), the gap between the overall mean of the importance and current status scores in a category, was the standard for looking at individual item discrepancies. If an item gap was higher than that value it was a need. A similar process was applied to Fuzzy scores with mT being a stand-in for the mean. 5.3. Procedures The survey was mailed in 2011. The intended population was faculty/experts participating in HEEAC accreditation. A complete listing was not available for public use, but the Council had suggested 10–20 evaluators to each university program and asked for others to avoid possible conflicts of interest. The researchers were able to access 45 such lists and from them compiled a group of 720 ‘evaluators’, the total sample. A letter was sent inviting participation, with 18 individuals indicating they did not fit, reducing the sample to 702. Respondents received Likert or Fuzzy scaled forms randomly, 351 people
were randomly assigned to each form. Three reminders were sent at reasonable intervals. 6. Results 6.1. Return rates and participant characteristics The returns were 119 for the Likert form and 117 for the Fuzzy one (33.90% and 33.33%). The respondents were very similar in terms of characteristics: experience, involvement in evaluation, and having served as HE program evaluators (Table 1). Demographics (position, affiliation) were also parallel. Compared to the respondents in the prior study, the overall group was larger, more diverse in professional background and less experienced/involved in evaluation (Table 2). The latter were expected due to the enlarged sample. Missing data were observed for both forms but was slightly more pronounced for the Fuzzy survey. For the Likert there were 14,994 possible responses across the double scales with 463 (3.09%) missing and for the other form the percentage was 6.99% (1030/14,742). Fifteen respondents left all items in one category blank on the Fuzzy form, this did not occur on the Likert. Analyses were done with maximum number of useable responses per item. Table 1 Number of respondents, return rates and participants’ characteristics for the forms.
Number of respondents Return rate (%) Average years for teaching/ conducting evaluation, and doing relevant research (mean/SD) Average times being evaluators for HE programs in past 3 years (mean/SD)
Likert scale
Fuzzy scale
119 33.90 12.21/10.71
117 33.33 12.30/10.39
5.57/3.64
5.32/4.31
Y.-F. Lee et al. / Evaluation and Program Planning 41 (2013) 12–18 Table 2 A comparison of characteristics of respondents from the prior study and the current one.
Total number of respondents Professional fields (%) Education Business Humanity Science Others Average years for teaching/ conducting evaluation, and doing relevant research (mean/SD) Average times being evaluators for HE programs in past 3 years (mean/SD)
Prior study in 2011
Current study
12
236
75.00 8.33 8.33 8.33 0.00 16.92/9.91
17.3 32.7 19.4 4.55 26.5 12.04/10.56
11.5/9.16
5.23/4.03
15
Cronbach’s alpha was .97 for the 63 items and .81 to .97 for categories, reliability was high. For the Fuzzy ratings, the range was also restricted with mT values of .60 to .68 and the reliability was from .87 to .99 for categories, and .96 for all items (Table 5). For both forms, professional practice was the highest category and project management the lowest. Tables 4 and 5 contain category competency scores. The results fell into small ranges 4.14 to 4.65 (Likert) and .60 to .69 (Fuzzy). Competency alphas were comparable to importance and on both instruments respondents saw themselves more competent in professional practice and less so in project management. An overall view of rating patterns across forms suggested that the Likert and Fuzzy scales produced nearly the same results for importance and current competency. There was support for convergent validity across the forms. 6.3. Item highlights
Table 3 Items in competency categories exceeding the criterion for importance for the forms. Category
Number of Items
Number of items exceeding 3.5 in Likert form
Number of items exceeding .5 in Fuzzy form
Professional practice Systematic inquiry Situational analysis Project management Reflective practice Interpersonal competence Total
11 18 10 11 5 8 63
11 18 10 11 5 7 62
11 18 10 11 5 8 63
6.2. Overall results for the competency framework Sixty-two out of 63 items surpassed the standard for importance on the Likert instrument and all items were above the standard for the Fuzzy one (Table 3). Because items were preselected from the previous exploratory study (Lee et al., 2012), this was anticipated. For the Likert importance ratings, the overall mean was 4.38 and ranged for the six categories from 4.21 to 4.69 (Table 4).
The categories were important and as were most items. It is noteworthy that the highest rated items in importance were primarily in one category – Professional practice. The top four on the Likert were ‘Ensures the confidentiality of information’ (M = 4.88), ‘Acts ethically and strives for integrity and honesty in conducting evaluations’ (4.86), ‘Avoids possible inside benefit’ (4.85), and ‘Makes judgments’ (4.78) with three of them being in the professional practice category. Analogous findings came from the Fuzzy instrument, where the top two items were the same as above (mT = .74 and .72), followed by ‘Respects and follows the evaluation process’ (.71) and ‘Applied professional evaluation standards’ (.70). Likewise, the highest competencies matched across forms. They were ‘Ensures the confidentiality of information,’ ‘Avoids possible inside benefit,’ ‘Respects and follows the evaluation process,’ and ‘Acts ethically and strives for integrity and honesty in conducting evaluations.’ 6.4. Needs indices for competencies Ratings for category importance in the Likert format were higher than those for current competency, but discrepancies were
Table 4 Descriptive data for importance and current competency per category in the Likert scaled form. Category
Importance
Professional practice Systematic inquiry Situational analysis Project management Reflect practice Interpersonal competency Total
Current competency
Mean
SD
Cronbach’s alpha
Mean
SD
Cronbach’s alpha
4.69 4.35 4.26 4.21 4.53 4.44 4.38
.31 .47 .49 .62 .46 .46 .38
.81 .93 .90 .93 .85 .85 .97
4.65 4.30 4.23 4.14 4.45 4.41 4.32
.35 .46 .55 .59 .55 .46 .44
.87 .94 .92 .93 .91 .84 .98
Table 5 Fuzzy numbers for importance and current competency per category. Category
Professional practice Systematic inquiry Situational analysis Project management Reflect practice Interpersonal competency Total a
Importance
Current competency
mR
mL
mT
Cronbach’s alpha
.83 .79 .78 .76 .80 .79 .79
.46 .54 .54 .56 .59 .52 .54
.68 .62 .62 .60 .62 .63 .63
.87 .99 .97 .96 .94 .92 .96
a
mR
mL
mT
Cronbach’s alphaa
.83 .79 .78 .76 .80 .79 .79
.45 .52 .52 .56 .57 .52 .52
.69 .63 .63 .60 .64 .64 .64
.99 .99 .96 .95 .72 .91 .97
The value was calculated based on the geometric mean of the highest and lowest scores for each range in the Fuzzy scale.
Y.-F. Lee et al. / Evaluation and Program Planning 41 (2013) 12–18
16
Table 6 Numbers of items exceeding MDA cutoff scores for the forms. Category
# of items exceeding the cutoff scorea in Likertb
# of items exceeding the cutoff scorea in Fuzzyc
# of items exceeding the cutoff scores in both scales
Professional practice Systematic inquiry Situational analysis Project management Reflect practice Interpersonal competency Total
7 11 5 6 2 6 37
3 2 1 6 1 3 16
3 3 1 4 1 3 15
a
The overall importance and satisfaction difference per category. The cutoff scores in Likert were .04, .05, .03, .07, .08, and .03 for the six categories. c Analogously, the cutoff scores in Fuzzy were .01, .01, .01, 0, .02, and .01.
b
Table 7 Items with higher needs indices per category for the forms. Category
Items exceeding the cutoff score per category across forms
MDA in Likert
MDA in Fuzzy
Professional practice
Applies professional evaluation standards Acts ethically and strives for integrity and honesty in conducting evaluations Prepares well prior to the evaluation process
.16 .10 .09
.01 .01 .02
Systematic inquiry
Makes judgments Develops recommendations Understands the knowledge base of evaluation (terms, concepts, theories, assumptions)
.19 .12 .10
.01 .02 .02
Situational analysis
Serves the information needs of intended users
.07
.01
Project management
Trains others involved in conducting the evaluation Budgets an evaluation and justifies cost needed Uses appropriate technology Identifies needed resources for evaluation, such as information, expertise, personnel, instruments
.11 .09 .09 .08
.01 .01 .01 .01
Reflective practice
Reflects on personal evaluation practice
.11
.03
Interpersonal competence
Uses written communication skills Uses verbal/listening communication skills Facilitates constructive interpersonal interaction (teamwork, group facilitation, processing)
.10 .09 .07
.02 .02 .01
Table 8 Key Themes about participants’ comments for the study for the forms. Key themes
Frequency in Likert
Frequency in Fuzzy
Not in charge of certain evaluation tasks Unfamiliar with some terms (meta-evaluation, program theory, logic model, etc.) Some content is abstract/unclear/hard to link to practical evaluation tasks Too many items Not fitting the context of the ‘evaluators’ A tendency to rate competencies high A question to identify participants’ roles in evaluation would have been helpful Hard to rate Fuzzy scale
4 2 3 2 2 2 1 0
3 4 3 2 1 1 2 2
small (.03–.07, Table 4). Gaps for the Fuzzy survey were negative and miniscule (0.0–.02, Table 5). While 37 items were greater than the MDA cutoff score (Table 6), the discrepancies as described were not substantive. Sixteen Fuzzy items exceeded the standard but again were not of much value. When reviewing individual items, fifteen were in common across the two forms in exceeding the MDA standard (Table 7). As indicated they were not of import for the Fuzzy scale and for the Likert, somewhat higher needs indices were observed for ‘Making judgments,’ ‘Applies professional evaluation standards,’ and ‘Develops recommendations.’ 6.5. Results of the qualitative data One open-ended question was used to collect opinions about the surveys and the study and 30 individuals provided comments, 16 from Likert and 14 from Fuzzy. A prominent theme (Table 8)
was that faculty ‘evaluators’ were not in charge of certain evaluation tasks. Some recommendations were made about survey design, terminology, wording of items, length of the instrument, and Fuzzy scales. 7. Discussion 7.1. Similarities of respondents across forms One-third of each of the two random samples responded to the survey forms in about equal numbers (119, 117). Since respondents in Taiwan are often paid (not in this case) the response rates (Table 1) were perceived as good. Going further, the samples were close in regard to demographics, experience in teaching/conducting evaluation, and being evaluators for higher education programs in recent years – random assignment worked well.
Y.-F. Lee et al. / Evaluation and Program Planning 41 (2013) 12–18
17
7.2. How was the overall evaluator competency framework received?
7.6. High need indexes were not achieved
The categories were viewed the same as they were in the previous study but now for a large sample (n = 236) as compared to a very small one (n = 12). The highest to lowest were generally in the same order as for the smaller group. In other words the pattern repeated and generally the framework worked.
One goal of the effort was to identify evaluation training needs. In Table 7, some competencies had higher need indices than others but were not meaningful and pressing needs for improvement were not there. This might be due to the prior training on aspects of evaluation or again the nature of the sample’s tendency to give socially acceptable answers (see below).
7.3. Perspectives about the importance and current competency 7.7. Issues of instrument design In Tables 4 and 5, high category ratings for importance were achieved but with very limited variation and similar results occurred for competence. Possible factors contributing to the responses were described in the open-ended comments. For example, a representative comment (confirmed by the demographics) was ‘most people involved in evaluation practice are faculty or experts in certain fields, rather than full-time evaluators.’ A number of respondents remarked that they did not know some terms (meta-evaluation, program theory, logic model, and stakeholders) with which most full time evaluators would be familiar. Some respondents were evaluators but most were not and this may not have been the best sample for the study. Its members could not make subtle distinctions when judging competencies. They could not discern what being competent means and perhaps just gave socially acceptable responses. One participant noted ‘no one will rate himself low in terms of competency level.’ This is typical in Taiwan, especially for those who are from an upper socioeconomic status. Faculty is considered to be at this level. 7.4. Patterns across importance and competency All categories were rated high but in a narrow range with professional practice being highest in importance and competency for both forms. This reflected what was stressed in HEEAC training and was consistent with the Russ-Eft et al. (2008) study where professional foundation was rated as the top domain for evaluators in the International Board of Standards for Training, Performance, and Instruction. It seems that the importance of fundamental norms and values underlying evaluation practice is highly recognized. The least important category was project management and even less so its competency ratings. This makes sense, because in Taiwan a government team did most project management tasks and thus the results were pro forma as contrasted to high ratings in the Russ-Eft et al. study. Looking deeper into the influence of geographic location, respondents in North American tended to rate items in ‘Managing the evaluation’ (like ‘Collect data’ and ‘Analyze and interpret data’) higher than those from the other regions. A possible reason was that the respondents from other regions relied on others to undertake the data collection and analysis work (RussEft et al., 2008). This is an example of the impact of contextual/ cultural factors on evaluation practices. 7.5. Cultural concerns
The instrument could have been structured differently. First, questions should have been included about the roles respondents play in HEEAC work. Most did just a couple of things with a few doing more. If such information had been collected, respondents could have been classified into subgroups for scrutiny. Another question might be about formal evaluation training to see what the baseline is. Second, evaluators assume varied roles - educator, consultant, facilitator, counselor, summarizer of data, and so forth (Morabito, 2002). For the most part, they should be aware of or expected to know something about all of them. This is truer in the west but less so in Taiwan. An instrument with a long list of competencies which a portion of the population may not know could have contributed to answering in a certain way as noted by the open-ended remarks. The validity of instrument for examining competencies could be improved. Third, socially acceptable responses were apparent in the findings. Better instructions might have prompted more honest assessments of competence. Fourth, since categories were in a narrow range for importance, having the respondents place them in rank order before rating items might have pinpointed the more salient categories. It might cause participants to think more deeply about items in categories. More research on this is warranted. 8. Conclusions and lessons learned 8.1. Generalization of the framework Evaluator competences from authors’ previous study generally fit the perceptions of a larger sample in Taiwan. The respondents had a sense of the categories and items and ratings of the highest to lowest categories were congruent with the prior research (Table 9). Furthermore, the Likert and Fuzzy scales led to roughly equal patterns in importance and competency scores. The convergent results from different methods served to increase the confidence that the construct we wished to assess was being measured. The competencies assessed were modified from Stevahn’s original framework and generalized to Taiwan. The generalizability validity was achieved by taking cultural influences into consideration in accord with Wilcox’s (2012) observation that competencies might not be uniform across settings. Analogously, Patton proposed that evaluators need to adapt their practices in ways that are sensitive and respect diversity in cross-cultural Table 9 Overall ratings for importance per category in prior (2011) and current studies. Category
The culture of Taiwan is prominent in this study. Several unique items from our previous study that were based on the milieu of the country were rated high in this investigation in importance – ‘Ensures the confidentiality of information,’ ‘Avoids possible inside benefit,’ and ‘Respects and follows the evaluation process.’ Participants saw themselves as being competent on them and what HEEAC promotes resonated in responses.
Professional practice Systematic inquiry Situational analysis Project management Reflect practice Interpersonal competency
Prior study
Current study
mT (Fuzzy)
Mean (Likert)
mT (Fuzzy)
0.70 0.69 0.66 0.66 0.68 0.69
4.69 4.35 4.26 4.21 4.53 4.44
0.68 0.62 0.62 0.60 0.62 0.63
18
Y.-F. Lee et al. / Evaluation and Program Planning 41 (2013) 12–18
environments (Coffman, 2002). Evaluations occur in context and are not routine in form without adaptations (Patton, 1999) was supported by this study. 8.2. Perspectives about competencies The respondents gave high ratings for the importance and current competence, leading to a low variability for both ratings. This might arise from self-report instruments that are subject to participants’ interpretation of items and to issues of social desirability (Colton & Covert, 2007). This point is consistent with Dunaway, Morrow, and Porter (2012) findings where participants wanted to make themselves look good when responding to the Cultural Competence of Program Evaluators self-report scale. It may also partly come from the sample in this study. Although they had some training for accreditation it was not about the many topics that are the foundation of evaluation leading to only superficial understandings of some categories and items within them. The open-ended comments support this conclusion but it is to be underscored that the sample was the only feasible one in Taiwan. Extensive evaluation training is simply not available via universities or other programs and while there are evaluation projects there is little emphasis on systematic educational programming. We are hopeful that in the not too distant future this will change in Taiwan. 8.3. Discrepancies between importance and current levels The importance and current status ratings were parallel to earlier findings (Lee et al., 2012) and mainly in one category, matching the perceptions of the small panel of 12 individuals. Training needs were not identified. Aside from sampling and social desirability problems the instruments might not have been sensitive enough to detect meaningful differences. 9. Final thoughts Evaluation in Taiwan is in a premature state, unstable career opportunities, limited need for evaluation specialists, and not much formal preparation (Lee et al., 2012). Under such conditions a survey study has its limitations and it may have been better to interview a small but carefully targeted group of trained, experienced evaluators to gain greater understanding of competencies and needs thereof (such a study is currently in its early stages). Acknowledgements This study was supported under a National Science Council Research Grant in Taiwan (NSC 98-2410-H-260-034). References Bodjanova, S. (2006). Median alpha-levels of a fuzzy number. Fuzzy Sets and Systems, 157(7), 879–891. Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81–105. Canadian Evaluation Society. (2009). Competencies for Canadian evaluation practice Accessed 03.12.10 http://www.evaluationcanada.ca/txt/20090531_competencies_companion.pdf.
Chen, S. J., & Hwang, C. L. (1992). Fuzzy multiple attribute decision making: Methods and applications. New York: Springer-Verlag. Coffman, J. (2002). A conversation with Michael Quinn Patton. The Evaluation Exchange, 8, 10–11, Spring. Colton, D., & Covert, R. W. (2007). Designing and constructing instruments for social research and evaluation. San Francisco: John Wiley & Sons, Inc. Dunaway, K., Morrow, J., & Porter, B. (2012). Development and validation of the Cultural Competence of Program Evaluators (CCPE) self-report scale. American Journal of Evaluation, 33(4), 496–514. Garden, F. (2010). Introduction to the forum on evaluation field building in South Asia. American Journal of Evaluation, 31(2), 219–221. Hay, K. (2010). Evaluation field building in South Asia: Reflections, anecdotes, and questions. American Journal of Evaluation, 31(2), 222–231. Hung, H. L., Altschuld, J. W., & Lee, Y. F. (2012). Exploring training needs of educational program evaluators in the Asia-Pacific region Evaluation and Program Planning. Evaluation and Program Planning, 35(1), 501–507. King, J. A., Stevahn, L., Ghere, G., & Minnema, J. (2001). Toward a taxonomy of essential evaluator competencies. American Journal of Evaluation, 22(2), 229–247. Kumar, A. K. S. (2010). A comment on ‘‘evaluation field building in South Asia: Reflections, anecdotes, and questions’’. American Journal of Evaluation, 31(2), 238–240. Lee, Y. F., Altschuld, J. W., & Hung, H. L. (2008). Practices and challenges in educational program evaluation in the Asia-Pacific region: Results of a Delphi study. Evaluation and Program Planning, 31(4), 368–375. Lee, Y. F., Altschuld, J. W., & Lee, L. S. (2012). Essential competencies for program evaluators in a diverse cultural context. Evaluation and Program Planning, 35(4), 439–444. Lynch, K. D. (2007). Modeling role enactment: Linking role theory and social cognition. Journal for the Theory of Social Behaviour, 37, 379–399. Mark, M. M. (2002). Toward better understanding of alternative evaluator roles. In K. E. Ryan & T. A. Schwandt (Eds.), Exploring evaluator role and identity (pp. 17–36). Greenwich, CT: Information Age Publishing. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13– 103). New York: Macmillan Publishing Company. Messick, S. (1995). Standards of validity and the validity of standards in performance assessment. Educational Measurement: Issues and Practice, 14(4), 5–8. Montgomery, J. D. (2000). The self as a fuzzy set of roles: Role theory as a fuzzy system. Sociology Methodology, 30, 261–314. Morabito, S. (2002). Evaluator roles and strategies for expanding evaluation process influence. American Journal of Evaluation, 23(3), 321–330. Patton, M. Q. (1999). Utilization-focused evaluation in Africa. Lectures delivered to the inaugural conference of the African Evaluation Association September. Patton, M. Q. (2008). Utilization-focused evaluation (4th ed.). Thousand Oaks, CA: Sage. Russ-Eft, D. F., Bober, M. J., de la Teja, I., Foxon, M., & Koszalka, T. A. (2008). Evaluator competencies: Standards for the practice of evaluation in organizations. San Francisco: Jossey-Bass. Skolits, G. J., Morrow, J. A., & Burr, E. M. (2009). Re-conceptualizing evaluator roles. American Journal of Evaluation, 30(3), 275–295. Spencer, L. M., & Spencer, S. M. (1993). Competence at work: Models for superior performance. New York: Wiley. Stevahn, L., King, J. A., Ghere, G., & Minnema, J. (2005). Establishing essential competencies for program evaluators. American Journal of Evaluation, 26(1), 43–59. Stevahn, L., King, J. A., Ghere, G., & Minnema, J. (2006). Evaluator competencies in university-based evaluation training programs. Canadian Journal of Program Evaluation, 20(2), 101–123. Turner, J. H. (2006). Role theory. In J. H. Turner (Ed.), Handbook of sociology (pp. 233– 254). New York: Springer-Science. Wilcox, Y. (2012). An initial study to develop instruments and validate the essential competencies for program evaluators (ECPE). Minneapolis, MN: The University of Minnesota (Unpublished doctoral dissertation). Zorzi, R., McGuire, M., & Perrin, B. (2002). Canadian Evaluation Society project in support of advocacy and professional development: Evaluation benefits, outputs, and knowledge elements Accessed 25.11.11 http://consultation.evaluationcanada.ca/ results.htm. Yi-Fang Lee, Ph.D., is an Associate Professor at National Taiwan Normal University in Taiwan. She has published and presented in the area of needs assessment, program evaluation, and evaluation in STEM education. James W. Altschuld, Ph.D., is a Professor Emeritus at The Ohio State University. He has presented and published extensively on evaluation topics, especially on needs assessment and the evaluation of science and technology education. Lung-Sheng Steven Lee, Ph.D., is a Professor and the President of National United University, Taiwan. He has participated in a variety of institutional and program evaluation for more than 30 years.