Studies in Educational Evaluation 50 (2016) 33–45
Contents lists available at ScienceDirect
Studies in Educational Evaluation journal homepage: www.elsevier.com/stueduc
Unravelling continuous assessment practice: Policy implications for teachers’ work and professional learning Jerome De Lisle School of Education, Faculty of Humanities & Education, The University of the West Indies, St. Augustine, Trinidad and Tobago
A R T I C L E I N F O
Article history: Received 14 August 2015 Received in revised form 13 June 2016 Accepted 14 June 2016 Available online xxx Keywords: Assessment policy Continuous assessment (CA) Formative assessment Canonical correlation analysis (CCA) Policy evaluation Trinidad and Tobago
A B S T R A C T
Assessment policy in some countries continues to promote the use of continuous assessment (CA) within classrooms. Several policy claims have been made about the potential of CA to improve education. Using data from a 2010 evaluation of the Trinidad and Tobago Continuous Assessment Programme (CAP), canonical correlation analysis (CCA) was used to unravel the pattern of relationships between institutional variables, professional learning, teacher beliefs, and CA components. The CA components measured were (1) overall use, (2) multiple assessment formats use and (3) formative feedback use. Each CA component was shown to be associated with different variables, with overall use related to several teacher belief factors such as extra-role behaviour. However, multiple assessment formats use added only a small amount of unique variance to the CCA solution. Moreover, the canonical variate for formative feedback use was not statistically significant. These findings have implications for CA as a policy tool. Successful implementation of CA may require high quality professional learning as well as teacher workforce remodelling. Even when implemented successfully, however, CA may not be a useful vehicle for promoting high quality formative assessment or use of multiple assessment formats. ã 2016 Elsevier Ltd. All rights reserved.
1. Introduction Continuous assessment (CA) policy is enacted through national programmes directing curriculum- based classroom assessment (De Lisle, 2015; Le Grange & Reddy, 1998). As an educational innovation, CA policy has been implemented in diverse contexts, including countries such as Albania, Morocco, Brazil, Swaziland, Ethiopia, Zambia, Uganda, Ghana, Malawi, and South Africa (Altinyelken, 2010; Chulu, 2013; Perry 2013; UNESCO, 2008; Vandeyar & Killen, 2007). The major CA components are: (1) continuous summative assessment (primarily continuous testing) (Nitko, 1995a); (2) formative assessment (formal and informal) (Nitko, 1995a; Pennycuick, 1990), (3) authentic assessment (Le Grange & Reddy, 1998), and (4) multiple measures (holistic, continuous, and multiple assessment formats) (Le Grange & Reddy, 1998; Nitko, 1995a). Therefore, CA is much more than repeated measurement of student learning (Pennycuick, 1990; Nitko, 1995a; Sadler, 1998; Brookhart, 2009). The theoretical impetus for current CA policy is found in the work of Nitko (1995a, 1995b). Nitko (1995a) believed that it was possible for a system of CA to be successfully combined with high
E-mail address:
[email protected] (J. De Lisle). http://dx.doi.org/10.1016/j.stueduc.2016.06.004 0191-491X/ã 2016 Elsevier Ltd. All rights reserved.
stakes public examinations used for selection and certification purposes. From this perspective, school based assessments (SBAs) are a kind of CA attached to high stakes purposes. Nitko (1995a) suggested that ideally CA should be curriculum-based, criterion referenced, seamlessly aligned with both curriculum intent and outcomes, and including both summative and formative assessments. As an assessment system, the ultimate purpose of CA, then, is to improve student learning. 2. Policy claims about continuous assessment Policy rationales and academic texts have made several claims about CA, similar to that of Altinyelken (2010) in Uganda: Furthermore, the new curriculum adopts continuous assessment and requires teachers to assess their students on a daily basis. The purpose of such assessment is considered to be diagnostic and remedial. It is assumed that frequent assessment would facilitate appropriate feedback and corrective action on the part of teachers. For instance, it would enable teachers to identify individual problems and provide adequate help so that the child would catch up with the rest of the class. Likewise, high achievers can be identified and given more challenging tasks to stimulate their learning (p. 154).
34
J. De Lisle / Studies in Educational Evaluation 50 (2016) 33–45
An analysis of the academic literature and policy documents suggests that the most common claims about CA are that it: (1) successfully serves multiple assessment purposes and uses; (2) facilitates the integration of formative and summative functions; (3) is equivalent to classroom assessment; (4) promotes high quality teaching and learning; (5) is a Western innovation; and (6) improves the validity and minimizes the impact of high stakes examinations. Some of these claims are more ambitious than others judging from the logic and rationale of current assessment theory (Kane, 2013). Such ambitious claims will require greater theoretical support in an interpretation and use argument (IUA) (Kane, 2013). The following section will discuss these six claims in greater detail, focusing on practice and justification. Claim 1. CA successfully serves multiple assessment purposes and uses. Nitko (1995a) recognized the diversity in the understandings and uses of CA. Some CA systems are attached to high stakes certification schemes, further encouraging a focus on summative judgments. However, there are also CA systems that emphasize mainly the formative function, such as those in Malawi and Trinidad and Tobago (Mchazime, 2003; Trinidad and Tobago Ministry of Education, 1998, 2000). Both CA and SBAs are multiple purpose or multiple use assessment systems (Bennett, 2015; Stobart, 2009). Koch (2010, 2013) considered a multiple-use, single assessment to be an assessment where the primary intended use is supplemented by additional intended uses. However, CA is an assessment system with multiple components. Thus, it is the system itself which serves multiple uses and therefore the maxim that a single assessment cannot effectively serve multiple purposes may not be violated (Bennett, 2010, 2015). However, some single CA components do serve multiple purposes. Multiple purposes present a difficulty for validation. However, both Kane’s (2013) IUA argument and Bennett’s (2011) theory of action can accommodate multiple claims and uses. Claim 2. CA provides a vehicle for integrating formative and summative functions. A neglected aspect when considering multiple uses is how data might be employed by teachers. Data use is fundamental to good assessment practice in CA. Both formative assessment and data use represent different dimensions of practice in which evidence and decision-making are central (Moss, 2007; Van der Kleij, Vermeulen, Schildkamp, & Eggen, 2015). Data use outcomes may be ideal, non-use, and misuse (inappropriate use). Misuse might be common given the multiple purposes promoted by CA (Heritage, Kim, Vendlinksi, & Herman, 2009; Kapambwe, 2010). The early work of Nitko (1995a) and Pennycuick (1990) suggested that it might be possible to integrate formative and summative functions in CA. However, De Lisle (2015) questioned the likelihood and efficiency of such integration on the basis of evidence from CA implementation in different countries (Hayford, 2007; Nsibande, 2007). In the extended study of CA in Trinidad and Tobago, it was found that teachers consistently privileged the summative function, minimizing most types of formative assessment practice (De Lisle, 2013, 2015). The tendency of teachers to place higher value on the summative function might be inherent to CA practice in real settings or might be attributed to lack of training or strongly held pedagogical beliefs. Claim 3. CA is equivalent to classroom assessment. CA is not classroom assessment although teacher judgments are central to both activities. High quality classroom assessment is usually more varied and strongly dependent upon teachers’ assessment literacy and preferences. Classroom assessment is
also more responsive to students’ needs and interests (McMillan, Myran, & Workman, 2002). By contrast, national CA policy is both prescriptive and restrictive because the external policy will determine which assessments are used and for which purposes. Therefore, in CA, the teacher has much lower autonomy than in classroom assessment (World Bank, 2009; Kanjee & Acana, 2013). Understanding that CA is essentially state policy enacting control over educational assessment practice in the classroom sheds light on (1) the expected role of the teacher in implementation, (2) expectations for teacher professionalism in the context, and (3) the student-directed character of formative assessment. In CA policy, there are many externally developed rules and guidelines although detailed professional learning schemes are rarely explicated. Since CA policy represents system rules applied to classrooms, CA will function as a boundary object subject to local rules and variation (Moss, Girard, & Haniford, 2006). This means that what is actually done in the classroom varies across sites and among teachers. Claim 4. CA promotes high quality teaching and learning. Pennycuick (1990) outlined nine reasons for using CA in a policy document from Sri Lanka, all related to some aspect of improving teaching and learning. Pennycuick considered the formative function to be the essential and active ingredient of CA, because it was directly related to improving student learning. The assertion that CA will lead to improved teaching and learning belies a belief in the tenets of measurement-driven instruction, which assumes that assessment policy can positively change teaching- learning (Airasian, 1988; Popham, 1987). In practice, there is little evidence for this assertion, especially for assessments used in high stakes contexts (Bracey, 1987; Luxia, 2007). This remains true even when the assessment vehicle is authentic or continuous assessment (Hayford, 2007; Nsibande, 2007; Torrance, 1993) Claim 5. CA is a Western innovation. Shandomo (2008) considered CA to be a Western innovation and believed that this explained some of the implementation difficulties. However, Shandomo (2008) also observed the early implementation of CA in Nigeria in the 1970s. The evidence suggests that CA has a long history in non-Western countries. Indeed, Nitko (1995a) mentioned the utility of CA in Jamaica back in the 1980s. There is evidence of CA as both a national philosophy and political ideology in Tanzania from as early as 1967 (De Lisle, 2013). In Tanzania, CA was seen as a solution to the problem of high examination failure and is said to have grown out of a German tradition (National Examinations Council of Tanzania, 2003). Some funding agencies have supported the implementation of CA in African countries (Chulu, 2013; Mchazime, 2003). It appears therefore that the focus upon CA as a policy tool has grown out of education practice and experiences in several developing countries (Nitko, 1995a; Pennycuick, 1990). After the work of Black and Wiliam (1998) and Sadler (1998), the focus of classroom assessment in many Western countries shifted towards formative assessment (Birenbaum et al., 2015). However, in some non-Western countries, the focus has continued on CA. Nevertheless, Perry (2013) considered CA to be an effective vehicle for formative assessment. Given the evolving demands of formative assessment, differences between formative assessment and traditional teaching, and questions about authentic assessment for learning (Bennett, 2011; Heritage et al., 2009; Swaffield, 2011), it seems unlikely that formative assessment can be easily implemented in non-Western contexts. Claim 6. CA improves the validity and minimizes the impact of high stakes examinations.
J. De Lisle / Studies in Educational Evaluation 50 (2016) 33–45
Nitko’s (1995a) discussion of the value of combining CA with high stakes public examinations implies that CA can lessen the impact and improve the validity of high stakes public examinations. Pennycuick (1990) clearly identified enhancing the validity of assessment as one of the three broad aims of CA policy. This assertion is fundamental to the original rationale for introducing continuous assessment in Tanzania, as identified in the 1974 MUSOMA resolution. Nyerere (1967) himself talked passionately about the value of holistic CA as a tool for improving relevance and reducing the negative impact of the examination system. Recent justifications for using performance assessments in accountability systems also mimic such claims (Marion & Buckley, 2016). The assertion that CA can enhance the validity of final examinations is fundamental to past and future assessment reform within Trinidad and Tobago and the wider Anglophone Caribbean. In Trinidad and Tobago, public examinations at 11+, 16+, and 18+ include CA or and School Based Assessment (SBA) components (De Lisle, 2009). One of the arguments put forward in both the Caribbean and in Hong Kong is that the multidimensional and authentic nature of SBA or CA is able to enhance construct
35
representativeness, construct relevance, and accountability of final scores in public examinations (Berry, 2011; Hamp-Lyons, 2009). 3. Theorizing continuous assessment 3.1. Theory of action The six claims described above suggest specific intended effects resulting from the actions of the different CA components. A theory of action might be used for illustrating the mechanism of such actions. This is defined as “a graphical or textual depiction of an intervention that explains the cause-effect relationships among inputs, activities, and intended outcomes” (Bennett, 2010; p. 71). For an assessment system, the graphic explicates the hypothesized mechanisms and intended effects of the different components. The graphic includes (1) components of the assessment system; (2) the intended effects of the assessment system; (3) action mechanisms; (4) interpretive claims from the results; and (5) potential unintended negative effects. Fig. 1 provides a possible theory of action for CA, with four components/tools specified (Bennett, 2011). The hypothesized
UNRAVELLING THE COMPLEXITY OF CA PRACTICE Continuous Assessment Components/Tools
Continuous Summative Assessment (Continuous testing)
Hypothesized Action Mechanisms
Teachers regularly collect scores from quizzes and tests
Teachers clarify learning intentions
\ Formative Assessment (Formal & Informal) (Formative Feedback)
Intended Intermediate Effects Records of student performances established
Student performance documented
Formative use of summative assessment
Teachers engineer classroom experiences to elicit evidence of understanding
Teachers provide feedback
Intended Ultimate Effects
Pracce of Assessment for and as learning
Improved student learning
Teachers activate students as sources of peer feedback
Teachers activate students as owners of learning
Authentic Assessment
Teachers use real life and interdisciplinary activities
Practice of meaningful assessment Increased student engagement
Multiple Measures (Multiple format, holistic, & continuous)
Teachers use multiple formats, assessing over extended times in different ways
Multiple aspects of intelligence are assessed
Fig. 1. Generalized theory of action for continuous assessment.
36
J. De Lisle / Studies in Educational Evaluation 50 (2016) 33–45
action mechanisms represent the way each component or tool generates an intended effect (intermediate and final). For assessment for and as learning, formative assessment theory, as outlined by Leahy, Lyon, Thompson, & Wiliam (2005) and Wiliam and Thompson (2007) will explain the hypothesized action. The continuous testing component involves summative teacher judgments and includes record keeping and formative use of summative test data (FUST). The authentic assessment component involves thematic real life learning, which will ensure more meaningful assessment. The use of multiple measures is designed to assess different competencies and skills. The pathways for the different effects might be independent and achieved through different components of the system. The ultimate intended effects include improved (1) record keeping, (2) student learning and (3) student engagement. 3.2. Confronting complexity in implementation The implementation process mediates the transition from policy to practice. The Achilles’ heel of CA could be the multicomponent structure, which makes effective implementation highly dependent upon appropriate use by teachers. Thus, implementation will require high quality professional learning to ensure appropriate teacher judgment. Smith (2010) has argued that all assessment practice is complex and cannot be divorced from the pedagogical perspectives of teachers. This might be doubly true for CA, with the requirement of multiple purposes likely to result in policy and personal tensions for the teacher (Firestone, 1998). These tensions can inhibit implementation efficiency and fidelity. Carless (2005) provided insight into the implementation of assessment for learning. He showed the process to be complex and layered, with several factors in different sub- systems influencing outcomes. His framework includes factors at the macro, micro and personal domains. The macro level includes the general reform climate and functioning of high stakes examinations. The micro level includes support of schools and views of parents. The personal domain level captures teachers’ understandings and conceptions. From a complexity perspective also, Flórez Petour (2015) proffered polysystems theory as an overarching framework for examining assessment change in Chile. The systems examined in this study included political parties, the economic sector, and education governance. Compared with assessment for learning, CA might be more difficult to implement because of the complex, multi-component structure. The context of reform in some countries will add further to such complexity. For example, the high stakes testing cultures of the Anglophone Caribbean could inhibit the practice of formative assessment without hindering summative assessment. Teachers’ conflicting worldviews and belief systems may also make some elements of CA easier to implement. It becomes useful, then, to explore how different factors influence implementation of practice of CA in different contexts. 3.3. Modelling CA implementation in the Trinidad and Tobago context The Trinidad and Tobago CAP was implemented in all 477 primary schools from the year 2000. Guidelines for CA practice were detailed in policy documents (Trinidad and Tobago Ministry of Education, 1998, 2000). The policy emphasized formative and diagnostic assessment even though protocols for continuous testing and record keeping were included (De Lisle, 2015). Since the Trinidad and Tobago CAP is a formatively focused CA, training and implementation might be hindered by personal and institutional factors different to that in a high stakes scheme focusing upon summative assessments.
For this study, a multilevel system model of assessment change was constructed which focused upon three groups of antecedent variables: (1) institutional factors, (2) professional learning, and (3) individual teacher cognitions and intentions (Johnson, 2008; Waugh & Punch, 1987; Weiner, 2009). The model was partly based on organizational change and implementation science literature and focused upon complexity of intervention, process and contextual variables (Al-Haddad & Kotnour, 2015; May, 2013). Complexity was attributed to the number of factors at different levels influencing change, the multi-component nature of the intervention, and the unknown variation among teachers and sites (Carless, 2005; Van der Voet, Kupiers, & Growneveld, 2015). The model in this study did not include a direct consideration of societal level variables as proposed by Carless (2005). For each category, specific factors were identified based upon current theories of educational assessment and teacher change as proposed by Carless (2005), Brown and Lake (2006), and Waugh and Punch (1987). A definition of each variable is provided in Table 1. The category teacher beliefs and attitudes consisted of (1–11) conceptions of assessment, teaching, and curriculum, (12) their attitude towards the specific innovation and (13) extra-role behaviour (willingness to go beyond their designated roles). The category professional learning and skills included (14) professional learning communities at the school; (15) self-reported assessment skills; and (16) system level training on the innovation. The category of institutional context and climate was measured by (17) the academic achievement context, (18) leadership of the CAP; (19) organizational readiness for change, (20) organizational innovation, (21) collective teacher efficacy, and (22) organizational citizenship behaviour. The dependent variable CA included three linked but overlapping components: (23) overall CAP use; (24) multiple assessment formats use; and (25) formative feedback use. CAP use is a global measure but the other components are expected to add unique variance in a formatively focused CA scheme. The CAP use measure was based upon an inventory of all CA tasks listed in the policy document. Formative feedback from the teacher is just one aspect of formative assessment (Havnes, Smith, Dysthe, & Ludvigsen, 2012; Leahy et al., 2005). However, Bennett (2011) has argued that there might be more credible evidence for this aspect upon student learning (Hattie & Timperley, 2007; Kluger & DeNisi, 1996; Voerman, Korthagen, Meifer, & Simons, 2014). Additionally, the CAP policy document identified formative feedback from the teacher as the key element directly leading to improved student learning (Trinidad and Tobago Ministry of Education, 1998, 2000). Multiple assessment formats use involves teachers employing diverse assessment formats in the classroom. This approach will extend classroom assessment practice from traditional paper and pencil assessments to include performance assessments such as exhibitions, posters, portfolios and presentations (Pennycuick, 1990). Using a variety of assessments might ensure that multiple aspects of student learning are measured, students are engaged, and there are opportunities for high quality formative assessment. Thus an argument for multiple assessment formats use is that it improves the quality of teaching, learning, and assessment by (1) increasing opportunities for diagnosis and feedback on (2) demonstrations of student understanding on performances (Darling-Hammond & Adamson, 2014; Fischer & Frey, 2015; Moss & Brookhart, 2015; Shepard et al., 1996; Wiggins, 1998). 4. Purpose and research questions The purpose of this study was to explore the inter-relationships between identified variables and different components of CA practice for the 2000 Trinidad and Tobago CAP. This pattern of
J. De Lisle / Studies in Educational Evaluation 50 (2016) 33–45
37
Table 1 Definitions and sources of instruments used in study. Variables
Definition
1. Assessment for Student Accountability 2. Assessment for School Accountability 3. Assessment for Improvement 4. Assessment as Irrelevant 5. Curriculum-Academic 6. Curriculum-Social Reconstruction 7. Curriculum-Technological 8. Teaching-ApprenticeDevelopmental 9. Teaching-Nurturing
A conception of teachers focused upon the purpose of assessment as student accountability A conception of teachers focused upon the purpose of assessment as school accountability
A conception of teachers focused upon the purpose of assessment as educational improvement A conception of teachers focused upon the purpose of assessment as irrelevant Belief in discipline and content in the curriculum Belief in curriculum is a vehicle for facilitating social change Belief in the application of the disciplines to technology and industries. A perspective on effective teaching as a process of enculturating students into a set of social norms and ways of working and planned and conducted “from the learner’s point of view”. A perspective on effective teaching, assumes that long-term, hard, persistent effort to achieve comes from the heart, as well as the head. 10. Teaching-Social Reform Teaching A perspective on effective teaching, seeks to change society in substantive ways. 11. Teaching-Transmission Teaching A perspective on effective teaching, requires a substantial commitment to the content or subject matter. 12. Attitude towards CAP Attitudes, intentions, and behaviors related to using CAP 13. Extra Role Behaviour Individual behavior that goes beyond stated role requirements but contributes to institutional effectiveness 14. Professional Learning The level of shared values, norms of collaboration and professional learning within the school. Community Self-reported inventory of skills in assessment. 15. Reported Assessment Skill 16. Received Training on CAP The extent and quality of training received for implementing the CAP 17. Academic Achievement Categorization of overall school performance based on the Trinidad and Tobago Academic Performance Index using data generated from National tests 18. Leadership of CAP Principal leadership of teaching, learning and assessment tasks in the CAP 19. Readiness for Change The extent to which individuals are cognitively and emotionally inclined to accept, embrace, and adopt a particular change plan 20. Organization Innovation A group level variation capturing the vision, policies, structures and processes facilitating the institution’s capacity to seek and prepare for future events 21. Collective Teacher Efficacy Shared belief that efforts of staff will have a positive effect on student learning. 22. Group Organizational The extent to which the group acts to support the social and psychological environment in which task performance takes place. Citizenship Behaviour Completion to a high degree of all listed tasks as prescribed in the CAP 2000 Policy 23. Overall CAP Use 24. Multiple Formats Use The frequent use of multiple formats as prescribed in the CAP 2000 policy 25. Formative Feedback Use Information communicated to the learner intended to modify the learner’s thinking or behavior for the purpose of improving learning. Note. Brown (2006); Pratt, Collins, & Selinger (2001); Shute, 2008; Hord, Meehan, Orletsky, & Saltes, 1999; Holt, Armenakis, Feild, & Harris. (2007); Goddard, Hoy, & Hoy, 2000; Cheung, 2000; Noonan and Renihan (2006); Zhang and Burry-Stock (2003).
relationships will provide insight into the factors influencing CA implementation practice. Additionally, the fit of the components within CA is explored. The key research questions guiding the study were: 1. Which institutional factors influenced the different components of CA? 2. To what extent are the CA components compatible, integrated, and synergistic? Evidence for both questions is provided by the patterns of correlations in canonical correlation analysis (CCA). These two research questions cover 5 of 7 possible questions answerable by CCA research (Thompson, 1984). The second research question is directly related to claims that CA (1) can successfully serve multiple assessment purposes and uses and (2) facilitates the integration of formative and summative functions. Results on formative feedback use also provide insight into the third claim: (3) that CA promotes high quality teaching and learning. 5. Method 5.1. Canonical correlation analysis as an analytical strategy The main analytical strategy in this quantitative evaluation is CCA, an exploratory statistical procedure for examining interrelationships between multiple independent and dependent variables. CCA is the most generalized of the multivariate techniques and subsumes multiple regression, principal component analysis, and discriminant analysis (Thompson, 1984; Thompson, 1991; Sherry & Henson, 2005). Levine (1977) showed
CCA to be closely related to factor analysis; but answering different research questions. CCA is able to capture complexity by describing the multiple interrelationships between variables while reducing Type 1 error (Hair, Black, Babin, & Anderson, 2010; Thompson, 1988). In this study, capturing complexity is also achieved through (1) conceptualizing CA as a multiple component dependent variable and (2) presenting a generalized solution explaining influences of contextual factors on different CA components (Sherry & Henson, 2005). CCA may also prove useful in theory building despite being a correlational strategy without random assignment of participants (Nimon & Reio, 2011). Although no definitive evidence on causal mechanisms is provided, inferences from CCA can inform the exclusion of alternative models (Thompson, Diamond, McWilliam, Snyder, & Snyder, 2005). CCA has not been frequently used as a statistical technique perhaps because of the difficulty in interpreting solutions (Pugh and Hu, 1991; Wanders, Mendez, & Downer, 2007). Since the seminal works of Thompson (1984, 2000), updated CCA protocols have been published (Fan & Konold, 2010; Hair et al., 2010; Sherry & Henson, 2005; Tabachnick & Fidell, 2013). Hair et al. (2010) listed six stages in CCA: (1) identifying the appropriate research problem; (2) ensuring appropriate research design; (3) clarifying assumptions; (4) deriving and assessing canonical functions; (5) interpreting canonical variates; and (6) validation and diagnosis. In assessing canonical functions, Hair et al. (2010) suggested three criteria: (1) level of statistical significance, (2) magnitude of canonical correlation, and (3) redundancy measures of shared variance. To interpret the canonical variates, Hair et al. (2010) also suggested use of (1) canonical weights, (2) canonical loadings and (3) canonical cross- loadings.
38
J. De Lisle / Studies in Educational Evaluation 50 (2016) 33–45
Thompson (2000) identified additional useful indices which aid overall interpretation, including (1) squared canonical loadings (an effect size explaining the proportion of variance for all variables in the set with the synthetic variable) and (2) canonical communality coefficients (the proportion of variance for each variable in the entire solution). In this study, the CCA solution was obtained from multiple statistical reports generated using IBM SPSS version 20, MANOVA and CANCORR syntax as well as SYSTAT 13, the latter providing a rotated solution maximizing the correlation for a more efficient interpretation (Tabachnick & Fidell, 2013). The SPSS output provided canonical cross-loadings, which Hair et al. (2010) suggested provides the best approach to determining the contribution of individual variables in each variate. 5.2. Sampling & instrumentation Data were obtained from a Trinidad and Tobago Governmentcommissioned evaluation of a random sample of 402 teachers in 35 primary schools conducted in 2010. For the CCA strategy, 138 records were omitted because of missing data leaving a sample size of 264. With 25 variables, this sample size was still higher than the recommended 10 respondents to 1 variable (Pugh & Hu, 1991; Thompson, 1990). The data collection tool was a multi-instrument questionnaire. The instrument contained several multiple-item scales either adapted from the literature or constructed specifically for this study. The items were piloted on 67 teachers enrolled in a Bachelor of Education programme. The scales and Cronbach’s alpha for the independent variables are reported in Table 2. Two of the 22 scales were below the benchmark of 0.7 (Nunnally, 1978) and therefore caution is exercised in the analysis of both variables (School Accountability, 3 items, a = 0.626; Transmission Teaching, 2 items, a = 0.352). For the dependent variate operationalizing CA, the CAP use scale listed 18 activities specifically identified in the CAP manual ( Trinidad & Tobago Ministry of Education, 1998). Teachers indicated the extent to which they were currently engaged in each activity on a 5-point scale ranging from Always to Never. Examples of activities Table 2 Scale and descriptive statistics for independent and dependent variables. Variables
1. A- Student Accountability 2. A-School Accountability 3. A- Improvement 4. A- Irrelevant 5. C-Academic 6. C-Social Reconstruction 7. C-Technological 8. T-Apprentice-Developmental 9. T-Nurturing 10. T-Social Reform 11. T-Transmission 12. Attitude towards CAP 13. Extra Role Behaviour 14. PLC 15. Reported Assessment Skill 16. Received Training on CAP 17. Academic Achievement 18. Leadership of CAP 19. Readiness for Change 20. Organization Innovation 21. Collective Teacher Efficacy 22. OCB 23. Overall CAP Use 24. Multiple Formats Use 25. Formative Feedback Use
Scale & Descriptive Statistics Reliability
Mean
SD
Max.
Min.
0.626 0.824 0.886 0.811 0.772 0.774 0.710 0.799 0.869 0.859 0.352 0.915 0.892 0.915 0.980 0.981 NA 0.827 0.893 0.752 0.803 0.884 0.845 0.841 0.780
13.38 13.74 50.78 25.00 14.41 13.59 9.26 14.88 16.02 14.68 8.37 111.33 104.01 19.48 194.27 7.22 1.16 71.58 116.32 58.51 81.30 108.8 52.87 25.56 37.93
3.59 3.13 9.59 8.63 2.81 3.06 1.95 2.64 2.42 2.82 2.36 23.96 20.43 6.81 75.38 8.19 0.74 21.39 22.66 11.73 12.42 21.13 9.45 7.85 6.02
18.0 18.0 66.0 60.0 18.0 19.0 12.0 18.00 18.0 18.0 12.0 183.0 143.0 28.0 335.0 33.0 2.0 102.0 171.0 87.0 123.0 184.0 72.0 46.0 53.0
0.0 0.0 0.0 0.0 3.0 3.0 0.0 5.0 3.0 4.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 12.0 19.0 0.0 38.0 11.0 1.0 0.0
Note. N = 356–401. A = Assessment; C = Curriculum; T = Teaching.
included (1) “Conduct ongoing, continuous testing and (8) Integrate teaching and assessment.” The formative feedback use scale listed 11 feedback activities identified by Shute (2008). Examples of these activities include (1) “I give an overall score or grade on the assessment” and (6) “I provide detailed feedback focusing upon primarily the target concept or required skill.” Respondents were asked how often they practiced each activity on a 5-point rating scale ranging from Daily to Never. The scale measuring multiple assessment formats use listed 10 assessment formats recommended in the CAP manual and then asked teachers how often they used each within the school year, ranging from daily to never. Half of the formats were performance-based, therefore higher scores suggested greater use of performance assessments. All of the constructed scales reported adequate to good internal consistency as measured by Cronbach’s alpha (CAP use, 18 items, a = 0.845; Multiple Format Assessment use, 11 items, a = 0.841; Formative Feedback use, 11 items, a = 0.780) 6. Results 6.1. Descriptive & correlational analysis Table 2 provides the means, standard deviations, and minimum and maximum scores for all 25 variables in the study. As shown, for the dependent variables, teachers reported comparatively lower scores for multiple assessment formats use. Also, the higher standard deviation suggested greater variation in practice across classrooms. For the independent variables, there was great variation in the three variables used to measure professional learning, PLCs, reported assessment skill, and formal training on the CAP. Table 3 provides the inter-correlation matrix for all 25 variables included in the study. Correlations above 0.30 were flagged. As shown, the greatest numbers of flagged correlations were for variables related to beliefs and perceptions of teachers; however, these variables were correlated primarily with each other. Assessment for improvement and an academic orientation to the curriculum all reported five flagged correlations. A social reconstruction curriculum orientation reported four flagged correlations. As expected, the perception of assessment as irrelevant was negatively correlated with professional learning communities, perhaps showing the value of site based learning. In terms of organizational variables, leadership of CAP was positively correlated with extra-role behaviour and professional learning community. Group organizational citizenship behaviour was also positively correlated with extra-role behaviour and professional learning community. 6.2. Canonical correlation analysis Fig. 2 models the overall canonical solution. As shown, the synthetic predictor consisted of three groups of twenty-two variables and the synthetic criterion consisted of the three CA components. Data for the overall CCA solution are presented in Table 4 and an analysis of individual variables using canonical weights, loadings and cross-loadings is conducted in Table 5. Table 6 provides additional information for interpreting the influence of individual variables in each canonical variate, including squared loadings, canonical community coefficients and the adequacy and redundancy indices. For the overall canonical solution presented in Table 4, the Stewart-Love Canonical Redundancy Index, Wilk’s Lambda, and Pillai’s Trace were all statistically significant. Therefore, the implied null hypothesis of no relationship between the variable sets is rejected. The value of Wilk’s Lambda suggests that the overall effect (R2) (1-
Table 3 Inter-correlation matrix for variables in study. Variables 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
1.00 0.34 0.59 0.20 0.18 0.26 0.28 0.21 0.26 0.24 0.12 0.15 0.07 0.20 0.17 0.15 0.18 0.10 0.22 0.07 0.01 0.18 0.21 0.10 0.14
1.00 0.47 0.07 0.20 0.14 0.25 0.21 0.25 0.22 0.15 0.11 0.16 0.13 0.07 0.06 0.00 0.17 0.11 0.01 0.11 0.08 0.18 0.05 0.08
1.00 0.19 0.23 0.33 0.37 0.36 0.43 0.36 0.16 0.06 0.14 0.14 0.18 0.14 0.06 0.18 0.19 0.03 0.09 0.28 0.31 0.06 0.15
1.00 0.07 0.02 0.01 0.10 0.16 0.01 0.02 0.01 0.16 0.34 0.03 0.09 0.08 0.19 0.18 0.22 0.19 0.17 0.11 0.11 0.12
1.00 0.43 0.35 0.44 0.47 0.38 0.17 0.12 0.08 0.15 0.10 0.13 0.12 0.11 0.09 0.08 0.07 0.06 0.17 0.00 0.10
1.00 0.51 0.55 0.46 0.50 0.22 0.10 0.02 0.10 0.04 0.12 0.10 0.18 0.10 0.04 0.18 0.12 0.25 0.11 0.16
1.00 0.36 0.40 0.33 0.27 0.13 0.05 0.16 0.05 0.07 0.13 0.07 0.10 0.14 0.07 0.21 0.24 0.03 0.11
1.00 0.62 0.55 0.15 0.03 0.14 0.13 0.16 0.19 0.03 0.15 0.20 0.03 0.06 0.13 0.22 0.02 0.14
1.00 0.56 0.16 0.03 0.11 0.15 0.07 0.18 0.02 0.15 0.21 0.05 0.01 0.24 0.24 0.06 0.22
1.00 0.24 0.02 0.05 0.09 0.04 0.11 0.19 0.07 0.14 0.13 0.11 0.12 0.18 0.04 0.16
1.00 0.19 0.07 0.09 0.03 0.02 0.05 0.10 0.01 0.10 0.11 0.14 0.27 0.03 0.00
1.00 0.24 0.24 0.27 0.18 0.07 0.28 0.48 0.23 0.11 0.25 0.20 0.19 0.02
1.00 0.22 0.10 0.16 0.16 0.34 0.03 0.29 0.14 0.40 0.36 0.21 0.16
1.00 0.21 0.17 0.17 0.44 0.29 0.24 0.07 0.37 0.24 0.12 0.17
1.00 0.22 0.09 0.12 0.18 0.02 0.05 0.08 0.14 0.14 0.06
1.00 0.05 0.07 0.17 0.06 0.07 0.09 0.18 0.24 0.05
1.00 0.07 0.02 0.13 0.11 0.08 0.14 0.01 0.05
1.00 0.17 0.06 0.08 0.47 0.24 0.07 0.10
1.00 0.09 0.35 0.10 0.20 0.11 0.13
1.00 0.29 0.13 0.02 0.04 0.02
1.00 0.19 0.07 0.03 0.01
1.00 0.26 0.15 0.21
1.00 0.31 0.21
1.00 0.19
1.00
Note. N = 356 to 401. A = Assessment; C = Curriculum; T = Teaching. Correlation coefficients in Bold > 0.30.
J. De Lisle / Studies in Educational Evaluation 50 (2016) 33–45
1. A-Student Accountability 2. A-School Accountability 3. A- Improvement 4. A- Irrelevant 5. C-Academic 6. C-Social Reconstruction 7. C-Technological 8. T-Apprentice-Developmental 9. T-Nurturing 10. T-Social Reform 11. T-Transmission 12. Attitude towards CAP 13. Extra Role Behaviour 14. PLC 15. Assessment Skill 16. Training on CAP 17. Academic Achievement 18. Leadership of CAP 19. Readiness for Change 20. Organization Innovation 21. CTE 22. OCB 23. Overall CAP Use 24. Multiple Formats Use 25. Formative Feedback Use
39
40
J. De Lisle / Studies in Educational Evaluation 50 (2016) 33–45
UNRAVELLING THE COMPLEXITY OF CA PRACTICE Continuous Assessment
Factors Involved in Implementation Practice
Teacher Beliefs & Attitudes
Overall CAP use
Professional Learning & Skills
Multiple Assessment Formats use
Synthetic Criterion
Synthetic Predictor
Canonical Correlation
Institutional Context & Climate
Formative Feedback use
Fig. 2. Theoretical model for canonical correlation analysis of factors in continuous assessment.
Table 4 Measures of overall fit for CCA. Canonical Function
Canonical Correlation
Canonical R2
Canonical Redundancies Independent Set
Canonical Redundancies Dependent Set
Bartlett Test
ChiSquare
df
1
0.544
0.296
0.049
0.132
145.4
66.0 0.000
2
0.341
0.116
0.008
0.030
57.5
42.0 0.056
3
0.318
0.101
0.004
0.030
Correlations 1 through 3 Correlations 2 through 3 Correlations 3 through 3
26.7
20.0 0.143
Statistic
Value
Approx. F
Stewart-Love Canonical Redundancy Index Wilk’s Lambda Pillai’s Trace Hotellings trace Roy’s gcr
0.182 0.559 0.514 0.665 0.296
2.33 2.26 2.39
P value
P-Value <0.00 <0.00 <0.00
Note. N = 264 (138 cases deleted because of missing data).
Lambda) is 0.44. The more conservative Stewart-Love index suggests that the shared variance between the sets is only 18.2%. Table 4 shows that of the three canonical functions derived, function 1 was both statistically and practically significant (R2 = 0.296). Function 2 was practically significant using the 0.30 criterion (R2 = 0.116), but the p-value of 0.056 for functions 2–3 was not statistically significant. Function 3 was also above the criterion of 0.30 (R2 = 0.101) but was not statistically significant (p = 0.143). According to the canonical redundancy analysis, the portion of variance explained by the predictor set for the criterion set is small in functions 2 and 3 (3%) compared with that of function 1 (13.2%). Function 1 loaded for overall CAP use, function 2, multiple assessment formats use, and function 3, formative feedback use. These findings suggest that only functions 1 and 2 should be interpreted. Moreover, function 2 should be interpreted with great caution given the borderline practical significance and number of variables in the independent sets (Cohen, 1988). For the canonical variates presented in Table 5, practically significant canonical loadings were interpreted at a cut-off score of
0.35 based on the sample size of 264 (Hair et al., 2010). Table 6 provides square loadings, communality coefficients, along with adequacy and redundancy indices which further aid interpretation. Bearing in mind the values of the canonical weights, the rotated factor structure for the canonical loadings presented in Table 5 appears to provide an elegant and parsimonious solution. This interpretation is aided by the canonical cross-loadings obtained from the SPSS MANOVA procedure. The canonical loadings for function 1 suggested that several variables categorized as teacher beliefs predicted CAP overall use. These were (1) conceptions of the curriculum such as social reconstruction curriculum (0.47); (2) assessment conceptions such as assessment for improvement (0.56); assessment for school accountability (0.42), and assessment for student accountability (0.39); and (3) perspectives on teaching, such as nurturing teaching (0.66); social reform teaching (0.58), and apprenticeship-developmental teaching (0.48). Extra-role behaviour (0.47) may also be considered part of this belief structure, since it measures willingness to engage in activities beyond narrow role
J. De Lisle / Studies in Educational Evaluation 50 (2016) 33–45
41
Table 5 Canonical weights, rotated canonical loadings, cross loadings for three canonical functions. Independent Variables
CAP Overall Use Canonical Weights
Student 1. Assessment Accountability 2. Assessment School Accountability 3. Assessment for Improvement 4. Assessment as Irrelevant 5. Academic Curriculum 6. Social Reconstruction Curriculum 7. Technological Curriculum 8. ApprenticeshipDevelopmental Teach 9. Nurturing Teaching 10. Social Reform Teaching 11. Transmission Teaching 12. Attitude towards CAP 13. Extra Role Behaviour 14. PLCs 15. Self-Reported Assessment Skill 16. Received Training on CAP 17. Academic Achievement Context 18. Leadership of CAP 19. Readiness for Change 20. Organizational Innovativeness 21. Collective Teacher Efficacy 22. Group OCB Dependent variables 23. Overall CAP Use 24. Multiple Formats Use 25. Formative Feedback Use
0.04
Rotated Canonical Loadings
CAP Multiple Format Use CrossLoadings
Canonical Weights
Rotated Canonical Loadings
0.26
0.17
CAP Formative Feedback Use Cross loadings
Rotated Canonical Loadings
Cross Loadings
0.13
0.36
0.04
0.39
0.23
0.05
0.42
0.16
0.28
0.56
0.31
0.02 0.01 0.24
0.28 0.40 0.47
0.01
0.41
0.23
0.09
0.03
0.48
0.22
0.16
0.14 0.10 0.31 0.16 0.50 0.18 0.17
0.66 0.58 0.22 0.07 0.47 0.46 0.42
0.26 0.18 0.24 0.21 0.38 0.26 0.16
0.15
0.03
0.21
0.05
0.02
0.11
0.15
0.01
0.01
0.01 0.01 0.02
0.36 0.37 0.02
0.25 0.22 0.03
0.07 0.26 0.10
0.17 0.21 0.10
0.04 0.03 0.02
0.02
0.12
0.06
0.02
0.04
0.55
0.29
0.14
0.28
0.05
0.43
0.86 0.21 0.18
0.98 0.15 0.01
0.52 0.28 0.22
0.21 0.91 0.65
0.15 0.98 0.10
0.02 0.24 0.18
0.59 0.51 0.78
0.10
0.15 0.17 0.27
0.08
0.04
Canonical Weights
0.05
0.08
0.39
0.33
0.03
0.36 0.12 0.24
0.16 0.22 0.12
0.09 0.03 0.03
0.09
0.07
0.30
0.04
0.01
0.11
0.16
0.19
< 0.00
0.09 0.51 0.13 0.65 0.14 0.09 0.05
0.30 0.20 0.10 0.53 0.47 0.23 0.20
0.14 0.19 0.04 0.14 0.01 0.05 0.06
0.30 0.01 0.42 0.50 0.13 0.06 0.23
0.03 0.03 0.60 0.26 0.22 0.04 0.05
0.06 , 0.00 0.14 0.05 0.02 0.05 0.04
0.42
0.61
0.07 0.11 0.07 0.30
0.20
0.14
0.10
0.11
0.24 0.07 -0.20
0.00 0.11 0.06
0.07
0.15
0.05
0.33
0.12
0.05
0.45
0.45
0.26 0.13 0.23
0.27 0.06 0.05
0.14
0.18 0.03
0.10 0.09 0.99
0.06 0.13 0.13 0.04 0.02 0.05 0.09
0.09 0.15 0.24
Note. Loadings and Cross-Loadings in Bold > 0.35.
descriptions. Moreover, the cross-loading in Table 5 ( 0.38) and canonical communality coefficient in Table 6 (0.44) suggested that extra-role behaviour was important in distinguishing overall CAP use from multiple assessment formats use. As shown by the adequacy coefficient in Table 6, however, the independent variable set in function 1 reproduced only 16% in the synthetic variable. Overall CAP use was also related to 2 of the 3 measures of professional learning, including PLCs (0.46) and self-reported assessment skills (0.42). Institutional variables such as organizational citizenship behaviour (0.55), readiness for change (0.37),and CAP leadership (0.36) were also important. The factor structure for function 2 showed that training on CAP (-0.61) and attitudes towards CAP (-0.53) were most influential. However, these factors were inversely related to the synthetic predictor variable. This meant that teachers with (1) a negative attitude towards CAP and who (2) defined their teaching role narrowly were more inclined to use multiple assessments. Although function 3 was included here only for noting, the factor structure is explainable. Tentatively, it is proposed that formative feedback was more often used in high achievement contexts. 6.3. Discussion and policy implications CA has emerged as an important assessment policy reform in several countries, such as Trinidad and Tobago. Justifications for
the use of CA have been based on the hypothesized potential for improving education. The design of this study provided information on three of the six claims described in the literature: (1) CA can successfully serve multiple assessment purposes and uses, (2) CA facilitates the integration of formative and summative functions; and (3) CA promotes high quality teaching and learning. In this study, the dependent variable CA was conceptualized as: (1) overall CAP use, (2) multiple assessment formats use, and (3) formative feedback use. Twenty-two predictor variables grouped into teacher beliefs, professional learning, and school level groupings were measured. The inter-relationships between the predictor and dependent variable sets were explored using CCA. The CCA solution found that most of the variance was explained by the first function, which related the predictor set to CAP use, a global measure of CA. The second function which related the predictor set to multiple assessment formats use was small but practically and statistically significant. The third function explaining the relationship between the predictor set and formative feedback use was not statistically significant. Several variables in all three categories of variables influenced CAP use but only two variables explained variance in multiple format use. The findings suggest that CA practice in Trinidad and Tobago may: (1) be primarily focused on mechanically completing tasks; (2) be influenced by both teacher and institutional variables; (3) require
42
J. De Lisle / Studies in Educational Evaluation 50 (2016) 33–45
Table 6 Square loadings, communality coefficients, adequacy and redundancy for canonical functions 1 and 2 only. Independent Variables
Assessment Student Accountability Assessment School Accountability Assessment for Improvement Assessment as Irrelevant Academic Curriculum Social Reconstruction Curriculum Technological Curriculum Apprenticeship-Developmental Teach Nurturing Teaching Social Reform Teaching Transmission Teaching Attitude towards CAP Extra Role Behaviour PLCs Self-Reported Assessment Skill Received Training on CAP Academic Achievement Context Leadership of CAP Readiness for Change Organizational Innovativeness Collective Teacher Efficacy Group OCB ADEQUACY REDUNDANCY
Squared Loadings
Canonical Commonality Coefficient
Function 1
Function 2
0.15
0.03
0.18
0.18
0.04
0.31
Dependent Variables Squared Loadings
Canonical Commonality Coefficient
Function 1
Function 2
Overall CAP Use
0.96
0.02
0.98
0.22
Multiple Format Use
0.02
0.96
0.98
0.01
0.32
0.00
0.01
0.01
0.08 0.16 0.22
0.06 0.00 0.04
0.14 0.16 0.26
Formative Feedback Use ADEQUACY REDUNDANCY
0.33 0.10
0.33 0.04
0.17 0.23
0.00 0.00
0.17 0.23
0.44 0.34 0.05 0.00 0.22 0.21 0.18 0.00 0.00 0.13 0.14 0.00 0.01 0.30 0.16 0.05
0.09 0.04 0.01 0.28 0.22 0.05 0.04 0.37 0.00 0.03 0.04 0.01 0.00 0.08 0.07 0.01
0.53 0.38 0.06 0.29 0.44 0.26 0.22 0.37 0.00 0.16 0.18 0.01 0.02 0.38
Note. Squared Loadings and Coefficients in Bold > 0.35.
professional learning at both system and site based levels; and (4) not necessarily promote formative assessment. The pattern of correlations in CCA suggests that there was insufficient evidence for the three claims identified. Certainly, overall CAP use appears quite dissimilar to multiple assessment formats and formative feedback use. These findings can inform the implementation of CA. The three areas of consideration are: (1) workforce remodeling, (2) professional learning, and (3) policy choices. In this study, the influence of extra-role behavior on CAP use was notable. It is likely that high intensity CA practice required teachers to go beyond narrowly specified roles (Somech & Drach-Zahavy, 2000). In a low accountability system like Trinidad and Tobago, it is possible that several behaviours might be viewed by the teacher as discretionary, even when clearly spelled out in policy documents. Additionally, CAP may require a collaborative dimension to work because group organizational citizenship behavior was also a significant predictor. Complex assessment innovations like CA may either require an explicit, shared redefinition of the teachers’ roles or may lead to an intensification of teachers’ work (Chisholm & Chilisa, 2012; Yigzaw, 2013; Van den Berg, 2002). In Trinidad and Tobago, changes in teachers’ work issues have not been considered in reform agendas. However, the findings of this study suggest that the nature and definition of teaching could be an important implementation barrier. Therefore, to facilitate change, some aspects of traditional teachers’ work might have to be restructured. Modern day teaching is increasingly regarded as emotional labour with a collective character (Connell, 2009). However, the kinds of pedagogic practice required for conducting performance, authentic or formative assessment will be quite different to traditional teaching (Yu & Frempong, 2012). Therefore, new individual and collaborative professional roles may have to be specified (Kirtman,
2002). Such a reform agenda for Trinidad and Tobago should include workforce remodeling to allow teachers to focus upon core tasks (Hammersley-Fletcher & Adnett, 2009). The multiple demands created by the new tasks required in CA will also demand new teacher competencies and skills. This, in turn, will necessitate greater professional learning both at site and system levels (Little, 1993). High quality CA training must capture the multi-component nature of CA, with a greater emphasis on formative assessment and use of authentic, performance assessments. In the past, CA training for teachers in Trinidad and Tobago has been mostly didactic, focused upon transmission of content and concepts (Webster-Wright, 2009). Such poor quality professional learning could be the single most important reason for the failure of CA implementation in Trinidad and Tobago and elsewhere (De Lisle, 2015; Kapambwe, 2010). Modern professional learning systems for CA should involve active learning, continuous classroom practice, and peer networking (Garet et al., 20012001Garet, Porter, Desimone, Birman, & Yoon, 2001). The findings for the measure of professional learning community (PLC) also pointed to the possible critical role of site based learning for implementing CA. It might certainly be useful for countries like Trinidad and Tobago to emphasize site based professional learning in an attempt to prime organizations for assessment reform. However, the canonical loadings also suggested that the professional learning needs of teachers must be supported by effective site leadership. Site based support might be generated also through instructional coaching systems. Nevertheless, the quality of professional learning is not the sole reason for misuse and distortion when implementing CA. It could be that CA systems are inherently flawed because teachers find it easier to practice summative assessment rather than formative assessment or authentic performance assessment (Swaffield, 2011; Black, 2015).
J. De Lisle / Studies in Educational Evaluation 50 (2016) 33–45
7. Conclusion Overall, the findings of this study have implications for policy choices in assessment reform for both countries that use CA and those involved in assessment reform. Some countries are increasingly using assessments and assessment systems for multiple purposes (Bennett, 2010; Bennett & Gitomer, 2009). However, there are fundamental differences between the tools, processes, and outcomes of high quality assessment for learning and those for the assessment of learning (Swaffield, 2011). Such differences might mean that different components and purposes must be compartmentalized for the fullest effect, especially in the context of high stakes. Certainly the installation of new assessment systems should be accompanied by a clear theory of action, which will ensure that stakeholders and evaluators are able to judge the effectiveness of outcomes (Sireci, 2016). For many countries such as Trinidad and Tobago, installing continuous assessment systems to promote learning has not proved to be effective. In many instances and as found in this study, teachers in continuous assessment practice have focused on the summative function and have ignored the formative purpose, which is the critical ingredient in enhancing learning (Pennycuick, 1990). For countries with limited resources, the most effective strategy for assessment reform might be, then, to focus on promoting high quality and authentic formative assessment in classrooms rather than installing elaborate CA systems. This is because it is formative assessment with its feedback processes that bring the greatest gains in student learning outcomes (De Lisle, 2015). References Airasian, P. W. (1988). Measurement driven instruction: a closer look. Educational Measurement: Issues & Practice, 7(4), 6–11. http://dx.doi.org/10.1111/j.17453992.1988.tb00837.x. Al-Haddad, S., & Kotnour, T. (2015). Integrating the organizational change literature: a model for successful change. Journal of Organizational Change Management, 28 (2), 234–262. http://dx.doi.org/10.1108/JOCM-11-2013-0215. Altinyelken, H. K. (2010). Pedagogical renewal in sub-Saharan Africa: the case of Uganda. Comparative Education, 46(2), 151–171. http://dx.doi.org/10.1080/ 03050061003775454. Bennett, R. E., & Gitomer, D. H. (2009). Transforming K-12 assessment: integrating accountability testing, formative assessment, and professional support. In C. Wyatt-Smith, & J. Cumming (Eds.), Educational assessment in the 21st century (pp. 43–61).New York: Springer. Bennett, R. E. (2010). Cognitively based assessment of, for, and as learning: a preliminary theory of action for summative and formative assessment. Measurement: Interdisciplinary Research and Perspectives, 8, 70–91. http://dx.doi. org/10.1080/15366367.2010.508686. Bennett, R. E. (2011). Formative assessment: a critical review. Assessment in Education: Principles, Policy & Practice, 18(1), 5–25. http://dx.doi.org/10.1080/ 0969594X.2010.513678. Bennett, R. E. (2015). The changing nature of educational assessment. Review of Research in Education, 39(1), 370–407. http://dx.doi.org/10.3102/ 0091732X14554179. Berry, R. (2011). Assessment trends in Hong Kong: seeking to establish formative assessment in an examination culture. Assessment in Education: Principles, Policy & Practice, 18(2), 199–211. http://dx.doi.org/10.1080/0969594X.2010.527701. Birenbaum, M., DeLuca, C., Earl, L., Heritage, M., Klenowski, V., Looney, A., . . . Wyatt-Smith, C. (2015). International trends in the implementation of assessment for learning: implications for policy and practice. Policy Futures in Education, 13(1), 117–140. http://dx.doi.org/10.1080/0969594042000333904. Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: Policy, Principles, & Practice, 5(1), 7–74. http://dx.doi.org/10.1080/ 0969595980050102. Black, P. (2015). Formative assessment—an optimistic but incomplete vision. Assessment in Education: Principles, Policy & Practice, 22(1), 161–177. http://dx. doi.org/10.1080/0969594X.2014.999643. Bracey, G. W. (1987). Measurement-driven instruction: catchy phrase, dangerous practice. Phi Delta Kappan, 68(9), 683–686. Brookhart, S. M. (2009). The many meanings of multiple measures. Educational Leadership, 67(3), 6–12. Brown, G. T. L., & Lake, R. (2006). Queensland teachers' conceptions of teaching, learning, curriculum and assessment: comparisons with New Zealand teachers. Paper presented at the annual conference of the Australian association for research in education (AARE).
43
Brown, G. T. L. (2006). Teachers’ conceptions of assessment: validation of an abridged instrument. Psychological Reports, 99, 166–170. Carless, D. (2005). Prospects for the implementation of assessment for learning. Assessment in Education: Principles, Policy and Practice, 12(1), 39–54. http://dx. doi.org/10.1080/0969594042000333904. Cheung, D. (2000). Analyzing the Hong Kong junior secondary science syllabus using the concept of curriculum orientations. Educational Research Journal, 15 (1), 69–94. Chisholm, L., & Chilisa, B. (2012). Contexts of educational policy change in Botswana and South Africa. Prospects, 42(4), 371–388. http://dx.doi.org/10.1007/s11125012-9247-5. Chulu, B. W. (2013). Institutionalisation of assessment capacity in developing nations: the case of Malawi. Assessment in Education: Principles, Policy & Practice, 20(4), 407–423. http://dx.doi.org/10.1080/0969594X.2013.843505. Cohen, J. (1988). Statistical power analysis for the behavioral sciences, 2nd ed. New Jersey: Lawrence Erlbaum. Connell, R. (2009). Good teachers on dangerous ground: towards a new view of teacher quality and professionalism. Critical Studies in Education, 50(3), 213– 229. http://dx.doi.org/10.1080/17508480902998421. Darling-Hammond, L., & Adamson, F. (2014). Beyond the bubble test: how performance assessments support 21st century learning. San Francisco: JosseyBass. De Lisle, J. (2009). External examinations beyond national borders-Trinidad and Tobago and the Caribbean Examinations Council. In B. Vlaardingerbroek, & N. Taylor (Eds.), Secondary school external examination systems-Reliability, robustness and resilience (pp. 265–290).New York: Cambria Press. De Lisle, J. (2013). Exploring the value of integrated findings in a multiphase mixed methods evaluation of the continuous assessment program in the Republic of Trinidad and Tobago. International Journal of Multiple Research Approaches, 7(1), 27–49. http://dx.doi.org/10.5172/mra.2013.7.1.27. De Lisle, J. (2015). The promise and reality of formative assessment practice in a continuous assessment scheme: the case of Trinidad and Tobago. Assessment in Education: Principles, Policy & Practice, 22(1), 79–103. http://dx.doi.org/10.1080/ 0969594X.2014.944086. Fan, X., & Konold, T. R. (2010). Canonical correlation analysis. In G. R. Hancock, & R. O. Mueller (Eds.), Quantitative methods in the social and behavioral sciences: a guide for researchers and reviewers (pp. 29–34).New York, NY: Taylor & Francis. Firestone, W. A. (1998). A tale of two tests: tensions in assessment policy. Assessment in Education: Principles, Policy & Practice, 5(2), 175–191. http://dx.doi.org/ 10.1080/0969594980050203. Fischer, D., & Frey, N. (2015). Checking for understanding: formative assessment techniques for your classroom, 2nd ed. Alexandria, VA: ASCD. Flórez Petour, M. T. (2015). Systems, ideologies and history: a three-dimensional absence in the study of assessment reform processes. Assessment in Education: Principles, Policy & Practice, 22, 3–26. http://dx.doi.org/10.1080/ 0969594X.2014.943153. Garet, M. S., Porter, A. C., Desimone, L., Birman, B. F., & Yoon, K. S. (2001). What makes professional development effective? Results from a national sample of teachers. American Educational Research Journal, 38(4), 915–945. http://dx.doi. org/10.3102/00028312038004915. Goddard, R. D., Hoy, W. K., & Hoy, A. W. (2000). Collective teacher efficacy: its meaning, measure, and impact on student achievement. American Educational Research Journal, 37, 479–507. http://dx.doi.org/10.3102/00028312037002479. Hair, J., Black, W. C., Babin, B. J., & Anderson, R. E. (2010). Multivariate data analysis, 7th ed. Upper Saddle River, New Jersey: Pearson Education International. Hammersley-Fletcher, L., & Adnett, N. (2009). Empowerment or prescription? Workforce remodelling at the national and school level. Educational Management Administration & Leadership, 37(2), 180–197. http://dx.doi.org/ 10.1177/1741143208100297. Hamp-Lyons, L. (2009). Principles for large-scale classroom-based teacher assessment of English learners’ language: an initial framework from schoolbased assessment in Hong Kong. TESOL Quarterly, 43, 524–530. http://dx.doi. org/10.1002/j.1545-7249.2009.tb00249.x. Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77, 81–112. http://dx.doi.org/10.3102/003465430298487. Havnes, A., Smith, K., Dysthe, O., & Ludvigsen, K. (2012). Formative assessment and feedback: making learning visible. Studies in Educational Evaluation, 38(1), 21– 27. http://dx.doi.org/10.1016/j.stueduc.2012.04.001. Hayford, S. K. (2007). Continuous assessment and lower attaining pupils in primary and junior secondary schools in Ghana. UK: University of Birmingham (Unpublished doctoral dissertation). Heritage, M., Kim, J., Vendlinski, T., & Herman, J. (2009). From evidence to action: a seamless process in formative assessment? Educational Measurement: Issues and Practice, 28(3), 24–31. http://dx.doi.org/10.1111/j.1745-3992.2009.00151.x. Holt, D. T., Armenakis, A. A., Feild, H. S., & Harris, S. G. (2007). Readiness for organizational change: the systematic development of a scale. Journal of Applied Behavioral Science, 43, 232–255. http://dx.doi.org/10.1177/0021886306295295. Hord, S. M., Meehan, M. L., Orletsky, S., & Saltes, B. (1999). Assessing a school staff as a community of professional learners. Issues About Change 7(1) . http://www. sedl.org/change/issues/issues71/. Johnson, E. (2008). Ecological systems and complexity theory: toward an alternative model of accountability in education. Complicity, An International Journal of Complexity & Education, 5(1), 1–10. Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73.
44
J. De Lisle / Studies in Educational Evaluation 50 (2016) 33–45
Kanjee, A., & Acana, S. (2013). Developing the enabling context for student assessment in Uganda. SABER-Student assessment working paper No. 8. Washington, DC: World Bank. Kapambwe, W. M. (2010). The implementation of school based continuous assessment (CA) in Zambia. Educational Research and Reviews, 5(3), 99–107. Kirtman, L. (2002). Policy and practice: restructuring teachers’ work. Education Policy Analysis Archives 10(May (25)) . http://eppaa.asu.edu/eppa/v10n25/. Kluger, A. N., & DeNisi, A. (1996). The effects of feedback interventions on performance: a historical review, a meta-analysis, and a preliminary feedback intervention theory. Psychological Bulletin, 119, 254–284. http://dx.doi.org/ 10.1037/0033- 2909.119.2.254. Koch, M. J. (2010). Implications of the multiple-use of large-scale assessments for the process of validation: a case study of the multiple-use of a Grade 9 mathematics assessment. Canada: University of Ottawa (Unpublished doctoral dissertation). Koch, M. J. (2013). The multiple-use of accountability assessments: implications for the process of validation. Educational Measurement: Issues and Practice, 32(4), 2– 15. http://dx.doi.org/10.1111/emip.12015. Le Grange, L., & Reddy, C. (1998). Continuous assessment: an introduction and guidelines to implementation. Cape Town, South Africa: Juta. Leahy, S., Lyon, C., Thompson, M., & Wiliam, D. (2005). Classroom assessment minute by minute, day by day. Educational Leadership, 63(3), 19–24. Levine, M. S. (1977). Canonical analysis and factor comparison. Beverly Hills, CA: Sage. Little, J. W. (1993). Teachers’ professional development in a climate of educational reform. Educational Evaluation & Policy Analysis, 15(2), 129–151. http://dx.doi. org/10.3102/01623737015002129. Luxia, Q. (2007). Is testing an efficient agent for pedagogical change? Examining the intended washback of the writing task in a high-stakes English test in China. Assessment in Education, 14(1), 51–74. http://dx.doi.org/10.1080/ 09695940701272856. Marion, S. F., & Buckley, K. (2016). Design and implementation considerations of performance-based and authentic assessments for use in accountability systems. In H. Braun (Ed.), Meeting the challenges to measurement in an era of accountability (pp. 49–76).Washington, DC: NCME. May, C. (2013). Towards a general theory of implementation. Implement Science, 8, 1– 14. http://dx.doi.org/10.1186/1748-5908-8-18. McMillan, J. H., Myran, S., & Workman, D. (2002). Elementary teachers’ classroom assessment and grading practices. Journal of Educational Research, 95(4), 203– 213. http://dx.doi.org/10.1080/00220670209596593. Mchazime, H. (2003). Integrating primary school curriculum and CA in Malawi. Improving Educational Quality (IEQ) Project. Washington, DC: American Institutes for Research. Moss, C. M., & Brookhart, S. M. (2015). Formative classroom walkthroughs: how principals and teachers collaborate to raise student achievement. Alexandria, VA: ASCD. Moss, P., Girard, B., & Haniford, L. (2006). Validity in educational assessment. Review of Research in Education, 30, 109–162. http://dx.doi.org/10.3102/ 0091732X030001109. Moss, P. A. (Ed.), (2007). Evidence and decision making. The 106th yearbook of the National Society for the Study of Education. Malden, MA: Blackwell Publishing. National Examinations Council of Tanzania (2003). History of the national examinations council of Tanzania. National Examinations Council of Tanzania. http://www.matokeo.necta.go.tz/history.htm. Nimon, K., & Reio, T. G. (2011). The use of canonical commonality analysis for quantitative theory building. Human Resource Development Review, 10(4), 451– 463. http://dx.doi.org/10.1177/1534484311417682. Nitko, A. J. (1995a). Curriculum- based continuous assessment: a framework for concepts, procedures and policy. Assessment in Education: Policy & Practice, 2(3), 321–337. http://dx.doi.org/10.1080/0969595950020306. Nitko, A. J. (1995b). Is the curriculum a reasonable basis for assessment reform? Educational Measurement: Issues and Practice, 14(3), 5–10. http://dx.doi.org/ 10.1111/j.1745-3992.1995.tb00862.x. Noonan, B., & Renihan, P. (2006). Demystifying assessment leadership. Canadian Journal of Educational Administration and Policy, 56, 1–21. Nsibande, R. (2007). Knowledge and practice of continuous assessment: the barriers for policy transfer. South Africa: University of South Africa (Unpublished doctoral dissertation). Nunnally, J. C. (1978). Psychometric theory. New York: McGraw-Hill. Nyerere, J. (1967). Education for self-reliance. The Ecumenical Review, 19, 382–403. http://dx.doi.org/10.1111/j.1758 6623.1967.tb02171.x. Pennycuick, D. B. (1990). The introduction of continuous assessment systems at secondary level in developing countries. In P. Broadfoot, R. Murphy, & H. Torrance (Eds.), Changing educational assessment: international perspectives and trends (pp. 106–135).London: Routledge. Perry, L. (2013). Review of formative assessment use and training in Africa. International Journal of School & Educational Psychology, 1(2), 94–101. http://dx. doi.org/10.1080/21683603.2013.789809. Popham, W. J. (1987). The merits of measurement-driven instruction. Phi Delta Kappan, 68, 679–682. Pratt, D. D., Collins, J. B., & Selinger, S. J. (2001). Development and use of the Teaching Perspectives Inventory (TPI). Presented at the annual meeting of the American Educational Research Association. http://www.academia.edu/317236/ Development_ and_Use_of_the_Teaching_Perspectives_Inventory_TPI_. Pugh, R. C., & Hu, Y. (1991). Use of canonical correlation of analyses in Journal of Educational Research articles: 1978–1989. Journal of Educational Research, 84(3), 147–152.
Sadler, D. R. (1998). Formative assessment: revisiting the territory. Assessment in Education: Principles, Policy & Practice, 5(1), 77–84. http://dx.doi.org/10.1080/ 0969595980050104. Shandomo, H. (2008). Continuous assessment in Swaziland: the predictable fate of western innovation in Africa. Saarbrücken, Germany: VDM Verlag Dr. Müller Aktiengesellschaft. Shepard, L. A., Flexer, R. J., Hiebert, E. H., Marion, S. F., Mayfield, V., & Weston, T. J. (1996). Effects of introducing classroom performance assessments on student learning. Educational Measurement: Issues and Practice, 15(3), 7–18. http://dx. doi.org/10.1111/j.1745-3992.1996.tb00817.x. Sherry, A., & Henson, R. K. (2005). Conducting and interpreting canonical correlation analysis in personality research: a user-friendly primer. Journal of Personality Assessment, 84(1), 37–48. http://dx.doi.org/10.1207/s15327752jpa8401_09. Shute, V. J. (2008). Focus on formative feedback. Review of Educational Research, 78 (1), 153–189. Sireci, S. G. (2016). A theory of action for test validation. In H. Jiao, & R. W. Lissitz (Eds.), The next generation of testing (pp. 253–271).Charlotte, NC: Information Age. Smith, K. (2010). Assessment: complex concept and complex practice. Assessment Matters, 2, 6–20. Somech, A., & Drach-Zahavy, A. (2000). Understanding extra-role behavior in schools: the relationships between job satisfaction, sense of efficacy, and teachers’ extra-role behavior. Teaching & Teacher Education, 16(5), 649–659. http://dx.doi.org/10.1016/S0742- 051X(00)00012-3. Stobart, G. (2009). Determining validity in national curriculum assessments. Educational Research, 51(2), 161–179. http://dx.doi.org/10.1080/ 00131880902891305. Swaffield, S. (2011). Getting to the heart of authentic assessment for learning. Assessment in Education: Principles, Policy & Practice, 18(4), 433–449. http://dx. doi.org/10.1080/0969594X.2011.582838. Tabachnick, B. G., & Fidell, L. S. (2013). Using multivariate statistics, 6th ed. Boston: Pearson. Thompson, B., Diamond, K. E., McWilliam, R., Snyder, P., & Snyder, S. W. (2005). Evaluating the quality of evidence from correlational research for evidencebased practice. Exceptional Children, 71(2), 181–194. http://dx.doi.org/10.1177/ 001440290507100204. Thompson, B. (1984). Canonical correlation analysis: uses and interpretation. Newbury Park, CA: Sage. Thompson, B. (1988). Canonical correlation analysis: an explanation with comments on correct practice. Paper presented at the annual meeting of the American Educational Research Association [ERIC Document Reproduction Service No. ED 295 957]. Thompson, B. (1990). Finding a correction for the sampling error in multivariate measures of relationship: a Monte Carlo study. Educational & Psychological Measurement, 50(1), 15–31. http://dx.doi.org/10.1177/0013164490501003. Thompson, B. (1991). A primer on the logic and use of canonical correlation analysis. Measurement and Evaluation in Counseling & Development, 24(2), 80–95. Thompson, B. (2000). Canonical correlation analysis. In L. Grimm, & P. Yarnold (Eds.), Reading and understanding more niultivariate statistics (pp. 285–316). Washington, DC: American Psychological Association. Torrance, H. (1993). Notes: combining measurement-driven instruction with authentic assessment: Some initial observations of national assessment in England and Wales. Educational Evaluation and Policy Analysis, 15, 81–90. http:// dx.doi.org/10.3102/01623737015001081. Trinidad & Tobago Ministry of Education (1998). CAP pilot operational manual. Port of Spain: TTMOE. Trinidad & Tobago Ministry of Education (2000). Integrating continuous assessment into the teaching and learning process operations manual. Port of Spain: TTMOE. UNESCO (2008). Education for All by 2015: will we make it? EFA global monitoring report. Paris: UNESCO & Oxford University Press. Van den Berg, R. (2002). Teachers’ meanings regarding educational practice. Review of Educational Research, 72(4), 577–625. http://dx.doi.org/10.3102/ 00346543072004577. Van der Kleij, F. M., Vermeulen, J. A., Schildkamp, K., & Eggen, T. J. H. M. (2015). Integrating data-based decision making, Assessment for Learning and diagnostic testing in formative assessment. Assessment in Education: Principles, Policy & Practice. http://dx.doi.org/10.1080/0969594X.2014.999024. Vandeyar, S., & Killen, R. (2007). Educators' conceptions and practice of classroom assessments in post-apartheid South Africa. South African Journal of Education, 27(1), 101–115. Voerman, L., Korthagen, F. A., Meijer, P. C., & Simons, R. J. (2014). Feedback revisited: adding perspectives based on positive psychology. Implications for theory and classroom practice. Teaching & Teacher Education, 43, 91–98. http://dx.doi.org/ 10.1016/j.tate.2014.06.005. Wanders, C., Mendez, J. L., & Downer, J. (2007). Parent characteristics, economic stress, and neighborhood context as predictors of parent involvement in preschool children’s education. Journal of School Psychology, 45, 619–636. http:// dx.doi.org/10.1016/j.jsp.2007.07.003. Waugh, R. F., & Punch, K. F. (1987). Teacher receptivity to system-wide change in the implementation stage. Review of Educational Research, 57(3), 237–254. http:// dx.doi.org/10.3102/00346543057003237. Webster-Wright, A. (2009). Reframing professional development through understanding authentic professional learning. Review of Educational Research, 79(2), 702–739. http://dx.doi.org/10.3102/0034654308330970. Weiner, B. J. (2009). A theory of organizational readiness for change. Implementation Science, 4, 67–75. http://dx.doi.org/10.1186/1748-5908-4-67.
J. De Lisle / Studies in Educational Evaluation 50 (2016) 33–45 Wiggins, G. (1998). Educative assessment: designing assessments to inform and improve student performance. San Francisco: Jossey-Bass. Wiliam, D., & Thompson, M. (2007). Integrating assessment with learning: what will it take to make it work? In C. A. Dwyer (Ed.), The future of assessment: shaping, teaching, and learning (pp. 53–82).Mahwah, NJ: Erlbaum. World Bank (2009). Zambia student assessment: SABER country report 2009. Washington, DC: World Bank. Yigzaw, A. (2013). High school English teachers and students perceptions, attitudes and actual practices of continuous assessment. Educational Research & Reviews, 8(16), 1489–1498.
45
Yu, K., & Frempong, G. (2012). Standardise and individualise—an unsolvable tension in assessment? Education as Change, 16(1), 143–157. http://dx.doi.org/10.1080/ 16823206.2012.692210. Zhang, Z., & Burry-Stock, J. A. (2003). Classroom assessment practices and teachers' self-perceived assessment skills. Applied Measurement in Education, 16(4), 323– 342. http://dx.doi.org/10.1207/S15324818AME1604_4. van der Voet, J., Kuipers, B., & Groeneveld, S. (2015). Held back and pushed forward: leading change in a complex public sector environment. Journal of Organizational Change Management, 28(2), 290–300. http://dx.doi.org/10.1108/ jocm- 09-2013-0182.