Quantitative and qualitative components of teachers' evaluation strategies

Quantitative and qualitative components of teachers' evaluation strategies

Teochmg & Teacher Educofm. Printed in Great Britain 0742~5lx/88 $3.oo+O.w Pergamon Press plc Vol. 4, No. I, pp. 41-51, 1988 QUANTITATIVE AND QUALI...

1MB Sizes 0 Downloads 50 Views

Teochmg & Teacher Educofm. Printed in Great Britain

0742~5lx/88 $3.oo+O.w Pergamon Press plc

Vol. 4, No. I, pp. 41-51, 1988

QUANTITATIVE

AND QUALITATIVE COMPONENTS OF TEACHERS’ EVALUATION STRATEGIES

LINDA University

K. ALLAL

of Geneva,

Switzerland

AbstractEvaluation of student learning is an essential part of the teaching process. From interviews with 45 elementary school teachers, three aspects of their evaluation practices are analyzed: (a) the implicit references used when assigning grades, (b) the procedures adopted to combine information resulting from several assessments, and (c) the processes leading to decisions regarding student promotion and placement. The results of the analysis show that although teachers’ evaluation strategies vary widely, they are generally more complex than the minimum required by school system regulations and nearly always take into account both quantitative and qualitative sources of information. Implications for teacher training in the area of evaluation are discussed.

Evaluation of student learning is an essential aspect of the teacher’s pedagogical activity. Techniques of formative evaluation - both formal (e.g., unit tests) and informal (e.g., observations during seatwork or lessons) - are used on a daily basis as a means of regulating the processes of learning and teaching. At the end of prescribed periods (trimesters, semesters, etc.), the outcomes of summative evaluations are transcribed, usually in the form of grades, in the student’s report card, communicated to parents and to other persons outside the classroom, and used as a basis for making decisions about promotion and placement. Classroom evaluation is thus a major aspect of the day-to-day functioning of schooling, as well as a major determinant of the student’s access to further educational opportunities. Over 40 years ago, States (1943) defined several fundamental distinctions between scientists’ criteria of measurement and the criteria that govern teachers’ practice of measurement and evaluation. He described teachers’ strategies as being based on “interplay between objectivity and judgement” (p.6). Over the last four decades, educational measurement specialists and training materials developed for

teachers in the U.S. context have tended to emphasize ways of improving objectivity as the primary, if not sole, concern in the area of classroom assessment. An analysis by Stiggins and Bridgeford (1985) has indicated that two topics - interpretation and use of standardized test scores, and techniques for construction of objective tests - are dealt with in around 90% of the published articles on educational measurement and take up around 95% of the space in introductory measurement textbooks used in teacher training. Several studies show that, although these procedures, and in particular teacher-made objective tests, represent significant components of most teachers’ evaluation strategies, they are almost always combined with other means of assessment which rely heavily on teacher judgements. This is especially true at the elementary-school level, where teacher observations and overall impressions are a major basis for instructional and official decisions (Airasian, 1984). In the survey conducted by Stiggins and Bridgeford (1985), nearly all teachers - across elementary, junior high, and senior high grades - report a high level of use of procedures classified as “structured and spontaneous performance assess41

42

LINDA

based on observation and rating of ments” student behavior and products. Although recent studies provide substantial data regarding the types of assessment procedures used by teachers, relatively less evidence is available concerning the rules that teachers follow in order to combine information from several different sources. In one study of elementary school teachers’ thoughts while assigning grades (Whitmer, 1982), it was found that a variety of factors enter into grading decisions -achievement on tests, daily tasks completed, absences, informal ratings of effort, observations of classroom behavior among which tallies of daily tasks completed appeared to be the most important. Data from the Netherlands show that in elementary school arithmetic instruction, summative evaluation decisions are based more on “practice tasks” and on curriculum-embedded tests used on a daily basis than on tests especially constructed to assess learning outcomes (Janssens, 1986). Although teacher-made tests (unit tests, midterms, finals, etc.) take on increasing importance in junior and senior high school, spontaneous and structured performance assessments remain major factors in the assignment of grades and the reporting of results to parents (Stiggins & Bridgeford, 1985), and may include aspects of student behavior such as participation in class discussion, or completion of homework (Haertel, 1986). In the French-speaking research community in Europe: considerable attention has been devoted in recent years to studies of formative evaluation processes as means of regulating learning and teaching in the classroom (Allal, Cardinet, & Perrenoud, 1979). Attempts have been made in particular to identify the way in which evaluational practices are integrated into the daily functioning of the classroom (Perrenoud, 1984), and to define the types of regulation - interactive, retroactive, and proactive - that result from teachers’ formal and informal assessment procedures (Allal, 1983,1987). The tendency to focus on various aspects of formative evaluation can be explained in part by the fact that this function of evaluation is closely linked to other topics, for instance, individualized instruction, classroom management, learning models, of theoretical interest to educational researchers. Neglect of teachers’ sum-

K. ALLAL

mative evaluation practices (grading, report cards, decisions about promotion, etc.) probably reflects as well, at least in the European context, a general lack of interest on the part of the research community for the more mundane, “nuts and bolts” aspects of teaching. It is beginning to be recognized, however, that attempts to develop valid conceptual models of the teacher’s professional activity, as well as efforts to improve teacher training, must be based on thorough studies of existing practices, and in particular those practices which teachers devise in response to institutional constraints. One type of constraint which teachers nearly always encounter is the school system’s regulations governing the formal aspects of summative evaluation: frequency of report cards sent to the students’ parents, grading scale or other type of code used for reporting results, format of the report card, and so on. Since nearly all school systems use summative evaluation outcomes not only for certification purposes but also for predictive decisions pertaining to the student’s status the following year, regulations often stipulate the standards (grade point average or other criteria) for promotion to the next grade, or for placement in the sections or levels of secondary school studies. Our research on teachers’ strategies for dealing with the summative-predictive functions of student evaluation is based on the general hypothesis that in this area, as in other areas of teaching, practitioners develop a wide variety of “routines” which correspond to different ways in which school system rules are integrated (interpreted, transformed, enlarged, restricted, etc.) within the operational framework governing each teacher’s pedagogical activity. It is probable that initially improvised assessment practices - ways of “playing” with the rules of evaluation (Perrenoud, 1986) -ultimately lead to highly scripted, stable “recipes” that teachers apply year after year (Huberman, 1983). The study reported here focuses on the evaluation practices of first- through sixthgrade elementary school teachers in Geneva, Switzerland. Three phases of teachers’ strategies for dealing with the summative-predictive functions of evaluations are examined in this paper. 1. The implicit references used when assigning grades for the outcomes of a single assess-

Teachers’

Evaluation

ment (test, or other task). 2. The procedures adopted to combine information resulting from several assessments in order to determine the trimester grade to be transcribed in the student’s report card. 3. The processes leading to decisions at the end of the year regarding student promotion (passing or repeating a grade level within elementary school) and student placement (admission to the sections of junior high school).

Method The population surveyed consists of the elementary school teachers (grades l-6) working in the public school system of the canton of Geneva. One aim of the study was to provide the Director of elementary schools with information on teachers’ preferences regarding two report card formats used in the canton. Since the data on this point are of purely local interest, they will not be presented in this paper. A second aim of the survey was to obtain detailed descriptions of the teachers’ existing evaluation practices, particularly as related to the school system regulations on grading, report cards, and promotion/placement decisions. The analysis of these practices is the focus of this paper. A sample of 45 teachers was selected on the basis of two variables. The first was the type of report card used in the school; 1.5 teachers were chosen from three schools using an experimental report card, and 30 were sampled from four schools using the traditional report card officially adopted in the canton. The schools were selected on the basis of canton-wide statistics as representative of the elementary school population with respect to the nationality of the children and the socioeconomic status of their families. The variable “type of report card” was crossed with a second variable “grade level” so as to insure a sample of 15 teacher respondents in each of three strata: first-second grades (students aged 6-8 years), third-fourth grades (S10 years), and fifth-sixth grades (10-12 years). An interview lasting approximately one hour was conducted with each teacher on the basis of a semi-structured format. At the end of the interview, a questionnaire designed to collect

Strategies

43

additional data was left for the teacher to fill out and return to the interviewer. The interview format and the questionnaire were constructed on the basis of information from a series of open-ended, pilot interviews. All data collection was carried out by a former elementary school teacher working under the supervision of the author of this paper. When teachers talk about their evaluation practices, or about any aspect of their teaching practice for that matter, they often use a highly personalized language which they have devised for communicting with their class. One teacher has dubbed the mathematics quiz given first thing each Monday morning the “weekly wakeup;” another teacher talks about his “plus and minus” system for recording homework done each day; still another distinguishes “exams” (given two to three times per trimester) from “check-ups” (short quizzes given every week). Because of this diversity of vocabulary for talking about evaluation, it was necessary during the interview to ask constantly for explanations and concrete examples. Subsequently, in the stage of data processing and analysis, a system of categories had to be defined in order to permit classification and condensation of data that, at a surface level, display a very high degree of heterogeneity. When one first reads the interview protocols, one is tempted to conclude that the 45 teachers have 45 different ways of carrying out student evaluation. But successive operations of coding and classification allow patterns to appear and permit identification of comparable components in the teachers’ evaluation strategies. After developing and applying a system of data classification for each of the three phases of evaluation dealt with in the interview, X2 tests were carried out to determine whether the teachers’ practices vary systematically according to the type of report card used, or according to the grade level taught. None of the tests showed a significant effect at the probability level that was adopted (p < .Ol). This may be partly due to the relatively small size of the sample, but it also reflects the wide range of inter-individual variation inherent in the data. The lack of a grade level effect may be explained at least in part by the fact that teachers in the Geneva school system often change grade from year to year, rather than al-

LINDA

44

ways teaching the same grade; they therefore tend to develop a personal system of student evaluation which they apply, with minor adaptations, whatever the grade being taught. Three summary tables grouping the data for all 45 respondents will serve as a basis for presentation of the characteristics of the teachers’ evaluation strategies. Explanations regarding aspects of the Geneva school system that differ from general conditions in the United States are given in notes appended to each table. Occasionally, reference will be made in the text to additional tabulations carried out to verify specific points or to refine interpretations of the basic findings.

Results The presentation deals with three phases of teachers’ evaluation strategies: (a) the practices followed when assigning grades for the outcomes of a single evaluation, (b) the procedures adopted for determining the trimester grade to be recorded in the student’s report card, and (c) the processes leading to decisions about student promotion and placement. Implicit References tices In the Geneva ficial regulations

of Teachers’ Grading Prac-

school system there are no ofindicating how teachers are

K. ALLAL

supposed to assign grades for student results on tests or other types of tasks. In order to determine what procedures teachers generally follow and in particular, the degree to which their practices implicitly reflect norm-referenced or criterion-referenced principles of measurement, the respondents were confronted with brief written descriptions of five typical ways of assigning grades (as derived from the pilot interviews). The teacher was asked to indicate for each grading practice if he or she followed this procedure at least some of the time and, if so, in which subject matter area(s). The teacher was also requested to indicate any additional ways used to assign grades and the subject matter area(s) concerned. The classification of the practices as norm-and/or criterion-referenced was not provided to the teachers and, since this distinction is not dealt with in a systematic way in teacher training in Geneva, it is unlikely that respondents’ answers were influenced by concerns of theoretical acceptability. A first finding is the fact that nearly all teachers (88%) follow more than one grading practice, and that a sizable number of teachers (36%) apply at least three different procedures which vary according to the subject matter area or the type of instrument being used. As shown in Table 1, strictly norm-referenced grading is mentioned by 42% of the teachers. Practices which reflect a criterion-referenced approach are more often based on the number of mistakes the student made (mentioned by 64% of the

Table 1 implicit References of Teachers’ Grading Practices (n = 45) Reference

Criterion

Respondents n (%)

Grading practice” (for tests and other tasks) referenced

Norm-referenced Mixed reference

Grading

according

to the number

of mistakes

Grading obtains

according

to the number

or percentage

the student

made

29

64

of points the student 17

38

The grade 4 is located at the median, and the other grades are determined by the distribution of the students’ results

19

42

The grade 4 is attributed for a predetermined percentage of points, and the other grades by the distribution of the students’ results

19

42

The grade 4 is determined by the definition of minimal objectives attained, and the other grades by the distribution of the students’

13

29

9

20

to be results

Other a In the Geneva school system, grades are assigned on a scale of O-6, with 3 as the minimum roughly equivalent to a C in the letter scale commonly used in the United States.

passing

grade;

the grade 4 is

Teachers’

Evaluation

teachers) than on the number of points obtained (mentioned by 38%). Both of these approaches represent what might be termed “blind” or “unanalyzed” criterion-referenced grading in the sense that the fixing of the criterion for each grade follows standard intervals (e.g., 6 = 90100%, 5 = 8&89%, 4 = 70-79%, etc.), without the teacher having analyzed in pedagogical terms the objectives to be attained, or the minimum competency levels required for future work. Given the frequency of the error-counting approach, which is particularly widespread for spelling dictation (the b&e noire of all French-speaking school children), it is not surprising that students often view evaluation as an essentially negative process in which one invariably loses one’s initial “capital” of points. Among the most interesting practices are those that involve a mixed reference. In these cases, one point in the grading scale (usually the grade 4, equivalent to a “C” in the U.S. system) is fixed in a criterion-referenced manner, either on the basis of a predetermined percentage of points (mentioned by 42% of the teachers), or by the specification of minimal objectives (mentioned by 29%). The other grades are then assigned in an intuitive norm-referenced manner according to the distribution of the students’ scores. Although measurement specialists (and measurement textbooks written for teacher training) generally insist on the necessity of clearly distinguishing norm- and criterion-referenced evaluations. The data shows that the

45

Strategies

complexity of classroom practice often leads teachers to combine the two references. Their rationale for doing so seems to entail an intuitive form of “cost-benefit” analysis that can be summarized as follows: It is worth taking the time to decide what students must know for a grade of 4 (seen more or less as a minimal competency level), but it is too time consuming to do a similar analysis for each point in the grading scale, and a student’s position with respect to the others is probably as valid a basis as any other for assigning grades in the zones above and below 4. Procedures for Determining

Trimester Grades

Three times in the year, elementary school teachers must determine the grade that will be recorded for each subject in the report card sent to the student’s parents. The school system regulations state merely that each trimester grade is to be based on the results of at least three tests (contrbles). In order to determine how teachers interpret, modify, or enlarge this rule, respondents were asked to describe the procedures they follow to assign trimester grades in the two major subject matter areas, French and mathematics. The data presented in Table 2 concern the trimester grade for mathematics. The findings are very similar for the trimester grades in French, but for the sake of brevity are not presented in this paper. In Table 2, the data are grouped in two major

Table 2

F

Teachers’ Procedures for Determining the Trimester Grade in Mathematics (n = 45) Respondents n

W)

11 25 9

24 56 20

24

53

22

49

7

16

18 27

40 60

Elements that enter into the determination of the trimestergrade: “Big” tests:

less than 3 per trimester 3 per trimester more than 3 per trimester

+ “Small” tests and quizzes + Tallies of daily work done (worksheets, exercises, homework, lesson participation, etc.) + Global tasks (research projects done individually or in small groups, games, problemsolving situations) Rules for determining the trimester grade: Strict calculation (algorithm) Calculation + “adjustments”

46

LINDA K. ALLAL

categories. The first category, elements that enter into the determination of the trimester grade, includes two subcategories: 1. The number of “big” tests administered by the teacher; this subcategory corresponds to teacher-constructed tests of substantial length covering several content chapters and/or objectives, and for which a grade is generally attributed; 2. Other types of information or assessments which are obtained by the teacher as “adjunctions” or means of “supplementing” the results of the “big” tests: (a) Results of “small” tests or quizzes given frequently, once or several times each week, and which are often combined to obtain one or more grades that enter into the trimester grade; (b) “Tallies” recorded on a daily basis of various tasks carried out by the students (e.g., exercises completed, worksheets and homework assignments turned in, etc.) and/or observations of student behavior (e.g., active participation in lessons, or in group work): these tallies generally take the form of checklists or tabulations in which the teacher’s assessments are recorded by a personally defined set of symbols, for instance: 0 homework turned in 0 homework turned in and over half correct &l homework turned in and entirely correct; (c) Assessments of students’ work on “global tasks,” such as research projects done individually or in small groups, or problem-solving situations; in some cases the teacher uses the official grading scale, in other cases he or she devises another symbol system (e.g., E = excellent, S = sufficient, I = insufficient) to record assessments. One category of instruments often present in U.S. studies of teacher assessment procedures, that is, published tests of the standardized type or of the curriculum-embedded type, does not appear in the survey because tests of these types are virtually nonexistent in elementary schooling in French-speaking Switzerland. The second category in Table 2 (rules for determining the trimester grade) includes two cases: (a) strict calculation of the grade by a standard algorithm (e.g., calculation of the average of the grades given for various tests and other tasks), and (b) calculation moderated by various types of adjustments. These adjust-

ments occur primarily when the average calculated on the basis of several grades leads to a decimal number (e.g., 4.6) and the teacher is obliged to decide which higher or lower whole number grade (4 or 5) will be attributed for the trimester. Elements that enter into the determination of the trimester grade. The data show that most teachers (76%) base their trimester grades on at least three big tests, as required by school regulations. It is important to note, however, that those who administer less than three big tests always apply one or more complementary procedures (quizzes, tallies, etc.). Moreover, among the teachers who administer three or more big tests, the vast majority (74%) carry out one or more additional procedures that provide complementary information taken into account in the trimester grade. Two complementary procedures, little tests and tallies, are each used by approximately half the teachers. In contrast, procedures based on global tasks are much less frequently used (mentioned by 16% of the teachers). This latter finding shows that the reform of mathematics instruction, introduced in French-speaking Switzerland 15 years ago, has had very little impact on teachers’ evaluation practices. Even teachers who propose research projects and problem-solving situations to their students as “learning activities” tend, for evaluation purposes, to stick with traditional paper-and-pencil assessment devices. Rules for determining the trimester grade. Since trimester grades are always based on several elements, teachers have to develop rules for combining the information they have collected. Our data show that 40% of the teachers apply a strict calculation algorithm, the most common one being the computation of a simple or weighted average of the grades obtained by the student on tests and other tasks, with application of a standard arithmetic rounding rule to determine the grade to appear in the report card. The majority (60%) of the teachers interviewed do not follow such a strictly standardized approach. They carry out calculations based on the grades obtained by the student,

Teachers’

Evaluation

but they introduce various types of adjustments which can have a significant impact on the determination of the trimester grade. In approximately one-third of the cases, the adjustments are essentially quantitative. For example: after calculation of the trimester average, any “unused” tenths of points (e.g., 0.4 unused points when 4.4 is rounded to 4.0) or any “additional” tenths of points (e.g., 0.4 added on when 4.6 is rounded to 5.0), are marked down in the teacher’s grade book and, at the next trimester, these tenths are added to or subtracted from the new trimester average. Most often, however,the adjustments are the result of a more qualitative type of synthesis based on a variety of elements: assessment of effort or of perseverance based on the information in the tallies of daily work done, unrecorded and intuitive observations of the child’s attitudes and work habits, global judgements regarding the adequacy of the child’s skills in problem-solving situations, and so on. In some cases, a systematic attempt is made to interrelate quantitative and qualitative sources of information. In other cases, qualitative information is used selectively, for example, if a teacher believes that the average of a student’s test grades does not reflect “true capacity,” a qualitative assessment of the student’s initiative, curiosity, and reasoning ability when carrying out mathematics games will lead the teacher to “round off” an average of 4.3 to 5, even though a similar adjustment is not made for other students whose test results are judged to be fair measure of their competency level. To illustrate the complexity of the procedures teachers devise for determining trimester grades in mathematics, the following is an example. A fifth-grade teacher administers three big tests each of which provides a grade, and around 10 small quizzes whose results are combined to give a fourth grade. The average of the four grades is calculated with the grades for the big tests having twice the weight of the grade for the quizzes. Tallies of worksheets done by the student are converted into an additional grade which, added to the preceding average, leads to a second average. This second average generally becomes the trimester grade. If, however, the student has conducted an individual research pro-

Strategies

41

ject and the quality of work is judged positively, this may lead to a further adjustment, “boosting” the average to the next highest whole number grade. In summary, elementary school teachers’ procedures for determining trimester grades are considerably more complex than the minimum required by the school system regulations (i.e., administration of three tests and calculation of the average grade). In most teachers’ strategies, there is a quantitative “core” of test scores or grades which is supplemented and/or adjusted by more qualitative types of information corresponding in large part to spontaneous and structured performance assessments as defined in the survey by Stiggins and Bridgeford (1985). A teacher’s evaluation strategy appears to follow a progressively routinized “script” that governs action in a stable manner from trimester to trimester and, within a certain range, for example, fourth to sixth grades, across grade levels.

Decisions of Promotion and Placement In this third and final phase of teachers’ evaluation strategies, decisions are made at the end of the school year concerning the student’s status the following year. At the end of each elementary school grade, it is necessary to decide whether the student will be promoted to the next grade level, or repeat the grade just finished. At the end of sixth grade, decisions are made regarding student placement in the preacademic or preprofessional sections of the junior high school system. According to official regulations, these decisions are based exclusively on the average of the trimester grades attained by the student in the major subject matter areas (French and mathematics), To be promoted to the next grade level, or to be admitted into the college preparatory sections of seventh grade, the student must attain a specified grade point average. In questioning teachers about these decisions, it was hoped to determine the extent to which other factors than average grades are taken into consideration in the decision-making process and the persons (in addition to the teacher) who are involved in this process. The data, after classification and conden-

LINDA

48

K. ALLAL

Table 3 Teachers’ Procedures for Making Decisions of Promotion and of Placement (II = 45) Decisions of promotion to next grade

Decisions of placement in junior high sections”

Respondents n

Respondents W)

n

(%)

6

13

11

24

11

24

7

16

27

60

17

38

1

2

10

22

28 17

62 38

9 22

20 49

0

0

14

31

15 6 3

33 13 7

9 I 7

20 I 16

19

42

12

27

2

4

17

38

Importance attributed to various factors: -Primary importance attributed to grades -Equivalent importance attributed to grades and to other factors -Primary importance attributed to other factors than grades -Nonresponse Decision processes: -Discussion -Discussion

among adults (teacher, parents, inspector”) among adults and with the child

-Nonresponse Determiningopinion. -Teacher --Inspectorb -Parents -Negotiation,

variable

from case to case

-Nonresponse

a In the Geneva school system, “streaming” is introduced at the beginning of junior high school (seventh grade). Students are placed in one of four sections: “Latin” and “scientific” (which are college preparatory sections), “general” and “practical” (which prepare students to enter commercial and vocational training at age 15). h In the Geneva elementary school system, inspectors carry out the administrative and supervisory tasks that principals carry out in the U.S. system. Inspectors are responsible for several schools in a district. According to the school system regulations, they have final responsibility for decisions regarding promotion from one grade to the next in elementary school, but are not concerned with decisions regarding student placement in junior high school sections.

sation, appear in Table 3. For decisions of promotion, all teachers gave an account of how they proceed (notwithstanding occasional omissions of some details). For placement decisions, however, a sizable number of teachers who had never taught sixth grade gave no answer or only a partial answer, which explains the nonresponse rates of 22-38%. The data are grouped in three major categories which will now be discussed. Importance attributed to grades and to other factors. A small minority of teachers (13% for promotion decisions, 24% for placement decisions) attribute primary importance to the student’s grades. For most teachers, promotion decisions are based primarily on other factors such as: the overall psychological and intellectual development of the child, physical matur-

ity, the general evolution of achievement (progression or regression), motivation for school work, relationships with peers and degree of social integration, particular difficulties encountered during the year, for instance, prolonged absences, family problems, the fact that he arrived in the middle of the year, or is not yet at ease speaking French. In general the teachers have no systematic means of record keeping for these factors (except for traditional aspects such as absences); thus, their decisions are based largely on informal observation, intuitive interpretation regarding the relative importance of various factors, and professional judgement concerning the advantages and disadvantages for the child if promoted or, on the contrary, he or she repeats a grade. For placement decisions at the end of sixth grade, slightly more importance is given to grades but other factors remain

Teachers’

Evaluation

an essential basis for decisions, and in particular the parents’ wishes regarding the child’s future studies. Decision process. All teachers state that the decision is based on a discussion with other parties. For promotion decisions, the majority of the teachers (62%) prefer that the discussion take place among adults (teacher, the child’s parents, the inspector in charge of the school district). For placement decisions, on the other hand, most teachers who answered (22/31 = 71%) consider that the student should also participate in the discussion regarding future studies. Determining opinion. The teachers were asked: In case of conflicting views among the persons involved in the decision, whose opinion has priority? Most teachers answered, “it varies from case to case, sometimes one person, sometimes another,” or “it’s always a matter of negotiation, no one opinion dominates.” If one examines the responses of the remaining teachers who affirm that one opinion usually has priority, it is found that for promotion decisions, priority tends to be given to the opinion of the teacher and/or inspector rather than to the child’s parents (46 vs. 7%), whereas for placement decisions, priority is attributed with similar frequency to the teacher and to the parents (20 and 15% respectively). To summarize, although the school regulations indicate that promotion and placement decisions are determined in a virtually automatic manner by a single quantitative element (grade point average in the principal subject matters), in reality a wide range of qualitative factors are considered and most often given more importance than grades. All teachers state that decisions are based on discussions during which a variety of factors are considered, analyzed, weighed, and compared. These discussions generally entail consultation, if not negotiation, with students’ parents and, in the case of placement decisions, the students themselves often participate. Decision making in the context of summative-predictive evaluations thus often appears to be more a matter of social transaction between interested parties (teacher, inspector, parents, student) than one of simple measurement of learning outcomes.

Strategies

49

Conclusions Analysis of the interview and questionnaire data leads to the following conclusions regarding elementary school teachers’ strategies for dealing with the institutionally defined functions of summative and predictive evaluation. 1. Evaluation practices show a wide range of variation among teachers, as well as considerable diversity within the set of “routines” developed by any given teacher. When attributing grades for student outcomes on tests or other tasks, most teachers use more than one approach, a norm-referenced approach for some tasks, a criterion-referenced approach for others, a mixed norm- and criterion-referenced approach for still others. Diverse references are adopted for pragmatic reasons, and because teachers believe intuitively that a combination of approaches is undoubtedly more dependable than any single approach. The procedures used by teachers to determine trimester grades show a similar pattern of diversity. Nearly all teachers rely on several sources of information: student outcomes on major tests, results on smaller tests or quizzes, tallies of work done on daily tasks and, less frequently, assessments of performance in more global, open-ended situations. The procedures devised by teachers for combining different types of information vary considerably but in most cases entail application of a calculation rule, followed by adjustments which take into account qualitative information that could not be easily included in the initial calculation. Decisions of promotion and placement are also based on diverse types of information: the student’s grade point average is one element, but even more importance is given to global, intuitive assessments of the student’s capacities and needs, and to parents’ wishes. 2. Despite the diversity of teachers’ practices, certain patterns can be identified as common to nearly all their evaluation strategies, and in particular the tendency to develop strategies that are considerably more complex than the minimum required by the school system regulations. This tendency appears to be linked to teachers’ intuitive conception of the problem of reliability. Convinced that any given evaluation may not adequately reflect the student’s “true capacity,” teachers tend to increase the number

50

LINDA

K. ALLAL

of evaluation occasions and to diversify the type of evaluation conditions. This approach is fundamentally sound: although the reliability of any typical classroom evaluation instrument is likely to be quite low, the reliability of an endof-trimester grade or an end-of year promotion decision can be quite high if it is based on a sufficiently large and varied sample of relevant information. 3. One common aspect of nearly all teachers’ strategies is the interplay between quantitative and qualitative sources of information. Figure 1 summarizes the way in which these types of information are combined at two stages in the evaluation process, first, at the end of each trimester, when quantitative test results (scores transformed into grades) are combined with other more qualitative types of information (tallies, intuitive observations, and judgements) to determine the trimester grade, and secondly, at the end of the year, when the student’s grade point average is considered along with other largely qualitative sources of information (including parents’ viewpoints) to determine decisions of promotion and placement. The findings summarized in Figure 1 show needs to be that classroom evaluation approached more as a process of decision making than as a process of measurement. From this point of view, it is possible to identify several important gaps in current teacher training pro-

grams. Although the topics typically dealt with, such as the construction of objective testing instruments, the definition of norm- and criterion-referenced interpretation procedures, are important and should not be neglected, other topics corresponding to major components of teachers’ evaluation strategies should be treated in greater detail, both in preservice and inservice training. In particular, with respect to the summative/predictive functions of evaluation linked to the grading and report card systems of the school system, teachers need explicit training in the following areas: 1. How to construct simple instruments (checklists, matrices and charts, coding systems, etc.) for recording qualitative data based on observations and interactions with students. 2. How to avoid or reduce biases (errors of estimation and judgement) that commonly occur in informal, intuitive assessment procedures. 3. How to develop and use techniques for combining quantitative and qualitative information drawn from several different sources. 4. How to conduct discussions with parents and students regarding decisions of promotion and placement. In addition to defining a “mini-curriculum” of objectives for teacher training in evaluation, it is necessary to reconsider the procedures that are likely to be best suited to training in this aspect of teaching. We would advocate an approach in

+ Numerical grades in the student’s report

card

more qualitative types of information, including parents’

+

Decisions regarding promotion and placement

Fig. 1. Quantitative

and qualitative

components

of teachers’

evaluation

strategies.

Teachers’

Evaluation

which future teachers are first confronted with the “hard realities,” inevitable hesitations and doubts provoked by the act of evaluation - as revealed by classroom observations, discussions with students, interviews with parents, and only at a second stage, are introduced to the conceptual and methodological tools that can help provide answers to the questions raised in the first stage. Inservice training needs to be organized in a similar way, starting with an analysis of teachers’ concerns and problems in the area of evaluation, and then going on to the techniques - of measurement, of intuitive appraisal, and of social transaction - that are useful, if not necessary for survival, in the complex enterprise of classroom evaluation.

References Airasian, P. W. (1984). Classroom assessment and educational improvement. Revised version of a Keynote Address given at the Northwest Regional Educational Laboratory, Portland, Oregon. Allal, L. (1983). Evaluation formative: Entre l’intuition et l’instrumentation. Measure et evaluation en education (Measurement and evaluation in education), 6(5), 37-57. Allal, L. (1987). Vers un Clargissement de la pedagogic de maitrise: Processus de regulation interactive, rttroactive et proactive (Toward an enlargement of mastery learning: Processes of interactive, retroactive and proactive regulation). In M. Huberman (Ed.), Muirriser les processus d’apprentissage: Fondements et perspectives de la pedagogic de maitrise(Mastering the processes of learning: Foundations and perspectives of mastery learning). Neuchltel: Delachaux & Niestle.

51

Strategies

Allal, L., Cardinet, J., & Perrenoud, P. (1979). L’evaluation formative duns un enseignemenf differencie’ (Formative evaluation in differentiated instruction). Bern: Lang. Haertel, E. (1986 April). Choosing and using classroom tests: Teacher’s perspectives on assessment. Paper presented at the meeting of the American Educational Research Association, San Francisco. Huberman, M. (1983). Repertoires, recettes et vie de classe: Comment les enseignants utilisent l’information (Repertoires, recipes, and classroom life: How teachers use information). Education et recherche (Education and research), 4(2), 157-177. Janssens, F. J. G. (1986, April). The evaluation practice of elementary school teachers. Paper presented at the meeting of the American Educational Research Association, San Francisco. Perrenoud, P. (1984). La fabrication de l’excellence scolaire: Du curriculum aux pratiques d’evaluation (The production of scholastic excellence: From the curriculum to evaluation practices). Geneva: Droz Perrenoud, P. (1986). L’evaluation codifiee et le jeu avec les regles: Aspects d’une sociologic des pratiques. (Codified evaluation and playing with the rules: Aspects of a sociology of practice). In J. M. DeKetele (Ed.), L’evaluation: approche descriptive ou prescriptive? (Evaluation: Descriptive or prescriptive approach?). Brussels: DeBoeck. States, D. E. (1943). Differences between measurement criteria of pure scientists and of classroom teachers. Journal of Educational Research, 37, 1-13. Stiggins, R. J., & Bridgeford, N. J. (1985). The ecology of classroom assessment. Journal of Educational Measuremen& 22.271-287. Whitmer, S. P. (1982, March). A descriptive multimethod study of teacher judgement during the marking period. Paper presented at the annual meeting of the American Educational Association, New York. Received

30 September

1987 0