The class size question in primary schools: policy issues, theory, and empirical findings from the Netherlands

The class size question in primary schools: policy issues, theory, and empirical findings from the Netherlands

International Journal of Educational Research 29 (1998) 763—778 Chapter 5 The class size question in primary schools: policy issues, theory, and emp...

108KB Sizes 0 Downloads 32 Views

International Journal of Educational Research 29 (1998) 763—778

Chapter 5

The class size question in primary schools: policy issues, theory, and empirical findings from the Netherlands Roel J. Bosker* Faculty of Educational Science and Technology, University of Twente, Enschede, The Netherlands

Abstract Primary schools in the Netherlands are autonomous in their decisions on grouping of pupils, and therefore on class size matters, because the state may not interfere in pedagogical-didactical issues according to the constitution. Surveys indicate that both across as well as within schools (across grades) class sizes consequently vary quite a lot. Capitalizing on this situation, a randomly drawn sample of 416 schools was used to collect data on class sizes and pupil achievement in grades 2, 4, 6 and 8. Social-ethnic status, IQ and gender also were examined. Using multi-level statistical models the strongest size effects were obtained in grade 2 in classes below 25 and above 35. Interaction effects between socio-ethnic status, sex and IQ with class size on achievement showed inconsistent trends. In three out of ten cases the achievement gap widened with decreasing class size.  1998 Elsevier Science Ltd. All rights reserved.

The first study on class size conducted in the Netherlands dates back to the turn of the century, when pedagogians recommended class sizes of around 20 (Van Gelder, 1959). At that time more than half of the classes in primary schools had 40 or more pupils in them. The average class size in 1960 was still 36, and from that time a slow but regular decrease in average class size can be observed (Warries, 1985). In 1994, a committee evaluating the integration of kindergarten and primary schools into one cycle (‘‘het basisonderwijs’’) came to the conclusion that primary education was not functioning well (CEB, 1994), a conclusion underpinned by the low position 9-yearold pupils have in the rankings of the international reading literacy study of the IEA (Ross and Postlethwaite, 1994). In reaction to this report of the committee, teacher unions suggested the class sizes as one probable cause. It was only then that the class size issue became an item on the political agenda and the Ministry of Education asked

* E-mail: [email protected]. 0883-0355/98/$ — see front matter  1998 Elsevier Science Ltd. All rights reserved. PII: S 08 8 3-0 3 55 ( 9 8) 0 0 06 2 - 7

764

R.J. Bosker/Int. J. Educ. Res. 29 (1998) 763—778

the inspectorate for further information in this respect. The findings that were presented in the report of the inspectorate were surprising. It was not so much that the average class size in primary schools was 25.6, but more that this figure was 27.8 in schools without pupils-at-risk, and even more perturbing was the fact that some classes consisted of more than 40 pupils. This gave rise to the installation of a national advisory committee on ‘‘Class Size and the Quality of Education’’ to further study this problem and advise on possible improvements (CKAG, 1996). This chapter describes both the theoretical considerations and empirical findings that are relevant in the Dutch context. It begins by describing this context and gives some further information on the class size situation in primary schools. Second, some plausible causes for class size differences between, as well as within, countries are discussed. Then a short account is presented on alternative theoretical explanations on why class size affects (or does not affect) pupil achievement. The empirical contribution to the international knowledge base on this issue is then given by presenting the results of a large-scale observational study on class size and its association with pupil performance. In the final section the main conclusions are presented, as well as some suggestions on how to make further empirical progress in our understanding of the mechanisms that may be the intermediate links between class size and pupil performance.

1. Context In understanding the issues that are related to the class size debate, it is relevant to give a brief sketch of the Dutch situation both with respect to policy issues as well as with respect to class size differences within as well as between schools. Around the turn of the century the school funding controversy was settled by law, in which it was stated that both public schools and religious schools should be equally funded. Moreover, by law, it was agreed that parents had the freedom to fund schools; they had the freedom to determine the religious or ideological basis of such a school and schools accordingly had the freedom to determine the content of instruction. In essence, however, the education system has three types of primary schools: public religious (e.g., Roman Catholic, Protestant, Islamic) or ideologically based (e.g., Jena Plan, Montessori etc), and private. The first three types are completely state funded. The freedom to set up schools and of determining their content of instruction has as an ideological basis, that the state should not interfere in pedagogical matters that are related to the world views to which parents (and teachers) adhere. For this reason there is no national curriculum in the sense of France or the UK, but instead an indication of the subject matter to be taught (to be verified by the inspectorate) and some global goals to be achieved at the end of primary schooling. Standards of course are implicitly present by the foreshadowing effects of the tracked system of secondary education and, more explicitly, by a test that each school should administer to its pupils in grade 8. Since seventy percent of the schools use a test developed by the national testing service (Cito) for this purpose, these more or less implicit standards are operationally formalized in the content and the domains of

765

R.J. Bosker/Int. J. Educ. Res. 29 (1998) 763—778 Table 1 Class sizes at the start and at the end of the school year (1994/1995)

Grade Grade Grade Grade Grade Grade Grade Grade Grade

1 (4-yr olds) 2 1/2 3 4 5 6 7 8

Beginning

End

19.4 24.7 23.7 23.3 25.4 25.3 25.0 25.2 25.0

26.1 26.2 29.0 23.7 25.3 25.2 24.8 25.0 24.9

Source: Inspectorate of education (1995).

this test. The principle of non-interference by the state, however, dominates the education system. Most decisions are made by the school board (for public schools this is the municipality, for religious or ideologically based schools this board consists mostly of parents); in practice, however, much of the decision making power is delegated to the school’s principal. This delegation may be viewed as the driving force for the enormous variation in class size within schools. One cause of variation among schools is the result of some special funding arrangements for schools in rural areas and schools that have many underprivileged pupils. The funding formula, original based on a minimum pupil—teacher ratio for each age-group has been made (almost) linear. As a consequence multi-grade classes are a common phenomenon: 80% of the primary schools have at least one multi-grade class, and 46% of all classes are multi-graded (Kral, 1997). The actual size of classes in primary schools (operationally defined as the size of the class in which the pupils spend more than half of their time in school) was assessed by the inspectorate (Inspectie van het Onderwijs, 1995). Subsequently, a re-analysis of these data was done by the inspectorate to gain greater insight into the within school variability in class sizes (Inspectie van het Onderwijs, 1996). Table 1 contains descriptive data on class sizes. Two interesting conclusions can be derived from the data. First, with a slight exception of grade 3 classes, Dutch primary schools apparently do not have a policy on class size: in all grades they have approximately the same size. Second, because of continuous enrollment of 4-year olds, class sizes in the lower grades increase during the school year and by the end of the year are the greatest. When looking into variation in class size between as well within schools Table 2 contains the relevant data. The results presented in Table 2 indicate an enormous variation in average class size. First 17% of the schools (mostly rural) have classes of less than 20 pupils on average, whereas at the upper tail, 15% of the schools (in urban areas, with no or just a small number of underprivileged pupils) have an average class size of over 30, with 3% having an average of over 35. Second, the variation in class

766

R.J. Bosker/Int. J. Educ. Res. 29 (1998) 763—778

Table 2 Variation in class size across and within schools (1994/1995) Average class size per school

)15 16—20 21—30 31—35 '35

Differences in size between largest and smallest class within schools 6% 11% 71% 12% 3%

45 6—10 11—15 16—20 '20

11% 24% 38% 14% 8%

Source: Inspectorate of education (1996).

size within schools is large as well. In 65% of the schools the difference between the largest and smallest class within the school is more than 10 pupils. Using data from the IEA Reading Literacy Study of 9-year old pupils (Ross and Postlethwaite, 1994) it can be concluded that with respect to both the pupil-to-teacher ratio and class size, the Netherlands’ data are relatively unfavorable. Of the European countries involved in this IEA study, only Spain (28.8) and Ireland (31.0) have both larger classes and lower school staff.

2. Class size differences: causes and mediation of effects The variations in class sizes among countries, among schools within a country, and even within schools, may have different causes. Variations among countries have to do with national policies regarding the funding of schools as well as with teacher salaries. The Netherlands falls below the OECD average of 6.1% with only 5% of the gross national product being spent on education. In addition, because of the noninterference principle, there are no restrictions on the maximum class size in the Netherlands (unlike countries such as Denmark, Norway, Germany, Finland, France, Italy, Greece and Scotland (see RISE, 1994), that have maximum class sizes for some or all of the grades in primary education). The variation in class size within the Netherlands is a direct consequence of the funding formula involved, a formula that favors small schools in rural areas and schools with high proportions of underprivileged pupils. Part of the variation across schools, but all of the variation within schools, is a direct consequence of school policy: does the school want to work with a year group system, with multi-grade/multi-age classes, or one-to-one tutoring or for other special activities for pupils-at-risk? Only 30% of the schools (according to research conducted by Polder and Gijtenbeek, 1995) refer to the latter two considerations (with respect to pupils-at-risk) when making policy plans for the deployment of their staff. That schools hardly make use of their policy-making capacities can also be demonstrated by looking into the data on class size in schools with many underprivileged pupils (see Table 3).

767

R.J. Bosker/Int. J. Educ. Res. 29 (1998) 763—778 Table 3 Class size by school score, which indicates the percentage of underprivileged pupils within a school Grade

Grade Grade Grade Grade

School score

1 3 5 7

100—124

125—149

150—174

*175

26.8 24.1 25.1 24.9

25.0 21.1 22.9 23.1

25.8 19.8 20.8 21.6

22.1 18.4 20.6 19.7

Source: Bosker and Hox (1996).

In the table the higher the school score, the higher the percentage of underprivileged pupils. The maximum score is 190, indicating a 100% ‘‘non-white’’ school with all pupils stemming from ethnic-minority families and with parents having low levels of education. The minimum score is 100 for schools in which all parents have at least intermediate levels of education (no matter whether these are ethnic minority or indigenous families). What is most notable about the data in Table 3 is, that money given to schools for special purposes, namely helping underprivileged pupils, is more or less ‘‘swamped’’ by the schools. It is used for a general reduction of class size in all grades, rather than in meeting pupils’ ‘‘special needs’’. In the discussion section we will revisit this issue. The statistical meta-analysis of class size and its effects on pupil achievement by Glass and Smith (1978) has usually been the starting point for an empirical discussion on class size issues. Smith and Glass (1980) demonstrated that this effect might well be mediated by instructional differences between small and large classes and by noncognitive effects of class size as well (see also, Glass et al., 1982). The theoretical reasoning behind this supposition can be found in the well-known Carroll model, in which time and quality of instruction are seen as the predominant facilitators of pupil learning. Smaller classes provide the opportunity for more individualized instruction and help during practice. For a teacher acting rationally it is worthwhile to spend more time on these activities with decreasing class size, as Correa (1993) shows (by using optimization techniques). Engaged learning time is also supposed to increase, because teachers in smaller classes have more opportunities to monitor individual pupils closely. It also is likely, as Blatchford and Mortimore (1994) argue and Smith and Glass (1980) empirically demonstrate, that quality of instruction increases as well when class size decreases. In smaller classes teachers more often provide advance organizers, engage in regular assessment of pupil progress, and provide positive feedback. It should be clear, however, that, contrary to the time factor, these aspects of quality of instruction are not directly related to class size. What may be more directly related is the appropriateness of instruction, i.e., gearing instruction to individual pupil needs (Stringfield and Slavin, 1992). Hill (1998) uses the term ‘‘focused instruction’’ in this

768

R.J. Bosker/Int. J. Educ. Res. 29 (1998) 763—778

respect, meaning that instruction should be focused to match prior achievement levels of individual pupils. Another crucial element that may mediate class size effects is high standards and the accompanying expectation that pupils can achieve these standards. This may be an alternative explanation of the effects of class size in the experimental study, STAR (Finn and Achilles, 1990; Word, 1990). Since teachers had to prove the point that a reduction in class size would boost pupil achievement they may have imposed higher standards on their pupils. In fact, these teachers may have improved on the other instructional characteristics for this reason as well (which would be an example of the Hawthorne effect). Indirect effects of class size on pupil achievement may occur because of an improvement of teacher’s working conditions. There has not been much empirical evidence collected on this, however, since the meta-analysis of Smith and Glass of 1980, although a teacher opinion poll in the UK suggests that this still is the case (Bennett, 1996). The results of a large scale Dutch study into the relation between class size and pupil achievement, to be presented next, does not shed light on these intermediate mechanisms. Having presented the theoretical frameworks, however, we are in a position to come back to these when discussing the results of the study.

3. An observational study into class size effects In order to study the associations between class size and pupil achievement use is made of a large scale cohort study called the PRIMA-cohort (for details see Driessen et al. (1996); Jungbluth and Meijnen (1994); Ledoux and Overmaat (1996)). This is a multi-purpose study containing data in addition to class size and on pupil achievement. 3.1. Design The main sample of the PRIMA-cohort consists of 416 randomly sampled primary schools. Within these schools for four groups of pupils (grades 2, 4, 6, and 8) data were gathered on IQ (but not for grade 2 pupils), gender, socio-ethnic status, arithmetic and language achievement, and well-being at schools (as rated by the teacher for grades 2 and 4, and for pupils in grade 6 and 8 a self-completion questionnaire). The variable that deserves some specific attention is socio-ethnic status. This variable has an ordinal scale that runs as follows (from high to low): 5: pupils whose parents are born in the Mediterranean area, Surinam, the Dutch Antilles, or whose parents came in as asylum seekers. Extra condition: the parents have at maximum a junior vocational certificate. 4: pupils whose parents have to travel to make a living (showmen), 3: bargemen’s children, 2: pupils whose parents have at most a certificate for junior vocational education, and that do not belong to group 5, 1: other pupils.

R.J. Bosker/Int. J. Educ. Res. 29 (1998) 763—778

769

Arithmetic and language achievement were measured using scales that were constructed on the basis of Item Response Theory. The tests were administered in October and November for grades 4 and higher, and shortly after Christmas in grade 2. Covariates at the teacher/classroom level were: teacher experience, single-grade or multi-grade, full-time or part-time teaching job, job satisfaction, and efficacy. Classroom contextual characteristics serving as covariates in the analysis included: percentage of girls, mean IQ, and mean socio-ethnic status. The last contextual variable is needed as a covariate in the design, since in Dutch primary schools staffing and resourcing of schools not only depends on the number of pupils, but also on their socio-ethnic status. Pupils from disadvantaged groups therefore are weighted as: group 5: 1.90; group 4: 1.70; group 3: 1.40; group 2: 1.25; group 1: 1.00. Therefore, as was described earlier in this chapter, schools with a high percentage of underprivileged pupils have classes that are regularly smaller, in which case one might erroneously conclude that with increasing class size achievement levels rise. Finally, class size was measured for the class the cohort pupils were in during the previous school year (i.e., when pupils were in grades 1, 3, 5 and 7, respectively). This information was gathered in a teacher questionnaire, asking this in retrospect (more than half a year later). Since class size effects in reality may not be linear, the variable is made polytomous by creating as categories 5—9, 10—14, 15—19, 20—24, 25—29, 30—34 and 35—39. In the analysis Helmert-contrasts are used, where the effect of each class size category is compared with effects of all smaller class sizes combined. Although the data are observational, not experimental, treating the hierarchical structure of the data by using multi-level statistical models and by including the covariates mentioned in the analysis, it is possible to make causal inferences on class size effects (cf, Raudenbush and Willms, 1996). The only serious drawback is that at the end of grade 2, 4, 6 and 8, respectively, the achievement of pupils is of course an accumulation of all previous learning experiences in all earlier grades, and thus contains effects of varying class sizes. For this reason, especially the grade 2 results are seen as most interesting (since these are less confounded) with respect to the estimation of class size effects. The multi-level statistical model applied can be seen as a regression model, taking into account, however, the clustered structure of the data, and the different sample sizes at the pupil-, class-, and school-level (c.f., Goldstein, 1995). Since there were no IQ measures for the grade 2 pupils, separate multi-level regression models were estimated for grade 2 and for grade 4 and up. In the grade 2 analysis there are two levels: school/classroom and pupils, with the school-level being contaminated with the classroom-level (there is only one grade 2 class per school). In the analyses for grade 4, 6, and 8, three levels are distinguished: school, grade/classroom, and pupil. Moreover, the grade that the pupils are in is included as an extra covariate in the analysis. In order to be able to detect possible cross-level interactions between class size and pupil characteristics such as socio-ethnic status, sex, and IQ, a series of additional multi-level statistical models was fitted to estimate the effects of the relevant cross-products (e.g., class size * IQ).

770

R.J. Bosker/Int. J. Educ. Res. 29 (1998) 763—778

Table 4 Relation between class size and arithmetic achievement (main effects, standard errors in brackets) Grade 2

Grade 4, 6, 8

Number of schools Number of classes Number of pupils

70 70 1124

327 766 12336

Mean Variance

892.70 4291.00

1122.20 7036.00

Intercept Grade IQ Socio-ethnic Status Sex Class mean IQ Class mean SES Class mean sex Experience Full-time/part-time Job satisfaction Efficacy Sex of teacher Single- vs. multi-grade Class size 5—9 Class size 10—14 Class size 15—19 Class size 20—24 Class size 25—29 Class size 30—34 Class size 35—39

818.00 — — !15.55 11.25 — !1.94 5.35 0.78 !5.13 4.78 14.08 !2.36 5.23 0.00 13.77 !16.81 !2.27 !14.07 !8.30 !10.81

783.40 39.13 4.20 !5.70 !11.80 0.69 0.15 3.02 0.26 2.80 !0.46 !1.14 !0.51 1.74 0.00 5.94 3.18 5.47 4.63 7.36 1.14

Residual variance Variance accounted for by class size

3673.80 16%

(2.11) (3.35) (5.11) (27.43) (0.49) (7.02) (7.61) (8.61) (15.30) (10.30) (10.44) (11.73) (9.96) (7.45) (7.78) (8.30)

(0.56) (0.09) (0.46) (0.82) (0.44) (1.42) (5.88) (0.09) (1.50) (1.42) (1.99) (1.79) (1.62) (4.69) (3.42) (2.63) (2.31) (2.30) (3.03)

2302.90 1%

Significant at a(0.05 two-tailed. Significant at a(0.10 two-tailed.

4. Results Table 4 contains the results of the analyses with respect to class size effects on arithmetic achievement. The first blocks of Table 4 contain some descriptives: sample size, mean, and variance (to keep it simple only the total variance is given, although the variance was of course decomposed). Then in the third block the estimated regression coefficients (with the accompanying standard errors in brackets) are presented. With respect to the grade 2 results, only two of the covariates (socio-ethnic status and sex) have a significant effect. Nevertheless, the other covariates are retained

R.J. Bosker/Int. J. Educ. Res. 29 (1998) 763—778

771

in the model, if only to remind us that the potential effects of, for example, teacher experience are controlled. The grade 2 class size effects show up when classes become larger than 20 pupils: arithmetic achievement of pupils in classes of that size lag somewhat behind. The variance accounted for by the class size variable is 16% in grade 2. The results of the analyses for grade 4, 6, and 8 are contained in the last two columns. In this case six covariates appear to have a significant relationship with arithmetic achievement: grade, IQ, socio-ethnic status, sex, teacher’s experience, and whether the teacher holds a full-time or part-time job. The class size effects show an unexpected pattern: all signs are positive, indicating that students in larger classes do better than those in smaller classes. However, the variance accounted for by the class size variable amounts to only one percent. Table 5 contains the results with respect to the estimated association between class size and language achievement. Once again the first blocks of the table contain some descriptive statistics: sample sizes, means, and variances. The results of the analysis for grade 2 show that only three of the covariates (socio-ethnic status, sex, and teacher efficacy) have a significant association with language achievement. As was the case in the analysis on arithmetic and class size, class size appears to have a negative relationship with achievement when classes contain more than 20 pupils. And there is a further extra (negative) coefficient when classes become larger than 30 pupils. For grade 2 class size accounts for 13% of unique variation in language achievement. The results for grade 4 and higer indicate that six of the covariates appear to have a significant relationship with arithmetic achievement: grade, IQ, socio-ethnic status, sex, class mean socio-ethnic status, and class mean sex. Class size has a negative association with language achievement when classes become larger than 35 pupils. In order to gain more insight into the meaning of the relationship between class size and pupil achievement, effect sizes are calculated. The effects of the various class sizes can be made be more insightful by calculating effects using the estimated Helmert contrast coefficients. For grade 2 classes of size 5 to 9, for example, the estimated effect for language achievement is: ((!1)*(!3.15))#((!1/2)*(!2.15))#((!1/3)*(!5.89))#((!1/4)*(!6.98) #((!1/5)*(!4.62))#((!1/6)*(!9.76))"10.484, whereas for a class of 35 pupils or more it is: ((5/6)*(!9.76))"!8.13. To transform effects into effect sizes, the estimated effects for each class size category (as deviations form the average effect of classes of size 20—24) are divided by the square root of the residual variance. Table 6 contains the estimated effect sizes for arithmetic achievement. The largest positive effect size for grade 2 (0.416) is found for classes containing 10—14 pupils, but since there are only a few of them this is statistically non-significant. Statistically significant effect sizes appear when classes contain more than 25 pupils. The effect sizes are !0.252 for class size 25—29, !0.185 for class size 30—34, and !0.256 for class size 35—39. Whether or not these effect sizes are important will be discussed later.

772

R.J. Bosker/Int. J. Educ. Res. 29 (1998) 763—778

Table 5 Relation between class size and language achievement (main effects, standard errors in brackets) Grade 2 Number of schools Number of classes Number of pupils

70 70 1124

Mean Variance

968.70 1145.50

Intercept Grade IQ Socio-ethnic status Sex Class mean IQ Class mean SES Class mean sex Experience Full-time/part-time Job satisfaction Efficacy Sex of teacher Single- vs. multi-grade Class size 5—9 Class size 10—14 Class size 15—19 Class size 20—24 Class size 25—29 Class size 30—34 Class size 35—39

943.40 — — !7.97 8.08 — 3.68 !1.07 !0.08 2.62 !0.44 9.75 !2.30 4.50 0.00 !3.15 !2.15 !5.89 !6.98 !4.62 !9.76

Residual variance Variance accounted for by class size

944.90 13%

Grade 4, 6, 8 327 766 12336 1073.00 2588.00

(1.12) (1.78) (2.30) (12.07) (0.22) (3.09) (3.32) (3.70) (6.91) (4.44) (4.70) (5.31) (4.38) (3.35) (3.49) (3.70)

877.60 22.30 2.17 !6.89 2.10 0.68 !2.57 !2.22 0.09 0.68 1.04 0.07 0.03 0.79 0.00 !3.58 !1.94 !1.03 !2.18 !0.60 !3.34

(0.33) (0.06) (0.32) (0.57) (0.27) (0.90) (3.54) (0.06) (0.92) (0.97) (1.21) (1.09) (1.00) (3.04) (2.19) (1.69) (1.46) (1.43) (1.83)

1035.18 0%

Significant at a(0.05 two-tailed.  Significant at a(0.10 two-tailed.

With respect to grade 4 and higher the largest (albeit statistically non-significant) effect size shows up for classes containing lower than 10 pupils and it is negative (!0.309). The largest positive effect size (0.094) is that for classes of size 30—34, and classes of size 35—39 have a small (but statistically significant) negative effect size of !0.032. Table 7 contains the estimated effect sizes for language achievement. Positive, but statistically non-significant, effect sizes in grade 2 show up for classes with less than twenty pupils. Class sizes of 25—29 and 35—39 have statistically significant negative effect sizes of !0.092 and !0.265, respectively. For grade 4, 6 and 8 the only statistically significant effect size is found to be !0.109 (for class size 35—39).

773

R.J. Bosker/Int. J. Educ. Res. 29 (1998) 763—778 Table 6 Class size and arithmetic achievement; effect sizes Class size

Grade 2

Grade 4, 6, 8

5—9 10—14 15—19 20—24 25—29 30—34 35—39

!0.038 0.416 !0.227 0.000 !0.252 !0.185 !0.256

!0.309 !0.062 !0.086 0.000 0.006 0.094 !0.032

Significant at a(0.10 two-tailed. Table 7 Class size and language achievement; effect sizes Class size

Grade 2

Grade 4, 6, 8

5—9 10—14 15—19 20—24 25—29 30—34 35—39

0.394 0.189 0.186 0.000 !0.092 !0.044 !0.265

0.184 !0.038 !0.017 0.000 !0.052 !0.007 !0.109

Significant at a(0.10 two-tailed.

The results of the multi-level analyses in which interaction effects between class size, on the one hand, and socio-ethnic status, sex, and IQ, on the other, on pupil achievement were estimated are presented in a summary table (see Table 8). Of the four potential interaction effects of socio-ethnic status and class size on pupil achievement, only one is statistically significant: with respect to arithmetic achievement in grade 2 the effect of socio-ethnic status becomes smaller with increasing class size. Phrased differently, when class sizes become smaller the achievement gap between underprivileged and other pupils becomes bigger. The same pattern occurs for interaction effects of sex and IQ with class size on language achievement in grades 4, 6 and 8: achievement gaps increase as classes become smaller. The only interaction effect that indicates smaller gaps with decreasing class size concerns IQ in grades 4, 6 and 8.

5. Summary and discussion Class sizes in Dutch primary schools are 25.7 on average, which is, compared to other countries, quite high. Moreover the pupil-to-teacher ratio is relatively high, with resources being made available to schools being relatively scarce. One of the reasons that class sizes in Dutch primary schools can become very large is that there is no

774

R.J. Bosker/Int. J. Educ. Res. 29 (1998) 763—778

Table 8 Class size and interaction effects; summary table Grade Language

Socio-ethnic status Sex IQ

Arithmetic

2

4,6,8

2

4,6,8

1 0 NA

1 — —

— 0 NA

0 1 #

NA"not applicable, 0: there is no interaction effect, 1: there is a statistically significant interaction effect but the model does not fit to the data significantly better than the model without the interaction, #: there is an interaction effect indicating that the effect of the variable involved increases with increasing class size, !: there is an interaction effect indicating that the effect of the variable involved decreases with increasing class size.

maximum imposed on schools as there is in some other countries. If schools are given extra funds (for instance in the context of the educational priorities program), it appears that a large portion of these funds is used for a general class size reduction in all grades with no priorities being set for the lower grades. When looking into the association between class size and pupil achievement the most negative relationships show up in grade 2 classes. For arithmetic achievement the effect size is in the area of !0.20 when classes are larger than 25 pupils (as compared to the average achievement level in classes of 20—24). For language achievement in grade 2 the results become clear-cut (!0.27) if classes contain more than 35 pupils, although a negative effect for classes of size 25—29 appears as well (!0.10). In Cohen’s phrasing, the effect sizes of !0.20 might be referred to as being small, although it should be added immediately, that Cohen cautions the use of the word small as a synonym for irrelevant (Cohen, 1988). That this caution applies to these effect sizes may be illustrated by comparing the effects of socio-ethnic status on pupil achievement with the class size effects. The value !0.20 is approximately the same as the (standardized) achievement gap between indigenous pupils whose parents have at most a certificate for junior vocational education, and other indigenous pupils. This consistent gap has been one of the reasons to launch the national educational priorities program that has been in action during the last decade. Consequently, resourcing schemes have been adjusted so that each four pupils from indigenous low socio-ethnic status families count for five in the funding formula. It should be added, however, that an effect size of class size with respect to language achievement in grade 2 that is of comparable magnitude only shows up for classes containing 35 pupils or more. The associations between class size and pupil achievement in grades 4, 6 and 8 are less pronounced, although there are some indications that pupils in classes with 35 or more pupils lag (slightly) behind other pupils.

R.J. Bosker/Int. J. Educ. Res. 29 (1998) 763—778

775

Interaction effects between socio-ethnic status, sex, and IQ, on the one hand, and class size, on the other, appear to have inconsistent patterns. Quite surprisingly, in three of ten cases, achievement gaps appeared to be widening with decreasing class size.

6. Theoretical considerations Resourcing is seen as the major cause of large classes in Dutch primary schools. The pupil-to-teacher ratio allows formally for somewhat smaller classes (3 pupils per class less), but this would be hard to realize since part of the funding has to be used for managerial leadership activities, remedial teaching, and physical education. In addition, schools that are in the financial position to implement staffing policies with respect to class size in specific grades generally refrain from doing so. Theoretical considerations, however, clearly point to the importance of class size for young children, since they are more dependent on individual interaction with the teacher than older pupils in order to make good progress. This is especially true for pupils at risk. Slavin (1989) contends that class size only really matters if classes become as small as containing one pupil only. For this reason he advocates the Reading Recovery approach (i.e. individualized instruction for pupils lagging behind in their reading performance) for half an hour a day, as long as it takes (on average two to three months) to get the pupil back on track. If classes contain many underprivileged pupils, the Reading Recovery program should be part of a total approach, called Success for All (Madden et al., 1993; Slavin, 1996), in which pre-school learning activities, a specifically designed reading curriculum, co-operative learning, parental involvement, regular assessments, and team development and team support are included as key elements. Achilles et al. (1993) empirically demonstrated that low socio-economic status pupils especially may benefit from a class size reduction. That is not what the Dutch results show. In retrospect the mediating factors may have been time and elements of quality of instruction. But appropriateness of instruction (gearing instruction to differences in prior achievement) may have been lacking in the Dutch case, although it was feasible to achieve this in the smaller classes. One reason is that schools and teachers may be somewhat reluctant to prioritize. Their system of beliefs and understandings may not allow them to do so until the beneficial effects of putting extra efforts into low achievers have been empirically demonstrated. Because of this Bosker and Meijnen (1996) have proposed a national experiment with different implementations of the use of extra funding in schools, including substantial class size reductions in the earliest grades, one-to-one tutoring for pupils at risk, or a scheme in which class sizes steadily increase with increasing pupil age (implying an increase in class size in the upper grades as compared to the current situation), or flexible class sizes depending on the part of the curriculum to be covered (with small classes for basic subjects, see, for example, Tomlinson, 1989; Odden, 1990).

776

R.J. Bosker/Int. J. Educ. Res. 29 (1998) 763—778

7. Practical implications Practical implications of the Dutch study and the advice being (partially) based on it are that a general class size reduction has been proposed (and being adhered to by the major political parties) for grades one to four. At this time the Ministry of Education, supported by the international data showing that there is under-investment in Dutch education and that money apparently matters (e.g. Wenglinsky, 1997), has developed different scenarios with varying extra investments in primary education, ranging from a resourcing scheme in which a class size reduction of two pupils will be made feasible in the first four grades, to a scheme in which classes in these grades will contain eight pupils less (Ministry of Education, 1997). The proposal contains all the necessary ingredients, including building facilities (that should, however, be arranged by the municipalities), recruiting of more students for teacher education, and the mobilisation of the ‘‘silent reserve’’ (temporarily retired teachers). Although suggestions have been made (Bosker and Meijnen, 1996) to start large scale experiments with class size, these are not contained in the ministerial proposal. Because of the freedom to determine the content of instruction, the proposal refrains from imposing a maximum class size. However, the funding assigned to schools will be ear-marked for the lower grades. In addition, standards for grade 4 achievement will be formulated, although it is not clear whether these are supposed to be minimum standards to be reached by virtually all pupils and what will happen if they are not achieved by a school. Nevertheless, pupil-to-teacher ratios will decrease during the next years and the knowledge base on the effects of class size reductions has been helpful in underpinning this policy. Studying how schools and teachers will take up or fail to take up the opportunities provided, and what the concomitant changes in pupil achievement will be should further our understanding of the relationship between class size and pupil achievement. Also evaluating the implementation of one of the investment scenarios, may lead to an answer to the question what the returns (in terms of achievement levels, the gap between low and high achievers, referral to special education, repeating of grades) of extra investments in education may be.

References Achilles, C. M., Nye, B. A., Zaharias, J. B., & Fulton, B. D. (1993). Creating successful schools for all children: a proven step. Journal of School ¸eadership, 3, 606—621. Bennett, N. (1996). Class size in primary schools: perceptions of headteachers, chairs of governors, teachers and parents. British Educational Research Journal, 22(1), 33—55. Blatchford, P., & Mortimore, P. (1994). The issue of class size for young children in schools: what can we learn from research? Oxford Review of Education, 20(1), 411—428. Bosker, R. J., & Meijnen, G. W. (1996). Het nationaal OK-experiment: organisatie van klassen. Enschede/Amsterdam: Universiteit Twente/Universiteit van Amsterdam. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale: Lawrence Erlbaum. CEB, Commissie Evaluatie Basisonderwijs (1994). Zicht op kwaliteit. Evaluatie van het basisonderwijs. Den Haag: SDU.

R.J. Bosker/Int. J. Educ. Res. 29 (1998) 763—778

777

CKAG, Commissie Kwalitatieve Aspecten van Groepsgrootte (1996). Klassenverkleining. Den Haag: SDU. Correa, H. (1993). An economic analysis of class size and achievement in education. Education Economics, 1(2), 129—135. Driessen, G., Jungbluth, P., van Langen, A., & Vierke, H. (1996). Prima-cohortonderzoek. ¹echnische rapportage I¹S-deel. Nijmegem: ITS. Finn, J. D., & Achilles, C. M. (1990). Answers and questions about class size: A statewide experiment. American Educational Research Journal, 27(3), 557—577. Gelder, L. van (1959). De klassegrootte bij het lager onderwijs. Pedagogische Studie( n, 36, 498—519. Glass, G. V., & Smith, M. L. (1978). Meta-analysis of research on the relationship of class size and achievement. San Francisco: Far West Laboratory. Glass, G. V., Cahen, L. S., Smith, M. L., & Filby, N. N. (1982). School class size. London: Sage. Goldstein, H. (1995). Multilevel statistical models. London: Edward Arnold. Hill, P. (1998). Shaking the foundations: empirically driven school reform. In: School Effectiveness and School Improvement (to appear). Inspectie van het Onderwijs (1995). Groepsgrootte in het basisonderwijs. Den Haag: SDU. Inspectie van het Onderwijs (1996). Schoolkenmerken en groepsgrootte in het basis-onderwijs. Utrecht: Inspectie van het Onderwijs. Jungbluth, P., & Meijnen, G. W. (1994). Opzet Prima-cohort onderzoek. Nijmegen/Amsterdam: ITS/SCO. Madden, N. A., Slavin, R. E., Karweit, N. L., Dolan, L., & Wasik, B. A. (1993). Success for all: Longitudinal effects of a restructuring program for inner city elementary schools. American Educational Research Journal, 30(2), 123—148. Ministerie van Onderwijs, Cultuur en Wetenschappen (1997). Groepsgrootte en kwaliteit: investeren in de onderbouw van de basisschool. Den Haag: SDU. Odden, A. (1990). Class size and student achievement: research-based policy alternatives. Educational Evaluation and Policy Analysis, 12(2), 213—227. Overmaat, M., & Ledoux, G. (1996). School- en klaskenmerken basisonderwijs en speciaal onderwijs. Ubbergen: SCO-Kohnstamm/Tandem Felix. Polder, K.-J., & Gijtenbeek, J. (1995). Inkomsten en besteding van basisscholen. Survey naar de werking van ¸ondo- en FBS-bekostiging op katholieke basisscholen. Amsterdam: SCO-Kohnstamm Instituut. Raudenbush, S. W., & Willms, J. D. (1996). The estimation of school effects. Journal of Educational and Behavioral Statistics, 20(4), 307—335. RISE, (1994). Class size regulation. A dossier of international comparisons. London: The Research and Information on State Education Trust. Ross, K. N., & Postlethwaite, T. N. (1994). Differences among countries in school resources and achievement. In W. B. Elley (Ed.), ¹he IEA study of reading literacy: Achievement and instruction in thirty-two school systems. Oxford: Elsevier. Slavin, R. E. (1989). Achievement effects of substantial reductions in class size. In R. E. Slavin (Ed.), School and classroom organization (pp. 247—257). Hillsdale, NJ: Lawrence Erlbaum. Slavin, R. E. (1996). Education for All. Lisse: Swets and Zeitlinger. Smith, M. L., & Glass, G. V. (1980). Meta-analysis of research on class size and its relationship to attitudes and instruction. American Educational Research Journal, 17(4), 419—433. Stringfield, S. C., and Slavin, R. E. (1992). A hierarchical longitudinal model for elementary school effects. In B. P. M. Creemers, & G. J. Reezigt (Eds.), Evaluation of effectiveness. Enschede/Groningen: ICO. Tomlinson, T. M. (1989). Class size and public policy: politics and panaceas. Educational Policy, 3(3), 261—273. Veenman, S. (1995). Cognitive and noncognitive effects of multigrade and multi-age classes: A best-evidence synthesis. Review of Educational Research, 65(4), 319—382. Veenman, S. (1996). Effects of multigrade and multi-age classes reconsidered. Review of Educational Research, 66(3), 323—340. Warries, E. (1985). Klassegrootte en onderwijskwaliteit. Enschede: Universiteit Twente. Wenglinsky, H. (1997). ¼hen money matters. Princeton: ETS. Word, E. (1990). Student/¹eacher Achievement Ratio (S¹AR). ¹ennessee’s K—3 Class Size Study. Washington: ERIC Report ED320 692.

778

R.J. Bosker/Int. J. Educ. Res. 29 (1998) 763—778

Roel J. Bosker is Professor of Education at the Faculty of Educational Science and Technology of the University of Twente (The Netherlands). He has published on substantive issues, such as school and teacher effectiveness, class size and student achievement, inequality of educational opportunities, school league tables, as well as on methodological issues, more notably simulation models, multilevel statistical models, and on evaluation research. Moreover he is involved in international comparative educational research projects carried out for the Organization for Economic Cooperation and Development.