Is ability grouping beneficial or detrimental to Japanese ESP students' English language proficiency development?

Is ability grouping beneficial or detrimental to Japanese ESP students' English language proficiency development?

English for Specific Purposes 49 (2018) 39–48 Contents lists available at ScienceDirect English for Specific Purposes journal homepage: http://ees.els...

432KB Sizes 1 Downloads 60 Views

English for Specific Purposes 49 (2018) 39–48

Contents lists available at ScienceDirect

English for Specific Purposes journal homepage: http://ees.elsevier.com/esp/default.asp

From the Editors

Is ability grouping beneficial or detrimental to Japanese ESP students’ English language proficiency development? Chris Sheppard a, *, Emmanuel Manalo b, Marcus Henning c a

Faculty of Science and Engineering, Waseda University, 3-4-1 Ohkubo, Shinjuku, Tokyo, 169-0071, Japan Graduate School of Education, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto City, Kyoto, 606-8317, Japan c Centre for Medical and Health Sciences Education, Faculty of Medical and Health Sciences, University of Auckland, Room 12.025, Grafton, Private Bag 92019, Auckland 1142, New Zealand b

a r t i c l e i n f o

a b s t r a c t

Article history:

This study investigated whether ability grouping was beneficial to Japanese university science and engineering students who had taken compulsory ESP (English for specific purposes) courses. By examining the change in their standardized general proficiency test scores (using the Test of English for International Communication or TOEIC) and using data from across six years of enrollment (a cohort of 13,000 students), the performance of students who had been placed into a broader band of English ability (i.e., less similar language proficiency) was compared with the performance of students who had been placed into a narrower band (i.e., more similar language proficiency). Findings showed that ability grouping benefited less proficient learners. By contrast, ability grouping did not appear to be beneficial for more proficient learners. Possible reasons for these findings are discussed, including organizational and instructional features of the program of study the students were taking, and the likely effects of ability grouping on students’ academic selfconcept. Ó 2017 Elsevier Ltd. All rights reserved.

Keywords: Ability grouping Placement Tracking Second language program design ESP Academic self-concept

1. Introduction Ability grouping is the very common practice of placing students into groups or classes according to their ability in a subject or skill, such as their language proficiency (Richards & Schmidt, 2002). The practice is also referred to as “streaming” (see, e.g., Cross, 1988), “tracking” (see, e.g., Hanushek & Wößmann, 2006; Harklau, 1994; Oakes, Gamoran, & Page, 1992), or “setting” (see, e.g., Boaler, 1997). In second language education, ability grouping often takes one of two forms. The first is where students are placed into classes which have different goals, tailored to the ability level of the students. Language schools and university preparatory classes are examples of programs which use this kind of ability grouping. The second form is where students are placed into classes of different levels, though these classes all have the same language learning goals. Curriculum design should be an evidence based process (Nation & Macalister, 2010) and, even though it is a widespread practice, there has been extensive debate about the effectiveness of ability grouping in general education (i.e., Boaler, Wiliam, & Brown, 2000; Kohli, 2014). Furthermore, despite its pervasiveness in high schools, universities, and other language institutions (Nunan, 1988), the use, and effectiveness, of ability grouping in second language teaching and learning has not been adequately investigated.

* Corresponding author. E-mail addresses: [email protected] (C. Sheppard), [email protected] (E. Manalo), [email protected] (M. Henning). https://doi.org/10.1016/j.esp.2017.10.002 0889-4906/Ó 2017 Elsevier Ltd. All rights reserved.

40

C. Sheppard et al. / English for Specific Purposes 49 (2018) 39–48

2. Literature review There are numerous arguments in support of ability grouping. Many educators find that ability-grouped classes are easier to organize than mixed ability classes (see, e.g., Watts, 1985). Teachers believe they are more able to meet the particular learning needs of each group. They are able to adjust the method, materials and pace of instruction to something which is more appropriate for the development of the students’ language skills (Kim, 2012). For students, there is less stress because they are only required to work at their own level, which may lead to a reduction in foreign language learning anxiety (Luo & Tsai, 2002) which is believed to be facilitative of language learning (Horwitz, Horwitz, & Cope, 1986). Ability grouping can also lead to improved motivation. Baker (1998) reasoned that providing learning environments and supporting infrastructures that are conducive to successful learning activities can encourage students’ motivation. There are, however, just as many arguments against, and criticism leveled at, the practice of ability grouping. Kim (2012) found that the practice of ability grouping can lead to more work for teachers, as they are required to adjust materials for different levels. Also, many teachers indicated that they avoid being assigned to lower level classes due to the stress involved, because grouping poorer students together often results in classroom management and discipline difficulties. Many of the arguments focus on the detrimental effects of ability grouping on students in the lower levels. Kim (2012) discussed feelings of inferiority and embarrassment from the social stigma of being assigned to lower level classes. Luo and Tsai (2002) also discussed the possibility that, because of their perceived low ability, these students may develop “learned helplessness” – that is, the students may accept the notion that they are of lower level and therefore act accordingly. There are also claims that ability grouping has a negative, rather than positive, effect on motivation. Ames (1992) suggested that ability grouping can be detrimental because all students compare themselves to other students, possibly leading to an increased negative self-concept. Ability grouping has also been criticized at the social level. In the United States, where general-education lower tracks are populated by larger proportions of limited English proficiency students, placement into these lower tracks appears to limit chances of future academic success. Callahan (2005), for example, found that class placement was a better predictor of attainment than was English proficiency. Once placed into a lower track, it is often very difficult to receive the instruction necessary for university entrance, thus, limiting opportunity. Similarly, Byean (2015) strongly criticized tracking as a part of the “neoliberalization” of the Korean education system. She argued that it limits the future opportunities of lower track students by taking away the possibility of becoming proficient in English. Perhaps more importantly, although in general education L1 contexts, the results of meta-studies on the effectiveness of ability grouping have also been mixed. Slavin (1987), while finding some benefits in grouping plans that involve cross-grade assignment in only one or two subjects, concluded that there is no evidence to support the practice of assigning students to self-contained classes on the basis of their supposed abilities. In a cross-country investigation, Robert (2010) analyzed the 2006 PISA (Program for International Student Assessment) survey data from 23 OECD (Organization for Economic Cooperation and Development) countries and found no evidence that ability grouping contributes to improving student achievement. In the Singapore context, Chang (1990) reported indications of some detrimental outcomes for lower track students – with evidence of those children using inappropriate learning strategies in language and mathematics courses. In contrast to these results, other meta-analyses (also in general education contexts) have shown a small, positive effect for ability grouping. Kulik and Kulik (1982) found “a small but significant” effect for ability grouping on achievement in exams. The average effect size1 was one-tenth of a standard deviation. A further broader meta-analysis (Kulik & Kulik, 1987), which included findings from the elementary school level, found a similar but slightly lower effect size of 0.06. Kulik (1992) later found, however, a much larger effect for talented students who were ability grouped over those talented students who were not (d ¼ 0.33). A more recent summary of meta-analysis on ability grouping by Hattie (2009), which collated 14 meta-studies examining 500 studies, found comparable results to Kulik and Kulik, with an effect size of 0.11 for the benefit of ability grouping. Hattie (2009) also found that the level of ability group had an influence on the effect size obtained (high level d ¼ 0.14, low level d ¼ 0.09, and mid-level d ¼ 0.03). Despite the evidence for the minimal educational benefits of ability grouping, it is widely practiced and often considered necessary in the field of language teaching (see, e.g., Cross, 1988) and is a common practice in the second language classes of secondary schools throughout Asia, such as in Hong Kong (Cheung & Rudowicz, 2003), Singapore (Liu, Wang, & Parkins, 2005), Korea (Kim, 2012), and China (Li et al., 2016). Interestingly, its practice is prohibited by the Taiwanese government (Liu & Yang, 2016), even though Taiwan also has important entrance examinations, which effectively place students into schools based on their ability. Ability grouping is also a common practice at universities where language course credits are required for graduation (e.g., Liu, 2008; Luo, 2005). In Japan, ability grouping is commonly implemented. However, its perceived connection to inequality leads to the practice being frowned upon. Despite this, students are ability grouped through entrance examinations at the institutional level

1 The effect size is the quantitative measure of the size of the relationship between two variables (Kelly & Preacher, 2012). One common measure of effect size is Cohen’s d (Cohen, 1988). This is a representation of the difference between two means, as a proportion of the pooled standard deviation of the data. Hence, a Cohen’s d of 1 is a difference of 1 standard deviation. Cohen (1988) has somewhat arbitrarily determined an effect size of d < 0.2 to be small, an effect size of d ¼ 0.5 to be medium, and an effect size of d > 0.8 to be large.

C. Sheppard et al. / English for Specific Purposes 49 (2018) 39–48

41

(LeTendre, Hofer, & Shimizu, 2003). According to Butler and Iino (2005), students’ English scores in entrance examinations in Japanese universities are believed to be indicative of their analytical and logical thinking skills, and are therefore usually highly regarded. These same English entrance examination scores and/or scores in English proficiency tests like the TOEFL (Test of English as a Foreign Language; Educational Testing Service, 2017) are used not just in making decisions about acceptance or otherwise in the institutions and the courses therein, but also in ability grouping students for any compulsory English courses. Thus, in most cases, students who attain high scores will be placed in classes with others who similarly score high in English, and those who score low – but still meet the required criteria for acceptance – will be placed in classes with other low scorers. Irrespective of the widespread use of ability grouping in assigning Japanese university students to their English language classes, the present authors were unable to locate any research studies to verify its efficacy in helping students achieve English language learning goals. Instead, many research studies point to grouped students’ low levels of motivation and dissatisfaction with their English learning experiences. Lafaye and Tsuda (2002), for example, surveyed over 500 university students and found that though they believed in the usefulness of learning English, the majority did not like English study and were unhappy with the levels of proficiency they had managed to achieve. As mentioned above, one of the most common criticisms of ability grouping is that it tends to favor the achievement of students in high ability groups while being detrimental to the achievement of those in low ability groups. If this criticism is applied to the practice of ability grouping in English language classes, it would suggest that students who score high in placement tests and are assigned to high ability groups would likely benefit more from their classes, while those who score low and are assigned to low ability groups would benefit less. While such an approach and its associated possible outcomes would appear non-egalitarian, it can be considered as being congruent with some voiced opinions in Japan that advanced and intensive English education should be reserved for select groups of people in order to cultivate the country’s competitiveness in the global environment (see, e.g., Hiraizumi & Watanabe, 1975; Suzuki, 1999). These views are much like those expressed in South Korea (Byean, 2015). Butler and Iino (2005) observed that the notion of developing only a select few “English specialists” in Japan is to some extent aligned with the promotion of “individualized education” by the Ministry of Education, Culture, Sports, Science, and Technology. Considering the arguments presented here, the uncertainty about the efficacy of ability grouping, its apparent pervasiveness in second language education, and the lack of research conducted at the tertiary level, it would appear crucial that research be conducted to address these issues. In this study, therefore, we sought an answer to the question of whether the practice of ability grouping ESP students into classes at tertiary level according to their English proficiency test scores makes a difference to the development of students’ proficiency in the English language. A related question we also addressed was whether the effects of ability grouping vary according to the English proficiency levels of ESP students. 3. Method 3.1. Participants The data were collected from students attending compulsory courses of an ESP language program at a science and engineering faculty of a large private university in Japan. They were enrolled in one of seventeen departments. The study’s cohort was comprised of the 13,348 students enrolled over the six years the study was conducted. Most of the students (approximately 60%, varying from year to year) in the program were selected through an entrance examination. The remainder entered through a recommendation system, whereby certain high schools that have an agreement with the university could recommend students for acceptance. As not all the students had to meet the standard set in the university’s English entrance examination, those who comprised the participants of this study had a broad range of English abilities as indicated by their TOEIC (Test of English for International Communication; Educational Testing Service, 2017) placement scores, which ranged from 10 to 990. The purpose of the program they were enrolled in was to develop their English language sufficiently to enable them to work as science or engineering researchers in the international community. All students were required to take two courses each semester of their first two years, comprising a total of eight semester-long compulsory courses. This study focused on the four first-year courses (see Table 1). The two Communication Strategies courses (CS 1 and 2) were run as consecutive 1semester courses, over the academic year, as were the Academic Lecture Comprehension courses (ALC 1 and 2). The Communication Strategies courses focused on developing speaking and listening fluency, knowledge about important communication structures, and application of those structures to promote effective communication with others. On the other

Table 1 First year courses included in the study. Semester

Name

First

Communication Strategies 1 (CS 1) Academic Lecture Comprehension 1 (ALC 1) Communication Strategies 2 (CS 2) Academic Lecture Comprehension 2 (ALC 2)

Second

42

C. Sheppard et al. / English for Specific Purposes 49 (2018) 39–48

hand, the Academic Lecture Comprehension courses focused on the development of skills in listening to lectures in class, taking structured notes, asking and answering questions, and writing summaries. All teachers followed the same relevant course curricula. They taught to the same goals, they used the same textbooks, and followed the same assessment procedures. The students had a broad range of English proficiency scores and they were placed into classes based on those scores, and on the university department they belonged to. During the period of this study, there were 17 departments in the faculty, and each department had day/s and class time slot/s assigned for their students. Students assigned to a particular time slot would then be placed into one of the classes available in that slot based on the language proficiency level indicated by their TOEIC placement score. 3.2. Ability grouping A reasonable measure of ability grouping is necessary to assist in investigating its effectiveness. The purpose of ability grouping is to reduce the range of ability in a group of students so that they are more similar, effectively reducing the statistical variance of the target ability in the group. On the other hand, in contexts where ability grouping is not used, the range of student ability is usually wider, increasing the statistical variance in ability. For the purposes of this study, the extent of ability grouping was determined by the standard deviation (a measure of variability) of the students’ TOEIC scores in the classes to which they were assigned. As noted above, all students in this program were ability grouped. However, there was a considerable amount of variability in grouping due to the number of classes being offered in particular time slots. Each course was taught in many classes, and the number of classes changed according to what time slot they were allocated; for example, period 3 on Monday had only two Communication Strategies classes, whereas period 1 on Wednesday had eleven. Time slots with fewer classes resulted in groups with more variability in student ability, whereas time slots with more classes resulted in groups with less variability in student ability. In this study, the standard deviations in the TOEIC placement scores for each of the classes, which reflect this variation in levels of language proficiency, ranged from 5.5 (very little variation, and thus “a high degree of homogeneity, and consequently, “a high degree of ability grouping”) to 139.5 (large variation, much less homogeneity, and consequently “a low degree of ability grouping”). In order to compare ability groups, students in the bottom quartile of standard deviation were defined as the “high-degree ability group” as they were placed in those classes which had a lower degree of TOEIC score variation. The students placed in classes in the upper quartile of standard deviation were defined as the “low-degree ability group”, as they were members of those classes which had a greater degree of TOEIC score variation. Table 2 shows the number of students from the cohort who completed both the placement and year-end TOEIC tests, and the resulting mean class standard deviation of TOEIC test scores for the low and high ability grouped students in the two first year courses. The upper and lower quartile groups each consisted of 2,847 students or one quarter of the total population (the 11,385 participants for whom there was complete data). 3.3. Ability grouping efficacy Improvement in TOEIC test scores was used as a measure of the efficacy of ability grouping. All students were required to take the TOEIC Institutional Program (IP) test twice: once at the beginning of the first year (their score here was used for sorting them into groups according to language proficiency) and once at the end of that first year. Improvement was calculated as the difference between those two test scores. The study was conducted over six years, and involved twelve administrations (two each year) of the TOEIC-IP tests. The scores used in this study were those reported by the Institute for International Business Communication, which operates the test in Japan, and were subject to the same rigor and quality as other administrations of TOEIC, to ensure the comparability of scores. 3.4. Data analysis In order to determine if there was an overall effect of placement into ability groups, the distributions of the data were appraised, followed by implementation of the most appropriate type of t-test. To determine if proficiency level was a covariate mediating any effects of ability grouping, analyses of covariance (ANCOVAs) were conducted so as to determine any possible interaction with language proficiency. Finally, regression analyses were used to determine the nature of any possible relationships. The regression method (i.e., linear, quadratic, cubic, or exponential) which best fit the data was selected. As is standard, a p < 0.05 threshold was used as the criterion for determining significance. Table 2 The mean standard distribution in TOEIC scores of high-degree and low-degree ability groups. Course

ALC CS

Low-degree ability group

High-degree ability group

N

Mean SD

N

Mean SD

2847 2847

56.20 57.89

2847 2847

16.61 14.69

C. Sheppard et al. / English for Specific Purposes 49 (2018) 39–48

43

Table 3 Wilcoxon Rank-Sums test results concerning the effect of ability grouping on TOEIC improvement. Course

W

p

Cohen’s d

5% and 95% Confidence Intervals

ALC CS

3303232 3116366

0.01 0.22

0.07 0.01

0.01 0.07

0.12 0.04

4. Results A preliminary investigation of the data showed the high and low-degree ability groups had unequal variance (Levene’s test (F (1, 5035) ¼ 7.63, p < 0.001)), and were not normally distributed (Anderson-Darling normality test (A ¼ 5.77. p < 0.001)); therefore, Wilcoxon rank sums tests2 were conducted (see Table 3). The comparison of the groups revealed that ability grouping had an effect for Academic Lecture Comprehension (W ¼ 3303232, p ¼ 0.01). The students assigned to classes with a high degree of ability grouping improved their TOEIC scores more than those in classes with a low degree of ability grouping. The effect size (d ¼ 0.07) was very small, but in the same order of similar studies (e.g., Hattie, 2009; Kulik & Kulik, 1987). By comparison, ability grouping had no significant effect on TOEIC score improvement in the other course, Communication Strategies (W ¼ 3116366, p ¼ 0.22). Further analysis was conducted to determine if students’ level of proficiency was interacting with any possible effect of ability grouping on improvements in their TOEIC scores. Examining the data revealed that the conditions for running a parametric ANCOVA were not met. Both assumptions of homogeneity of variance and homogeneity of regression slopes were not tenable for the data sets. Thus, robust ANCOVAs (20% trimmed means bootstrap-t, n ¼ 600; Wilcox, 2009) using the WRS2 package (Mair, Schoenbrodt, & Wilcox, 2015) in the statistical analysis software R (2015) were run. The TOEIC improvement score was the dependent variable and ability grouping was the independent variable. The placement TOEIC score was the covariant. The results of the robust ANCOVAs for Communication Strategies are shown in Table 4. A high degree of ability grouping was more effective for learners of lower proficiency (TOEIC scores < 385). This effect was reversed for more proficient learners (TOEIC score ¼ 540), where a lower degree of ability grouping was more effective. This result, however, was no longer apparent for learners with very high level proficiency, which could be due to a ceiling effect in the TOIEC score (the maximum score is 990). Also, ability grouping had no effect for learners with mid-level proficiency (TOEIC score ¼ 460). The results of the Robust ANCOVA were unsurprisingly very similar for Academic Lecture Comprehension (Table 5). A high-degree of ability grouping was more effective for lower proficiency students (TOEIC score < 445). By contrast, a lowdegree of ability grouping was more effective for higher proficiency students (TOEIC score > 555). The effect, however, was again no longer apparent for the highest-level learners (TOEIC score ¼ 990). Figure 1 summarizes these results for both courses. Cohen’s d (see footnote 1) was calculated between the mean improvement scores of high-degree and low-degree ability groupings and plotted at seventeen regular intervals along the placement score x-axis and a loess (locally weighted scatterplot smoothing) line (i.e., the regression line of best fit at multiple points in the plot) was calculated The shading indicates the 95% confidence interval of the loess line. For low proficiency learners, the maximum benefit for ability grouping attained was of a medium effect size (Cohen’s d ¼ 0.37). The maximum benefit of low ability grouping for higher proficiency learners was also of a medium effect size (Cohen’s d ¼ 0.34). For both classes, the effect was neutral at a TOEIC proficiency score between 478 and 517, which was just above average for the cohort. In choosing the optimal regression model, an inspection of the shape of the loess line in the graph (Figure 1) indicated that the relationships between variables of interest were likely to be cubic. The subsequent regression analyses determined the extent to which proficiency could predict the TOEIC improvement difference between high-degree and low-degree ability grouping. Table 6 shows that for both courses, the relationship was very strong. For ALC, 79% of the variance in the TOEIC improvement difference was accounted for by proficiency, F (3, 16) ¼ 24.73, p < 0.001. For CS, 75% of the variance was accounted for, F (3, 15) ¼ 18.78, p < 0.001. 5. Discussion The findings of this study indicate that there was a small overall effect of ability grouping on attainment (Cohen’s d ¼ 0.07) in one of the courses (Academic Lecture Comprehension). This finding is similar to that of other studies in general education which have found effects of comparable magnitudes (e.g., Hattie, 2009; Kulik & Kulik, 1987). The results also demonstrate that language proficiency moderates the effect of ability grouping. In the present study, ability groping was found to be effective for learners of lower proficiency, and detrimental to learners of higher proficiency. However, this result was opposite to that found in much of the ability group (tracking) research that has been conducted to date. In previous research, positive effects of ability grouping have been reported for higher proficiency students (Kulik, 1992),

2 In order to run the Student’s t-test, as with all other statistical tests, assumptions about the quality of the data being used must be met. In this case, the assumptions of equal variance, and of normally distributed data, were not met. This led to the selection of the Wilcoxon rank sum test as a viable alternative (Field, Miles, & Field, 2012).

44

C. Sheppard et al. / English for Specific Purposes 49 (2018) 39–48

Table 4 Robust ANCOVA results for CS TOEIC improvement. TOEIC placement scorea

160 385 460 540 990

N (Ability grouping)

Trimmed mean diff.

High

Low

115 1518 1725 1305 33

352 1691 1935 1634 62

28.87 6.36 0.89 11.53 4.56

Confidence (95%) Lower

Upper

1.70 1.02 7.94 19.71 23.20

56.03 13.75 6.16 3.35 14.08

Statistic

p-Value

2.73 2.22 0.32 3.62 0.63

0.01 0.03 0.77 0.00 0.50

a The TOEIC placement scores are selected by the function ancboot() by determining “five values of the covariate for which the relationship between the outcome and covariate is roughly the same in both groups” (Field, Miles, and Zoe, 2012, p. 484).

Table 5 Robust ANCOVA results for ALC TOEIC improvement. TOEIC placement score

160 385 445 555 990

N (Ability grouping) High

Low

117 1432 1654 1118 27

233 1617 1793 1504 57

Trimmed mean diff.

Confidence (95%)

27.30 22.94 15.42 7.91 2.18

Lower

Upper

0.31 15.48 8.17 16.65 15.70

54.92 30.39 22.68 0.82 20.07

Statistic

p-Value

2.50 7.78 5.37 2.29 0.31

0.02 0.00 0.00 0.02 0.76

Figure 1. Cohen’s d (difference) in TOEIC improvement between low-degree and high-degree ability groups and proficiency at placement (first year courses).

Table 6 Regression results for the difference TOEIC improvement between low-degree and high-degree ability groups and proficiency at placement. Course

Relationship

St. E

df

R2

Adjusted R2

F

df1

df2

p

ALC CS

Cubic Cubic

0.1026 0.08295

16 15

0.8226 0.7898

0.7893 0.7477

24.73 18.78

3 3

16 15

0.000 0.000

but no difference (Kulik & Kulik, 1987), or negative effects reported for lower proficiency students (Argys, Rees, & Brewer, 1996; Ireson, Hallam, Hack, Clark, & Plewis, 2002). These previously reported results have led to extensive calls to end ability grouping (tracking) in United States institutions (e.g., Kohli, 2014). An explanation of the difference between this study’s results and the other studies may come from Kulik (1992). In his meta-analysis, he found that the influence of ability grouping on attainment depends on features of the program. The different results that have been found could be explainable in terms of the extent of homogeneity of the lower level classes, the teachers assigned to those classes, and the degree of curriculum differentiation.

C. Sheppard et al. / English for Specific Purposes 49 (2018) 39–48

45

One assumption made in research which reports ability grouping to have no effect or negative effects on the attainment of lower proficiency learners (Argys et al., 1996; Kulik & Kulik, 1987; Slavin, 1990) has been that the groups created have been largely homogenous in all aspects. However, it can be argued that this has not often been the case. Although students may have been grouped according to similarity of a single trait (e.g., academic ability, or general intelligence; Tieso, 2003), for the lower level classes, ability grouping would have likely resulted in greater heterogeneity on other variables. In the US, lower level classes tend to have larger proportions of African-Americans (Gamoran & Mare, 1989; Hallinan, 2001) and Hispanics (Robinson, 2008). They also have a higher incidence of students with limited English proficiency (Callahan, 2005), and a higher incidence of learners with a learning disability (Sleeter, 1986). Each of these groups has specific learning needs which are not likely to be met by treating the group as if they were homogenous. On the other hand, higher proficiency groups would have been more likely to contain learners who came from the majority culture and from backgrounds which share the values of the educational institution they are members of (LeTendre et al., 2003), and are therefore more culturally homogeneous (Hallinan, 1994). By contrast, in the present study, the lower level ability groups were much more overtly socially and academically homogeneous. The students were members of very similar cultural groups: the overwhelming majority were Japanese, and the few exceptions were from other East Asian cultural backgrounds (i.e., Chinese or Korean). The students had been admitted to the university based on their entrance examination scores or high school academic performance; most shared the same first language (Japanese); and it was unlikely that they had any serious learning disabilities. The greater homogeneity of the students in the lower level English language ability groups in the present study was likely to have made ability grouping more effective for them – unlike the lower ability groups in previous studies that were likely to have greater heterogeneity where factors other those they were grouped on were concerned (cf. Ireson et al., 2002; Kulik, 1992; Kulik & Kulik, 1987). Another possible explanation for the difference in the results is the quality of the teachers, their attitude towards the students, and the classroom activities they designed. Often more senior teachers have been assigned to higher-level classes (Hallinan, 1994). There has been a perception among students too that the more proficient teachers have been assigned to these classes (Kim, 2012). Most likely related to this, teachers also have had differing attitudes towards teaching the different groups. For example, as previously mentioned, many teachers have seen lower level classes as being full of students with behavior problems (Chisaka, 2002; Hallam & Ireson, 2003; Kim, 2012). These differing attitudes have led to different classroom activities. Hallam and Ireson (2005), for example, have found that there was more rehearsal, more repetition, and more structured activities conducted in the lower level classes, but less focus on discussion, and lower expectations for critical thinking. Finally, a much larger proportion of time in the lower-level classes has been spent on disciplinary issues (Hallam & Ireson, 2003), which reduced the time spent on the curriculum. However, in the program described in the present study, the teachers were primarily assigned to classes based on their timetable availability. In addition to that, lower level classes were often assigned native Japanese speaking teachers when available so that difficult concepts could be explained in Japanese if necessary. When availability allowed, higher level classes were assigned to native English speaker teachers. There was little consideration of seniority or ability of teachers. The assignment of teachers in the present study would likely have had a more positive effect for lower level ability groups (i.e., having teachers who could explain difficult concepts in their native language) and a more negative effect for higher level ones (i.e., not being prioritized in assignment of the best or most senior teachers) when compared to the previous studies. A third possible explanation for the different results was curriculum differentiation between the ability groupings. The curriculum for different ability groups has usually been differentiated in many ways. These range from a new curriculum in a specialist institution to within class ability grouping in regular instruction (Rogers, 2002). “Grouping programs that entail more substantial adjustment of curriculum to ability have clear positive effects on children” (Kulik, 1992, p. 1), and this adjustment has been more likely to be made for the gifted and talented (Darling-Hammond, 2010; Rogers, 2002). The curriculum in the program which was the focus of the present study, on the other hand, was not differentiated for ability levels so the beneficial effects found in previous research of ability grouping for the higher language proficiency learners would have been absent. Support for this assumption comes from a study on ability grouping in non-differentiated tertiary EFL classes in Taiwan. Luo (2005, p. 253) found that “students in the basic level improved the most, while students of the advanced level didn’t make any substantial progress or scored even lower than in the pre-test.” The above describes some of the possible reasons behind the differences in the results obtained in the present study, compared to those of many previous studies. However, there may be another mechanism behind the apparent effectiveness of ability grouping for lower-level groups and the ineffectiveness for upper-level groups. Previous research findings have demonstrated a relationship between the proficiency level of the student, ability grouping, and academic self-concept – which pertains to the learner’s beliefs about his/her academic abilities and skills (Payne, 1962). Kulik (1992) found that, when ability grouping is used, the self-esteem of lower aptitude students rises slightly and the self-esteem of high aptitude students drops slightly. These results have been replicated for academic self-concept in Asian contexts: in Hong Kong (Kong, Hau, & Cheng, 1998; Wong & Watkins, 2001) and in Singapore (Liu, Wang, & Parkins. 2005) high schools. There is support for this position from research on Taiwan university freshmen English language classes (Luo & Tsai, 2002, p. 1) which found that “the students in the basic level benefited from achievement grouping by better attitudes towards learning English and increased motivation”. Ability grouping could influence academic self-concept because students are constantly making comparisons (Nicholls & Miller, 1984), and in ability groups these comparisons are being made with students of similar ability. For lower level students, this leads to a higher academic self-concept because they are in contact with more realistic role models. Allan (1991) has also

46

C. Sheppard et al. / English for Specific Purposes 49 (2018) 39–48

suggested that learners of low to average ability do not look to high-level learners as role models (Schunk, 1987). In contrast, it appears that for these learners seeing someone of similar ability succeed at a task raises their own motivation to try it (Feldhusen, 1989). Likewise, higher level learners are placed in groups where their peers have similar abilities to theirs. Marsh (1992) called this the big fish in the little pond effect. Previously these people would have been the best performing English student in their class, and possibly in their school – a little pond. Then they are placed into a big pond with a bunch of other big fish, and they are no longer the biggest fish. Comparing themselves to other high-performing learners effectively reduces their academic self-concept. For the first-year students in the present study, the effect was most likely magnified because, first, they were members of a cooperative culture who are more likely to compare themselves with their peers, and second, they had just entered a high-level institution which was already effectively ability grouped by the entrance examination/recommendation selection procedures. Academic self-concept has been demonstrated to predict educational attainment level ten years later over and above prior achievement (Guay, Larose, & Boivin, 2004) even when taking other socio-cultural factors into account. Marsh and O’Mara (2008) also demonstrated that academic self-concept has reciprocal effects with achievement. Thus, for lower level proficiency learners, the combination of higher academic self-concept, the presence of realistic role models, and the teacher’s curriculum adjustments made possible by the increased class homogeneity provided an environment which likely promoted learning, resulting in improved achievement. In contrast, any resulting decreases in academic self-concept resulting from ability grouping of the higher proficiency learners were likely to negatively impact their attainment. 6. Conclusion This study has found that ability grouping can make a difference for ESP second language learners. It has a positive effect for learners of lower than average proficiency, and a negative effect for learners of higher than average proficiency. Previous research findings about academic self-concept provide us with a viable social mechanism to explain why ability grouping might have affected attainment in this way. More proficient learners’ academic self-concept is negatively affected when placed with students of similar ability, resulting in poorer attainment. In contrast, less proficient learners’ academic selfconcept is positively affected when placed with students of similar ability, resulting in better attainment. These results have implications for ability grouping in language classrooms. Firstly, the results have demonstrated that ability grouping can be effective in some contexts for lower level ESP language learners, and it would be useful to continue with this practice. For more proficient learners, however, it would seem that some changes to the current practices are required. One possible change is to group average to high ability learners together randomly, or by some criteria other than proficiency, thereby reducing the degree to which they are grouped by ability. This is supported by the finding in this study that able language students in mixed ability groups made slightly better gains than those in groups of similar abilities. On the other hand, much of the research into tracking has found that ability grouping can be effective for higher level learners. Kulik (1992), for example, has suggested that the more curricula adjustments are made for higher level learners, the more effective ability grouping could become. It seems more care into making appropriate curricula adjustments for these students could be valuable. Another possibility is to do away with ability grouping all together (e.g., as in Finland; Darling-Hammond, 2010). This study has the usual limitations, given that it was conducted in a unique context: a large-scale ESP program at a Japanese tertiary institution. The results deal with students being placed into different ability groups studying in the same course, towards the same goals, and by and large using the same materials. These results cannot be extended to other programs which place students into different courses with different goals, such as one which has distinct beginning, intermediate, and advanced classes. Another limitation is that this study does not employ an experimental design, featuring a control group. All students were placed into ability groups according to their language proficiency level, and the study compared students who were ‘placed’ with students of similar ability, to a greater or lesser extent. Further research needs to be conducted into ability grouping in the second language classroom to determine if academic self-concept does indeed mediate between ability grouping and second language attainment, and to extend our understanding of the effect of ability grouping on attainment in contexts other than ESP. Acknowledgements This research was supported by a grant-in-aid (15H01976) received from the Japan Society for the Promotion of Science. We would also like to thank the Center for English Language Education in Science and Engineering (CELESE) at Waseda University and its students for their support of this research. References Allan, S. D. (1991). Ability-grouping research reviews: What do they say about grouping and the gifted? Educational Leadership, 48, 60-65. Ames, C. (1992). Classrooms: Goals, structures and student motivation. Journal of Education Psychology, 84, 267-271. Argys, L. M., Rees, D. I., & Brewer, D. J. (1996). Detracking America’s schools: Equity at zero cost? Journal of Policy Analysis and Management, 15, 623-645. Baker, P. (1998). The impact of teaching on student motivation. In S. Brown, S. Armstrong, & G. Thompson (Eds.), Motivating students (pp. 7-14). London: Kogan Page.

C. Sheppard et al. / English for Specific Purposes 49 (2018) 39–48

47

Boaler, J. (1997). Setting, social class and survival of the quickest. British Educational Research Journal, 23, 575-595. Boaler, J., Wiliam, D., & Brown, M. (2000). Students’ experiences of ability grouping: Disaffection, polarization and the construction of failure. British Education Research Journal, 26, 631-648. Butler, Y. G., & Iino, M. (2005). Current Japanese reforms in English language education: The 2003 “action plan”. Language Policy, 4, 25-45. Byean, H. (2015). English, tracking, and neoliberalization of education in South Korea. TESOL Quarterly, 49, 867-882. Callahan, R. M. (2005). Tracking and high school English learners: Limiting opportunity to learn. American Educational Research Journal, 42, 305-328. Chang, A. S. C. (1990, July). Streaming and learning behaviour. Paper presented at the 48th Annual Convention of the International Council of Psychologists, Tokyo, Japan, July 14–18. Retrieved from ERIC database (ED324092). Cheung, C., & Rudowicz, E. (2003). Academic outcomes of ability grouping among junior high school students in Hong Kong. The Journal of Educational Research, 96, 241-254. Chisaka, B. C. (2002). Ability grouping in Zimbabwe secondary schools: A qualitative analysis of perceptions of learners in low ability classes. Evaluation and Research in Education, 16, 19-33. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Earlbaum Associates. Cross, D. (1988). Selection, setting and streaming in language teaching. System, 16, 13-22. Darling-Hammond, L. (2010). The flat world and Education: How America’s commitment to equity will determine our future. New York: Teachers College Press. Educational Testing Service (ETS). (2017). TOEFL. Retrieved April 29, 2017 from https://www.ets.org/toefl. Feldhusen, J. P. (1989). Synthesis of research on gifted youth. Educational Leadership, 46, 6-11. Field, A., Miles, J., & Field, Z. (2012). Discovering statistics using R. London: Sage. Gamoran, A., & Mare, R. D. (1989). Secondary school tracking and educational inequality: Compensation, reinforcement, or neutrality. American Journal of Sociology, 94, 1146-1183. Guay, F., Larose, S., & Boivin, M. (2004). Academic self-concept and educational attainment level: A ten-year longitudinal study. Self and Identity, 3, 53-68. Hallam, S., & Ireson, J. (2003). Secondary school teachers’ attitudes towards and beliefs about ability grouping. British Journal of Educational Psychology, 73, 343-356. Hallam, H., & Ireson, J. (2005). Secondary school teachers’ pedagogic practices when teaching mixed and structures ability classes. Research Papers in Education, 20, 3-24. Hallinan, M. T. (1994). School differences in tracking effects on achievement. Social Forces, 72, 799-820. Hallinan, M. T. (2001). Sociological perspectives on black-white inequalities in American schooling. Sociology of Education, 74(extra issue), 50-70. Hanushek, E. A., & Wößmann, L. (2006). Does educational tracking affect performance and inequality: Differences-in-differences evidence across countries? The Economic Journal, 116, C63-C76. Harklau, L. (1994). Tracking and linguistic minority students: Consequences of ability grouping for second language learners. Linguistics and Education, 6, 217-244. Hattie, J. (2009). Visible learning: A synthesis of over 800 meta-analyses relating to achievement. New York, NY: Routledge. Hiraizumi, W., & Watanabe, S. (1975). Eigo kyoiku daironso [A debate on English education]. Tokyo: Bungei Shunju. Horwitz, E. K., Horwitz, M. B., & Cope, J. (1986). Foreign language classroom anxiety. The Modern Language Journal, 70, 125-132. Ireson, J., Hallam, S., Hack, S., Clark, H., & Plewis, I. (2002). Ability grouping in English secondary schools: Effects on attainment in English, mathematics and science. Educational Research and Evaluation, 8, 299-318. Kelly, K., & Preacher, K. J. (2012). On effect size. Psychological Methods, 17, 137-152. Kim, Y. (2012). Implementing ability grouping in EFL contexts: Perceptions of teachers and students. Language Teaching Research, 16, 289-315. Kohli, S. (2014, November 18). Modern-day segregation in public schools. The Atlantic. Retrieved from http://www.theatlantic.com/education/archive/2014/ 11/modern-day-segregation-in-public-schools/382846/. Kong, C. K., Hau, K. T., & Cheng, Z. J. (1998). Chinese students’ self-concept and academic performance: Big-fish-little-pond effects and the role of perceived school status. Paper presented at the Annual Meeting of the American Educational Research Associations, San Diego, CA, April 13–17. Kulik, J. (1992). Research on ability grouping: Historical and contemporary perspectives. Storrs: University of Connecticut, National Research Center on the Gifted and Talented. Kulik, C. C., & Kulik, J. A. (1982). Effects of ability grouping on secondary school students: A meta-analysis of the evaluation findings. American Education Research Journal, 19, 415-428. Kulik, J. A., & Kulik, C. C. (1987). Effects of ability grouping on student achievement. Equity & Excellence in Education, 23(1–2), 22-30. Lafaye, B. E., & Tsuda, S. (2002). Attitudes towards English language learning in higher education in Japan, and the place of English in Japanese society. Intercultural Communication Studies, 11, 145-161. LeTendre, G. K., Hofer, B. K., & Shimizu, H. (2003). What is tracking? Cultural expectations in the United States, Germany, and Japan. American Educational Research Journal, 40, 43-89. Li, F., Loyalka, P., Yi, H., Shi, Y., Johnson, N., & Rozelle, S. (2016). Ability tracking and social capital in China’s rural secondary school system. LICOS – Discussion Paper Series 2016, 1-54. Liu, H. J. (2008). An analysis of the effects of ability grouping on student learning in university-wide english classes. Feng Chia Journal of Humanities and Social Sciences, 16, 217-249. Liu, W. C., Wang, C. K. L., & Parkins, E. J. (2005). A longitudinal study of students’ academic self-concept in a streamed setting: The Singapore context. British Journal of Educational Psychology, 75, 567-586. Liu, J., & Yang, C. H. (2016). Between-class ability grouping, cram schooling, and student academic achievement in Taiwan. Sociology Study, 6, 335-341. Luo, B. (2005). Achievement grouping and students’ progress in freshman English classes at Feng Chia University. Feng Chia Journal of Humanities and Social Sciences, 11, 253-279. Luo, B., & Tsai, M. (2002). Understanding EFL learners in leveled and mixed classes. Paper presented at the Eleventh International Symposium on English Teachers/Fourth Pan-Asian Conference, Nov. 8–10, Chien Tan Overseas Youth Activity Center, Taipei. Mair, P., Schoenbrodt, F., & Wilcox, R. (2015). WRS2: Wilcox robust estimation and testing. Retrieved from https://r-forge.r-project.org/projects/psychor/. Marsh, H. W. (1992). Content specificity of relations between academic achievement and academic self-concept. Journal of Educational Psychology., 84, 3542. Marsh, H. W., & O’Mara, A. (2008). Reciprocal effects between academic self-concept, self-esteem, achievement, and attainment over seven adolescent years: Unidimensional and multidimensional perspectives of self-concept. Personality and Social Psychology Bulletin, 34, 542-552. Nation, I. S. P., & Macalister, J. (2010). Language curriculum design. New York: Routledge. Nicholls, J., & Miller, A. T. (1984). Development and its discontents: The differentiation of the concept of ability. In J. Nicholls (Ed.), The development of achievement motivation (pp. 185-218). Greenwich, Conn.: JAI Press. Nunan, D. (1988). The learner-centered curriculum: A study in second language teaching. Cambridge: Cambridge University Press. Oakes, J., Gamoran, A., & Page, R. N. (1992). Curriculum differentiation: Opportunities, outcomes, and meanings. In P. W. Jackson (Ed.), Handbook of research on curriculum (pp. 571-608). New York: Macmillan. Payne, D. A. (1962). The concurrent and predictive validity of an objective measure of academic self-concept. Educational and Psychological Measurement, 22, 773-780. R Core Team. (2015). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. http://www.R-project. org/ Richards, J. C., & Schmidt, R. W. (2002). Longman dictionary of language teaching and applied linguistics. London: Longman. Robert, P. (2010). Social origin, school choice, and student performance. Educational Research and Evaluation, 16, 107-129.

48

C. Sheppard et al. / English for Specific Purposes 49 (2018) 39–48

Robinson, J. P. (2008). Evidence of a differential effect of ability grouping on the reading achievement growth of language-minority Hispanics. Educational Evaluation and Policy Analysis, 30, 141-180. Rogers, K. B. (2002). Grouping the gifted and talented: Questions and answers. Roeper Review, 24, 103-108. Schunk, D. H. (1987). Peer models and children’s behavioral change. Review of Educational Research, 57, 149-174. Slavin, R. E. (1987). Ability grouping and student achievement in elementary schools: A best-evidence synthesis. Review of Educational Research, 57, 293336. Slavin, R. E. (1990). Achievement effects of ability grouping in secondary schools: A best-evidence synthesis. Review of Education Research, 60, 471-499. Sleeter, C. (1986). Learning disabilities: The social construction of a special education category. Exceptional Children, 51, 46-54. Suzuki, T. (1999). Nihonjin-wa naze eigo-ga dekinaika [Why aren’t Japanese good at English?]. Tokyo: Iwanami Shinshu. Tieso, C. L. (2003). Ability grouping is not just tracking anymore. Roeper Review, 26, 29-36. Watts, B. (1985). Being a teacher in the AMEP. Report to the committee of review of the adult migrant education program. Canberra: Department of Immigration and Ethnic Affairs. Wilcox, R. (2009). Robust ANCOVA using a smoother with bootstrap bagging. British Journal of Mathematical and Statistical Psychology, 62, 427-437. Wong, M. S. W., & Watkins, D. (2001). Self-esteem and ability grouping: A Hong Kong investigation of the big fish little pond effect. Educational Psychology, 21, 79-87. Chris Sheppard, PhD, is an associate professor at the Faculty of Science and Engineering at Waseda University in Tokyo, Japan. His research interests include curriculum design in ESP contexts, task-based language teaching and learning, the development of critical thinking skills through a second language, and second language acquisition. Email: [email protected]. Emmanuel Manalo, PhD, is a professor at the Graduate School of Education of Kyoto University in Japan. He teaches educational psychology and academic communication skills to undergraduate and graduate students. His research interests include the promotion of effective learning and instructional strategies, student diagram use, and critical thinking. Email: [email protected]. Marcus Henning, PhD, is a senior lecturer and the post-graduate academic advisor at the Centre for Medical and Health Sciences Education at the University of Auckland. His research interests include: adult education, quality of life, the motivation to teach and learn, organizational behavior, conflict management, and professional integrity. Email: [email protected].