The Flynn Effect for gains in literacy found in Estonia is not a Jensen Effect

The Flynn Effect for gains in literacy found in Estonia is not a Jensen Effect

Personality and Individual Differences 34 (2003) 1287–1292 www.elsevier.com/locate/paid The Flynn Effect for gains in literacy found in Estonia is not...

116KB Sizes 0 Downloads 53 Views

Personality and Individual Differences 34 (2003) 1287–1292 www.elsevier.com/locate/paid

The Flynn Effect for gains in literacy found in Estonia is not a Jensen Effect Olev Must*, Aasa Must, Vilve Raudik Department of Psychology, University of Tartu, Tiigi 78, Tartu 50410, Estonia Received 27 December 2001; received in revised form 16 February 2002; accepted 15 April 2002

Abstract In 1990–1991 the International Association for the Evaluation of Educational Achievement studied reading skills in 9- and 14-year-old students around the world. In 1994, using the same methodology, similar data were gathered in Estonia on 2901 9- and 14-year-olds. Five years later, 522 (37%) of the original group of Estonian 9-year-olds were re-tested when they reached 14 years of age. The aim of the second testing was to control for any test score rises over time. It was found that reading scores rose and these varied by subscale: 0.52 S.D.s on the Documents subscale, 0.19 S.D.s on the Expository subscale, and 0.07 S.D.s on the Narratives subscale. The Spearman rank-order correlation of secular changes with the g loadings from a principal components analysis was rho= 1.00. As such, for these data, the Flynn Effect is not a Jensen Effect. # 2002 Elsevier Science Ltd. All rights reserved. Keywords: Mental abilities; Intelligence; g-Factor; Reading literacy; Flynn Effect; Jensen Effect

1. Introduction The last two decades of scientific research on intelligence and the secular rise in IQ scores have added two new ‘‘effects’’ to the psychological vocabulary: the ‘‘Flynn Effect’’ and the ‘‘Jensen Effect.’’ The Flynn Effect refers to the secular rise in IQ test scores over time. The Jensen Effect refers to any strong positive correlation between g loadings and other variables (g is the general factor of intelligence; e.g. Jensen, 1998). The two effects may be related, as when we ask whether the changes in test performance over time occur on the g factor, that is, whether the Flynn Effect is a Jensen Effect. Such research has produced mixed results. On the one hand, both Rushton (1999), using mainly US data on the WISC-R, and Must, Must, and Raudik (submitted for * Corresponding author. Tel.: +372-7-375-912; fax: +372-7-352-900. E-mail address: [email protected] (O. Must). 0191-8869/02/$ - see front matter # 2002 Elsevier Science Ltd. All rights reserved. PII: S0191-8869(02)00115-0

1288

O. Must et al. / Personality and Individual Differences 34 (2003) 1287–1292

publication) using Estonian data on the National Intelligence Test, found the secular rise in test scores did not occur on the g factor. On the other hand, Colom, Juan-Espinosa, and Garcia (2001), using the Spanish version of the Differential Aptitude Battery, found the secular trend in test scores did correlate with the g factor. The present study aims to reduce this uncertainty. The investigations cited above (by Rushton, Must et al., and Colom et al.), used traditional IQ tests. However, there is a large body of data about changes in reading comprehension that is also relevant, although it has not been addressed in this context. To analyse the non-IQ test data in terms of the Flynn Effect requires making the assumption that g enters into every kind of activity requiring mental effort (Jensen, 1998). Although this may be reasonable it is only fair to point out that Flynn (1999) has contended that the changes in IQ scores are mainly related to the theoretical construct of IQ and not to changes in real-world cognitive abilities. Nonetheless, the present study will make the assumption that tests of whether the Flynn Effect is related to the g-factor can be made from reading comprehension tests. Reading comprehension and reading literacy research has a long historical tradition (Thorndike, 1973), and is a topic of world-wide research supported by the Organisation for Economic Co-operation and Development (1999). Data are available from different periods reflecting changes in reading over time. So it is possible to analyze these changes in reading scores to address the same research question as changes in IQ test scores. On the surface, the construct of reading literacy differs from the classic IQ construct in that reading is a more specific cognitive activity and more obviously vulnerable to environmental and educational influences. (Elley, 1992; Murray, Kirsch, & Jenkins, 1998). There is research evidence that about 40% of the variance in reading achievement is explained by environmental factors (Munck & Lundberg, 1994). However, it also shares common features with IQ scores. In fact, IQ tests often have Verbal subtests. Tests of reading literacy are oriented towards understanding the meaning of words, abstraction of themes, identifying relationships, and elaborating information. Test items are often similar to those on Wechsler’s (1981) Similarities subscale. In this regard it is important to note that the most impressive IQ gains occur on the Similarities subtest (Flynn, 1999). It seems reasonable to assume, therefore, that reading literacy scores rise for similar reasons as do scores on the Similarities subtest of the WISC. In this paper we examine the nature of changes in literacy test scores over time.

2. Methods In 1990–1991 the International Association for the Evaluation of Educational Achievement (IEA) carried out an international project (32 countries participating) for evaluating the reading literacy of schoolchildren in two age cohorts: 9- and 14-year-olds (Binkley, Rust, & Winglee, 1994; Elley, 1992). A special research procedure was elaborated by the IEA for both groups. The reading literacy score was computed as the sum of scores on three subscales based on different types of texts: Narratives, Expository, and Documents. In 1994 the IEA literacy tests were translated into Estonian and data were collected (Must, 1998). The testing was carried out in accordance with IEA requirements: two age groups were tested: 9-year-olds (3rd grade students) and 14-year-olds (8th grade students). A stratified random sampling procedure was used. The general population was divided into three layers according to the degree of urbanization: the

1289

O. Must et al. / Personality and Individual Differences 34 (2003) 1287–1292

capital city, provincial centers, and rural areas. The unit of selection was the school-class and this led to a small difference ( <4%) between the sizes of the planned and actual sample. The size of the sample was planned as 1500 students in both populations. Five years later we continued research using the same methodology and database. Data about the Estonian target populations and the approach to sampling are given in Table 1. There were three subscales. The first, Narratives, measured comprehension of narrative, emotional texts. The second, Expository, measured comprehension of informational, text-book like texts. The third, Documents, measured comprehension of tables, figures, maps, etc. Descriptions of these scales and their psychometric properties are given in Table 2. In 1994, the 9-year-old students attended the 3rd grade. In 1999, those students were 14 years old and attended the 8th grade. This meant that in the same methodological framework it was possible to carry out longitudinal research and to measure the same students twice: in the 3rd and in the 8th grade, and so evaluate the students’ progress during the 5-year interval. The second testing was carried

Table 1 Sample formation

Year

n Mean age (years) Gender Boys Girls Area Rural areas Provincial centres Capital

Grade 3

Replication

Grade 8

1994

1999

1994 General population

Sample

11 717 –

1492 14.2

General population

Sample

Sample

12 667 –

1409 9.7

522 14.7

– –

51% 49%

49% 51%

– –

45% 55%

54% 25% 21%

54% 25% 21%

48% 30% 22%

50% 29% 21%

51% 25% 24%

Table 2 Reading literacy scales (data from 1994) Scale

Description of the scale

Grade

No of items

Reliabilitya

Mean/S.D.

3

8

3

8

3

8

Narratives

Comprehension of narrative, emotional texts

22

29

0.87

0.82

14.2/5.3

20.8/5.1

Expository texts

Comprehension of informational, text-book like texts

21

26

0.86

0.74

11.7/5.0

16.7/4.3

Documents

Comprehension of tables, figures, maps etc.

23

34

0.80

0.77

15.8/4.4

24.8/4.5

a

Reliability is estimated by Cronbach’s a.

1290

O. Must et al. / Personality and Individual Differences 34 (2003) 1287–1292

out in 50% of the schools where the first testing had taken place (with every second school studied). Unfortunately, after 5 years, there was some attrition from the first wave because of dropouts, changes of school, etc. Consequently, only 522 students from the original sample (37%, not 50%) were re-tested. The sample was biased toward the better students (Table 3).

3. Results The mean reading score of the re-tested sample was higher by 0.19 S.D.s in its original third grade marks than were the full 3rd grade sample. Thus, the second cohort had dropped some students with low scores. This sample bias needed to be corrected. The mean test score of the 8th graders in 1999 was on average 0.41 S.D.s better than the scores in the same grade in 1994. But because the sample of 1999 is biased to higher scorers, the raw change should be corrected for this sample bias ( 0.19 S.D. units). After correcting for this bias, a rough estimate of average gain is possible of 0.22 S.D.s for the total score. The most impressive gain was in the Document subscale (0.52 S.D.s); the change was minimal in the Narrative subscale (0.07 S.D.s), and intermediate in the Expository subscale (0.19 S.D.s) The IEA project defined the literacy score as the sum of the three subscales. A factor analysis allowed an alternative interpretation based on the common variance of the three subscales. The first principal component thus describes the content and structure of this common variance (Table 4). The highest loading on the first principal component was by the Narratives subscale and the lowest by the Documents subscale. The Spearman rank-order correlation coefficient between the secular changes over the 5 years and the g loadings based on the first principal component is rho= 1.00 (Pearson’s r= 0.99). By itself, however, the rank-order correlation of 1.00 across only three subscales has a probability of 3!, or 321=1/6, or 0.16 which does not reach conventional levels of significance. However, an analysis dimensionality of the same IEA scales as used on a US sample by Atash (1994) found the exact same rank order of subtests based on a full factor analysis at the item Table 3 Changes in reading literacy test scores: 1994–1999Estonian students Subsample for replication data from 1994

Sample of grade 3 students (1994) n=1409 Mean/S.D.

n=522 Mean/S.D.

Summary score

41.8/13.1

Subscales: narratives exp. texts documents

14.2/5.3 11.7/5.0 15.8/4.4

a

Effect sizea

Effect sizea

Corrected effect sizea

n=522 Mean/S.D.

d2

d2 d1

63.4/11.6

67.8/9.5

0.41

0.22

20.8/5.1 16.7/4.3 24.8/4.5

22.0/4.2 18.2/3.7 27.5/3.6

0.26 0.37 0.66

0.07 0.19 0.52

d1

Sample of grade 8 students (1994) n=1492 Mean/S.D.

44.2/12.3

0.19

15.2/5.0 12.6/4.7 16.4/4.1

0.19 0.18 0.14

Effect size is estimated by Cohen’s d. d=M1 M2/spooled.

Subsample for replication data from 1999

1291

O. Must et al. / Personality and Individual Differences 34 (2003) 1287–1292 Table 4 The changes of the subtests scores of the reading literacy test Subscale

First principal component (3rd grade students)

Rank order of the factor loadings

Effect size da (from 3rd to 8th grade)

Rank order of the effect sizes

Narratives Expository texts Documents Eigenvalue

0.924 0.894 0.848 2.2

1 2 3

0.07 0.19 0.52

3 2 1

a

Effect size is estimated by Cohen’s d. d=M1–M2/spooled.

level. As such we may be permitted to increase the probability previously calculated to as much as 1/61/6 and find P<0.05.

4. Discussion The discovery of IQ gains over time has enriched our knowledge about the construct of intelligence and re-emphasized its dynamic qualities. Especially important is the question about whether the changes that occur on test scores over time do so on g, the general factor of intelligence. The present study indicates that the changes on three literacy subscales in Estonia do not appear to be on the g factor. In other words, the Flynn Effect appears not to be a Jensen Effect. However, the present study is very limited in scope because the observed changes were only over a 5-year period and on only three reading subscales. Nonetheless, changes were observed in line with those previous studies showing that IQ test gains are not changes in g (Rushton, 1998; Must et al., submitted for publication; but see Colom et al., 2001). Although reading research has a long history, only a few studies have examined changes over time, and whether these are on the g factor. Some American researchers have claimed that reading skills in the USA are in decline (Dodge, 1991; Feinberg, 1995; Young, 1995). Others, however, suggest modest gains (Campbell, Hombo, & Mazzeo, 2000). A study in Sweden shows no changes in reading skills over 21 years (Taube, 1993). These mixed results might be clarified if g-factor analyses were carried out. Clearly more research examining changes on other kinds of cognitive tests in addition to IQ tests might shed much light on both the Flynn Effect and the Jensen Effect. Acknowledgements Supported by Grant 2387 from the Estonian Scientific Foundation. We thank all students and teachers for their participation in this study. References Atash, N. (1994). Assessing the dimensionality of the IEA reading literacy data. In M. Binckley, K. Rust, & M. Winglee (Eds.), Methodological issues in comparative educational studies. The case of the IEA reading literacy study (pp. 75– 103). Washington, DC: US Department of Education, National Centre for Education Statistics.

1292

O. Must et al. / Personality and Individual Differences 34 (2003) 1287–1292

Binkley, M., Rust, K., & Winglee, M. (1994). Methodological issues in comparative educational studies. the case of the IEA reading literacy study. Washington, DC: US Department of Education Office of Educational Research and Improvement. Campbell, J. R., Hombo, C. M., & Mazzeo, J. (2000). NAEP 1999 trends in academic progress: three decades of student performance. Washington, DC: US Department of Education, Office of Educational Research and Improvement. Colom, R., Juan-Espinosa, M., & Garcia, L. F. (2001). The secular increase in test scores is a ‘‘Jensen effect.’’. Personality and Individual Differences, 30, 553–559. Dodge, S. (1991). Average score on SAT Verbal section falls to all-time low; math score declines for first time in more than 10 years. Chronicle of Higher Education, 38(2), 45–48. Elley, W. B. (1992). How in the world do students read? Hamburg: International Association for the Evaluation of Educational Achievement. Feinberg, L. (1995). A new center for the SAT. College Board Review, 174, 8–13,31–32. Flynn, J. (1999). Searching for justice: the discovery of IQ gains over time. American Psychologist, 54, 5–20. Jensen, A. R. (1998). The g factor. Westport, CT: Praeger. Munck, I., & Lundberg, I. (1994). Multivariate analyses of data from population A. In W. B. Elley (Ed.), The IEA study of reading literacy. Oxford: Pergamon Press. Murray, T. S., Kirsch, I. S., & Jenkins, L. B. (1998). Adult literacy in OECD countries. Washington, DC: US Department of Education, Office of Educational Research and Improvement. Must, O. (1998). Literacy: the case of Estonia. Trames, 2, 40–65. Must, O., Must, A. & Raudik, V. (submitted for publication). The secular rise in IQs: in Estonia the Flynn Effect is not a Jensen Effect. Intelligence. Organisation for Economic Co-operation and Development. (1999). Measuring student knowledge and skills: a new framework for assessment. Paris: Organisation for Economic Co-operation and Development. Rushton, J. P. (1998). The ‘‘Jensen Effect’’ and the ‘‘Spearman–Jensen Hypothesis’’ of black–white IQ differences. Intelligence, 26, 217–225. Rushton, J. P. (1999). Secular gains in IQ not related to the g factor and inbreeding depression—unlike black–white differences: a reply to Flynn. Personality and Individual Differences, 26, 381–389. Taube, K. (1993). Reading comprehension among Swedish Students: a comparative analysis of IEA studies from 1970 and 1991. Scandinavian Journal of Educational Research, 37, 89–97. Thorndike, R. L. (1973). Reading comprehension education in fifteen countries. Uppsala: Almqvist & Wiksel. Wechsler, D. (1981). Wechsler adult intelligence scale—revised. San Antonio: The Psychological Corporation. Young, J. W. (1995). ‘‘Recentering’’ the SAT score scale. College and University, 70(2), 60–62.