Economics Letters 123 (2014) 236–239
Contents lists available at ScienceDirect
Economics Letters journal homepage: www.elsevier.com/locate/ecolet
Disadvantages of linguistic origin—Evidence from immigrant literacy scores Ingo E. Isphording ∗ Institute for the Study of Labor (IZA) Bonn, Germany
highlights • • • • •
Estimation of gaps in literacy scores resulting from linguistic origin. Unique linguistic data on language differences. Cross-national design to control for origin- and destination-effects. Sizable disadvantages by linguistic origin, increasing with age at arrival. Moderate convergence of linguistically distant immigrants by time of residence.
article
info
Article history: Received 19 November 2013 Received in revised form 12 February 2014 Accepted 14 February 2014 Available online 22 February 2014
abstract This study quantifies the disadvantage in literacy skills that arises from the linguistic distance between their mother tongue and host country language, combining individual cross-country data on literacy scores with unique information on the linguistic distance between languages. © 2014 Elsevier B.V. All rights reserved.
JEL classification: F22 J15 J24 J31 Keywords: Linguistic distance Literacy Human capital Immigrants
1. Introduction The rise of information and communication technology and the associated increase in the demand for skills in literacy and numeracy display a particular challenge for immigrants from different linguistic backgrounds. Literacy in the destination language as ‘‘the ability to understand and employ written information in daily activities, at home, at work and in the community’’ (OECD, 2000) comprises a productive trait highly valued in the labor market (Dougherty, 2003), and insufficient levels of literacy lead to significant hurdles for the economic integration of immigrants
∗
Correspondence to: IZA, Schaumburg-Lippe-Str. 5-7, 53113 Bonn, Germany. E-mail addresses:
[email protected],
[email protected].
http://dx.doi.org/10.1016/j.econlet.2014.02.013 0165-1765/© 2014 Elsevier B.V. All rights reserved.
(Ferrer et al., 2006; Kahn, 2004). Still, given this importance of literacy and language skills, the literature on the skill formation remains surprisingly scarce. Non-native speaking immigrants face the economic decision to acquire a host-country language. The linguistic literature indicates that the costs of language acquisition are associated to the linguistic background of an immigrant. An increased linguistic dissimilarity or distance between the mother tongue of an immigrant and the language of the destination country, decreases the potential language transfer, the application of knowledge in the mother tongue in the destination country language acquisition. To provide an economic interpretation, the linguistic distance displays the degree of transferability of home country language capital into the destination country, analogous to the imperfect portability of education (Friedberg, 2000). Linguistic differences are not straightforward to measure, and the linguistic literature mainly comprises qualitative or small scale
I.E. Isphording / Economics Letters 123 (2014) 236–239
quantitative studies. Van der Slik (2010) offers an overview and notable exception. A small number of studies have attempted to implement measures that condense linguistic differences to a one-dimensional summary statistic: Chiswick and Miller (1999) define such a measure using classroom assessments of American language students to explain self-reported language fluency; Lohmann (2011) using grammatical features of languages to explain international trade flows and Adsera and Pytlikova (2012) using language family relations to explain bilateral migration flows. Against this background, this study aims to quantify the linguistic barriers in literacy skill formation. Data on literacy scores from the International Adult Literacy Survey (IALS) is combined with a unique measure of the linguistic distance used by Isphording and Otten (2013) to explain bilateral trade flows. It is based on differences between mother tongue and the host country language in terms of pronunciation. Drawn from linguistic research by the German Max Planck Institute of Evolutionary Anthropology, this measure offers a continuous and cardinally interpretable measurement of linguistic differences for any of the world’s languages. Regressing literacy scores on the linguistic distance yields estimates of score differentials with respect to an immigrant’s linguistic origin. This data setup offers two key advantages that allow for the core contributions of this study: first, the cross-sectional design of the IALS data allows simultaneously controlling for destination and origin country specific characteristics, which have been omitted in previous studies using national datasets (Chiswick and Miller, 1999; Van der Slik, 2010; Isphording and Otten, 2013). Second, the usage of objective literacy scores allows quantifying results for subjective measures of language skills, avoiding issues of measurement error in these self-reported indicators. Third, the combination of this dataset with the innovative measure of linguistic distance allows then the broadening of national results to achieve an international perspective. Finally, the study specifically addresses the influence of linguistic origin over time of residence and offers additional evidence for the so-called Critical Period hypothesis, which states that the necessary effort for acquiring a language increases with the immigrant’s age at arrival. 2. Material and methods To assess the magnitude of linguistic barriers in the language acquisition of immigrants, I combine data from two different sources—the public use file of the International Adult Literacy Study (IALS) and the Automatic Similarity Judgement Program (ASJP), a research program by the German Max-Planck Institute of Evolutionary Anthropology, aiming at explaining the historical development and geographical diversity of languages (Brown et al., 2008). The IALS offers a unique data source on adults’ literacy skills and socio-economic characteristics over the period from 1994 to 1998 (OECD, 2000). After deleting observations with missing information, the dataset covers 1521 immigrants from 70 sending countries in 9 host countries.1 The dataset offers information on three dimensions of literacy: prose literacy (the knowledge to understand and use information in texts), document literacy (the skills to use information stored in documents such as forms, schedules, tables, etc.) and quantitative literacy (the skill to locate numbers found in printed materials and apply simple arithmetic operations). This direct measurement based on test booklets avoids
1 Immigrants are defined as individuals not born in the surveyed country. Detailed information on the country of origin is available in Switzerland, the Netherlands, Sweden, Great Britain, Italy, Slovenia, Czech Republic, Finland and Hungary.
237
dealing with substantial degrees of misreporting in typically used self-reported measures of language skills (Charette and Meng, 1994; Dustmann and van Soest, 2001).2 The IALS data is augmented with a measure of linguistic distance between the mother tongue and host country language using the information on the first language of an immigrant. The ASJP method to assess language differences relies on the measurement of similarities in pronunciation by a direct comparison of word pairs with the same meaning across different languages. 40 culturally independent words are transcribed in a phonetic script, e.g. the English word mountain is transcribed as maunt3n, while its Spanish counterpart is transcribed as monta5a, with each character in these transcriptions representing a common sound of human communication. Within each word pair of the same meaning between languages, the Levenshtein distance is calculated, i.e. the minimum number of sounds that have to be changed, removed or added to transfer the word of one language into the same word in a different language. Table 1 summarizes some computational examples. The average minimum distance between all 40 word pairs is normalized to take into account potential similarities by chance due to shared phonetic inventories, resulting in the final measure of linguistic dissimilarities (Brown et al., 2008). The distances computed by the ASJP are in line with the basic intuition on language differences. Closest distances emerge within the same language family (Germanic languages for English and German, Romance languages for French and Slavic languages for Czech). The closest linguistic distance different from zero in the present sample relates to Serbian-speaking immigrants in Slovenia, while the largest distance is encountered by Turkish immigrants in the Netherlands.3 To identify systematic disadvantages of linguistic origin in the literacy scores, literacy Y is estimated as a function of linguistic distance LD, years since migration YSM and an indicator for arrival before age of 12 AgeEntry12 , separately for each of the three literacy dimensions: Y = β0 + β1 LD + β2 YSM + β3 AgeEntry12 + β4 LD
× AgeEntry12 + β5 LD × YSM + X ′ γ + O′ δ + D′ λ + ε.
(1)
The interaction term LD × YSM accounts for a convergence over time of residence in literacy scores between native and nonnative speakers. LD × AgeEntry12 accounts for an increase in the effect of the linguistic origin by age at arrival, as indicated in the psychobiological literature and referred to as the Critical Period hypothesis (Newport, 2002). Accordingly, the coefficients of the main effects of years since migration and age at entry, β4 and β5 , indicate the effects for the subpopulation of native-speaking immigrants with LD = 0. Control variables X consist of gender, individual and parental education, birth cohort and the geographic distance between the origin and destination countries. The international design of the IALS allows to simultaneously control for origin- and destination-fixed effects (D and O) capturing potentially omitted country characteristics, e.g. differences in
2 Specific answers to the test booklet do not indicate a literacy level with certainty. Due to the restricted number of questions, individuals with different levels of literacy might still produce the same set of answers. To account for this uncertainty, the IALS data provide 5 different plausible values of literacy scores for every individual. To take into account this sampling procedure of the IALS, I follow the established method of using the simple average of the 5 plausible values of test scores as the outcome variable. Standard errors are subsequently computed, taking into account the replicate weights offered by IALS. This method accounts for the unspecified intra-cluster correlation, yet ignores the stratification of the sampling. Brown and Micklewright (2004) show that this method might produce slightly overstated standard errors in some cases. 3 The complete matrix of linguistic distances can be found in the web appendix, Table 8.
238
I.E. Isphording / Economics Letters 123 (2014) 236–239
Table 1 Linguistic distance: computational examples. Source: Brown et al. (2008).
Table 2 Literacy and linguistic origin. Linguistic distance
Word
Spanish
English
Distance
You Not Person Night Mountain
tu no persona noCe monta5a
yu nat pers3n nEit maunt3n
1 2 2 3 5
Ling. dist. × age at entry 12 or older
−0.328**
−0.128
−0.247*
(0.09)
(0.10)
(0.11)
−0.446***
−0.518***
−0.413***
(0.06)
(0.07)
(0.08)
Ling. dist. × years since migration
0.013*** (0.00)
0.008** (0.00)
0.011*** (0.00)
Age at entry 12 or older
0.397 (4.15)
7.211 (3.68)
9.333* (3.79)
−0.333 (0.22)
0.054 (0.22)
0.106 (0.22)
language acquisition support, or selective migration policies favoring skilled immigrants for the receiving country, and differences in media exposure to foreign languages or the quality of the education system for the sending country.4
Destination-fixed effects Origin-fixed effects
Yes Yes
Yes Yes
Yes Yes
3. Results
R2 N
0.602 1521
0.589 1521
0.569 1521
The main results of the estimation of Eq. (1) are summarized in Table 2. Separately estimated for each dimension of literacy, the results confirm a strong negative influence of the linguistic background on the literacy formation in the destination language of immigrants. The main effect of linguistic distance displays the initial disadvantage (at YSM = 0) for young arrivals immigrating at the age of 11 or younger. It is only significant for the prose and quantitative literacy while it remains insignificant in the document literacy. The negative effect of the linguistic distance becomes more pronounced for immigrants arriving at an age of 12 or older, indicated by the significant coefficients of the interaction terms between age of entry 12 or older and the linguistic distance. This supports the Critical Period Hypothesis in the linguistic literature: young children are able to acquire new languages almost effortlessly, while the linguistic background plays an increasingly important role when individuals approach adolescence.5 Regarding the relationship of years since migration and the linguistic distance, the results indicate a moderate convergence over time. The positive interaction of linguistic distance and years since migration shows that immigrants with a distant linguistic background face a steeper assimilation profile and are able to catch up over time. The main effects of years since migration and age at entry are small in levels and insignificant in most cases. This indicates the lack of change in literacy scores for native speaking immigrants. Neither do native speaking immigrants face a disadvantage by arriving at older ages, as they already speak the destination language prior migration.6 Fig. 1 illustrates the relationship between age at entry, the time of residence and the linguistic distance in terms of predicted means based on the results of Table 2. A similar pattern arises for all three dimensions of literacy in the upper panels (a), (b) and (c). Although the linguistic distance only has a small effect for childhood immigrants (the dark gray line), it distinctively reduces the test scores for late arrivals, as indicated by the much steeper negative slope of the light gray line. A more distant linguistic
4 The data setup does not allow to control for unobserved heterogeneity on the level of bilateral origin- and destination-dyads. Robustness checks including potentially confounding factors on the bilateral level (cultural differences, migrant stock) at the expense of observational numbers indicate very robust pattern with regard to the linguistic distance. The results are available in the web appendix, Table 4. 5 Robustness checks show that the results are not sensitive to the choice of the actual threshold. The results are available in the web appendix in Table 6. 6 Estimations excluding native speakers are available in the web appendix in Table 5. The general pattern remains the robust, although the coefficients of interest become more pronounced.
Years since migration
Notes: standard errors in parentheses, computed using replicate weights and mean of plausible values to take sampling structure into account. Education base category: ISCED1/No schooling. Reference birth cohort: born before 1940. The dependent variable: literacy test scores (range 0–500). Control variables on the individual level include gender, individual and parental education, birth cohort and geographic distance. Control variables on the bilateral origin–destination level include migration stock, cultural distance and geographic distance. Full estimates in the web appendix in Table 3. * Significant at 5% level. ** Significant at 1% level. *** Significant at 0.1% level.
background increases the assimilation rate, albeit only marginally (Fig. 1, panels (d)–(f)). The convergence does not compensate the large initial disadvantage of linguistic origin.7 Fixing covariates at their sample means, the initial disadvantage of linguistic origin of a linguistically distant immigrant (e.g. a Turk in the Netherlands, LD = 102.33) compared to a native-speaking immigrant accounts for 33.5 (13.1, 25.3) points in the prose (quantitative, document) scale. This increases to 79.2 (66.1, 67.5) points for immigrants who arrived at the age of 12 or later, and is comparable to the disadvantage of having no formal schooling or schooling of ISCED 1 (only primary schooling) compared to ISCED 5 (short-cycle tertiary education). Due to the only moderate convergence, the disadvantage prevails over a long period of time, whereby the average disadvantage still accounts for 59.8 (53.9, 50.6) points after 15 years of residence. 4. Conclusion Insufficient literacy skills in the destination language represents a significant hurdle for the integration and assimilation of immigrants into the labor market of destination countries. This study shows that the immigrant’s proneness to insufficient levels of literacy can be explained to a large extent by the barriers that arise from a more or less distant linguistic background. Linguistic barriers lead to a significant disadvantage in literacy scores, which becomes more pronounced by a later age at arrival. Although linguistically distant immigrants seem to be able to catch up over time, this convergence is only moderate does not offset the initial hurdle. The unique data setup of internationally comparable and objective measures of literacy combined with a measure of linguistic distances between any of the world’s languages allows to quantify and generalize results for nationally assessed selfreported language proficiency, by simultaneously controlling for the unobserved heterogeneity on the origin and destination level.
7 Estimations including quadratic functions of the years since migration show a negligible decrease in the effect of additional exposure. For reasons of clarity, only the linear relationship is reported in the main specifications.
I.E. Isphording / Economics Letters 123 (2014) 236–239
239
Fig. 1. Interaction effects: linguistic distance, age at entry and years since migration.
The results uncover an important source of typically unobserved differences in heterogeneous immigrant populations and shed light on linguistic barriers as a factor for imperfect human capital portability. Although the sample at hand does not allow for a direct assessment of the labor market effects of the linguistic origin, comparisons with the literature on literacy and immigrant earnings (Ferrer et al., 2006; Kahn, 2004) indicate that the estimated disadvantages are likely to lead to increased hurdles in the labor market assimilation. Acknowledgments The author is grateful to Sebastian Otten, Marcos A. Rangel and the participants of the Symposium on Migration and Language at Princeton University, the participants of the 10th IZA Migration Week in Jerusalem, and the members of the Chair of Competition Policy, Bochum, for their helpful comments and suggestions. Appendix. Supplementary data Supplementary material related to this article can be found online at http://dx.doi.org/10.1016/j.econlet.2014.02.013. References Adsera, A., Pytlikova, M., 2012. The role of language in shaping international migration. In: CReAM Discussion Paper Series, vol. 1206. Centre for Research and Analysis of Migration (CReAM), Department of Economics, University College London. URL: http://ideas.repec.org/p/crm/wpaper/1206.html.
Brown, C.H., Holman, E.W., Wichmann, S., Velupillai, V., 2008. Automated classification of the World’s languages: a description of the method and preliminary results. In: STUF-Language Typology and Universals, Vol. 61. pp. 285–308. Brown, G., Micklewright, J., 2004. Using International Surveys of Achievement and Literacy: a View from the Outside. UNESCO Institute for Statistics. Charette, M., Meng, R., 1994. Explaining language proficiency: objective versus self-assessed measures of literacy. Econom. Lett. 44, 313–321. URL: http://ideas.repec.org/a/eee/ecolet/v44y1994i3p313-321.html. Chiswick, B.R., Miller, P.W., 1999. English language fluency among immigrants in the United States. In: Polachek, S.W. (Ed.), Research in Labor Economics, Vol. 17. JAI Press, Oxford, pp. 151–200. Dougherty, C., 2003. Numeracy, literacy and earnings: evidence from the national longitudinal survey of youth. Econ. Educ. Rev. 22, 511–521. URL: http://ideas.repec.org/a/eee/ecoedu/v22y2003i5p511-521.html. Dustmann, C., van Soest, A., 2001. Language fluency and earnings: estimation with misclassified language indicators. Rev. Econ. Stat. 83, 663–674. Ferrer, A., Green, D.A., Riddell, W.C., 2006. The effect of literacy on immigrant earnings. J. Hum. Resour. 41. URL: http://ideas.repec.org/a/uwp/jhriss/ v41y2006i2p380-410.html. Friedberg, R.M., 2000. You can’t take it with you? Immigrant assimilation and the portability of human capital. J. Labor Econom. 18, 221–251. URL: http://ideas. repec.org/a/ucp/jlabec/v18y2000i2p221-51.html. Isphording, I.E., Otten, S., 2013. The costs of babylon—linguistic distance in applied economics. Rev. Int. Econ. 21 (2), 354–369. URL: http://ideas.repec.org/p/rwi/ repape/0337.html. Kahn, L.M., 2004. Immigration, skills and the labor market: international evidence. J. Popul. Econ. 17, 501–534. URL: http://ideas.repec.org/a/spr/jopoec/ v17y2004i3p501-534.html. Lohmann, J., 2011. Do language barriers affect trade? Econom. Lett. 110, 159–162. http://dx.doi.org/10.1016/j.econlet.2010.10.023. Newport, E.L., 2002. Critical periods in language development. In: Encyclopedia of Cognitive Science, Macmillan Publishers Ltd., Nature Publishing Group. OECD, 2000. Literacy in the Information Age. Final Report of the International Adult Literacy Survey. Technical Report. Van der Slik, F.W.P., 2010. Acquisition of dutch as a second language. In: Studies in Second Language Acquisition, vol. 32. pp. 401–432.