Labour Economics 6 Ž1999. 471–489 www.elsevier.nlrlocatereconbase
Estimates of the return to schooling in Sweden from a large sample of twins Gunnar Isacsson
)
School of Transportation and Society, Dalarna UniÕersity, S-781 88 Borlange, Sweden ¨
Abstract A large sample of twins was used to examine whether conventional estimates of the return to schooling in Sweden are biased because ability is omitted from the earnings– schooling relationship. Ignoring measurement error, the results indicate that omitting ability from the earnings–schooling relationship leads to estimates that are positively biased. However, reasonable estimates of the measurement-error-adjusted returns are both above and below the unadjusted estimates, showing that the results depend crucially on a parameter not known at this time. However, an estimate of the reliability ratio was obtained using two measures on educational attainment. With this estimate of the reliability ratio, the measurement-error-adjusted estimate of the return to schooling in the sample of identical twins indicates that there is at most a slight ability bias in the conventional estimates of the return to schooling. The fundamental assumption of this kind of study is that within-pair differences in educational attainment are randomly determined. This assumption was also tested, but no strong evidence to reject it was found. q 1999 Elsevier Science B.V. All rights reserved. Keywords: Schooling; Reliability ratio; Twins
1. Introduction The primary purpose of this paper is to ascertain whether conventional estimates of the return to schooling in Sweden are biased because ability was omitted from the earnings–schooling relationship. A large sample of twins is used for this study. A fundamental assumption when using data on twins to estimate the return )
Tel.: q46-23-77-85-43; fax: q46-23-77-85-01; E-mail:
[email protected]
0927-5371r99r$ - see front matter q 1999 Elsevier Science B.V. All rights reserved. PII: S 0 9 2 7 - 5 3 7 1 Ž 9 8 . 0 0 0 1 4 - 1
to schooling is that within-pair differences in years of schooling are randomly determined. A secondary purpose of the present paper is, therefore, to check whether this assumption is reasonable. One problem of previous twin studies is that the samples that have usually been employed are rather small and quite special Žcf. Griliches, 1979, who calls samples of siblings ‘‘opportunity samples’’.. Hence, it has been difficult to know whether the results from these studies are transferable to the population at large or, for that matter, to the relevant population of twins. This is not a problem of the sample used in the present study, since this sample is taken from the entire population of twins born in Sweden between 1926 and 1958. Furthermore, it comprises a large proportion of the relevant population. In all, data on 2492 pairs of identical or MZ Žthat is, monozygotic — from the same egg. and 3368 like-sexed pairs of fraternal or DZ Žthat is, dizygotic — from two eggs. twins are used in this paper. MZ twins have the same DNA and are thus genetically identical, while DZ twins are as alike genetically as other siblings and need not even be of the same sex. To my knowledge, this is the largest sample of twins ever used in this kind of study. As pointed out by Griliches Ž1977; 1979., a relevant problem to any twin study is that of measurement error in the explanatory variable; that is, the years of schooling variable. This is due to the fact that when schooling is measured with error, the estimators of the return to schooling that use the within-pair variation in earnings and schooling suffer from a larger inconsistency than do those estimators that use the across-individual variation in earnings and schooling. Thus, an estimate of the return to schooling that is higher when using the across-individual variation in earnings and schooling than that obtained when using the within-pair variation in earnings and schooling, may not be due to an omitted ability variable, but might just as well be explained by measurement error in the years of schooling variable. Hence, estimators that correct for measurement error in the explanatory variables are normally used in twin Žor sibling. studies Žsee Ashenfelter and Krueger, 1994 and Ashenfelter and Zimmerman, 1997.. The paper is organized as follows. The data set is presented in Section 2. The reliability of the years of schooling variable is estimated in Section 3. Econometric models are presented in Section 4 and the econometric results in Section 5. In Section 6, tests of the assumption that any within-pair difference in educational attainment is due to random forces are presented. Section 7 concludes the paper. 2. The data The data on twins are taken from The Swedish Twin Registry, 1 which is a large sample of twins born in Sweden between the years 1886 and 1967. It was 1
For a presentation of this registry, see Cederlof ¨ and Lorich Ž1978..
compiled for the purpose of conducting epidemiological studies. I use the subset of the registry containing like-sexed pairs of MZ and DZ twins born between 1926 and 1958. 2 The 1991 wave of the Swedish Level of Living Survey 3 ŽSLLS. is used to compare the sample of twins with the population at large. The SLLS is a representative sample of the Swedish population. The data in the twin registry was primarily collected from mailed questionnaires, although some information was obtained from registers. Data on earnings ˚ . in 1987, 1990 and 1993 were collected from register information Ž‘‘ARSYS’’ provided by employers and held by Statistics Sweden. The classification of zygosity was made on the basis of the answers to a series of questions in a mailed questionnaire. Information on years of schooling for each individual was imputed from data on educational leÕel and type of education. This information was obtained from a register Ž‘‘Befolkningens utbildning’’., which contains information from administrative records on educational qualifications completed within the regular education system. There are approximately 45 different educational categories in terms of educational level and type of education. The information on education pertains primarily to the year 1990. When data on education in 1990 were missing, the corresponding information pertaining to 1993 was taken. If information on education in 1990, as well as in 1993, was missing the information pertaining to 1987 was taken. The imputations of years of schooling were based on a model estimated on the SLLS data where information on self-reported years of schooling is available. In addition, the SLLS contains self-reported information on educational leÕel and type of education. In other words, the strategy for getting information on years of schooling for the individuals in the twin samples consists of using the SLLS and estimating the model: Si s p Ei q hi , i s 1,2, . . . , N on that data set, where Si is self-reported years of schooling of individual i, Ei is a vector of dummy variables that measure the educational level and type of education and hi is an i.i.d. random disturbance assumed to be uncorrelated with Ei . If the equation does not contain an intercept term, the jth element of the
2 The reason for excluding individuals born after 1958 is that they did not receive the questionnaire from which some of the relevant information was collected. In particular, the information needed to determine zygosity. The questionnaire from which information on zygosity was obtained was sent to all like-sexed twins born in Sweden between 1926 and 1958, where both of the twins were alive in 1970. Information on zygosity was obtained for 77% of this population. 3 ˚ Ž1987.. For a presentation of this data set, see Erikson and Aberg
estimated vector, pˆ , measures the average years of schooling of the individuals in the SLLS observed in category j. 2.1. Defining the data sets
4
To smooth out any possible transitory component in the error term of the earnings equations, I use the average of the logarithms of earnings in the years 1987, 1990 and 1993 and of the explanatory variables that are used in the econometric analysis Žsee Sections 4 and 5 below.. This makes the data less noisy and hence improves upon the precision of the estimates. Furthermore, since the information on earnings is given in terms of annual earnings rather than of the hourly wage rate, pairs where one or both of the individuals have annual earnings of less than 60,000 Skr in either of the years have been excluded. 5 This reduces problems in the analysis caused by labor supply decisions. I have also excluded twins who were not reared together so as to reduce the effect of non-shared environmental factors of the two individuals on the analysis. I have considered twins as not having been reared together if at least one of them claims that they were separated at an age of less than 15 years. 2.2. DescriptiÕe statistics, representatiÕeness and comparisons with other twin data sets In Table 1, descriptive statistics for the sample of MZ and DZ twins of the main analysis are presented along with the corresponding figures for the sample of the population at large Žthe SLLS.. It can be seen from this table that the twins are quite similar to the population at large with respect to the means of all variables included in the table. A t-test at a 5% significance level reveals, however, that the means of the following variables are significantly different in the sample of MZ twins compared to the population at large: predicted years of schooling, age and the number of siblings. Similarly, the means of the following variables are significantly different between the samples of DZ twins and of the population at large: male, age and the number of siblings. It is not surprising that twins have more siblings. The difference between ordinary siblings and twins in the number of siblings Žin this case 0.4 to 0.6. is sometimes taken as a measure of the effect of an unexpected birth on family size. See Bronars and Grogger Ž1994. for more details. Table 2 shows that in terms of within-pair correlations in years of schooling and information on earnings as well as the fraction of twins reported to have the 4 5
The main results of the paper are not sensitive to these definitions of the data sets. In 1990, US$1 was approximately worth 6 Skr.
Table 1 Descriptive statistics, means and standard deviations Earnings Ž100 s Skr. a Predicted Years of Schooling Years of Schooling Male Age Big city b Married a Siblings c Sample Size
MZ twins
DZ twins
SLLS
1689.50 Ž647.48. 11.54 Ž3.06. – 0.52 44.10 Ž7.36. 0.39 0.66 2.73 d Ž1.69. 4984
1673.04 Ž653.08. 11.36 Ž3.03. – 0.58 43.86 Ž7.32. 0.36 0.66 2.92 e Ž1.74. 6736
1664.81 Ž760.47. 11.30 Ž3.13. 11.27 Ž3.51. 0.54 46.82 Ž8.62. 0.36 0.67 2.31f Ž1.99. 2214
a
Annual earnings in the twins data sets are not completely comparable to annual earnings in the SLLS. However, the differences between the definitions of the earnings variables are likely to be small. b Big city refers to individuals living in either Stockholm, Goteborg-Bohuslan ¨ ¨ or Malmo¨ Counties, i.e., counties containing one of Sweden’s three largest cities. c The maximum number of siblings is restricted to 10 both in the twin samples and the SLLS. d The average number of siblings is based on 4692 individuals. e The average number of siblings is based on 6226 individuals. f The average number of siblings is based on 2213 individuals.
same years of schooling, the present sample of Swedish twins is quite similar to the samples used by Ashenfelter and Krueger Ž1994., Miller et al. Ž1995. and Ashenfelter and Rouse Ž1997.. Table 2 also shows that the ‘‘similarity’’ of MZ
Table 2 Comparisons of the Swedish twin data set to other twin data sets S refers to predicted years of schooling in the samples of Swedish twins, to an imputed value for years of schooling in Miller et al. Ž1995. and to self-reported own years of schooling for the samples used by Ashenfelter and Krueger Ž1994. and Ashenfelter and Rouse Ž1997.. Y refers to the average of the logarithm of earnings in three different years in this study; to the logarithm of the hourly wage rate in Ashenfelter and Krueger and Ashenfelter and Rouse and to the annual income in Miller et al. Swedish twins
Ashenfelter and Krueger
Miller al.
Ashenfelter and Rouse
MZ twins Fraction of twins with same S Within-pair correlation in Y Within-pair correlation in S Number of pairs
0.48 0.68 0.76 2492
0.49 0.56 0.66 149
0.56 0.68 0.70 602
– 0.66 0.75 335
DZ twins Fraction of twins with same S Within-pair correlation in Y Within-pair correlation in S Number of pairs
0.30 0.46 0.55 3368
0.43 0.36 0.54 46
0.38 0.32 0.41 568
twins in relation to earnings and schooling is greater than for DZ twins. This might of course be expected since MZ twins are genetically identical and may also be treated more similarly within the family than DZ twins, who are no more alike, genetically, than two ordinary siblings.
3. A reliability estimate of the years of schooling variable A crucial question in any sibling study of the return to schooling is measurement error in the years of schooling variable. Consequently, a discussion of the reliability ratio of the register-based years of schooling variable presented in Section 2 follows. 6 To estimate this ratio, I use a second measure on educational attainment collected from a questionnaire mailed to the twins in 1972. Answers to these questionnaires were received up to 1974. 7 This survey-collected information was also recorded as educational level and type of education. However, it is difficult to achieve two separate imputations for the two lowest educational levels from the survey. 8 A common imputation for these two levels has therefore been made; that is, the imputation model presented in Section 2 was reestimated with a common category for the two shortest educational levels and used to impute years of schooling from the survey information. Thus, the resulting imputations are somewhat cruder in the lower tail of the educational distribution Ž7–9 years. when the survey information on educational attainment was used rather than the register information. In the following, I denote the two imputed years of schooling variables S 74 and 90 S , respectively. The first variable refers to imputed years of schooling from the survey information and the second variable refers to the imputed years of schooling variable using the register information. Furthermore, it should be noted that the two measures were collected approximately 16–17 years apart. This implies: Ž1. that it is likely that some individuals in the sample have truly changed there educational attainment between the two points in time and Ž2. that there might have been some changes in the definitions
6
The reliability ratio is the fraction between the variance in true but unobserved years of schooling and the variance in observed years of schooling. 7 The survey information on educational attainment contains several alternative educational levels and types of education whenever the respondent has an educational level above compulsory school. I have selected the highest attained educational level. In the cases where there have been more than one alternative for type of education on the highest educational level, I have selected the type of education with the highest coding number. 8 The reason for this is a major change in the Swedish educational system ŽGrundskolereformen. and how educational attainment according to the survey measure is recorded.
of the classification scheme between the two points in time to which each of the two measures refer. The reliability ratio of the register information is estimated by using the survey information as follows. First, I estimate the following equation separately for MZ and DZ twins: D S j74 s d D S j90 q D e j where D S j74 is the within-pair difference in years of schooling for pair j according to the survey information and D S j90 is the corresponding within-pair difference according to the register information. As Griliches Ž1979. points out, when there is measurement error in S 90 the probability limit for the OLS estimator of this type of equation is given by: plim d
OLS
ž
sd 1y
se 2 ss 2 Ž 1 y rS 90 .
/
where d is the true parameter, 1 y Ž se 2 .rŽ ss 2 . is the reliability ratio of S 90 and rS 90 is the within-pair correlation in years of schooling according to S 90 . If it is assumed that: Ž1. a classical measurement error applies to S 90 , Ž2. the reliability ratio is the same for MZ and DZ twins and Ž3. d is the same for MZ and DZ twins, the separate MZ and DZ estimates of d and rS 90 and the expression for the probability limit can be used to solve for the reliability ratio. 9 The size of the samples of MZ twins and DZ twins used here are 2373 and 3226, respectively. This approach gives an estimate of the reliability ratio of approximately 0.88. 10 A weak indication of whether this estimate is reasonable is given by the results from an evaluation of the information on educational levels that Statistics Sweden has made of the register information ŽStatistics Sweden, 1997.. My own computations on the background material of this evaluation suggest that a lower bound for 9 The reason for not using a fixed effects by instrumental variables estimator ŽFE by IV., as Ashenfelter and Krueger Ž1994. use, is that the FE by IV estimation results are very different for different age cohorts in the sample of MZ twins. The FE by IV estimates of the return to schooling are, for example, well below the conventional estimates of the return to schooling for older individuals in the sample whereas they are above the conventional estimates for younger individuals in the sample. This is likely due to a considerable variation in the correlation between the within-pair differences in the survey based years of schooling variable and the within-pair differences in the logarithm of annual earnings. The strategy to estimate the reliability ratio that I use does not rely on this correlation and the results obtained when applying this strategy are therefore more robust, in the sense of producing similar estimates of the reliability ratio, when applied to the different age-cohorts in the sample. 10 The estimate of d when regressing D S j74 on D S j90 is 0.296 in the sample of MZ twins. The corresponding estimated coefficient in the sample of DZ twins is 0.424. Even though these coefficients vary considerably between different age-cohorts in the sample, the estimates of the reliability ratio in the different age-cohorts are quite stable.
the fraction of correctly classified individuals with non-missing values on educational level is around 0.85. 11
4. Econometric models Suppose, as in Ashenfelter and Zimmerman Ž1997., that the true relationships between earnings and schooling are: y 1 j s A1 j q b S1 j q ´ 1 j ,
Ž 1a .
y 2 j s A 2 j q b S2 j q ´ 2 j ,
Ž 1b .
where yi j is the logarithm of annual earnings, Si j is years of schooling and A i j is the ability of twin i Ž i s 1, 2. in pair j Ž j s 1, 2, . . . , N .. If the covariance between A and S is different from zero and A i j has been omitted from Eqs. Ž1a. and Ž1b., an estimator of the return to schooling, bˆ , will be biased and inconsistent. Assume also, as in Ashenfelter and Zimmerman Ž1997., that the relationship between ability and schooling is given by: A j s l S1 j q l S2 j q m j ,
Ž 2.
where A j is the common ability of twins 1 and 2 in pair j and m j is a pair-specific random component that is assumed to be uncorrelated with the length of schooling of the two twins. The magnitude of l measures the strength of the correlation between ability and schooling. Inserting Eq. Ž2. into Eqs. Ž1a. and Ž1b. gives the following earnings–schooling relationships for twin 1 and twin 2 in pair j: y 1 j s Ž b q l . S1 j q l S2 j q m j q ´ 1 j ,
Ž 3a .
y 2 j s l S1 j q Ž b q l . S2 j q m j q ´ 2 j .
Ž 3b .
As pointed out by Ashenfelter and Zimmerman Ž1997., one problem of comparing OLS estimates of the return to schooling in Eqs. Ž1a. and Ž1b. Žomitting ability from the equations. and OLS estimates of the return to schooling in Eqs. Ž3a. and Ž3b. is due to measurement error in the years of schooling variable. In the econometric analysis, I therefore use the procedure outlined by Ashenfelter and Zimmerman Ž1997. to adjust the estimates for known magnitudes of the so-called 11
The evaluation was made by first creating a ‘‘true’’ register from other data sources on educational attainment for a sample of individuals from the register that is used here. The information in these two registers was subsequently compared ŽStatistics Sweden, 1997.. However, since it is likely that the ‘‘true’’ register is not completely true, I interpret the resulting estimate of the fraction of correctly classified individuals as a lower bound for the corresponding true fraction.
reliability ratio of the years of schooling variable. That is, the measurement error adjustment is made by estimating the following equations: y 1 j s Ž b q l . a q l b S1 j q l a q Ž b q l . c S2 j q m j q ´ 1 j ,
Ž 3aX .
y 2 j s l a q Ž b q l . b S1 j q Ž b q l . a q l c S2 j q m j q ´ 2 j ,
Ž 3bX .
where cov Ž S1 j ,S2 j .
c as1y
1yr2
, bs
var Ž S1 j . 1yr2
cov Ž S1 j ,S2 j .
c and c s
var Ž S2 j .
c
1yr2
and where, in turn, r is the within-pair correlation in years of schooling, c is the so-called noise-to signal-ratio, 12 covŽ S1 j , S2 j . is the within-pair covariance in years of schooling, varŽ S1 j . and varŽ S2 j . is the variance of years of schooling for those individuals coded as being number one and number two within-each pair, respectively. The noise-to-signal ratio is set exogenously and the sample counterparts to r , covŽ S1 j , S2 j ., varŽ S1 j . and varŽ S2 j . are used when estimating Eqs. Ž3aX . and Ž3bX .. The two equations have been estimated jointly with a SUR estimator ŽZellner, 1962..
5. Results The estimates obtained with the OLS estimator applied to Eqs. Ž1a. and Ž1b., including a set of control variables for the sample of the population at large Žthe SLLS. are presented in column 1 of Table 3. The estimate of the return to schooling is 4.5% in the population at large. Furthermore, columns 2 and 3 of the table show that all of the estimated coefficients in the twin samples resemble those obtained in the population at large. In particular, the estimate of the return to schooling is seen to be 4.6% in the sample of MZ twins and 4.7% in the sample of DZ twins. The estimation results of Eqs. Ž3aX . and Ž3bX .; that is, the equations that use the within-pair variation in earnings and schooling are presented in Table 4. The results in column 1 of the table, which contains the results obtained for MZ twins unadjusted for measurement error, suggest that the conventional estimate of the return to schooling might suffer from a considerable upward bias due to omission of ability from the earnings–schooling relationship. The corresponding results obtained for the DZ twins, found in column 2 of the table, also suggest that the conventional estimates of the return to schooling might suffer from an upward bias when omitting ability from the earnings–schooling relationship. However, the 12
This is one minus the reliability ratio.
Table 3 Simple cross-section OLS estimates Predicted years of schooling around 1990 is used in all of the samples. Standard errors in parentheses.
Intercept Schooling Age Age-squaredr100 Male Married Big city R2 a Individuals
OLS-SLLS
OLS-MZ
OLS-DZ
5.147 Ž0.176. 0.045 Ž0.002. 0.061 Ž0.007. y0.062 Ž0.008. 0.321 Ž0.012. 0.024 Ž0.014. 0.071 Ž0.012. 0.398 2214
4.980 Ž0.119. 0.046 Ž0.001. 0.068 Ž0.005. y0.070 Ž0.006. 0.333 Ž0.007. 0.011 Ž0.008. 0.090 Ž0.007. 0.447 4984
5.182 Ž0.106. 0.047 Ž0.001. 0.058 Ž0.005. y0.059 Ž0.005. 0.339 Ž0.006. 0.019 Ž0.007. 0.079 Ž0.006. 0.440 6736
within-pair estimate of the return to schooling obtained for the DZ twins is not as low as that obtained for the MZ twins. In columns 3 and 4 of Table 4, the estimates have been adjusted for a reliability ratio of 0.95. Here, the results obtained for the MZ twins still suggest an upward bias in the conventional estimates of the return to schooling. However, the corresponding estimate obtained for the DZ twins indicates that the conventional estimate of the return to schooling is only slightly biased upwards. The results presented in column 5, where the estimates for the MZ twins have been adjusted for a reliability ratio of 0.90 also indicate that the conventional estimates are biased upwards. However, the corresponding results obtained for the DZ twins in column 6 indicate that the conventional estimates are below the true estimate of the return to schooling. Columns 7 and 8 display the estimates obtained when using the estimated reliability ratio of Section 3; that is a reliability of around 0.88. The estimate of the return to schooling in the sample of MZ twins is 0.042, which indicates that the conventional estimate is only slightly biased upwards. The corresponding estimate obtained for the DZ twins is 0.053, which is somewhat above the conventional OLS estimate for the DZ twins. Finally, from columns 9 and 10, it is apparent that the measurement-error-adjusted within-pair estimates for the MZ and DZ twins coincide with each other at a return of 6%. This is the case for a reliability ratio of around 0.85. In summary, these results demonstrate that the conclusion about a potential ability bias in conventional estimates of the return to schooling depends crucially on the magnitude of the reliability ratio. Furthermore, at the estimated reliability ratio Žsee Section 3. for the years of schooling variable used here, the measurement-error-adjusted estimate of the return to schooling in the sample of MZ twins indicates that conventional estimates of the return to schooling are only slightly biased upwards.
Table 4 Within-pair estimates of the return to schooling Ž b . and the ‘‘selection effect’’ Ž l. between ability and schooling Standard errors in parentheses. Regressions also include an intercept term and control variables for age, age-squared, sex, marital status, and whether the individual lived in one of the counties that contain one of Sweden’s three largest cities. Reliability ratio 1 MZ
0.95 DZ
MZ
0.90 DZ
MZ
0.88 DZ
MZ
0.85 DZ
MZ
DZ
b 0.022 Ž0.002. 0.039 Ž0.002. 0.027 Ž0.003. 0.044 Ž0.002. 0.037 Ž0.004. 0.050 Ž0.002. 0.042 Ž0.005. 0.053 Ž0.003. 0.060 Ž0.007. 0.060 Ž0.003. l 0.014 Ž0.001. 0.005 Ž0.001. 0.012 Ž0.002. 0.003 Ž0.001. 0.008 Ž0.002. 0.001 Ž0.001. 0.006 Ž0.002. 0.000 Ž0.001. y0.002 Ž0.003. y0.002 Ž0.002.
6. A test of the assumption of equal within-pair abilities Family background is a major source of variation for the total variation in schooling. This can be seen from a simple comparison of the within-pair and across-individual variances in schooling. In the present data set of MZ twins, the former is 4.53, whereas the latter is 9.38. Note also that the difference is likely to be even larger than is suggested by these numbers since the fraction of measurement error in the within-pair variance in schooling is larger than in the total variance in schooling Žcf. Ashenfelter and Rouse, 1997.. The corresponding figures are 8.22 and 9.17 in the sample of DZ twins. These differences might indicate that there is good reason for using the within-pair differences in schooling and earnings to get unbiased estimates of the return to schooling. However, if the assumption of equal abilities is not fulfilled, this approach becomes somewhat problematic. To examine whether this basic assumption of the twins approach is reasonable, Ashenfelter and Rouse Ž1997. compared across-individual correlations between the average years of schooling of twin one and twin two and the corresponding averages of other variables such as marital status, self-employment and job tenure to the within-pair correlations between these variables. The idea was to see if the across-pair correlation between years of schooling and these ‘‘explanatory’’ variables are higher when compared to the corresponding within-pair correlations. If this is the case, it would provide support for the assumption that any within-pair difference in years of schooling is due to random forces. In making a similar analysis for this study, four different variables that describe individual characteristics of the twins are used. Two of these are physiological measures, namely, weight Žin kilograms. and height Žin centimeters. of the individual. 13 The other two measure two dimensions of the individual’s personality: introversionrextroversion and emotionalityrstability. They are measured with short forms of the so-called Eysenck Scales. 14 In the present sample, the information on these variables was collected from a mailed questionnaire sent to the respondents in 1972. Preferably, the differences in these ‘‘explanatory’’ variables should precede the within-pair differences in schooling; otherwise, causation might be going in the other direction. In the following, I therefore use the individuals in the samples who were between 14 and 20 years of age when the information was collected. 13
One reason to focus on physical height is that within-pair differences in height have been shown to be related to within-pair differences in the risk of dying from a heart disease Žsee Vagero ˚ ¨ and Leon, 1994.. 14 According to Engler Ž1991., the introversionrextroversion dimension ‘‘w . . . x reflects the degree to which a person is outgoing and participative in relating to other people’’ and the emotionalityrstability dimension ‘‘w . . . x refers to an individual’s adjustment to the environment and the stability of his or her behavior over time’’ ŽEngler Ž1991., p. 325..
The following equations are estimated: S j s b 0 q b 1 Hj q b 2Wj q ´ j ,
Ž 4a .
D S j s b 1 D Hj q b 2 DWj q D ´ j ,
Ž 4b .
S j s b 3 q b4 I j q b5 E j q ´ j ,
Ž 4c .
D S j s b4 D I j q b5 D E j q D ´ j ,
Ž 4d .
where a ‘‘bar’’ over the variable refers to the within-pair average of that variable, D is the difference operator, S is years of schooling, H is height, W is weight, I is a vector of nine dummy variables measuring emotionalityrstability and E is a vector of nine dummy variables measuring introversionrextroversion. Since each of these variables might have a different effect on the across-pair and within-pair differences in years of schooling for men and for women, I have also estimated Eqs. Ž4a., Ž4b., Ž4c. and Ž4d. separately for men and women. The results from estimating Eq. Ž4a. are presented in columns 1, 2 and 3 of Table 5a. The parameter estimate is positive for height for both MZ twins and for DZ twins. In both cases, the coefficients are significantly different from zero at conventional levels of significance. The parameter estimate for weight is negative and significantly different from zero in both of the twin samples. The corresponding estimates obtained when restricting the samples to men only are higher in absolute terms than they are when restricting the samples to women only. The signs of the coefficients are, however, the same for men and women. Table 5a also shows that the P-values of all the estimated equations imply that the hypothesis that all of the coefficients would be jointly equal to zero is rejected at a significance level of 10%. The within-pair estimates; that is, the estimation results for Eq. Ž4b. are displayed in columns 4, 5 and 6. The estimated coefficients all have the same sign as the estimated coefficients of Eq. Ž4a., apart from the estimated coefficient for weight in the equations estimated separately for women, which is now positive. None of the estimated coefficients are, however, significantly different from zero. The P-values of the F-test also suggest that the hypothesis that all of the coefficients are jointly equal to zero at conventional levels of significance cannot be rejected. In summary, these estimated equations all suggest that these two variables, which both show a fairly strong correlation with years of schooling in the cross-section, do not exhibit as strong a within-pair correlation. Consequently, these results suggest that the assumption of equal within-pair abilities does not seem to be violated. Turning to Eqs. Ž4c. and Ž4d., the results from using the sets of psychological variables one by one will be reported. The reason for doing so is that I think that
Table 5 Standard errors in parentheses. The category of those coded as the most unstable is omitted. Ža. OLS and fixed effects estimates from estimating Eqs. 4a and b, pairs born 1954 – 1958 OLS MenqWomen
OLS Men
OLS Women
FE MenqWomen
FE Men
FE Women
6.072 Ž2.814. 0.053 Ž0.021. y0.047 Ž0.018. 0.026 465
0.991 Ž4.331. 0.088 Ž0.030. y0.070 Ž0.024. 0.007 260
1.641 Ž5.178. 0.066 Ž0.037. y0.001 Ž0.028. 0.063 205
– 0.035 Ž0.031. y0.024 Ž0.024. 0.427 465
– 0.023 Ž0.040. y0.040 Ž0.029. 0.390 260
– 0.046 Ž0.050. 0.006 Ž0.044. 0.590 205
DZ Twins Intercept 3.308 Ž2.275. Height 0.069 Ž0.017. Weight y0.052 Ž0.016. P-value of F-test 0.000 Sample size 654 Standard errors in parentheses.
y2.851 Ž3.180. 0.106 Ž0.023. y0.060 Ž0.020. 0.000 414
2.690 Ž4.605. 0.071 Ž0.032. y0.039 Ž0.030. 0.094 240
– 0.023 Ž0.019. y0.010 Ž0.015. 0.487 654
– 0.036 Ž0.024. y0.026 Ž0.020. 0.301 414
– 0.001 Ž0.032. 0.020 Ž0.024. 0.651 240
MZ Twins Intercept Height Weight P-value of F-test Sample size
Žb. OLS and fixed effects estimates from regressing years of schooling on psyc hological instability, pairs born 1954–1958, MZ twins OLS MenqWomen OLS Men OLS Women FE MenqWomen FE Men
FE Women
Intercept I1 I2 I3 I4 I5 I6 I7 I8 I9
– 0.510 Ž0.553. 0.304 Ž0.590. 0.607 Ž0.615. 0.326 Ž0.661. 0.927 Ž0.672. 0.984 Ž0.671. 0.704 Ž0.723. 0.715 Ž0.786. 3.099 Ž1.143.
11.802 Ž0.285. 0.510 Ž0.426. 0.526 Ž0.436. 0.839 Ž0.478. 0.549 Ž0.529. 0.756 Ž0.659. y0.106 Ž0.653. 1.104 Ž0.886. y0.074 Ž0.982. 0.221 Ž1.581.
11.474 Ž0.364. 0.951 Ž0.563. 0.589 Ž0.610. 0.885 Ž0.656. 1.026 Ž0.827. 1.290 Ž0.928. y0.445 Ž1.190. 0.628 Ž1.690. y0.548 Ž2.370. y3.534 Ž3.738.
12.591 Ž0.472. y0.377 Ž0.661. 0.100 Ž0.638. 0.397 Ž0.712. y0.329 Ž0.708. y0.316 Ž0.934. y0.589 Ž0.801. 0.557 Ž1.032. y0.793 Ž1.067. 0.497 Ž1.660.
– 0.388 Ž0.243. 0.319 Ž0.261. 0.466 Ž0.294. 0.093 Ž0.328. 0.362 Ž0.334. 0.905 Ž0.371. 0.340 Ž0.455. 0.692 Ž0.490. 2.012 Ž0.793.
– 0.319 Ž0.266. 0.398 Ž0.288. 0.441 Ž0.340. 0.001 Ž0.387. y0.004 Ž0.397. 0.931 Ž0.535. y0.353 Ž0.762. 1.781 Ž0.826. 0.191 Ž1.280.
Table 5 Žcontinued. Žb. OLS and fixed effects estimates from regressing years of schooling on psychological instability, pairs born 1954–1958, MZ twins OLS MenqWomen
OLS Men
OLS Women
P-value of F-test 0.777 0.612 0.947 Sample size 485 275 210 Standard errors in parentheses. The category of those coded as the most unstable is omitted.
FE MenqWomen
FE Men
FE Women
0.166 485
0.239 275
0.281 210
Žc. OLS and fixed effects estimates from regressing years of schooling on psychological instability, pairs born 1954–1958, DZ twins OLS MenqWomen
OLS Men
OLS Women
Intercept 11.942 Ž0.249. 12.002 Ž0.287. 12.014 Ž0.529. I1 0.276 Ž0.386. 0.267 Ž0.447. 0.308 Ž0.787. I2 0.496 Ž0.383. 0.139 Ž0.476. 0.883 Ž0.685. I3 0.061 Ž0.414. y0.487 Ž0.541. 0.468 Ž0.701. I4 0.264 Ž0.443. 0.021 Ž0.615. 0.345 Ž0.717. I5 0.147 Ž0.545. 0.746 Ž0.832. y0.388 Ž0.805. I6 0.201 Ž0.529. y0.544 Ž0.783. 0.627 Ž0.792. I7 y0.718 Ž0.663. 0.668 Ž1.111. y1.453 Ž0.898. I8 0.858 Ž0.914. y1.934 Ž1.645. 1.934 Ž1.141. I9 y0.418 Ž1.207. y0.865 Ž2.365. y0.488 Ž1.414. P-value of F-test 0.860 0.865 0.226 Sample size 666 421 245 Standard errors in parentheses. The category of those coded as the most unstable is omitted.
FE MenqWomen
FE Men
FE Women
– 0.314 Ž0.244. 0.761 Ž0.261. 0.967 Ž0.282. 0.873 Ž0.304. 1.263 Ž0.350. 0.630 Ž0.367. 1.206 Ž0.444. 1.739 Ž0.656. 0.384 Ž0.786. 0.003 666
– 0.207 Ž0.278. 0.747 Ž0.302. 1.288 Ž0.338. 0.885 Ž0.380. 1.572 Ž0.482. 0.100 Ž0.530. 1.486 Ž0.714. 2.167 Ž0.907. y0.696 Ž1.325. 0.000 421
– 0.497 Ž0.603. 0.602 Ž0.630. 0.312 Ž0.639. 0.590 Ž0.643. 0.855 Ž0.656. 0.706 Ž0.678. 0.883 Ž0.724. 1.037 Ž1.007. 0.811 Ž1.045. 0.957 245
Žd. OLS and fixed effects estimates from regressing years of schooling on psychological introversionrextroversion, pairs born 1954–1958, MZ twins Intercept E1 E2 E3
OLS MenqWomen 13.590 Ž1.180. y1.669 Ž1.442. y0.915 Ž1.360. y0.519 Ž1.258.
OLS Men 9.462 Ž3.678. 3.186 Ž3.844. 2.861 Ž4.028. 2.738 Ž3.777.
OLS Women 14.255 Ž1.107. y3.470 Ž1.548. y1.263 Ž1.298. y0.380 Ž1.216.
FE MenqWomen – 1.050 Ž0.910. 0.797 Ž0.827. 0.840 Ž0.849.
FE Men – 0.149 Ž1.537. 0.575 Ž1.268. 0.820 Ž1.268.
FE Women – 1.255 Ž1.165. 0.815 Ž1.100. 0.632 Ž1.162.
Table 5 Žcontinued. Žd. OLS and fixed effects estimates from regressing years of schooling on psychological introversionrextroversion, pairs born 1954–1958, MZ twins OLS MenqWomen
OLS Men
OLS Women
FE MenqWomen
E4 y0.975 Ž1.236. 4.492 Ž3.711. y2.715 Ž1.195. 0.842 Ž0.861. E5 y1.619 Ž1.229. 2.620 Ž3.710. y2.264 Ž1.194. 0.526 Ž0.863. E6 y1.362 Ž1.222. 2.339 Ž3.698. y1.362 Ž1.198. 0.687 Ž0.867. E7 y1.435 Ž1.229. 2.130 Ž3.711. y1.477 Ž1.199. 0.475 Ž0.875. E8 y2.184 Ž1.252. 1.753 Ž3.714. y2.381 Ž1.295. 0.338 Ž0.883. E9 y1.653 Ž1.307. 2.603 Ž3.740. y2.489 Ž1.543. 0.303 Ž0.907. P-value of F-test 0.167 0.075 0.022 0.688 Sample size 485 274 211 485 Standard errors in parentheses. The category of those coded as the most introverted is omitted.
FE Men
FE Women
0.725 Ž1.293. 0.747 Ž1.297. 1.057 Ž1.293. 1.002 Ž1.303. 0.501 Ž1.309. 0.691 Ž1.330. 0.780 274
0.671 Ž1.172. 0.092 Ž1.174. y0.172 Ž1.200. y0.641 Ž1.210. y0.122 Ž1.239. y0.447 Ž1.324. 0.093 211
Že. OLS and fixed effects estimates from regressing years of schooling on psychological introversionrextroversion, pairs born 1954–1958, DZ twins OLS MenqWomen
OLS Men
OLS Women
FE MenqWomen
Intercept 11.481 Ž1.001. 12.064 Ž1.766. 11.355 Ž1.168. – E1 1.338 Ž1.204. 0.149 Ž2.072. 1.782 Ž1.426. y0.495 Ž0.774. E2 y0.103 Ž1.105. y0.631 Ž1.893. y0.020 Ž1.325. y0.090 Ž0.691. E3 1.283 Ž1.102. 0.222 Ž1.910. 1.825 Ž1.304. y0.334 Ž0.628. E4 0.776 Ž1.056. y0.024 Ž1.842. 1.188 Ž1.246. y0.294 Ž0.638. E5 0.683 Ž1.060. 0.410 Ž1.805. 0.318 Ž1.333. y0.424 Ž0.632. E6 0.637 Ž1.036. 0.414 Ž1.791. 0.096 Ž1.258. y0.464 Ž0.644. E7 0.320 Ž1.035. y0.902 Ž1.795. 1.767 Ž1.253. y0.397 Ž0.653. E8 0.707 Ž1.065. 0.038 Ž1.832. 0.778 Ž1.321. y0.130 Ž0.662. E9 0.200 Ž1.147. y0.518 Ž1.875. 1.121 Ž1.650. y0.775 Ž0.726. P-value of F-test 0.488 0.286 0.110 0.882 Sample size 663 420 243 663 Standard errors in parentheses. The category of those coded as the most introverted is omitted.
FE Men
FE Women
– y2.711 Ž1.235. y2.389 Ž1.094. y2.858 Ž1.022. y2.702 Ž1.026. y2.888 Ž1.034. y2.822 Ž1.044. y2.894 Ž1.045. y2.552 Ž1.047. y3.280 Ž1.107. 0.264 420
– 1.005 Ž0.928. 1.422 Ž0.851. 1.482 Ž0.742. 1.406 Ž0.772. 1.262 Ž0.743. 1.096 Ž0.765. 1.503 Ž0.806. 1.706 Ž0.850. 1.233 Ž1.040. 0.713 243
there is one quite unexpected result for this specification. This is the one found in Table 5b and c, where the estimation results using the instability of the individuals as the single explanatory variable are presented. Here, the P-values of the F-test are, in general, lower when using the within-pair variation than when using the across-pair variation. This is true for both the MZ sample of both men and women and the corresponding DZ sample. Furthermore, the P-values are lower for the equations when only men are included in the sample, especially for the DZ twins, compared to the results obtained when restricting the samples to contain only women. The results thus indicate that this within-pair correlation is higher than the across-pair correlation. This would seem to be contrary to the assumption that the variation in ability is reduced within families as compared to the variation across families. However, the P-value for the within-pair equation for the MZ sample of both men and women is 0.166. Hence, one would have to accept a fairly high level of significance for not rejecting the hypothesis that all of the estimated parameters are jointly equal to zero in this equation. The results from using the psychological measure of introversionrextroversion are shown in Table 5d and e. Here, the magnitudes of the P-values are lower, in general, when using the across-pair variation than when using the within-pair variation. This indicates that one variable that could explain some of the across-pair variation in schooling could not explain the same within-pair variation. Consequently, this implies that the assumption of equal within-pair abilities is not violated. In summary, one out of these two measures of an individual’s psychological characteristics seems to indicate that the within-pair correlation between the characteristic and years of schooling is stronger than the corresponding across-pair variation. This appears contrary to the assumption that any within-pair difference in years of schooling are purely random. However, the P-value for the MZ twins is 0.166 for the within-pair equation. This does not seem to present strong evidence against the assumption that the within-pair differences in years of schooling is purely random, at least not for MZ twins. The other measure of the individual’s psychological characteristics indicated, however, that the assumption of equal within-pair abilities does not seem to be violated.
7. Conclusions In this paper, the return to schooling in Sweden was reexamined by using a large sample of MZ and DZ twins. Ignoring measurement error, the results indicate that there was a problem of omitting ability from the earnings–schooling relationship, and that this produced estimates of the return to schooling in Sweden that were positively biased. This is accordance with most of the previous twin
studies of the return to schooling Žsee Behrman et al., 1980, Miller et al., 1995 and Ashenfelter and Rouse, 1997.. However, it was demonstrated that the results were sensitive to the magnitude of the reliability ratio of the years of schooling variable. A reliability ratio of the years of schooling variable was estimated by using two measures on educational attainment. The reliability ratio obtained was approximately equal to 0.88. At this estimated reliability, the measurement-error-adjusted estimate of the return to schooling in the sample of MZ twins indicated a slight ability bias of approximately 10% in the conventional estimates of the return to schooling. This result implies that previous reports of estimates of the return to schooling in Sweden Žcf. Bjorklund and Kjellstrom ¨ ¨ Ž1994. and Edin and Holmlund Ž1995.. might be slightly positively biased due to omitting ability from the earnings–schooling relationship. Furthermore, this result resembles those of other recent twin studies of the return to schooling Žsee, for example, Ashenfelter and Rouse, 1997.. The corresponding measurement-error-adjusted estimate obtained in the sample of DZ twins was a little higher than the corresponding conventional OLS estimate of the return to schooling. However, the estimate of a reliability ratio of 0.88 should probably be interpreted cautiously given the large time interval that separates the two measures of educational attainment and the assumptions that underlie the computations used to obtain this estimate. The assumption of equal within-pair abilities was tested by comparing acrosspair and within-pair correlations between years of schooling and two physiological measures of the individual and between years of schooling and two psychological measures of the individual’s personality. In general, the correlations were stronger in the across-pair correlation than in the within-pair correlation. Thus, this supports the assumption that within-pair differences in years of schooling are due to random factors.
Acknowledgements The Swedish Twin Registry is administrated through the Division of Genetic Epidemiology at the Institute of Environmental Medicine, Karolinska Institute and is supported grants from the John D. and Catherine T. MacArthur Foundation, and the Swedish Council for Planning and Coordination of Research ŽFRN.. I have benefited from comments made by Orley Ashenfelter, Anders Bjorklund, Wim ¨ Groot, Colm Harmon, Per Johansson, Alan Krueger, Mikael Lindahl, Erik Mellander, Jan-Eric Nilsson, Nancy Pedersen, Hakan Regner, ˚ ´ participants at a meeting of the European Society for Population Economics ŽESPE. and participants, in particular Kevin Murphy, at the workshop on labor economics held at the University of Uppsala. I am also grateful to Hakan Malmstrom ˚ ¨ at the Swedish Twin Registry for providing the data. The responsibility for any remaining error is, of course, my own.
References Ashenfelter, O., Krueger, A.B., 1994. Estimates of the economic return to schooling from a new sample of twins. American Economic Review 84 Ž5., 1157–1173. Ashenfelter, O., Rouse, C., 1997. Income, schooling and ability: evidence from a new sample of twins. Working Paper 6106. National Bureau of Economic Research, July, 1997. Ashenfelter, O., Zimmerman, D.J., 1997. Estimates of the returns to schooling from sibling data: fathers, sons and brothers. Review of Economics and Statistics, Feb. 1997. Behrman, J., Hrubec, Z., Taubman, P., Wales, T., 1980. Socioeconomic Success: A Study of the Effects of Genetic Endowments, Family Environment, and Schooling. North-Holland, Amsterdam. Bjorklund, A., Kjellstrom, ¨ ¨ C., 1994. Avkastningen pa˚ utbildning i Sverige 1968 till 1991. In: Erikson, R., Jonsson, J.O. ŽEds.. Skola och sortering. Studier av snedrekrytering och utbildningens konsekvenser. Carlssons Forlag, Stockholm. ¨ Bronars, S.G., Grogger, J., 1994. The economic consequences of unwed motherhood: using twin births as a natural experiment. American Economic Review 84 Ž5., 1141–1156. Cederlof, ¨ R., Lorich, U., 1978. The Swedish Twin Registry. In: Nance, W.E., Allen, G., Parisi, P. ŽEds.., Twin Research: Biology and Epidemiology. Alan R. Liss, New York. Edin, P.-A., Holmlund, B., 1995. The Swedish wage structure: the rise and fall of solidarity wage policy? In: Freeman, R.B., Katz, L.F. ŽEds.., Differences and Changes in Wage Structures. The University of Chicago Press, Chicago. Engler, B., 1991. Personality Theories: An Introduction, 3rd edn. Houghton and Mifflin, Boston. ˚ Erikson, R., Aberg, R., 1987. Welfare in Transition — Living Conditions in Sweden 1968–1981. Clarendon Press, Oxford. Griliches, Z., 1977. Estimating the returns to schooling: some econometric problems. Econometrica 45 Ž1., 1–22. Griliches, Z., 1979. Sibling models and data in economics: beginnings of a survey. Journal of Political Economy 87 Ž5., S37–S64, part 2. Miller, P., Mulvey, C., Martin, N., 1995. What do twins studies reveal about the economic returns to education? A comparison of Australian and US findings. American Economic Review 85 Ž3., 586–599. Statistics Sweden, 1997. Kvalitetsdeklaration Utbildningsregistret 1997-01-16. Vagero, ˚ ¨ D., Leon, D., 1994. Ischaemic heart disease and low birth weight: a test of the fetal-origins hypothesis from the Swedish Twin Registry. The Lancet, January 29 1994, pp. 260–263. Zellner, A., 1962. An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias. Journal of the American Statistical Association 57, 348–368.