Socioeconomic classification of countries: A maximum likelihood factor analysis technique

Socioeconomic classification of countries: A maximum likelihood factor analysis technique

SOCIAL SCIENCE RESEARCH 15, 97-112 (1986) Socioeconomic Classification of Countries: A Maximum Likelihood Factor Analysis Technique SOHRAB ABIZADE...

998KB Sizes 2 Downloads 94 Views

SOCIAL

SCIENCE

RESEARCH

15, 97-112 (1986)

Socioeconomic Classification of Countries: A Maximum Likelihood Factor Analysis Technique SOHRAB ABIZADEH Department of Economics, The University of Winnipeg AND ALEXANDER

BASILEVSKY

Department of Statistics, The University of Winnipeg This paper presents an alternative method to classify countries on the basis of preselected socioeconomic variables. It utilizes the maximum likelihood factor analysis model (MLFA) which is deemed to be superior to other techniques utilized so far. Data on 21 variables and 64 countries provide a consistent and meaningful classification of countries. The advantages of MLFA over other popular techniques is also discussed. o 1986 Academic press, hc.

Despite various attempts made in the past several years to group countries according to preselected socioeconomic variables there still does not appear to be a general model for doing so, although diverse techniques have been suggested. Per capita income (or its equivalents), which has been the subject of much heated debate, continues to be the most widely used variable (see Chelliah, 1971: 277). Alternatively the concepts of basic needs fulfillment (BNF) and physical quality of life have led to indices by which countries may be ranked according to their human and social responsiveness. Specifically, the recent development of the physical quality of life index (PQLI) which utilizes demographic variables is a direct consequence of the dissatisfaction with per capita income as the sole criterion. Since its introduction in 1977 (see the Overseas Development Council, 1977) the PQLI has gained wide acceptance as a supplement to income in classifying countries and in defining their level of development (the Overseas Development Council, 1979: 129). The PQLI utilizes three variables-infant mortality, life expectancy, and Send reprint requests to Sohrab Abizadeh, Winnipeg, Winnipeg R3B 2E9, Canada.

Department of Economics, University of

97 0049-089x/86 $3.00 Copyright 0 1986 by Academic Press, Inc. All nights of reproduction in any form reserved.

98

ABIZADEH

AND

BASILEVSKY

literacy rate. The three variables are then aggregated into a final index by assigning equal weight to each variable, countries being ranked on a scale I-100. This, of course, amounts to taking a simple unweighted average of the three demographic variables. Although the PQLI has been used as a supplement to per capita income, it itself is not free of difficulties. For example, it has been pointed out that including both infant mortality and life expectancy amounts to double counting and that since the three variables tend to be highly correlated, any one can serve to classify countries (Larson and Wilford, 1979: 583). In effect, we are therefore back to a single-variable classification. Furthermore, Hicks and Streeten (1979) point out that the term “quality of life” may be a misnomer since what is really being measured is the effectiveness in reducing mortality rates and increasing the literacy rate, both of which are quantitative rather than qualitative variables, much the same as per capita income. Although the PQLI gives valuable insight in addition to income levels as to whether human needs are met, it cannot, in any sense, replace economic variables as a sole criterion for ranking countries on a development-underdevelopment scale. The basic difficulty, therefore, appears to lie in the fact that the very concept of development (or, conversely, underdevelopment) is, theoretically, multidimensional in character. A developed country must first attain a certain level of economic performance which in turn can be used to meet life expectancy, literacy, and other human needs. Conversely, higher literacy rates and the fulfillment of human needs influence economic performance, so the final result is interdependence between economic, demographic, and basic needs criteria. This results in a multicollinear set of variables. Thus the appropriate statistical tool, as Ram (1982) has argued, must theoretically be multivariate in nature. Clearly the multiple regression model cannot serve our purpose due to the high level of multicollinearity and errors in variables, which result in inefficient and biased estimators. On the other hand, the principal components method which Ram (1982) has proposed lends itself poorly to the classification process, although it is an important step in the right direction. Our paper represents an attempt to fill this gap by means of the more general (and theoretically superior) maximum likelihood factor analysis model, which is readily available through most of the well-known major statistical packages. In what follows we discuss in a general fashion the main differences between principal components and maximum likelihood factor analysis (MLFA) and apply the latter model to a set of preselected economic and demographic variables in order to classify countries based on sampled data. THE FACTOR

ANALYSIS

MODEL

The most common and familiar technique that lends itself to country classification is the now well-known principal components method. Al-

CLASSIFICATION

OF COUNTRIES

99

though principal components has an old history in economic index construction (Rhodes, 1937) and has been used by Adelman and Morris (1965, 1967, 1970, 1973) to classify countries, the procedure did not receive sufficient recognition among applied economists concerned with this problem. Recently Ram (1982) has used this technique to compute a composite index of economic development that captures the joint effects of per capita income, basic needs fulfillment, and the physical quality of life. One can correctly argue that the use of more variables makes a particular technique superior to others which rely on a single variable. Given the fact that most socioeconomic variables (particularly time series) tend to be highly intercorrelated and in this sense exhibit certain redundancy, it would follow that an optimum technique is one which can reduce a relatively large set of classificatory variables into a more compact and independent set. This can be done by capitalizing on the high degree of multicolhnearity and by taking into account explicitly the residual error variance. Indeed, as it turns out the limiting factor is not the availability of relevant variables. What is lacking is a general agreement on the part of economists as to a well-developed and readily accessible statistical technique which is able to incorporate as many meaningful variables as possible and which can test theoretical hypotheses (see Morrison, 1%7, for more on testing such hypotheses). Maximum likelihood factor analysis appears to be the only statistical model which meets these requirements. Furthermore, since it includes the already discussed methods as special cases, economists need no longer argue the relative merits of these particular techniques. Similar to principal components (but unlike the least squares regression model), MLFA is designed to reduce the dimensionality of highly correlated variables. However, variables may also contain measurement error, and thus only MLFA can be used in circumstances where least squares regression and principal components break down. Several other important differences exist between MLFA and principal components which render MLFA the more appropriate procedure. First, unlike the purely empirical principal components method, MLFA is designed to embody a welldefined a priori hypothesis, namely that precisely r < k significant factors are present in the data, where k is the number of variables. This makes explicit an important aspect of classification, namely that any form of ranking or grouping is not theory free. Whether we realize it or not, by selecting certain variables to reflect development (or its lack) we are in fact hypothesizing that such variables in fact do define development. In particular, using the PQLI in addition to economic variables is tantamount to assuming that demographic variables contribute an additional dimension to the classification process which economic variables do not. This, of course, may not be so, and the existence of these two (or perhaps more) factors or dimensions must be tested for. The principal components

100

ABIZADEH

AND BASILEVSKY

procedure is not designed for such a purpose, since it allows the analyst to arbitrarily extract any number of components or “factors.” Second, an advantage of the MLFA model is that it allows for sample population testing more readily than principal components and results in indices which are not sample specific. The drawback of sample specificity of principal components is also noted by Ram (1982: 238). Third, MLFA is superior to principal components on two additional technical grounds: it yields better estimates than principal components since it weighs variables differentially depending on their error variances (which are assumed to be uncorrelated), and it produces the same factor loadings and scores when either the covariance or the correlation matrix is used. This is not so with principal components. The first point implies that the classification will be statistically more significant and unaffected by residual error variation. The second point ensures that the classification is free of the effects of the unit of measurement, since identical variables should result in an identical ranking (or classification) and should not depend on whether variables have been standardized beforehand. This is of some importance, since economic and sociodemographic variables are typically measured in different units. Thus, to obtain a meaningful principal components analysis one should employ the correlation matrix. In order to get around this problem, Ram (who employs the covariance matrix) ranks values of per capita GNP so that they lie in the interval l-100 and are thus expressed in the same unit as the physical quality of Life index (PQLI).’ This procedure may result in data loss and inaccuracy, since ranks usually contain less information and also assume equal intervals which is not necessarily the case. When using MLFA the unit of measurement problem disappears since either the covariance or the correlation matrix will result in the same classification, given an identically specified factor model.’ FACTOR ANALYSIS CLASSIFICATION

In order to perform our analysis, data for 21 variables and 64 countries were collected. The details of data sources along with the definitions of the variables are presented in Appendix 1. The variables are selected to reflect three major theoretical criteria or dimensions: industrial development-underdevelopment, high-low income and quality of life, and long-term general economic development and growth. Accordingly, our a priori hypothesis (based on the existing body of economic development ’ Ram (1982) notes that principle components may display some sensitivity to the rescahng procedure. However, his conclusion that such sensitivity should be “small” is not generally warranted. ’ For further detail concerning some of the differences between principal components and MLFA as well as for their statistical properties the reader is referred to Morrison (1967) and Lawley and Maxwell (1971).

CLASSIFICATION

101

OF COUNTRIES

literature) is that variables X2 to X6 which represent growth rates of different economic sectors should load on one common factor, the longterm economic growth dimension, while variables X, to Xk, represent the different sectoral shares of GDP. It is argued that the relative share of these sectors in GDP change as the economy passes through the different levels of development. The more developed the country, the higher is the share of the industrial and manufacturing sector (see Abizadeh, 1979), with the service sector beginning its growth once the country has entered a more highly developed stage. Thus, we should expect these variables to load on a different common factor which represents an industrial development-underdevelopment dimension. Finally, variables X, , X,1, and X,3-X21 are expected to load on yet a different factor, since they all relate to a quality of life dimension. Maximum likelihood factor analysis loadings are presented in Table 1. Although not all data are available for every country, this does not Obliquely Rotated MLFA

TABLE 1 Coordinates (Loadings) for the 21 Classification Variables

Variables

X, Per capita GNP X2 Avg. annual growth (rate) X3 Avg. annual growth in agriculture -&

X6 X7 X8 -53 Xl0 Xl, Xl, XI3 X,4 X,5 X,6 X,7 x18

F,

F2 0.645 0.504

-

-

(rate) Avg. annual growth in industry Avg. annual growth in manufacturing (rate) Avg. annual growth in services (rate) % GDP due to agriculture % GDP due to industry % GDP due to manufacturing % GDP due to services Avg. index food production Value added (manufacturing) Energy consumption per capita Avg. annual pop. growth Urban % of population Literacy rate % in secondary schools % of 20-24 age group in higher education Life expectancy at birth Infant mortality rate Population per physician

Percentage of variance explained

F3 - 0.295 0.632 0.461 0.881

0.343 0.313 0.577 0.582 - 0.736 0.542 0.909 0.7% 0.592

-

0.828

-

0.843 0.252 0.336 0.361 -0.259

-0.866 0.852 0.358 0.450

0.967 -0.963 - 0.597 35.7

Note. Variable coefficients less than .2000 are omitted.

14.2

15.8

102

ABIZADEH

AND

BASILEVSKY

pose a major problem for the MLFA model since missing observations can be replaced by mean values (or another suitable alternative-see Anderson et al., 1983), any discrepancy betweeen the actual (but unobserved) numbers and the overall variable means being relegated to the residual error term. Of course, if a country has a large amount of missing data then it should be deleted entirely. The residual errors are then used to weight the loadings such that more “error’‘-prone variables contribute less to the loading pattern. Consider a set of observed variables X,, X,, . . ., X, and a set of unobserved factors F,, F2, . . . , F, where r < k. Then the factor analysis model is given by X=Fa+e

(1)

where X is the (n x k) data matrix, F is a (n x r) matrix of factor scores, (Y is the (r x k) matrix of loadings, and E is a (n x k) residual matrix of uncorrelated errors. Thus Eq. (1) is of the same form as a multiple regression model except both the coefficients (loadings) (Y as well as the explanatory “variables” (factor scores) F must be estimated from the data. Note that the residual error variances need not be equal. The loadings a! are estimated from the normal equation [A-1/2 s A-112 _ AZ] A-112 aT = 0 (2) where A is the diagonal covariance matrix of the residuals, S is the correlation (or covariance) matrix of the observed variables, and AZ is the diagonal matrix of latent roots. Once the loadings Q are known the factor scores are estimated by FT = (a A-’ (Ye)-’ ab-’ XT.

(3)

Equations (l)-(3) illustrate the major difference between principal components and MLFA-both the loadings and the scores are weighted estimators, the weights being the inverse variances of the residual matrix E. Principal components, on the other hand, assume A = I, the identity matrix. The estimators, obtained from Eqs. (2) and (3) satisfy three important optimality criteria: (i) they are maximum likelihood estimators for multivariate normal data, (ii) they maximize the canonical correlations between the observed variables and the common factors, and (iii) they minimize the distance between the population and the estimated covariance matrices. While property (i) only holds for normal data, the remaining two are valid for most nonnormal data as well, although in this case statistical testing is usually precluded. The first step in the classification is to verify that the three factors do indeed estimate the three dimensions which are hypothesized to exist in the 21 classiftcation variables. This is done in the usual fashion by consulting the factor loadings in Table 1. At this stage the null hypothesis that the

CLASSIFICATION

OF COUNTRIES

103

data matrix contains precisely r factors (in our case r = 3) can be tested by a x2 statistic when data is multivariate normal. Due to the invariance of the MLFA model with respect to the units of measure the 21 variables need not be resealed, and the correlation matrix is sufficient to yield a nontrivially unique solution. A source of trivial nonuniqueness nevertheless remains, in the sense that the loadings of Table 1 (as well as the scores in Tables 2-4) depend on the orientation of the three-dimensional Cartesian coordinates. This arbitrariness is inherent in the mathematical structure of the model and in this sense does not interfere with empirical analysis. It does require, however, that the factor loadings be rotated in such a manner that each axis corresponds maximally to the variable groupings (clusters) which purportedly exist in the preselected data. Such rotations, which are generally oblique, effectively remove the last source of indeterminacy in the factor model and also enhance empirical interpretability. Note that oblique rotations also include orthogonal rotations as a special case and are preferable for our purposes. Also an oblique factor structure can be expected on theoretical grounds, since economic and quality of life dimensions are rarely uncorrelated. For an oblique model, however, the factor loadings no longer correspond to correlations between the variables and the factors. From Table 1 it is evident that the three oblique factors explain a significant portion of the total variance, since F1, F2, and F3 account for 65.70% of the variance of all 21 variables of Table 1. The first factor, F1, which loads (and also correlates)3 highly and positively with life expectancy at birth (X,9), the average literacy rate (X1& the percentage of the 20- to 24-year age group in higher education (X1*), percentage in secondary schools (X17), but negatively with the infant mortality rate (X2& the average annual population growth (X,4), and the number of people per physician (X2i), can be unambiguously defined as estimating the general economic and demographic quality of life. The 64 countries and their scores on factor F1 are given in Table 2 beginning from the lowest values. Factor scores yield continuous measurements of the countries’ attainment vis-a-vis the “quality of life” dimensions. Of course, countries can be classified into a smaller number of groups if this is dessfor example, we see in Table 2 that the tkst 21 countries represent highly “underdeveloped” countries (low quality of life) and include mainly African countries, with a few Asian countries lying in the India-Indonesia regions also being included in this group. The second group (Egypt to Panama) comprises intermediate or “developing” countries, mainly from 3 Note that for oblique factors the loadings no longer correspond to simple correlation coefficients. The former is termed the “pattern” whereas the latter is known as the “structure.”

104

ABIZADEH

AND BASILEVSKY

TABLE 2 Quality of Life Scaling of 64 Countries (F,), Ranked from Lowest to Highest Country 1. Somalia 2. Chad 3. Malawi 4. Senegal 5. Mali 6. Ethiopia 7. Sudan 8. Bangladesh 9. Nigeria 10. Ivory Coast 11. Togo 12. Pakistan 13. Zaire 14. India 1.5. Algeria 16. Tanzania 17. Morocco 18. Indonesia 19. Burma 20. Kenya 21. Zimbabwe 22. Egypt 23. Nicaragua 24. Turkey 25. Peru 26. Tunisia 27. El Salvador 28. Ecuador 29. Dominican Republic 30. Brazil 3 1. Thailand 32. Philippines

Country

F, - 1.838 - 1.829 - 1.764 - 1.719 - 1.600 - 1.593 - 1.307 - 1.305 - 1.297 - 1.296 - 1.247 - 1.159 - 1.077 - 0.973 - 0.861 - 0.833 - 0.785 - 0.732 - 0.673 - 0.669 - 0.637 - 0.580 - 0.472 - 0.399 - 0.391 -0.281 -0.191 -0.128 -0.120 0.031 0.089 0.105

33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 5 1. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64.

F,

Syria Colombia Paraguay Malaysia Sri Lanka Korea Argentina Jamaica Uruguay Panama Portugal Costa Rica Singapore Germany Austria Ireland Italy Hong Kong New Zealand United Kingdom Belgium Australia Switzerland Finland Denmark Canada France Netherlands United States of America Norway Japan Sweden

0.110 0.122 0.212 0.290 0.423 0.561 0.579 0.582 0.607 0.650 0.657 0.743 0.936 0.937 0.983 0.990 0.995 1.019 1.021 1.032 1.038 1.079 1.131 1.131 1.145 1.173 1.187 1.214 1.214 1.216 1.272 1.281

South America, the Caribbean and Asia? while the last group (Portugal to Sweden) represents high-level quality of life countries of Western Europe, North America, Australia-New Zealand, and developed Asian economies such as Singapore, Hong Kong, and Japan. Note that by necessity such a grouping is not always clear-cut, since borderline areas between groups can be arbitrary to a certain extent. This marginal indeterminancy is not due to the MLFA model so much as to the fact that the variables on which the analysis is based are intrinsically continuous. For example, Portugal can be included either in the second or in the 4 We will label the two groups of “underdeveloped” and “developing” “less developed countries” in the subsequent section of this paper.

countries as

CLASSIFICATION

OF COUNTRIES

105

third group, although clearly this country is somewhat different from Scandinavian and Northwestern European countries. The inclusion of such diverse countries into a single group is an outcome of our requirement that the factor scores not merely group countries into discrete categories, but that they provide an ordered linear ranking of these countries; for to do otherwise would be to lose valuable information contained in the observed continuous variables. Of course, the fuzzy nature of groupings based on continua, or the lack of clear separation points, does not necessarily imply that the pursuit of classifying objects is futile or that groupings in fact do not exist. We are all aware of the existence of the color spectrum and of the distinct colors contained therein, although we can never hope to distinguish clearly, by means of a single “cut-off” point, where the color blue, for example, ends and the color purple begins. Such a color spectrum classification, although logically different from that of a “mosaic” type grouping, is nevertheless as real and can provide us with a realistic and useful comparison of geographic and political regions or entities. If broad groupings are undesirable, countries can be split up into a larger number of clusters which can then be plotted as a histogram. Also, the factor scores may be used as further input into a classification/cluster analysis program if a dendrogram-type classification is required, say by constructing a first-difference (21 x 21) distance matrix (see Smil and Kuz, 1979, for such a classification). The second factor F2 (Table 3) represents a purely industrial factor indicating an industrial-agricultural split, with a certain emphasis on urbanism and energy consumption. Because of this, F2 is correlated with the quality of life factor F, (r = .61, see Table 4), although it does represent a logically distinct underdevelopment-development dimension. Thus, economic income or wealth variables such as per capita income (X1), the average rate of growth (X,), the average index of food production (X1,), and urban population (X,,) also corollate, to a lesser extent, with factor F1, indicating that physical quality of life and economic variables cannot be separated in a meaningful way. Note that the scores or the scaling of Table 3 does not produce the same ranking as in Table 2. In Table 3, we still tend to have African, Asian, and South American countries (which are on one end of the spectrum and relatively unindustrialized) with Europe, North America, and certain industry-dependent Asian and South American countries on the other end. The differences in the rankings of Tables 2 and 3 indicate that factor F2 captures the logical independence which exists between pure industrialism and quality of life and income, although these factors cannot be totally distinct. For example, although Singapore and Algeria rank as being the most dependent on industry (Table 3), their ranking (particularly Algeria) on the quality of life dimension (Table 2) is much lower, since

106

ABIZADEH

The Industry-Agriculture Country 1. 2. 3. 4. 5. 6.

TABLE 3 Split Scaling of 64 Countries (F2), Ranked from Lowest to Highest F2

Tanzania Bangladesh Ethiopia Burma Malawi Mali

7. Sudan

8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32.

AND BASILEVSKY

India Zaire Kenya Sri Lanka Pakistan Paraguay Colombia El Salvador Costa Rica Ivory Coast Thailand Philippines Togo Turkey Malaysia Nicaragua Switzerland Indonesia Ireland Hong Kong Senegal Nigeria Egypt Syria Dominican Republic

- 2.691 -2.685 - 2.418 - 2.278 - 1.735 - 1.712 -1.444 - 1.373 - 1.098 - 0.923 -0.880 -0.712 -0.699 - 0.698 -0.646 - 0.478 - 0.407 - 0.346 - 0.324 - 0.295 -0.249 -0.283 -0.130 -0.135 -0.127 - 0.085 - 0.046 - 0.045 - 0.035 0.039 0.112 0.123

33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64.

Country

FZ

Zimbabwe Korea New Zealand Tunisia Chad Somalia Portugal Panama Morocco Brazil Argentina Jamaica Finland Uruguay Ecuador Italy Peru Norway Australia Denmark Netherlands Canada Sweden France United States of America Japan Austria United Kingdom Belgium Germany Algeria Singapore

0.172 0.243 0.389 0.409 0.43 1 0.449 0.453 0.561 0.576 0.590 0.607 0.639 0.695 0.715 0.735 0.821 0.831 0.860 0.898 0.893 0.909 0.919 0.921 0.930 0.961 0.989 0.997 1.063 1.124 1.156 1.343 1.431

TABLE 4 Oblique Factor Correlation

FI F2 F3

Matrix

F,

F2

F3

1.00 .610 - .104

1.00 - .125

1.00

CLASSIFICATION

High-Low

OF COUNTRIES

107

TABLE 5 Rate of Economic Growth Scale of 64 Countries (F3), Ranked from Lowest to Highest

Country 1. Zaire 2. Jamaica 3. Sweden 4. United Kingdom 5. United States of America 6. Denmark I. Senegal 8. Nicaragua 9. Togo 10. Argentina 11. Chad 12. Netherlands 13. Canada 14. Belgium 15. Somalia 16. Peru 17. France 18. Ethiopia 19. Germany 20. UNguay 2 I. Finland 22. Sudan 23. El Salvador 24. Norway 25. Italy 26. New Zealand 27. Australia 28. Mali 29. Austria 30. Panama 31. Switzerland 32. Ireland

Country

F, -1.810 - 1.465 - 1.457 - 1.412 - 1.195 -1.159 - 1.133 - I .092 - 0.937

-0.919 -0.866 -0.842 -0.833 -0.769 -0.700 -0.694 -0.689 -0.666 -0.659 -0.630 -0.592 -0.581 -0.518 -0.466 -0.463 -0.451 -0.451 -0.450 -0.433 -0.362 - 0.334 -0.124

33. 34. 35. 36. 37.

38. 39. 40.

41. 42.

43. 44. 45. 46. 47.

48. 49. 50. 5 1. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64.

Malawi Zimbabwe Tanzania India Japan Burma Sri Lanka Morocco Pakistan Portugal Hong Kong Ivory Coast Turkey Colombia Dominican Republic Costa Rica Bangladesh Nigeria Philippines Kenya Algeria Brazil Singapore Egypt Tunisia Thailand Ecuador Paraguay Malaysia Syria Indonesia Korea

F3 -0.160 -0.091 -0.089 -0.006 0.104

0.109 0.146 0.239 0.239 0.304 0.341 0.448 0.488 0.501 0.515 0.579 0.726 0.760 0.791 0.877 0.938

1.250 1.306 1.309 1.365 1.413 1.524 1.563 1.621 1.785

1.912 2.340

that industrial activity has not yet translated itself into high life expectancy and incomes for these countries.’ Finally, the third factor F3 (Table 5) measures the rate of growth of industry, agriculture, and services (see also Table 1). That is, F3 represents a general economic growth dimension which can be used to predict future development (or underdevelopment). Referring to Table 5, we see that the previous ranking of Table 2 is upset. Many Western European and ’ An inconsistency nonetheless exists in that Chad possesses a much higher score for F2 than FI , which is clearly unrealistic since Chad is certainly not a very highly industrialized country. This inconsistency is due to the relatively large amount of missing data for the

108

ABIZADEH

AND BASILEVSKY

North American countries which exhibited high scores for the quality of life index (Table 2) and industrial development (Table 3), now rank at the bottom in terms of their rate of economic growth, when compared to less developed and low quality of life countries such as Korea, Indonesia, Syria, etc. On the other hand, relatively underdeveloped countries such as Ecuador, Tunisia, and Kenya rank highly on the basis of F3. This confirms our a priori notion that at times the rate of growth of the less developed countries can be significantly higher than for those which are already developed,6 since the more developed the country, the more difficult it could be to maintain high growth rates. The phenomenon seems to be best described by a quadratic curve. At the early dawn of industrialization rates of growth are low, by virtue of the fact that capital equipment is scarce and the labor force is scant and unskilled. Once development is underway, however, policies can be pursued which dramatically increase the rate of growth. This results in relatively developed economies, which in turn makes it more difficult to maintain high growth rates (see Gillis et al., 1983: 11-12). Thus one would expect that those countries falling under the “developing” group should rank high (above 50% of the sample) on the basis of F3. Indeed, such is the case for the majority of those countries. Fourteen countries out of 21 rank very highly with respect to rate of growth, F3 (Tables 2 and 5). Conversely, the group of countries labeled “developed” should rank low on the basis of F3. Here again, 17 out of 22 are ranked very low. Interestingly enough, the underdeveloped countries either fall at the lower end of the F3 spectrum (Zaire) or right after the group of countries labeled developed. Examples of the second group include Mali, Malawi, and Zimbabwe. There are also countries, such as Japan from the group of developed countries, which rank high on the basis of F3. This is expected since Japan is one of the few developed countries which has continued to maintain its momentum of a high rate of economic growth. The consistency of our classification with a priori knowledge is not only dependent on the MLFA model but is also a function of the variables which are selected on the basis of generally accepted theories of economic development. Thus the MLFA model, as any statistical technique, does not replace prior economic experience and theory but in fact confirms industrial/agriculture variables, and which are replaced by overall means. Since the overall mean industrial output of all countries is much higher from that of Chad; the effect of the substitution is to place Chad in the center of the table. Thus a better alternative to employing mean values would be to use only countries similar to Chad and in terms of other variables which correlate highly with industrial output (such as GNP) or else to delete Chad entirely from the analysis. 6 Note that we originally grouped the sample countries into three roughly equal groups using F, in Table 2. The resulting grouping-into (1) underdeveloped, (2) developing, (3) developed-can readily be checked in Table 2.

CLASSIFICATION

OF COUNTRIES

109

and combines these two aspects in a consistent way in order to produce a realistic and theoretically plausible classification. Of course, a different choice of variables may not result in the same classification even though the same statistical model is employed, if these alternative variables measure a completely different set of dimensions. Also, once additional experience is gained using factors to classify countries, the so-called confirmatory model (see Joreskog and Lawley, 1968) could be used in place of Eq. (2), the main difference between them being that the latter imposes a priori zero restrictions on the loadings. Also the factors F,, F2, and F, can be aggregated into a single quality of life-industrialismrate of growth index by using the scores of the three factors weighted by the proportion of variance which they explain. CONCLUSION Although useful to some degree, most attempts in the past decades to classify countries on the basis of socioeconomic variables have not been very successful nor free of criticism and difficulties, although the principal components method seems to have been a step in the right direction. This paper attempts to overcome some of the problems in the former methods, by employing the maximum likelihood factor analysis model. Using 21 variables and 64 countries, selected on the basis of economic development theory, a maximum likelihood factor model is used to estimate the factor scores for each country. Three factors are identified based on the factor loadings and are then used to classify the 64 sample countries. The first factor or dimension, which captures the largest percentage of the variance, is factor F,. It estimates the general level of economic development and the physical quality of life of the sample countries. As such it demonstrates that both economic and demographic welfare variables can be combined into a single dimension due to their high intercorrelation. This is further reinforced by the F, factor scores which provide a country classification that is consistent with our empirical a priori knowledge. The next factor F2 estimates a second dimension of development/ underdevelopment, one which is based on an industrial/agricultural sectional split of the economy. The F2 scores then classify countries on the basis of high industrial-low agricultural activity, which is again consistent with a priori experience. Although these two factors measure logically distinct dimensions of development, they are somewhat correlated, indicating that in practice the general quality of life (F,) tends to go hand in hand, to a certain extent, with extensive industrial activity (F2). Finally, the last factor F3 estimates the general rate of economic growth in industry, services, and agriculture which turns out to be uncorrelated with the first two dimensions. The F3 factor scores therefore classify countries based only on their rate of growth performance and as such

110

ABIZADEH

APPENDIX The variable

X, X2

X*

-&

X7

X,0 Xl,

AND BASILEVSKY

1: DATA SOURCES

AND THE VARIABLES

Definition

Year

Per capita GNP in the United States dollar Average annual growth rate of per capita GNP (percentage) Average annual growth rate in agriculture sector (percentage) Average annual growth rate in industrial sector (percentage) Average annual growth rate in manufacturing sector (percentage) Average annual growth rate in service sector (percentage) Percentage share of agricultural sector in GDP

1981 1960-81

DEFINED Source of data“ Table 1, Basic indicators Table 1, Basic indicators

1970-81

Table 2, Growth of production

1970-81

Table 2, Growth of production

1970-81

Table 2, Growth production

1970-8 1

Table 2, Growth of production

1981

Percentage share of industrial sector in GDP

1981

Percentage share of manufacturing sector in GDP Percentage share of service sector in GDP

1981

Average index of food production per capita (1%971 = 100)

1979-80

1981

Table 3, Structure Production Table 3, Structure Production Table 3, Structure Production Table 3, Structure Production Table 6, Agriculture Food

of of of of and

can be used to pick out those countries which have a high potential for future development. A further advantage of the method is that the relative importance of each factor can be measured in terms of the percentage of total variance which a particular classification explains. Realizing the importance of country classification and its relevance to economic policy decision,’ the MLFA model can serve as a superior technique which is free of some of the statistical deficiencies attributed to the older methods. ’ As Kellman (1976) pointed out, it was the lack of success in designing economic development indicators that was a fundamental problem in development economics when policy questions such as economic aid and loans from world agencies were involved. This paper just provides us with a reliable technique of achieving such an end.

CLASSIFICATION

APPENDIX The variable

l-Continued

Definition Value added in manufacturing (millions of 1975 dollars) Energy consumption per capita (kilograms of coal equivalents) Average annual growth of population (percentage)

X,5 X,6 X,, Xl*

111

OF COUNTRIES

Year

Source of data”

1980

Table 7, Industry

1980

Table 8, Commercial Energy Table 19, Population growth past and projected and hypothetical stationary population Table 22, Urbanization

1970-81

Urban population as percentage of total population Adult literacy rate

1981

Number of people enrolled in secondary schools as percentage of age group Number of people enrolled in higher education as a percentage of population aged 20-24 Life expectancy at birth (years)

1980

Infant mortality rate (aged O-1)

1981

Population per physician

1980

1980

Table 25, Education Table 25, Education

1979

Table 25, Education

1981

Table 23, Indicators related to life expectancy Table 23, Indicators related to life expectancy Table 24, Healthrelated indicators

a Data were collected from World Bank (1983), World Development Report, New York, Oxford University Press. The table numbers cited here refer to those of this publication.

REFERENCES Abizadeh, S. (1979). “Tax ratio and the degree of economic development,” Malayan Economic Review 24(l), 21-34. Adehnan, I., and Morris, C. T. (1965). “Factor analysis of the interrelationships between social and political variables and per capita G.N.P.,” Quarterly Journal of Economics 79, 555-578. Adelman, I., and Morris, C. T. (1967). Society, Politics and Economic Development: A Quantitative Approach, Johns Hopkins Univ. Press, Baltimore. Adelman, I., and Morris, C. T. (1970). “Factor analysis and gross national product: a reply,” Quarterly Journal of Economics 84, 651462.

112

ABIZADEH

AND BASILEVSKY

Adelman, I., and Morris, C. T. (1979). Economic Growth and Social Equity in Developing Countries, Stanford Univ. Press, Stanford, CA. Anderson, A. B., Basilevsky, A., and Hum, P. J. (1983). “Missing data,” in Handbook of Survey Research (P. H. Rossi, J. D. Wright, and A. B. Anderson, Eds.), Chap. 12, pp. 415-494. Academic Press, New York. Chelliah, R. J. (1971). “Trends in taxation in developing countries,” International Monetary Fund Staff Papers 18, 254-331. Gillis, M., Perkins, D. H., Roemer, M., and Snodgrass, D. K. (1983). Economics of Development, Norton, New York. Goreskog, K. G., and Lawley, D. N. (1968). “New methods in maximum likelihood factor analysis,” British Journal Mathematical and Statistical Psychology 21, 85-95. Hick, N., and Streeten, P. (1979). “Indicators of development: the search for a basic needs yardstick,” World Development 7(7), 567-580. Kelman, M. (1976). “The measurement of development effort-a suggestion,” Journal of Development

Studies

12, 429-437.

Larson, D. A., and Wilford, W. T. (1979). “The physical quality of life index: a useful social indicator?,” World Development 7(7), 581-584. Lawley, D. N., and Maxwell, A. E. (1971). Factor Analysis as a Statistical Method, 2nd ed. Butterworth, London. Morrison, D. F. (1%7). Multivariate Statistical Methods, McGraw-Hill, New York. Overseas Development Council (1977). The United States and The World Development: Agenda for 1977, Praeger, New York. Overseas Development Council (ODC) (1979). The United States and World Development Agenda, 1979, ODC, Washington, DC. Ram, R. (1982). “Composite indices of physical quality of life, basic needs fultilment, and income: a principle component representation,” Journal of Development Economics 11, 227-247. Rhodes, E. C. (1937). “The construction of an index of business activity,” Journal of the Royal Statistical Society 100, 18-66. Smil, V., and Kuz, T. (1979). “China: a quantitative comparison of development, 19501970,” Economic Development and Cultural Change 27(4), 653-667. World Bank (1983). World Development Report, Oxford Univ. Press, New York.