Computers, Environment and Urban Systems 59 (2016) 86–94
Contents lists available at ScienceDirect
Computers, Environment and Urban Systems journal homepage: www.elsevier.com/locate/ceus
City size distribution across the OECD: Does the definition of cities matter? Paolo Veneri Regional Policy Division, GOV, Organisation for Economic Co-operation and Development (OECD), 2 Rue André-Pascal, 75016 Paris, France
a r t i c l e
i n f o
Article history: Received 2 September 2014 Received in revised form 29 May 2016 Accepted 29 May 2016 Available online xxxx Keywords: City size distribution Zipf's law Rank–size rule Metropolitan areas
a b s t r a c t This study provides new comparative evidence on city size distribution in OECD (the Organisation for Economic Co-operation and Development) countries, by using consistently defined functional urban areas (FUAs). FUAs are identified by an algorithm based on population density at grid level and commuting patterns and are thought to better approximate economic agglomerations and their internal spatial organisation. Results show that Zipf's law provides a better description of city size distribution when cities are measured in terms of FUAs rather than using traditional administrative definitions. In addition, Zipf's law describes well city size distribution both at country level and wider spatial scales, that is, by continent and for the whole OECD. Finally, the power law hypothesis – of which Zipf's law is a particular case – was not rejected in most of the countries. © 2016 Elsevier Ltd. All rights reserved.
1. Introduction In the context of city size, a power law states that, within a given urban system, the frequency of cities having a certain population varies as a power of the population itself. Under the hypothesis of a Pareto probability distribution, the log(rank)–log(size) relationship is linear and the coefficient of −1 indicates the application of Zipf's law (Zipf, 1949). This law implies that the largest city is twice as large as the second largest city, thrice the third one and so on along the whole urban hierarchy. The relevance of Zipf's law in the context of city size distribution can be explained by at least two facts. The first is related to the desire to understand why activities distributed across space follow a specific pattern. This has been synthesised by Krugman's claim that a very stable regularity is ‘spooky’ and should find a theoretical explanation (Krugman, 1996: 40). The distribution of people and employment across space indicates the allocation of resources in space. Different resource allocations, reflected by the different shapes of the urban system, might have different levels of economic efficiency (Storper, 2013). The second is that the validity of Zipf's law, together with its stability over time, may also set some constraints in the patterns of urban growth. While cities can show very heterogeneous patterns of growth and decline – both in short and long terms – these different trajectories are always likely to respect the overall structure of the urban system, which remains substantially stable (Duranton, 2007). In fact, theories that aim to explain the determinants and patterns of urban growth should respect the constraint of a regular city size distribution.
E-mail address:
[email protected].
http://dx.doi.org/10.1016/j.compenvurbsys.2016.05.007 0198-9715/© 2016 Elsevier Ltd. All rights reserved.
During the last decade, some attempts have been made to provide a theoretical foundation to Zipf's law. Many of these were found on random growth models, hence on the hypothesis that there is no relationship between the growth of cities and their initial size (Gibrat's law). In this respect, Zipf's law can be seen as the steady-state distribution following from Gibrat's law. Gabaix (1999) proposes a model where variations in city size are caused by random amenity shocks. Similar approaches, where city size is modelled with productivity shocks, were proposed by Eeckhout (2004) or Rossi-Hansberg and Wright (2007), while in Duranton (2007), changes in city size were found to be driven by innovation shocks. In a recent study, Lee and Li (2013) developed a model where a Zipfian city size distribution is determined by a combination of many random factors, which may be more or less correlated with one another. From a more static perspective, Behrens, Duranton, and Robert-Nicoud (2014) proposed a model, where differences in city size are explained by small differences in their productivity. The latter is in turn dependent on the talent of residents and the balance between agglomeration economies and congestion costs. Finally, Hsu (2012) considered the location of cities in the geographic space, following a Christallerian approach. He developed a model where a power law in the city size distribution is generated from the presence of scale economies in the production of goods. There is ample empirical literature aimed at testing the validity of Zipf's law. This type of regularity has been tested for cities in various countries, including, among others, China (Song & Zhang, 2002), France (Guérin-Pace, 1995), Germany (Giesen et al., 2010), Greece (Petrakos, Mardakis, & Caraveli, 2000) and the United States (Black & Henderson, 2003; Eeckhout, 2004; González-Val, 2010). Applications of Zipf's law are also observed using data different from city level. For Chinese provinces, for example, Peng and Xia (2014) investigated the
P. Veneri / Computers, Environment and Urban Systems 59 (2016) 86–94
size distribution of exporting and nonexporting firms. Some empirical studies have also argued that a Zipfian rank–size relationship can be found for many distributions, and thus it might be a mere statistical phenomenon not requiring a theoretical explanation (Gan, Li, & Song, 2006). Zipf's law was found sensitive to the definition of cities (Gomez-Lievano, Youn, & Bettencourt, 2012; Rozenfeld, Rybski, Gabaix, & Makse, 2011). As in the case of many spatial analyses, the validation of Zipf's law can be affected by the modifiable areal unit problem (MAUP), a well-known phenomenon that makes results vary in line with the aggregation of data into areal units of different size (Openshaw & Taylor, 1979). According to MAUP, the extent to which city size distribution deviates from Zipf's law might depend on the definition of the units of analysis (cities). In order to overcome the scaling bias caused by the MAUP, it is necessary to use as observation units areas that are very close to those that are relevant for the phenomenon under study. In the context of this analysis, the idea is that the most appropriate definition of cities is an economic and functional definition, rather than an administrative one. Given these premises, the main purpose of this study is to test whether Zipf's law describes well city size distribution across OECD (the Organisation for Economic Co-operation and Development) countries when using consistently identified urban agglomerations rather than administratively defined cities. An accurate definition of cities should make it possible to merge those contiguous geographical units that are economically part of the same cities but, if not merged, would rather be considered as separate units. In this study, the issue of city definition is analysed cautiously. The units of analysis are functional urban areas (FUAs), which are consistently identified across countries, using a methodology recently proposed by OECD (2012). FUAs provide an economic definition of cities, and they are made of cores and commuting zones. Zipf's law was tested using different definitions of cities, such as ‘economic areas’ (Berry & Okulicz-Kozaryn, 2012), ‘metropolitan areas’ (among others, Ioannides & Overman, 2003), or ‘natural cities’ (Jiang & Jia, 2011). On studying the city size statistical distribution of US cities, Berry and Okulicz-Kozaryn (2012) found that once the units of observations are properly defined, the goodness of fit of Zipf's law increases. Functionally defined urban areas were found to better fit to the rank– size rule than administratively defined cities (Cheshire, 1999; Rosen & Resnick, 1980). A meta-analysis carried out by Nitsch (2005) on 29 studies showed that rank–size coefficients are on average significantly higher than 1. However, the same coefficients were found relatively smaller when cities are considered in their metropolitan definition rather than taking only the inner city into account. These results are in line with those obtained in this study through the use of FUAs versus the use of administratively defined cities. Other recent works have compared the validation of Gibrat's law across 40 countries by changing the definition of the units of analysis from municipalities to ‘integrated urban areas’ formed by contiguous built-up areas (Portnov, 2012; Portnov, Reiser, & Schwartz, 2012). Using the latter definition of cities, authors found that city growth was positively correlated with city size. This study provides three major novel contributions. First, it uses a functional definition of cities – where the ‘functional’ attribute approximates an economic definition based on the extension of the labour market – by comparing results of the validation of Zipf's law with those obtained using traditional administrative city boundaries. Second, it makes this type of comparison across OECD countries, where the same economic definition of cities is consistently applied. Most of the existing empirical work on city size distribution was carried out on US cities, while there are not many cross-country analyses. Among these, it is worth mentioning Rosen and Resnick (1980) and Soo (2005), which conducted cross-country comparisons of the rank–size relationship using both administrative cities and functionally defined urban areas. To the best of our knowledge, this study is the first of its kind to attempt to verify the validity of Zipf's law across different countries by using a
87
consistent economic definition of cities and by comparing the results with administratively defined cities. In this way, it provides a contribution for the relevance of the MAUP for the validation of Zipf's law. In addition, the use of FUAs allows the validity of Zipf's law to be tested for urban systems that are wider than those identified by national boundaries. For the whole set of cities in the OECD, Zipf's law fits very well the data for FUAs, suggesting that a functional definition of cities is appropriate to investigate city size distribution. A third novel contribution of this study is that it compares results obtained at the country level with those at continent level and at the level of the whole set of OECD countries. In an era of globalisation, national urban systems could be considered too small, as cities – especially those at the top of the urban hierarchy – are now connected internationally in a global network of socioeconomic relationships. As observed by Jiang, Yin, and Liu (2015), Zipf's law might be a universal law for city size distribution, and failing to observe it might be due to the selection of a too narrow scope of analysis, such as the country boundaries. In addition, some countries have simply very few cities to allow Zipf's law to be observed. The use of FUAs ensures a meaningful comparison of cities belonging to different countries, as the same method of identification is applied to all countries. The rest of this study is organised as follows. Section 2 reviews the empirical literature of Zipf's law, underlying the main findings and open issues. Section 3 describes the data and the method with which the analysis is carried out. Section 4 presents the results on the validation of Zipf's law – including those on testing the functional form of city size distribution – for different OECD countries, and for wider urban systems, such as at continent level and for the whole OECD. Section 5 compares the main results with those obtained using administratively defined cities, while Section 6 provides a conclusion. 2. Zipf's law literature: a reminder A Zipf's distribution is a particular type of Pareto distribution, which can be represented, as in Eq. 1 as follows: y¼
a Sζ
;
ð1Þ
where S is the city size in terms of population; y is the number of cities with population greater than S; and a is a positive constant equal to the population of the largest city in the case where ζ = 1. In this case, the size of a city times the number of cities with larger size (rank) is constant. Zipf's law holds when ζ = 1. Zipf's law can be approximated empirically by a deterministic rule called rank–size rule. This can be identified by log transforming Eq. ((1), obtaining the following linear equation: ln ðyÞ ¼ ln ðaÞ−ζ ln ðSÞ;
ð2Þ
where ζ can be estimated with ordinary least squares (OLS) and it should be close to 1 when Zipf's law holds. Several issues should be considered before attempting to verify the validity of Zipf's law. First, rank–size rule is an approximation of Zipf's law. Hence, the latter can still hold when the rank–size rule is only partially verified (Gabaix & Ioannides, 2004). Second, rank–size rule approximates well Zipf's law when large cities are taken into account, but not necessarily when including all cities. In the latter case, it was found that the city size distribution follows a lognormal distribution rather than a Pareto one (Eeckhout, 2004; Parr & Suzuki, 1973). This can be at the basis of the high sensitivity of ζ to the minimum city size threshold in the data (Fazio & Modica, 2015). In an analysis of the United States, González-Val (2010) found that Zipf's law holds only if the sample is sufficiently restricted at the top. Using un-truncated city size data in eight developed countries, Giesen and Südekum (2011) found that the double Pareto lognormal distribution is that with the
88
P. Veneri / Computers, Environment and Urban Systems 59 (2016) 86–94
best fit with data. Similar results have been obtained recently by González-Val, Ramos, Sanz, and Vera-Cabello (2015). A third point to be underlined is related to the OLS properties to estimate ζ. As argued by Gabaix and Ioannides (2004), OLS estimation of ζ is likely to be downward-biased, especially for small samples, as the largest cities will appear too big. In addition, OLS standard errors can be also underestimated and, as a consequence, Zipf's law can be rejected too often. Gabaix and Ibragimov (2011) proposed a simple way to overcome this problem within an OLS estimation framework by running log(y − 1/2) = a − ζ log(S). They also propose to substitute the standard error with the asymptotical one, equal to (2/n)1/2 ζ. Finally, the validation of Zipf's law should not rely on the statistical acceptance or rejection of the hypothesis that ζ = 1. As already argued, this rule cannot be taken too strictly, as rank–size rule is an approximation of Zipf's law behind it. Hence, as Gabaix and Ioannides (2004) suggested, the empirical debate on Zipf's law should focus more on to what extent such a law fits well data rather than its pure statistical acceptance of rejection. In this respect, an estimated value of ζ in a range close to 1 – between 0.8 and 1.2 – indicates that Zipf's law holds in describing the city size distribution in an urban system. For the sake of robustness, however, this study also presents the t-test with the hypothesis that ζ = 1. 3. Data 3.1. Units of analysis Carrying out international comparison requires the use of comparable units of analysis. It is well known that different countries have different definitions of cities, which can in turn make comparisons across countries meaningless. In order to overcome this problem, OECD (2012) has recently proposed a unique methodology to identify FUAs in different countries, which has been applied for 29 countries. FUAs are composed by a core and commuting zone, consistently with most of the algorithms that are used for the same aim (Cheshire & Hay, 1989). The OECD's methodology applied in this study starts with the identification of the urban core/s of each area. The cores are identified using residential net density thresholds for each 1-km2 cell of a regular grid structure. For European countries, population grid data are provided by the Joint Research Centre for the European Environmental Agency (EEA), while for all the other countries, harmonised gridded population data are provided by Landscan.1 More specifically, all cells with a population of at least 1500 inhabitants have been selected as urban core cells.2 Then, the final identification of the urban core is made by aggregating all contiguous local administrative units (LAUs) – such as wards, municipalities and census tracts – whose share of area covered by urban core cells is N50% and whose total population is N50,000 inhabitants.3 The second step consists in verifying whether two or more cores are parts of a single polycentric metropolitan area, rather than considering each core separately. This approach makes it possible to detect those regions whose spatial structure is more complex than the traditional monocentric one composed by a single urban core and surrounding hinterland. Hence, two or more cores are considered as part of the same FUA if at least 15% of resident population in one core commute to the other core. Data on commuting flows at the municipal level are needed to determine the relationships among different urban cores. 1
Source: http://www.ornl.gov/sci/landscan/landscan_references.shtml. There are exceptions to this threshold for non-European countries such as Australia, the United States, and Canada. Please see OECD (2012) for more details. 3 Although there is a minimum threshold of 50,000 inhabitants for the identification of the urban core, it is possible that the total population of the FUA is slightly lower than that amount. This might be either because b50% of municipal area is covered by urban core cells (so that a municipality is excluded) or because of differences in population data between the census and gridded population data. 2
The same data are used for the third and last step of the procedure, which is aimed at identifying the areas of influence of the cores (commuting zone). In this respect, all municipalities whose shares of resident population that travels daily to the core exceed 15% are considered as comprising the commuting zone of the FUA. This threshold can be seen as arbitrary in certain respects, but is consistent with that used by other official country-based methodologies and followed a sensitivity analysis. A full description of this methodology with results can be found in OECD (2012). Table 1 summarises the number of identified FUAs as well as their basic population descriptive statistics for each of the 29 OECD countries that have been included in this analysis. The source of population data for all FUAs is 2001 (or closest year available) population census of the respective countries. Population censuses provide precise information about resident population at municipal level, and subsequently aggregated at the FUA level. The whole data set includes 1179 FUAs of different size, ranging from 40,000 to 33 million inhabitants (Tokyo, Japan). The number of observations made by a country is also very diverse, reflecting the size of each country and ranging from single observation in the case of Luxembourg to 262 units in the case of the United States. Comparison between FUAs and administratively defined cities is provided in the Appendix (Table A1). On average, FUAs have more than double (2.2 times) the population of cities as defined administratively. 4. Rank–size regularities at different spatial levels 4.1. City size distribution by country The extent to which Zipf's law fits well data on city size distribution at country level is verified, consistently with most previous studies, only for countries with at least 20 cities (Giesen & Südekum, 2011). Table 2 reports the OLS-estimated ζ coefficients by applying both the classic rank–size equation and the one proposed by Gabaix and Ibragimov (2011) (G–I), where rank is in the ln(rank-1/2) form. The table also provides the squared-R and the t-test under the null hypothesis that the estimated ζ is equal to 1, hence that Zipf's law holds. It is clear from the results that the estimated ζ is close to 1 using both the traditional and G–I Zipf's equations. However, while the t-test does not reject the hypothesis that ζ = 1 only in three cases out of 12, the same hypothesis is never rejected using the G–I correction. Gabaix and Ioannides (2004: 2350) suggested that Zipf's law should not be tested, but just estimated, because it should be evaluated for its capacity to fit data well. Zipf plots in Fig. 1 further support the hypothesis that Zipf's law well describes the rank–size relationship of urban systems in OECD countries. In the case of Japan, a slight convexity in the rank–size relationship seems to be in place, an issue noted in the literature for the world's largest cities (Ettlinger & Archer, 1987). Fig. 1 also shows that the accuracy of predicting city size for the largest cities is different across countries. Sometimes the regression underpredicts the population of the largest city (France, Japan, UK), but in other cases an overprediction is observed (Germany, Italy, Poland, USA), while in yet other cases, the prediction is fairly accurate (Spain, Mexico, Korea, the Netherlands). The issue of the deviation of the largest cities from the rank–size-fitted relationship has been recently analysed by Portnov (2011). The author found that the population size gap between the first and the second largest cities is correlated positively with the status of capital of the largest city and negatively with the country's stage of development. In summary, results reported in Table 2 and Fig. 1 show that Zipf's law fits well with data of city size distribution at country level when using FUAs. Differences across countries in the magnitude of the estimated coefficients still exist, although they should not be considered necessarily as deviations from Zipf's law (the t-test under the G–I specification never rejects the hypothesis that ζ = 1). The factors underlying such differences have not yet been investigated extensively. In a meta-
P. Veneri / Computers, Environment and Urban Systems 59 (2016) 86–94
89
Table 1 Population in OECD functional urban areas: basic statistics by country, 2001. Source: Elaboration based on OECD (2012). Country
No. of FUAs
Average population
Std. dev.
Minimum population
Maximum population
Country population (thousands)
Austria Belgium Canada Chile The Czech Republic Denmark Estonia Finland France Germany Greece Hungary Ireland Italy Japan Korea Luxembourg Mexico The Netherlands Norway Poland Portugal The Slovak Republic Slovenia Spain Sweden Switzerland UK USA
6 11 34 26 16 4 3 7 83 109 9 10 5 74 76 45 1 77 35 6 58 13 8 2 76 12 10 101 262
742,213 552,942 635,658 438,978 294,148 727,887 248,685 372,637 457,885 481,033 621,646 497,272 418,651 395,804 1,262,851 864,218 388,217 748,054 333,772 338,455 362,511 426,444 247,154 381,648 358,422 392,312 409,945 425,160 725,646
850,919 633,100 1,103,443 1,138,009 405,105 794,685 247,259 440,979 1,195,887 628,987 1,175,550 808,262 561,272 727,784 4,179,645 3,007,069
243,858 118,732 75,385 49,503 72,858 269,774 73,275 115,903 80,123 78,946 70,006 134,433 90,743 50,190 125,814 45,262 388,217 117,829 59,569 72,758 65,175 63,470 113,259 231,250 47,652 96,883 116,145 82,384 40,373
2,454,241 2,273,476 5,450,470 5,929,563 1,682,032 1,915,285 531,481 1,356,482 10,900,000 4,334,215 3,671,587 2,790,878 1,413,073 3,867,226 32,700,000 20,100,000 388,217 17,200,000 2,175,368 1,073,554 2,881,670 2,650,467 689,848 532,045 5,533,488 1,838,377 1,114,737 10,600,000 16,100,000
8042 10,281 31,019 15,572 10,224 5357 1367 5188 61,163 82,340 10,950 10,188 3864 56,977 127,132 47,357 442 102,122 16,043 4513 38,248 10,293 5380 1992 40,721 8896 7285 59,113 285,225
1,984,947 416,115 371,323 536,363 740,392 196,904 212,694 740,752 506,827 333,676 1,063,771 1,688,834
analysis on 1669 estimates derived from 59 studies, Mulder (2014) shows that economic and institutional factors do not explain much of the variation in Zipf coefficients, while time and continent do. The interpretation of this result is that history and geography matters, although this might also depend on issues of spatial definition of the boundaries of the urban systems, which is addressed in the following paragraphs. 4.2. City size distribution in supranational urban systems Although empirical analysis (Cheshire & Magrini, 2009) shows that – even in integrated markets such as Europe – national boundaries
Table 2 Results of OLS regression for Eq. ((2) and its corrected version (G–I). Source: Elaborations on OECD data. Country
Canada Chile France Germany Italy Japan Korea Mexico The Netherlands Poland Spain UK USA * ** ***
p b .1. p b .05. p b .01.
Rank–size
(Rank-1/2)–size
ζ coeff. (rank–size)
sq∙R
t-test ζ = −1
ζ coeff. (G–I)
sq∙R
t-test ζ = −1
−0.798 −0.832 −1.128 −1.106 −1.019 −0.957 −0.715 −0.999 −1.067 −1.008 −0.997 −1.215 −0.864
0.99 0.94 0.97 0.96 0.97 0.93 0.99 0.95 0.98 0.99 0.98 0.96 0.97
169.12*** 14.55*** 33.47*** 26.59*** 0.71 1.9 467.37*** 0 5.46** 0.26 0.03 73.66*** 183.6***
−0.887 −0.962 −1.209 −1.161 −1.092 −1.035 −0.788 −1.072 −1.187 −1.087 −1.068 −1.292 −0.888
0.97 0.96 0.97 0.95 0.96 0.94 0.99 0.95 0.96 0.97 0.97 0.97 0.96
−0.541 −0.144 1.114 1.024 0.511 0.210 −1.273 0.418 0.660 0.433 0.391 1.606 −1.449
significantly affect economic adjustments and spatial disparities, the Zipfian shape of city size distribution is not necessarily confined at country level. Gibrat's law, and therefore Zipf's law, tends to hold at different spatial scales. Giesen and Südekum (2011) found that Zipf's law fits data on German city size distribution at both national and regional levels. By using the OECD definition of FUAs, this study provides some evidence on the city size distribution for aggregation of countries. More specifically, OECD countries are aggregated by continent as well as at the level of the whole urban system of the OECD, to check whether Zipf's law still fits well data on developed countries. Fig. 2 shows the Zipf plots for Europe, America, Asia and the whole sample of OECD countries. With the exception of Asia – Japan and Korea – all the plots confirm an almost perfect linear relationship in most part of the distribution. The estimated coefficients are reported in Table 3, which shows that, even when aggregating countries by continent or at the OECD level, Zipf's law fits data well. When considering all OECD cities, the hypothesis of a coefficient exactly equal to − 1 cannot be rejected. When the G–I correction is used, the statistical validity of Zipf's law is accepted also for the aggregation of American and Asian countries. Interestingly, when all cities are considered as part of an integrated urban system – which seems to make sense for such developed and economically integrated countries as those considered in this study –Zipf's coefficient is almost exactly equal to − 1. In other words, this suggests that the validity of Zipf's law holds as a universal law – thus valid for urban systems of different scopes. The coefficient associated to city size ranges from a minimum of 0.86 for Asia to a maximum of 1.14 for Europe. Consistent with the argument of ESPON (2005), Europe emerged as the most polycentric continent, while Asia has the highest urban primacy, where the largest cities dominate the whole urban system. The extent of polycentricity of the urban system is measured by the magnitude of the coefficient associated to the logarithm of city size: the higher the coefficient in absolute value, the more balanced the spatial structure of the urban system (Brezzi & Veneri, 2015; Meijers & Burger, 2010).
90
P. Veneri / Computers, Environment and Urban Systems 59 (2016) 86–94
Fig. 1. Zipf plot for 12 OECD countries using consistently identified functional areas Source: Elaboration on OECD data.
4.3. Pareto vs. lognormal: testing city size distribution The fact that Zipf's law fits the urban population data should imply that the size of cities follows a Pareto distribution. As specified in Introduction, however, several analyses have found that a lognormal distribution better describes city size data, especially when all cities are included in the analysis. Two simple tests are conducted in this study to provide some evidence on the city size distribution using the FUAs and in turn to strengthen the results on Zipf's law presented in the previous paragraphs. The first test consists in checking whether the hypothesis of a lognormal distribution is rejected. According to Soo (2012), this test can be performed by running the standard Shapiro– Wilk and Shapiro–Francia tests for normality on the log of city population. Table 4 shows the results of both tests for all countries with at least 20 cities, as well as by continent and for the whole OECD urban system. The hypothesis of a lognormal distribution was rejected in all cases at 95% confidence level. In addition, the z-statistics were significantly higher when the city size distribution was tested at the continent level and for the whole OECD, with respect to those obtained by looking at national urban systems. A second test was performed to check whether a power law describes the city size distribution. This test was proposed by Gabaix (1999) to check for quadratic deviations in the rank–size equation, as specified in Eq. (3), which embeds the −1/2 correction of Gabaix and Ibragimov (2011): 2
ln ði−1=2Þ ¼ ln a−ζ ln Si þ qð ln Si −γÞ
ð3Þ
where γ ≡ (cov((ln Si)2, ln Si)) / (2var(ln Si)); i is the rank of cities; and Si is their population size. The parameters ζ and ɣ are to be estimated by OLS. The main purpose of the use of this test is that a high value of |q| indicates a deviation from power law distribution. The test rejects the hypothesis of a power law if |q| N qc, where qc = 2.57 ζ2/(2N)1/2, with N being the number of cities and 2.57 the maximum value of a standard normal distribution at 99% confidence level (Gabaix, 1999). Columns 6–8 of Table 4 show the results of the power law test. At country level, the hypothesis of pure power law behaviour in the city size distribution is rejected only for Mexico, Poland and Spain, while in all the other cases, the power law is not rejected. At wider spatial scales, the power law distribution is not rejected for European and American cities taken together. This does not happen for the whole set of OECD and Asian cities. These results further support the idea that Zipf's law generally fits data well and that this happens beyond the boundaries of national urban systems. 5. Does the city definition matter? This section verifies whether there is substantial difference in the shape of city size distribution when using comparable definitions of cities based on economic functions rather than on administrative boundaries. The latter have been compared with FUAs in the OECD countries included in this analysis, with the exception of Canada, the United States, the United Kingdom, Japan and Korea (see Appendix Table A2 in the Appendix for descriptive statistics on urban population when cities are administratively defined). These countries were excluded because determining the proper administrative boundaries at city level
P. Veneri / Computers, Environment and Urban Systems 59 (2016) 86–94
91
Fig. 2. Zipf plots for Europe, America, Asia and OECD functional urban areas. Source: Elaborations on OECD data.
was not obvious and it needed some discretionary choices. By considering the remaining countries, the first main evidence is that, even using administrative definitions of cities, Zipf's law approximates the actual city size distribution. This is true for both countries and larger geographical domains. In fact, the estimated coefficients of Eq. (2) for administrative cities are always included between 0.8 and 1.2 (Table 5). Again, the statistical validity of Zipf's law is more likely not to be rejected using the G–I correction. A closer look at the estimates reported in Table 5 helps highlight some of the differences that emerge when using different city definitions. First, when FUAs are used, the estimated coefficients are always closer to − 1 with respect to the case where administrative units are used. This holds when city size distribution is observed both within countries and within continents. The following two factors might be at the basis of the better fit of Zipf's law when FUAs are considered. First, the actual size of cities should be better identified when economic self-organisation is taken into account, for example, through commuting flows, rather than looking at the administrative boundaries. Second, when city boundaries are consistently identified through the same functional approach, a higher comparability is ensured, as local administrative boundaries can be very different across countries. For example, the
Table 3 OLS results for Eq. ((2) and its corrected version (G–I) by continent. Source: Elaborations on OECD data. Rank–size
America Asia Europe OECD (29) * ** ***
p b .1. p b .05. p b .01.
(Rank-1/2)–size
ζ coeff. (rank–size)
Sq∙R
t-test ζ = −1
ζ coeff. (G–I)
sq∙R
t-test ζ = −1
−0.903 −0.820 −1.127 −1.005
0.97 0.93 0.97 0.97
129.98*** 75.06*** 274.63*** 1.07
−0.921 −0.864 −1.143 −1.015
0.96 0.93 0.97 0.97
1.21 1.22 2.27** 0.35
average size of municipal boundaries ranges from b 2000 inhabitants for France, the Czech Republic and the Slovak Republic to N50,000 inhabitants for Denmark, Lithuania and the United Kingdom. Another difference in using administrative rather than functionally defined city boundaries is visible in Fig. 3. In most of the cases, it emerges that a linear relationship between log(rank) and log(size) of cities fits better by using a functional definition of cities rather than an administrative one. This is already visible by the average higher squared-R reported in Table 5. In summary, a functional definition of cities yields the shape of the city size distribution closer to − 1 and it
Table 4 Results of lognormality and power law tests for city size distribution in OECD countries (Functional urban areas). Country
(1)
Canada Chile France Germany Italy Japan Korea Mexico The Netherlands Poland Spain UK USA America Europe Asia OECD
Lognormality test
Power law test (Gabaix, 2009)
Shapiro–Wilk test
Shapiro–Francia test
|q|
qc
0 If rejecting power law; 1 otherwise
(2)
(4)
(6)
(7)
(8)
0.100 0.041 0.016 0.219 0.037 0.035 0.021 0.062 0.177 0.125 0.081 0.023 0.116 0.117 0.056 0.150 0.106
0.943 0.404 0.185 0.606 0.108 0.361 0.597 0.046 0.676 0.005 0.033 0.139 0.549 0.262 0.408 0.029 0.024
1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 0 0
(3)
(5)
z
p Value
z
p Value
2.55 3.16 4.57 3.49 4.61 5.36 3.41 3.84 1.70 3.50 3.71 4.81 5.86 6.79 8.26 4.58 9.81
0.005 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.045 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
2.39 3.17 4.37 3.25 4.31 5.05 3.30 3.70 1.77 3.25 3.56 4.59 5.45 6.33 7.74 4.36 9.25
0.008 0.001 0.000 0.001 0.000 0.000 0.000 0.000 0.038 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000
92
P. Veneri / Computers, Environment and Urban Systems 59 (2016) 86–94
Table 5 OLS results for Eq. ((2) by type of city definition. Source: Elaborations on OECD data. FUA
Chile France Germany Italy Mexico The Netherlands Poland Spain OECD (22) America Europe France Germany
Administrative city
Beta coeff. (rank–size)
sq∙R
t-test β = −1
Beta coeff. (rank–size)
sq∙R
t-test β = −1
−0.832 −1.128 −1.106 −1.019 −1.123 −1.067 −1.008 −0.997 −1.098 −1.019 −1.106 −1.128 −1.106
0.94 0.97 0.96 0.97 0.93 0.98 0.99 0.98 0.96 0.94 0.97 0.97 0.96
14.55*** 33.47*** 26.59*** 0.71 12.29*** 5.46** 0.26 0.03 131.01*** 0.52 148.98*** 33.47*** 26.59***
−1.563 −1.142 −1.107 −1.114 −1.149 −1.345 −1.169 −1.090 −1.15 −1.17 −1.19 −1.14 −1.11
0.80 0.97 0.97 0.99 0.88 0.95 0.97 0.98 0.95 0.90 0.97 0.97 0.97
12.64*** 34.61*** 32.33** 70.18*** 8.99*** 39.02*** 41.57*** 20.37*** 212.86*** 19.95*** 419.9*** 34.61*** 32.33***
FUA
Chile France Germany Italy Mexico The Netherlands Poland Spain OECD (22) America Europe France Germany * ** ***
Administrative city
Beta coeff. (Rank-1/2)–size
sq∙R
t-test β = −1
Beta coeff. (Rank-1/2)–size
sq∙R
t-test β = −1
−0.962 −1.209 −1.161 −1.092 −1.072 −1.187 −1.087 −1.068 −1.110 −1.070 −1.120 −1.100 −1.080
0.96 0.97 0.95 0.96 0.95 0.96 0.97 0.97 0.96 0.93 0.96 0.81 0.82
−0.144 1.114 1.024 0.511 0.418 0.660 0.433 0.391 1.83* 0.49 1.81* 0.58 0.56
−1.74 −1.22 −1.17 −1.19 −1.21 −1.50 −1.26 1.17 −1.17 −1.23 −1.21 −1.22 −1.17
0.76 0.96 0.96 0.99 0.85 0.94 0.95 0.97 0.94 0.88 0.96 0.96 0.96
1.53 1.16 1.05 0.99 1.08 1.39 1.10 0.88 2.56** 1.33 2.89*** 1.16 1.05
p b .1. p b .05. p b .01.
Fig. 3. Zipf plot: administrative vs. functional definition of city.
P. Veneri / Computers, Environment and Urban Systems 59 (2016) 86–94
93
Table 6 Results of lognormality and power law tests for city size distribution in OECD countries (administrative cities). Country
Lognormality test
Power law test (Gabaix, 1999)
Shapiro–Wilk test
Chile France Germany Italy Mexico The Netherlands Poland Spain America (Chile and Mexico) Europe
Shapiro–Francia test
z
p Value
z
p Value
−1.07 3.85 4.16 4.43 0.94 2.29 4.25 3.76 1.37 7.55
0.858 0.000 0.000 0.000 0.174 0.011 0.000 0.000 0.085 0.000
−1.24 3.71 3.90 4.15 0.82 2.33 3.88 3.57 1.18 7.06
0.893 0.000 0.000 0.000 0.207 0.010 0.000 0.000 0.118 0.000
increases, on average, the goodness of fit of Zipf's law both at country level and wider geographical scales. The two tests proposed in the previous section were performed to investigate whether the size of administratively defined cities is distributed according to a Pareto or lognormal distribution. Compared with results obtained for FUAs, the hypothesis of lognormal distribution cannot be rejected in the case of Chile and Mexico, while the same hypothesis is rejected in all the other countries. Regarding the second test, it is found that three out of eight countries rejected the hypothesis of a Pareto distribution. As for city size distributions at continent level, the two tests do not allow either of the two distributions to be rejected in the case of Mexican and Chilean administrative cities taken together. Both distributions are instead rejected for Europe. On the whole, a power law seems to better describe the city size distribution than a lognormal function does (Table 6).
6. Concluding remarks This study provides new statistical evidence on the shape of the city size distribution in OECD countries. Such evidence is based on a consistent definition of the units of analysis, which are identified through functional criteria, and allow accounting for a robust definition of the actual economic size of cities. The main result shown by this study is that Zipf's law fits well with data on city size distribution and that such fit increases when using functionally defined urban areas rather than administratively defined cities. This finding supports the idea that part of the deviation from a Zipfian coefficient might depend on the units of analysis. This reminds the relevance of the long-standing debate on the MAUP when investigating spatial phenomena. In addition to the importance of considering a sound spatial definition of cities, this study shows that also the boundaries of the urban system can affect the fit of Zipf's law. Country boundaries might represent an artificially small scope of analysis for a universal law on the size distribution of cities, which are increasingly and strongly connected in many respects (Jiang et al., 2015). The fact that considering the whole set of OECD cities in a single integrated urban system yields a very good fit of Zipf's law – and a coefficient almost exactly equal to − 1 – provides support to this idea. A possible step ahead is to look at whether the shape of the city size distribution has changed over time, in a long time horizon. This would require an adjustment of city boundaries in the different points in time, which is not an easy task if the actual economic size of cities is to be taken into account. Extending this analysis over time would also make it possible to investigate whether the empirical regularity of the city size distribution (Zipf's law) is associated with the independency of city growth from city size (Gibrat's law) for different countries using comparable spatial units.
|q|
qc
0 If rejecting power law; 1 otherwise
1.63 0.11 0.17 0.07 0.54 0.22 0.22 0.12 0.459 0.189
429.88 0.03 0.21 0.02 14.13 2.81 1.22 0.03 7.894 0.161
1 0 1 0 1 1 1 0 1 0
Acknowledgements The author wishes to thank Joaquim Oliveira Martins (OECD) for initial inspiration and discussions on the ideas that led to this paper. Valuable comments by Marco Modica and three anonymous referees are also acknowledged. Every error remains the author's. The views expressed herein are those of the author and do not reflect those of the OECD or of its member countries. Appendix A. Appendix
Table A1 Comparisons of city definitions: average population in cities using both administrative and FUA definitions. Country
Administrative city
FUA
Mean of FUA/admin
Austria Belgium Switzerland Chile The Czech Republic Germany Denmark Estonia Spain Finland France Greece Hungary Italy Luxembourg Mexico The Netherlands Norway Poland Sweden Slovenia The Slovak Republic
384,344 157,958 125,916 160,584 186,182 243,526 282,792 190,076 219,614 186,689 135,442 211,842 304,807 208,881 76,688 386,044 157,367 190,010 187,968 217,385 188,275 79,382
742,213 552,942 409,945 438,978 294,148 481,033 727,887 248,685 358,422 372,637 457,885 621,646 497,272 395,804 388,217 532,073 333,772 338,455 362,511 392,312 381,648 247,154
2.5 3.6 3.3 2.4 1.6 2.2 2.2 1.3 1.5 1.8 3.3 1.8 1.7 1.8 5.1 1.4 1.9 1.6 1.8 1.5 2.0 3.3
Note: The last column reports the average ratio, by country, between the FUA population and the population of the respective administrative unit (i.e. municipality).
Table A2 Population in OECD administrative cities: basic statistics by country, 2001. Source: National Census data. Country
No. of cities
Average population
Std. dev.
Minimum population
Maximum population
Austria Belgium Switzerland Chile The Czech Republic Germany
6 11 10 26 16 109
384,344 157,958 125,916 160,584 186,182 243,526
573,193 109,728 97,214 72,034 278,255 386,202
90,141 1,550,123 67,500 447,664 26,560 363,273 49,503 323,184 50,702 1,169,106 49,776 3,382,169 (continued on next page)
94
P. Veneri / Computers, Environment and Urban Systems 59 (2016) 86–94
Table A2 (continued) Country
No. of cities
Denmark Estonia Spain Finland France Greece Hungary Italy Luxembourg Mexico The Netherlands Norway Poland Sweden Slovenia The Slovak Republic
Average population
Std. dev.
Minimum population
Maximum population
4
282,792
154,185
161,661
499,148
3 76 7 83 9 10 74 1 76 35 6 58 12 2 8
190,076 219,614 186,689 135,442 211,842 304,807 208,881 76,688 386,044 157,367 190,010 187,968 217,385 188,275 79,382
182,850 378,856 168,469 247,396 238,602 519,142 348,804 . 316,084 146,555 169,624 198,406 197,549 109,752 23,171
68,680 44,980 78,996 25,204 61,373 81,920 45,501 76,688 52,364 41,004 60,418 55,224 96,883 110,668 40,870
400,378 2,938,723 555,474 2,125,852 789,166 1,777,921 2,546,804 76,688 1,632,795 734,533 512,093 933,258 750,348 265,881 117,227
References Behrens, K., Duranton, G., & Robert-Nicoud, F. (2014). Productive cities: Sorting, selection and agglomeration. Journal of Political Economy, 122(3), 507–553. Berry, B. J. L., & Okulicz-Kozaryn, A. (2012). The city size distribution debate: Resolution for US urban regions and megalopolitan areas. Cities, 29, s17–s23. Black, D., & Henderson, V. (2003). Urban evolution in the USA. Journal of Economic Geography, 3, 343–372. Brezzi, M., & Veneri, P. (2015). Assessing polycentric urban systems in the OECD: Country, regional and metropolitan perspectives. European Planning Studies, 23(6), 1128–1145. Cheshire, P. (1999). Trends in sizes and structures of urban areas. In P. C. Cheshire, & E. S. Mills (Eds.), Handbook of regional and urban economics (pp. 1339–1373). Amsterdam: North Holland. Cheshire, P., & Hay, D. G. (1989). Urban problems in Western Europe. London: Unwin Hyman. Cheshire, P., & Magrini, S. (2009). Urban growth drivers in a Europe of sticky people and implicit boundaries. Journal of Economic Geography, 9, 85–115. Duranton, G. (2007). Urban evolutions: The fast, the slow, and the still. American Economic Review, 97, 197–221. Eeckhout, J. (2004). Gibrat's law for (all) cities. American Economic Review, 94, 1429–1451. Espon (2005). ESPON 1.1.1 Potentials for Polycentric Development in Europe - Final report. Luxembourg: ESPON. Ettlinger, N., & Archer, J. C. (1987). City-size distributions and the world urban system in the twentieth century. Environment & Planning A, 19(9), 1161–1174. Fazio, G., & Modica, M. (2015). Pareto or log-normal? Best fit and truncation in the distribution of all cities. Journal of Regional Science, 55(5), 736–756. Gabaix, X. (1999). Zipf's law for cities: An explanation. Quarterly Journal of Economics, 114, 739–767. Gabaix, X. (2009). Power Laws in Economics and Finance. Annual Review of Economics, 1, 255–293. Gabaix, X., & Ibragimov, R. (2011). Rank-1/2: A simple way to improve the OLS estimation of tail exponents. Journal of Business and Economic Statistics, 29, 24–39. Gabaix, X., & Ioannides, Y. M. (2004). The evolution of city size distributions. In J. V. Henderson, & J. F. Thisse (Eds.), Handbook of regional and urban economics (pp. 2341–2378). Amsterdam: North Holland. Gan, L., Li, D., & Song, S. (2006). Is the Zipf law spurious in explaining city-size distributions? Economic Letters, 92, 256–262. Giesen, K., & Südekum, J. (2011). Zipf's law for cities in the regions and the country. Journal of Economic Geography, 11, 667–686.
Giesen, K., Zimmermann, A., & Suedekum, J. (2010). The size distribution across all cities – Double Pareto lognormal strikes. Journal of Urban Economics, 68, 129–137. Gomez-Lievano, A., Youn, H., & Bettencourt, L. M. A. (2012). The statistics of urban scaling and their connection to Zipf's law. PloS One, 7(7), e40393. http://dx.doi.org/10.1371/ journal.pone.0040393. González-Val, R. (2010). The evolution of US city size distribution from a long term perspective (1900–2000). Journal of Regional Science, 50(5), 952–972. González-Val, R., Ramos, A., Sanz, F., & Vera-Cabello, M. (2015). Size distributions for all cities: Which one is best? Papers in Regional Science, 94(1), 177–197. Guérin-Pace, F. (1995). Rank–size distribution and the process of urban growth. Urban Studies, 32(3), 551–562. Hsu, W. -T. (2012). Central place theory and city size distribution. The Economic Journal, 122, 903–932. Ioannides, Y. M., & Overman, H. G. (2003). Zipf ‘s law for cities: An empirical examination. Regional Science and Urban Economics, 33, 127–137. Jiang, B., & Jia, T. (2011). Zipf's law for all the natural cities in the United States: A geospatial perspective. International Journal of Geographical Information Science, 25(8), 1269–1281. Jiang, B., Yin, J., & Liu, Q. (2015). Zipf's law for all the natural cities around the world. International Journal of Geographical Information Science, 29(3), 498–522. Krugman, P. (1996). The self-organizing economy. Cambridge, MA: Blackwell. Lee, S., & Li, Q. (2013). Uneven landscapes and city size distributions. Journal of Urban Economics, 78, 19–29. Meijers, E. J., & Burger, M. J. (2010). Spatial structure and productivity in US metropolitan areas. Environment & Planning A, 42(6), 1383–1402. Mulder, P. (2014). Unravelling the urban hierarchy: A meta-analysis on the rank-size rule for city-size distributions. Paper presented at the GIGA seminar in socio-economics Hamburg 24 November 2014. Nitsch, V. (2005). Zipf zipped. Journal of Urban Economics, 57(1), 86–100. OECD (2012). Redefining urban: A new way to measure metropolitan areas. Paris: OECD Publishing. Openshaw, S., & Taylor, P. J. (1979). A million or so correlation coefficients: Three experiments on the modifiable areal unit problem. In N. Wrigley (Ed.), Statistical applications in the spatial sciences (pp. 127–144). London: Pion. Parr, J. B., & Suzuki, K. (1973). Settlement populations and the lognormal distribution. Urban Studies, 10, 335–352. Peng, G., & Xia, F. (2014). The size distribution of exporting and non-exporting firms in a panel of Chinese provinces. Papers in Regional Science, 95, S71–S86. Petrakos, G., Mardakis, P., & Caraveli, H. (2000). Recent developments in the Greek system of urban centres. Environment and Planning B, 27(2), 169–181. Portnov, B. A. (2011). Does Zipf's law hold for primate cities? Some evidence from a discriminant analysis of world countries. Review of Regional Research, 31, 113–129. Portnov, B. A. (2012). Does the choice of the geographic units matter for the validation of Gibrat's law? Région et Développement, 36, 79–106. Portnov, B. A., Reiser, B., & Schwartz, M. (2012). Does Gibrat's law for cities hold when location counts? Annals of Regional Science, 48, 151–178. Rosen, K. T., & Resnick, M. (1980). The size distribution of cities: An examination of the Pareto law and primacy. Journal of Urban Economics, 8, 165–186. Rossi-Hansberg, E., & Wright, M. L. J. (2007). Urban structure and growth. Review of Economic Studies, 74(2), 597–624. Rozenfeld, H., Rybski, D., Gabaix, X., & Makse, H. (2011). The area and population of cities: New insights from a different perspective on cities. American Economic Review, 101(5), 2205–2225. Song, S., & Zhang, K. H. (2002). Urbanisation and city size distribution in China. Urban Studies, 39(12), 2317–2327. Soo, K. T. (2005). Zipf's law for cities: A cross-country investigation. Regional Science and Urban Economics, 35, 239–263. Soo, K. T. (2012). The size and growth of state populations in the United States. Economics Bulletin, 32(2), 1238–1249. Storper, M. (2013). Keys to the city. How economics, institutions, social interaction, and politics shape development. Oxford: Princeton University Press. Zipf, G. K. (1949). Human behavior and the principle of least effort. Cambridge, MA: Addison-Wesley.