Physics Letters A 374 (2009) 22–27
Contents lists available at ScienceDirect
Physics Letters A www.elsevier.com/locate/pla
Modeling worldwide highway networks Paulino R. Villas Boas a,∗ , Francisco A. Rodrigues a , Luciano da F. Costa a,b a b
Institute of Physics at São Carlos, University of São Paulo, PO Box 369, São Carlos, São Paulo 13560-970, Brazil Institute of Science and Technology for Complex Systems, Brazil
a r t i c l e
i n f o
Article history: Received 13 April 2009 Received in revised form 28 September 2009 Accepted 11 October 2009 Available online 14 October 2009 Communicated by C.R. Doering
a b s t r a c t This Letter addresses the problem of modeling the highway systems of different countries by using complex networks formalism. More specifically, we compare two traditional geographical models with a modified geometrical network model where paths, rather than edges, are incorporated at each step between the origin and the destination vertices. Optimal configurations of parameters are obtained for each model and used for the comparison. The highway networks of Australia, Brazil, India, and Romania are considered and shown to be properly modeled by the modified geographical model. © 2009 Elsevier B.V. All rights reserved.
PACS: 89.75.Fb 02.10.Ox 89.75.Da Keywords: Complex networks Highway networks Complex systems Modeling
Complex systems are composed of a large number of components obeying rules that are frequently not well understood. Nevertheless, their most intrinsic dynamics can be inferred, to some approximation, from observation of their behavior and used to devise network models capable of explaining the existing structures and predicting the network growth and behavior. Special types of complex systems include human-made structures, such as the Internet, power grids, and highway networks [1]. These systems are particularly important because they can provide fundamental clues about human activity and dynamics and help planning effective and sustainable schemes for development. In the current work, we are interested in the characterization, classification and modeling of highway networks in different worldwide regions. Questions of particular relevance which are addressed here include: (i) Can highways be modeled by single local rules and provide an emergent topology that differs from random networks? (ii) Are there specific/universal features and patterns to be found in highways in different countries? (iii) What are the optimization processes underlying the highway topology?
*
Corresponding author. E-mail addresses:
[email protected] (P.R. Villas Boas),
[email protected] (F.A. Rodrigues),
[email protected] (L. da F. Costa). 0375-9601/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.physleta.2009.10.028
The terrestrial communication between cities is established through an integrated system of railways and highways. These systems give rise to complex networks trying to optimize connections between nearby cities while minimizing the overall extension [2–5]. More specifically, the number of connections of a single vertex (city) is typically constrained by its proximity to other vertices, while the establishment of long range connections is restricted by the distance-dependent cost of edges. Because of these properties, highways differ from other complex systems and do not present either scale-free distribution in the number of connections or hierarchical structure such as the World Wide Web [6]. The majority models of geographical networks that have been developed, e.g. the Waxman model for Internet [7], involve selecting pairs of vertices and connecting them with probability inversely proportional to the distance between them. Despite their elegance, such models do not take into account optimization rules, such as constructing roads that minimize the distance between population and some facilities, as in the spatial distribution model developed by Gastner and Newman [2]. In this Letter, we present an extension of the geographical network model introduced recently in [8] — henceforth called the Geographic Path Network model (GPN) — and use it to model different worldwide highway networks. These two recent models can be understood as generalizations of more traditional geographical networks (e.g. [9]), where cities found between two reference ver-
P.R. Villas Boas et al. / Physics Letters A 374 (2009) 22–27
tices have some chance of being incorporated. However, we use a path, instead of a single edge, to connect cities. Although a little bit different, our model is more general than that presented in [8], because that model has one parameter that restricts the number of possible cities which can be included in the path between the reference vertices. In addition, the currently reported approach takes into account four different countries, which are investigated, modeled and compared with other geographical models. Moreover, optimal parameters are obtained for each model. The four different countries differ one another with respect to the size of the highways, the number of cities and the economic development level. Our analysis shows that the highway networks can be accurately modeled by the currently proposed model, based on path transformations [8]. In fact, the networks obtained by the application of a relatively simple set of rules are verified to present topological features substantially similar to those observed in the real-world networks. Moreover, the rules applied to construct the networks seem to be universal, irrespectively to the country. In order to characterize the geometrical networks considered in this article, the following measurements have been used [10]: (i) average strength, i.e. average distance between neighbor connected cities; (ii) average of the average strength between the neighbors of the cities; (iii) Pearson correlation coefficient between the vertex strength at both ends of the edges; (iv) average weighted clustering coefficient (a measurement for the local cohesiveness between vertices) calculated by considering the inverse of the edge’s weight, which means that triangles formed by three near connected cities have a greater value for this measurement than those triangles formed by three farther connected cities; (v) average shortest path length between the cities; (vi) average betweenness centrality — a measurement related to the number of shortest paths passing the vertices; (vii) central point dominance, which is related to the importance of the central vertex (i.e. the one where the most number of shortest paths pass); (viii) average concentric degree of level 2 (degree of a vertex is the number of its connections); (ix) average concentric clustering coefficient of level 2; and (x) average concentric divergence ratio of level 2. For the last three measurements, the edge weight is not taken into account. All these measurements are described and discussed in [10]. The GPN model has been evaluated with respect to the highways of networks of countries in different continents, i.e. Australia, Brazil, India, and Romania. The respective highway networks and
23
coordinates of the cities of these countries have been compiled using a script program. The map of these countries, obtained from the Internet, were preprocessed in order to remove irrelevant data. The cities were then represented by vertices, while the roads were mapped as links. This representation is known as primal, which is different from the dual road networks representation [11], where vertices correspond to highways, and edges, to cities or crossroads where two or more roads cross. At first, the vertices were considered to be only the cities, so that there was an edge between two cities whenever they shared at least one road. However, we also noticed an important effect implied by crossings between roads, which define cliques between the involved cities. In order to take this fact into account, the crossroads were also considered as vertices. This approach is reasonable since the crossroads also play an important rule to the highway network formation. The Australian, Brazilian, Indian, and Romanian highway networks obtained automatically using the script program have respectively 816, 800, 572, and 224 vertices, which respectively include 574, 548, 537, and 153 are cities (the remainders correspond to the crossings). The real-world highways are established according to some intrinsic optimization rules, which are often applied in an empirical fashion. In the same way, our model establishes paths that minimize the distance between cities while trying to maximize the number of nearby cities covered by the incorporated paths. The model starts with isolated cities, and every city is then connected to one or more destination cities with priority to the nearest ones by paths that should pass through a great number of other cities. This model reflects the fact that the destination is generally chosen by planners to be the most important nearby city, while the length of the chosen path has to be minimized while maximizing the number of covered cities and crossroads. By using this procedure, the distance between any pair of cities and the sum of the length of all highways should be kept small. One example of how cities can be connected to their destination is presented in Fig. 1, with respect to connections from Oxford to London, Bristol, Birmingham, and Peterborough (dashed lines). The chosen paths (solid lines) are those with relatively short lengths which pass through other cities. More formally, for an origin city i, a destination city j is chosen according to the probability:
P i j ∼ e −λdi j ,
(1)
Fig. 1. The model for highway construction. Every city is connected to some nearby cities through short paths (solid lines) instead of considering direct connections (dashed lines), as it would be done in traditional geographical complex network models. In this case, the starting city, Oxford, is connected to London, Bristol, Birmingham, and Peterborough through the paths indicated by the solid lines.
24
P.R. Villas Boas et al. / Physics Letters A 374 (2009) 22–27
where λ is a parameter of this model, and di j is the geographical distance between i and j. Notice that P i j is such that j P i j = 1, ∀i. If λ is large, there is a higher chance of choosing a destination city nearer the origin. For small values of λ, the chance of choosing a distant city is increased. After choosing a destination, a similar rule for including cities or crossroads along the path is applied, but those cities or crossroads whose distances to either the origin or the destination are greater than the distance between the origin and the destination are not taken into account. Then, the set S of all possible cities and crossroads including i and j is sorted (increasing order) according to the distance from i. Starting from i, the next city or crossroad k is chosen with probability given by,
P ik ∼ e −γ dik
cos(θ) + 1 2
,
(2)
where γ is the second parameter of the model, and θ is the angle between the straight lines ik and i j. The term (cos(θ) + 1)/2 was introduced to reduce the probability to choose a city or crossroad k with large absolute angle θ . As −π < θ < π , this term ranges from 0 to 1. After choosing k, all cities and crossroads with distance from i less than the distance from i to k are removed from the set S, and the next one is chosen. This procedure is repeated until there is just one element left in set S, which is the destination city. The resulting sequence of cities and crossroads defines the path from i to j. Overall, the model consists in the repetition of the above steps until the desired total length of highways is achieved. In order to ensure that no city result disconnected, a preliminary step is applied in which every city is connected to another one by using the above described approach. After having the preliminary step completed, the respective weighted, undirected network is obtained, where every city is a vertex, and two cities are connected through an edge whose weight is given by their distance if they are neighbors along at least one of the obtained paths. Alternative models have been compared to our suggested GPN model. Although there are many geographical network models [12], only those which allow the specification of the position of the vertices have been used in our analysis. This constraint is necessary since all considered models have the same number of vertices having the same positions as the original network. The
process of building the models is the same: the first step is to start with a set of disconnected vertices whose positions are given by the original network. Then, the cities are connected through rules which depend on each model. In order to have its accuracy determined, the GPN model has been compared to two other geographical models. The first one is a version of the Waxman geographical model [7] (WGN) in which the probability of connecting two vertices i and j is proportional to α e −λdi j , where α and λ are the parameters of the model, which control the choosing j near or far away, and di j is the geographical distance between vertices i and j. It has been previously observed [13] that the WGN model generate networks whose topological features are close to the US highway network. Indeed, the currently proposed GPN model is a generalization of the WGN model. The second model is geographical and scale-free (e.g. [9]) (GSF) whose construction is similar to the Barabási and Albert scale-free model [14], in which vertices with a small number of connections are added sequentially, and the probability of choosing a vertex is proportional to its degree. In the GSF model, however, we start with a set of disconnected vertices and connect twovertices i and j with probability proportional to e −λdi j (k j + 1)/ l (kl + 1), where λ and di j are the same as for WGN, and k j is the degree of j. In order to achieve a fair comparison between the models, the = { p 1 , p 2 , . . . , pn } of each of them have been adn parameters p justed so that they are optimized with respect to each of the three highway networks. This is obtained by varying all parameters, through successive approximations, in order to find those which imply the smallest Euclidean distance [15], which can be obtained by:
10 2 D p = μlReal − μlp ,
(3)
l =1
where the sum is performed considering the set of 10 measurements described before, D p is the Euclidean distance from the real ; highway network to the corresponding model with parameters p
μlReal is the measurement l for the real highway network; and μlp
is the average measurement l over all realizations of the corre . In order to be comparable, sponding model with parameters p all measurements have been standardized before the calculation
Fig. 2. Methodology for the classification of highway networks. A set of M models is chosen and its best configuration of parameters determined, which allows determining networks whose topological properties are most similar to the real world network. Next, P network realizations with the established parameters are generated for each i j . These feature vectors are then projected into three dimensions considering the model and a set of measurements is estimated and stored into respective feature vectors μ canonical variable analysis. Finally the real network is projected and classified into the model whose likelihood is maximum according to the maximum likelihood decision theory.
P.R. Villas Boas et al. / Physics Letters A 374 (2009) 22–27
Table 1 Best parameters found for the models. Model
Parameters
Australia
Brazil
India
Romania
GWN
α
0.21 0.0038
0.30 0.0091
0.65 0.0037
0.50 0.053
m
1 0.0034
1 0.0068
1 0.0026
1 0.044
λ (km−1 )
0.0012 0.070
0.022 0.12
0.00041 0.090
0.0060 0.091
GSF
GPN
λ (km−1 ) λ (km−1 )
γ (km−1 )
25
of the Euclidean distance. A measurement l is standardized by subtracting the respective average and dividing by the respective standard deviation [15]. The best parameters found for all models are presented in Table 1, after 200 realizations of each set of of each model have been performed. parameters p It is interesting to note that the Indian highway network presents the smallest value of λ and therefore exhibits the longest highways. On the other hand, the highways in Brazil are the shortest ones, as indicated by the highest respective value of λ. This fact in mainly a consequence of the high concentration of cities in southern Brazil, where cities are close one another. For instance,
Fig. 3. The 3D phase space defined by the first three canonical variable projections for the (a) Australian, (b) Brazilian and (c) Indian, (d) Romanian highway networks considering the models: GPN (red circle), WGN (green triangle), and GSF (blue square). The highway networks are already assigned with the symbol of the respective model obtained by the maximum likelihood methodology and are indicated by black arrows. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this Letter.)
26
P.R. Villas Boas et al. / Physics Letters A 374 (2009) 22–27
Fig. 4. The visualization of the (a) original Indian network and one realization of the corresponding (b) WGN, (c) GSF, and (d) GPN models. It is interesting to note that only the GPN model yields a single connected component. The isolated vertices in (d) are some of the crossroads of the original network, which were not chosen for the paths.
in the state of São Paulo, most cities are less than 50 km apart one another. In addition, the parameter γ is similar to India and Romania, but smaller than for Australia. This suggests that the probability to choose the cities in the path are similar in these countries. On the other hand, the Brazilian highway network has the highest value of γ , which indicates that the cities in the path are very close one another. Therefore, the most similar networks with respect to the set of parameters are the highway networks of India and Romania. These networks have homogeneous highway distributions, i.e. a more integrated highway system, which connects most locations through the territory. In contrast, Australia and Brazil present highway networks with the most different parameters (high λ and γ ), which is a consequence of geographical and geophysical heterogeneities (e.g. large forests and deserts). At the same time, both Brazil and Australia tend to have most of the roads interconnecting coastal cities. Another set of 500 realizations and the corresponding measurements have been obtained for each model using the best parameters of Table 1. Since these measurements are often correlated, the canonical variable analysis [16,17] has been applied in order to reduce the correlation between them, ensuring an optimal separation between all the involved categories, and providing a means for respective visualization. The classification has been performed by extracting a set of measurements for each network model realization. These measurements were first standardized and then projected into the threedimensional spaces by the canonical variable analysis [16]. Finally, the classification was performed by the maximum likelihood decision theory [15]. When all categories involve the same number of individuals (as is the case in this work), the maximum likelihood methodology determines the probability of each model with respect to a given set of attributes and associates the model that yields the maximum likelihood to the original highway network.
The overall approach adopted for classifying the highways networks is illustrated in Fig. 2. The results of this methodology are shown in Fig. 3, where the classification of the four highway networks is also provided. These results clearly indicate that the GPN allows the best reproduction of the respective highway networks for all four countries. Notice that each point in the scatterplots represent a network and each real-world highway network is indicated by an arrow. We adopted a relatively large number of measurements (10) in order to provide a more comprehensive characterization of the topological and geographical properties of the networks, reflecting geographical distances, density of interconnections between neighbors, etc. In order to verify the robustness of the results, we also performed the classification using subgroups of the whole set of ten measurements and obtained the same classification (not shown here), i.e. the GPN model was the most accurate in all cases. Fig. 3 also shows that the networks generated by the GSF are the most different from the original networks. This fact in mainly due to the occurrence of hubs in these networks, while the highway networks do not present highly connected vertices. Since the GPN model is a generalization of the WGN model, the clusters formed by networks generated by these models tend to be close one another. Therefore, if we do not take into account the GPN model, the highway would be classified as belonging to the WGN cluster. Thus, the GPN model provides an improvement of the traditional geographical model. The accuracy of the GPN model can be verified in Fig. 4, where the visualization of the Indian highway network and one realization of the corresponding models are shown. It can be noticed in this figure that only the GPN model, Fig. 4(d), has just one connected component. The isolated vertices in this figure are due to the crossroads which were not included in the paths of the model.
P.R. Villas Boas et al. / Physics Letters A 374 (2009) 22–27
In this Letter we have presented the GPN model — a geographic model where paths instead of just distances between vertices were considered. Our results have shown that different worldwide highway system can be accurately modeled by the simple, possibly universal, rules embedded in the GPN model. There are some limitations, however, with the GPN model where the geography is not considered. This model could be improved including the geography of the analyzed country by considering lakes and the coasts prohibiting some connections. Besides, additional studies could investigate how the proposed model reproduces the time evolution of highway networks in different countries. Acknowledgements
[2] [3] [4] [5] [6] [7] [8]
[9] [10] [11] [12]
Luciano da F. Costa is grateful to FAPESP (05/00587-5), CNPq (301303/06-1 and 573583/2008-0) for financial support. Francisco A. Rodrigues acknowledges FAPESP sponsorship (07/50633-9), Paulino R. Villas Boas acknowledges FAPESP sponsorship (08/53721-9).
[13]
[14] [15]
References [1] L. da F. Costa, O.N. Oliveira Jr., G. Travieso, F.A. Rodrigues, P.R. Villas Boas, L. Antiqueira, M.P. Viana, L.E.C. da Rocha, Analyzing and modeling real-world
[16] [17]
27
phenomena with complex networks: A survey of applications, arXiv:0711.3199, 2008. M.T. Gastner, M.E.J. Newman, Phys. Rev. E 74 (1) (2006) 016117. M.T. Gastner, M.E.J. Newman, Eur. Phys. J. B 49 (2) (2006) 247. M.T. Gastner, M.E.J. Newman, J. Stat. Mech. (2006) P01015. N. Mathias, V. Gopal, Phys. Rev. E 63 (2) (2001) 021117. E. Ravasz, A.-L. Barabási, Phys. Rev. E 67 (2) (2003) 26112. B.M. Waxman, IEEE J. Selected Areas Commun. 6 (9) (1988) 1617. P.R. Villas Boas, F.A. Rodrigues, L. da F. Costa, Modeling highway networks with path-geographical transformations, in: Studies in Computational Intelligence, Springer-Verlag, 2009, unpublished. Y. Hayashi, IPSJ Digital Courier 2 (0) (2006) 155. L. da F. Costa, F.A. Rodrigues, G. Travieso, P.R. Villas Boas, Adv. Phys. 56 (1) (2007) 167. V. Kalapala, V. Sanwalani, A. Clauset, C. Moore, Phys. Rev. E 73 (3) (2006) 26130. S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, D.U. Hwang, Phys. Rep. 424 (4–5) (2006) 175. P.R. Villas Boas, F.A. Rodrigues, L. da F. Costa, Modeling the evolution of complex networks through the path-star transformation and optimal multivariate methods, Int. J. Bifur. Chaos, in press. A.-L. Barabási, R. Albert, Science 286 (5439) (1999) 509. L. da F. Costa, R.M. Cesar Jr., Shape Analysis and Classification: Theory and Practice, CRC Press, 2001. N.A. Campbell, W.R. Atchley, Systematic Zoology 30 (3) (1981) 268. R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, Wiley–Interscience, 2000.