Accepted Manuscript Effects in the network topology due to node aggregation: Empirical evidence from the domestic maritime transportation in Greece Dimitrios Tsiotas, Serafeim Polyzos
PII: DOI: Reference:
S0378-4371(17)30862-2 https://doi.org/10.1016/j.physa.2017.08.130 PHYSA 18576
To appear in:
Physica A
Received date : 8 February 2017 Revised date : 17 July 2017 Please cite this article as: D. Tsiotas, S. Polyzos, Effects in the network topology due to node aggregation: Empirical evidence from the domestic maritime transportation in Greece, Physica A (2017), https://doi.org/10.1016/j.physa.2017.08.130 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Effects in the network topology due to node aggregation: empirical evidence from the domestic maritime transportation in Greece Dimitrios Tsiotas*, Serafeim Polyzos Department of Planning and Regional Development, University of Thessaly, Pedion Areos, Volos, 38 334, Greece Tel +30 24210 74446, fax: +302421074493 E-mails:
[email protected],
[email protected] *Corresponding author Abstract This article studies the topological consistency of spatial networks due to node aggregation, examining the changes captured between different network representations that result from nodes’ grouping and they refer to the same socioeconomic system. The main purpose of this study is to evaluate what kind of topological information remains unalterable due to node aggregation and, further, to develop a framework for linking the data of an empirical network with data of its socioeconomic environment, when the latter are available for hierarchically higher levels of aggregation, in an effort to promote the interdisciplinary research in the field of complex network analysis. The research question is empirically tested on topological and socioeconomic data extracted from the Greek Maritime Network (GMN) that is modeled as a non-directed multilayer (bilayer) graph consisting of a port-layer, where nodes represent ports, and a prefecture-layer, where nodes represent coastal and insular prefectural groups of ports. The analysis highlights that the connectivity (degree) of the GMN, is the most consistent aspect of this multilayer network, which preserves both the topological and the socioeconomic information through node aggregation. In terms of spatial analysis and regional science, such effects illustrate the effectiveness of the prefectural administrative division for the functionality of the Greek maritime transportation system. Overall, this approach proposes a methodological framework that can enjoy further applications about the grouping effects induced on the network topology, providing physical, technical, socioeconomic, strategic or political insights. Keywords complex network analysis, spatial networks, multilayer networks, layer transformations, node scale. 1. Introduction Spatial and specifically transportation networks have been diachronically developed to serve economic, social, political, and cultural needs among the social configurations (such as the urban units) they connect (Ducruet and Beauguitte, 2014). According to this aspect, they suggest communication systems that on their structure reflect information about their socioeconomic environment and thus they may operate as economic indicators of their interconnected spatial units (Tsiotas and Polyzos, 2015a). This implies that the information being immanent in the structure of networks representing the same socioeconomic system should have a common basis on the way they describe their environment, regardless their level of resolution or the scale of the spatial units that their nodes may refer to. A relevant on this topic study was conducted by Ducruet at al. (2011), which examined the participation of cities in worldwide air and sea flows through three different Page | 1
levels of urban delineation: cities, regional areas and megalopolises. Despite their noticeable differences in their spatial and economic organization, the authors chose to study these two networks because both of them have essential respective roles in the urban development, which sometimes are complementary. The aim of the study was to reveal the complementarities between air and sea transport networks in shaping an urban hierarchy and to evaluate the influence of their aggregation in the network structure, the hierarchy of centrality, and the correlations between these networks. This work first pointed out that the chosen structure of the three-level aggregation grasped some important topological and geographical properties of these two networks that are interdependent with urban development. Next, it was shown that the network topological features were more similar to larger urban units (urban regions, and megalopolises) than on the level of single and separated terminals, implying that global players in the transport industry tend to follow the main paths of urban development. Additionally, the majority of the urban regions and megalopolises were benefited from a combined air and sea network, while such coexistence in the city level was rather competitive than cooperative. Finally, the study highlighted that the largest nodal regions were specializing in air traffic and that independent cities tend to be connected with those of similar specializations as an effect of the distinctive geographic coverage between air and sea networks (Ducruet at al., 2011). In methodological terms, in the work of Ducruet et al. (2011) node aggregation was performed to make possible the analysis between two different types of networks, which otherwise would not connect the same nodes (airports and seaports), except the case that these networks were layers of a multimodal transportation system (Boccaletti et al., 2014; Ducruet, 2017). Within this context, this paper goes beyond the former perspective by lowering the geographical scale to national level in order to examine the consistency and the effect of node aggregation on the topological properties between two networks (the port-level Gα and the regional-level Gβ) that represent the same socioeconomic system (the Greek Maritime Network - GMN). In the port-level model (Gα) nodes represent ports and edges represent inter-port shipping connections (Tsiotas and Polyzos, 2015a), whereas in the regional-level (Gβ) nodes represent collections of ports grouped per prefecture and edges represent the potential of maritime interconnections among these groups. Modeling a national maritime transportation system by using the network paradigm (Brandes et al., 2013) advances both the complex network analysis (CNA) (Albert and Barabasi, 2002; Barthelemy, 2011; De Montis et al., 2011) and the spatial economic analysis (Ducruet and Beauguitte, 2014; Makkonen et al., 2013; Tsiotas and Polyzos, 2015a). The GMN is a network with a noticeable set of peculiarities on its geomorphology, economic basis, and competition with other transportation modes. Comparatively with the cases of international transportation networks, which suggest broader socioeconomic systems of interconnected national economies that include indirect information about their national components (Tsiotas and Polyzos, 2015a), the study of national transportation networks is directly related with the socioeconomic environment of their countries and thus it can provide more useful information in national level. Under this perspective, the port-level (Gα) GMN represents a physical socioeconomic (transportation) system of interconnected ports participating in an economic market of exchanging passenger and trade flows. Depending on their types, network flows may express either the state of supply or demand in economic terms. For instance, when flows in the GMN express product exchange, the network’s destinations configure a market of demand, whereas when the GMN’s flows express exchanging of services, the same destinations represent a supply market. Within this context, the Gα layer expresses a productivity system in physical scale, which produces flows of exchanging either products or customers enjoying services at the place of supply (Polyzos and Page | 2
Petrakos, 2001; Polyzos, 2009; Tsiotas and Polyzos, 2015a,b). On the other hand, ports do not suggest autonomous economic entities. In each of their backgrounds, a latent market exists producing or consuming either goods or services transported by this physical port system. In other words, ports operate as gates of their hinterland markets (Ducruet at al., 2011) that interact through the GMN. Even the intensity of the flows transported within this physical network is a result of the productive dynamism of the hinterland markets corresponding to each port (Polyzos, 2009; Tsiotas and Polyzos, 2015a). The existence of such markets is latent in the port-level representation of the GMN, a fact that is being supplemented in this study by the regional-level consideration (Gβ). In particular, nodes in the Gβ representation do not constitute physical but administrative units that express discrete or substantial market entities of the Greek spatio-economic system. Additionally, nodes in regional-level enclose policy information, in the extent that the administrative division of a country into regions suggests a political act representing a structure for the implementation of its national policies. Through the comparison of the port- and regional-level of the GMN, this paper targets to facilitate the linking of information between different levels of hierarchy in node aggregation and it may serve interdisciplinary purposes. At first, in the field of spatial and regional statistics it targets to facilitate the management and the link of data between different levels of territorial units for Statistics, such as in the NUTS classification (see Eurostat, 2017), and to further provide a standardization reference for those spatial units that are not included in a typical classification (e.g. NUTS I, II, and III), such as in the case of cities, towns, urban settlements, or of other special forms of urban configurations. In the field of regional planning and policy, it proposes a methodological framework for the link and evaluation of different levels of spatial and administrative division existing in the institutional framework for the strategic and spatial planning of countries worldwide, which in many cases is described by a high level of complexity (Salet and Kreukels, 2003). For instance, in Greece, the current system of the institutional framework for the spatial planning is composed by many layers and categories of spatial organization (see YPEKA, 2017) that are expressed in different scale (e.g. administrative units, land uses, functional areas, zones of regulated construction, of environmental protection, of cultural heritage, etc.), which many times they are overlaid and perplexed and necessitate an integrated management (Polyzos, 2011). Linking the information between these layers promotes spatial planning and policy, because it contributes to their management and it facilitates the planners and policy makers to evaluate in what level their planning and decision making applied in a single layer is affected or affects the other layers. In the field of CNA (Albert and Barabasi, 2002; Barthelemy, 2011; De Montis et al., 2011), this paper proposes a methodological framework for capturing the topological changes between networks considered as different layers in a multilayer model. Provided that the network layers are constructed as a result of node grouping, any changes observed between the topology of different layers illustrate the effects of node aggregation. This approach facilitates the linking of information related to an empirical network with data of its socioeconomic environment available in hierarchically higher levels of aggregation and thus it may serve interdisciplinary research in the CNA. Within this context, the major research hypothesis of the manuscript is to examine how different levels of node aggregation can be compared in order to detect their topological consistency, provided that comparisons between node sets of different length (where nodes are not of the same number and do not refer to the same spatial units) is not a trivial issue in statistics and it is ruled by high level of complexity. The remainder of this article is organized as follows: Section 2 presents the methodology; the graph models of the GMN, the available data, the quantitative tools used Page | 3
in the analysis and the proposed methodological framework. Section 3 presents and interprets the results of the analysis, Section 4 describes the limitations of the study and provides addresses of further research, and finally at Section 5 conclusions are given. 2. Methodology and Data 2.1. Terminology-notations At the following we present some essential terminology for the multilayer network consideration, complying with the formalism provided from Boccaletti et al. (2014) and Kivela et al. (2014). G(V,E): Graph expressed as the pair set (V,E), consisting of the node-set V and the edge-set E. Cartesian Pair set A B produced by the sets A, B under the set theoretic product: expression A B={(a,β): α A and β B}. : Multilayer graph or network (,), composed by the family of networks (named layers or components) ={Gα}={Vα,Eα}, α=1,2,…,p, and the set of the interlayer connections ={Εαβ Vα Vβ}, with α,β {1,2,…,p}, where p is the number of layers. Ga≡Ga(Va,Ea): The αth component (or layer) of the multilayer network . For α=2 the multilayer graph is considered bilayer (,)=({Gα,Gβ},Εαβ), implying that it is composed by a pair of layers. vα,uα: Nodes that belong to the set Vα (vα,uα Vα) of the αth-layer. eij,a: Edge eij shaped by the pair of nodes vα,uα within the layer α. The length of the edge eij,α expresses the distance d(eij,α)≡dij,α between the pair of nodes vα,uα. A : The length (number of elements) of a set A. For the multilayer representation we define Vi V (Gi ) ni and Ei E (Gi ) mi , where i=α,β. X : Average value of a measure or attribute X. Node scale: Conventional term, expressing the geographical reference or resolution (i.e. port- or prefectural-level) of the network nodes. For example, the node scale of the Gα layer is port-level and of the Gβ layer is prefecturallevel. 2.2. Methodological framework The methodological framework followed for the analysis of the GMN has a twodimensional configuration, as it is shown in figure 1. Its vertical structure corresponds to the layer where the analysis does take place, whereas its horizontal configuration represents the different parts (steps) of the multi-level analysis. The rationale behind the construction of this framework is that any comparisons between layers Gα and Gβ of a multilayer network (={Gi, i },), with α,β= i and α≠β, determine transformations of the form g X : G G , where X is a given topological variable (e.g. degree, betweenness, etc.), is the family of layers, and is the set or the interlayer connections. Within this context, the set T={gX: X=network attribute} represents the total effects of node aggregation to the network topology. This framework proposes a novel approach for detecting and evaluating the effects in the network’s topology due to node Page | 4
aggreggation and its i applicabiility is beinng evaluated d on empiriccal observat ations of thee Greek maritim me transport. The folllowing paraagraphs briefly presen nt the succeessive parts of the analyssis composinng the meth hodological framework k of the stud dy.
Figure 1: The methoddological frramework of the study. Graph modelling and Da ata 2.3. G The G GMN is moodeled in th he L-space representation (Bartheelemy, 20111) as a multilayer (bilayeer), non-diirected, weeighted graaph =(,) (Kivelaa, 2014), aaccording to the concepptual frameework show wn in figuree 2. The first layer Gα(nα=229, m α=231) rep presents the poort version of the GMN N, whereass the second d layer Gβ(nβ=35, mβ= =43) represeents its prefeccture versionn. Figu ure 2: The cconceptual fram mework of th the study, baseed on multillayer reprresentation. The topo ologies of thhe layers Gα and Gβ are com mpared, whicch refer to the sam me sociioeconomic environmeent, but they are connstructed on n diffeerent node sscale (portleveel vs. prefect cture-level). Page P |5
L Let loosely suppose th hat denottes the spacce representing the widder socioeco onomic enviroonment of thhe GMN. Then, T the nnetworks Gα and Gβ may m be conssidered as discrete d representations of o , accord ding to relaation (1), where w the symbol s “ ” is the Caartesian producct’s operatoor. G , G : G (V , E ) and G , G : G (V , E )
(1) IIn the Gα representaation, the nodes vi,α Vα refer to the GM MN ports Pi≡vi,α (i=1,… …,229) and they are located l at tthe center of the geographic areea defined by the infrasttructure covverage of eaach port (T siotas and Polyzos, P 20 015a). The ccoordinatess of the Greek ports weree drafted fro om the Gooogle Maps site s (Googlee Maps, 20113). The no odes vi,β Vβ inn the Gβ reppresentation n are regionnal groups of o vi,α Vα nodes, n which ch they correspond to the Greek coasstal or insullar prefectuures Ri (i=1,,…,35) (figu ure 3). Eachh node vi,β is a set and w we write vi , Va , Ri vi ,a Va vi ,a Ri , wherre Vβ ={v1,ββ, v2,β,…,v355,β}={Vα,1, Vα,2,…,
35
Vα,35}. In terms of o cardinaliity, it standds
V i 1
a , Ri
ustrated na . The graph moddels are illu
using tthe networkk manipulation softwarre of Bastian n et al. (200 09).
mα=231) (leeft) and Gβ(n ( β=35,mβ=4 43) layers of the GM MN (see Figuree 3: The Gα(nα=229,m node ccoding in tabble 1). T The vi,β noddes are locaated at the ggeographic coordinates of each pprefectural group’s g hub (ii.e. at the port p within each prefeccture with the maximum numberr of connecctions), where cases withh more than n one hub are located d in the coo ordinates caalculated by b their spatiall means (Tssiotas and Polyzos, 20115a). The sizes of the Gβ layer’s nnodes are sh hown in table 11. Table 1. Nodes’ siize ni,β in th he Gβ layer Rank 1. 2. 3. 4. 5.
Preefecture Evrrou Kavvalas Chaalkidikis Theessalonikis Maagnessias
Size* 2 7 9 2 9
Rank k 13. 14. 15. 16. 17.
Prefecturee Achaeas Fokidos Heleias Zakynthou u Messenias
Size* 5 2 2 2 2
Ra ank 25. 26. 27. 28. 29.
Prefectuure Lasithiouu Dodekannesou Lesvou Cycladess Larissas
Size* 3 22 12 27 1
Page P |6
Rank 6. 7. 8. 9. 10. 11. 12.
Prefecture Fthiotidos Euvoias Kefallonias Lefkadas Kerkyras Prevezas Aitoloakarnanias
Size* 7 16 10 4 6 2 5
Rank 18. 19. 20. 21. 22. 23. 24.
Size* Rank Prefecture Size* 4 30. Thesprotias 1 5 31. Xanthis 1 2 32. Arkadias 1 8 33. Chiou 5 9 34. Attikis 27 2 35. Samou 6 2 * number of ports included in each node vi,β
Prefecture Lakonias Korenthias Veotias Argoledos Chaneon Rethemnou Herakleiou
Edges of the first layer e Eα represent direct shipping connections between pairs of nodes v,u Vα and they are drawn as linear segments (figure 3), whereas those of the second layer e Eβ represent the potential a pair of prefectures Ri,Rj (i≠j=1,…,35) to have a maritime connection. Generally, the edge weights of the bilayer network express spatial distances that are measured in nautical miles. The GMN was considered, additionally to edge-, also node-weighted (Tsiotas and Polyzos, 2015a), implying that each node is described by a set of measurable socioeconomic attributes. This procedure produces nodesets Vα(X)={xi} (i=1,…,229) and Vβ(X)={xj} (j=1,…,35), in respect to the socioeconomic attribute Xs, which form vector socioeconomic variables participating in the empirical analysis. The available socioeconomic variables of the GMN concern population, accessibility, economic and maritime traffic data and they are shown in table 2. Variables of the Gα layer have a port reference (including 229 elements), whereas those of the Gβ layer have prefectural scale (including 35 elements). Table 2. Network variables participating in the analysis Description Gα Gβ Structural (mixing space and topology) variables Degree The number of connections The degree of each prefecture (degree) of each port Strength Sum of distances of the edges Sum of distances of the edges adjacent to a node-port adjacent to a node-prefecture Betweenness CB scores of each port – CB scores of each prefecture – centrality intermediacy of ports intermediacy of prefectures Closeness centrality CC scores of each port – CC scores of each prefecture – accessibility of ports accessibility of prefectures Clustering Probability of meeting connecting Probability of meeting coefficient neighbors per port connecting neighbors per prefecture Modularity Scores indicating the community Scores indicating the community classification that each port belongs that each prefecture belongs Accessibility Zone The area of the region that has The area of the each prefecture access to each port (km2). (km2). Edge length The length (km) of the inter-port The length (km) of the interconnections of the GMN. prefectural connections of the GMN. Socioeconomic variables Port Population The population of the urban units The population of the (cities, villages, etc.) where the prefectures where the ports ports belong. belong. Primary Sector The participation of a port’s The participation of a prefecture prefecture to the GDP of the to the primary sector’s GDP. primary sector. Tertiary Sector The participation of a port’s The participation of a prefecture prefecture to the GDP of the to the tertiary sector’s GDP. Variable
DEG STR CB CC CLUST MOD ACC EDGE
POP ASEC CSEC
Page | 7
tertiary sector. Tourism arrivals per port (defined as differences on summer-winter data). Tourism departures per port (defined as differences on summerwinter data).
AR
Arrivals
DEP
Departures
LD
Package Load
UNLD
Package Unload
Tourism arrivals per prefecture (defined as differences on summer-winter data). Tourism departures per prefecture (defined as differences on summer-winter data). Tourism package loads per port Tourism package loads per (defined as differences on summer- prefecture (defined as winter data). differences on summer-winter data). Tourism package unloads per port Tourism package unloads per (defined as differences on summer- prefecture (defined as winter data). differences on summer-winter data). (sources: ELSTAT, 2011; Tsiotas and Polyzos, 2015a)
2.4. Network analysis The second step of the methodology includes calculations of the network measures, detection of the scale-free (SF) and small-world (SW) attributes, and examination of the typology of the rank size distributions for each layer (Gα,Gβ). Each of these parts composing the network analysis is described in brief at the following paragraphs. 2.4.1. Network measures At first, some of the most common network measures are calculated for the analysis of the GMN, which allow capturing different aspects of each layers’ topology. The measures used in the analysis are shown in table 3. Table 3. Topological network measures used in the analysis Measure Graph density (ρ)
Node Degree (k)
Node (spatial) strength (s)
Description Fraction of the existing connections of the graph to the number of the possible connections. It expresses the probability to meet in the GMN a connected pair of nodes. Number of the edges adjacent to a given node, expressing the node’s communication potential.
Math Formula n 2m m 2 ( n n 1)
The sum of edge distances being adjacent to a given node.
si s (i )
jV ( G )
ij , where
(Diestel, 2005)
1, if eij E (G ) 0, otherwise
ij
jV ( G )
ij d ij ,
(Barthelemy, 2011)
where d ij w(eij ) in km
Average Network’s Degree k
Mean value of the node degrees k(i), with i V(G).
Closeness Centrality(*) ( CC (i) )
Total binary distance d(i,j) computed on the shortest paths originating from a given node i V(G) with destinations all the other nodes j V(G) in the network. This measure expresses the node’s reachability in terms of steps of separation. The proportion of the (σ) shortest
Betweenness
ki k (i )
Reference (Diestel, 2005)
k
1 n k (i ) n i 1
CC (i )
n 1 d ij d i n 1 j 1,i j
CB ( k ) (k )
(Diestel, 2005)
(Koschutzki et al., 2005).
(Koschutzki et
Page | 8
Measure Centrality(*) ( CB (k ) ) Local Clustering Coefficient (C(i))
Modularity (Q)
Average Path Length l
Description paths in the network that pass through a given node k.
Probability of meeting linked neighbors around a node, which is equivalent to the number of the node’s connected neighbors E(i) (i.e. the number of triangles), divided by the number of the total triplets shaped by this node, which equals to ki(ki–1). Objective function expressing the potential of a network to be subdivided into communities. In its mathematical formula, gi is the community of node i V(G), [Aij – Pij] is the difference of the actual minus the expected number of edges falling between a particular pair of vertices i,j V(G), and δ(gi,gj) is an indicator function returning 1 when gi=gj. Average length d(i,j) of the total of network shortest paths.
Math Formula
C(i)
(Barthelemy, 2011)
E(i) ki ki 1
[ A P ] (g , g ) ij
Q
Reference al., 2005)
ij
i
i, j
2m
l
d (v , v ) vV
i
j
j
(Blondel et al., 2008; Fortunato, 2010)
(Barthelemy, 2011)
n (n 1)
* when the measure is computed on binary distances it is considered binary (bin), whereas when it is computed on nautical distances it is considered weighted (wei)
Comparisons between homologue Gα and Gβ scores, for the same network measure (figure 1), provides an initial quantification of how the network topology is affected due to node aggregation. 2.4.2. Pattern recognition This part detects the existence of the scale-free (SF) and small-world (SW) properties in the GMN layers. Networks with the SF attribute have a degree distribution fitting to a power-law curve (Stumpf and Porter, 2012), of the form f(x)=bx-a, with its exponent a lying within the typical range 2< a <3. SF networks consist of a minority of nodes with many connections (called hubs) whereas the majority of their nodes have significantly fewer links. The mechanism describing the evolution of such networks is the so-called preferential attachment, implying that new nodes entering a SF network “prefer” to connect with the already highly connected nodes (i.e. the hubs), in order to exploit the connectivity benefits of the latter (Barabasi and Albert 1999; Albert and Barabasi, 1999; Boccaletti et al., 2006). The authors have found in one of their previous works that the degree distribution of the Gα follows a power-law pattern (Tsiotas and polyzos, 2015a), under an almost perfect determination, with an exponent being close to this of the typical interval 2< a <3. This is an indication that the Gα layer may have scale-free characteristics. Based on this observation, the SF property for the Gβ layer is detected comparatively, by applying the two independent sample Kolmogorov-Smirnov (K-S) test (Norusis, 2004; Stumpf and Porter, 2012), in order to detect whether the degree distributions of the Gα and Gβ layers follow the same pattern. In particular, this test compares the distributions between two variables xα and xβ, where the null hypothesis states that they follow the same continuous
Page | 9
distribution, whereas the alternative states that they do not. The significance level is defined 5% and the test statistic is described by the formula: tKS max{ F ( x) F ( x) } (2) where Fα(x) is the proportion of xα values that are less than or equal to x and Fβ(x) is the proportion of xβ values that are less than or equal to x. The Fα(x) and Fβ(x) express the empirical cumulative distribution functions (CDFs). The two-sample K-S test is further applied to compare the empirical distributions of all the topological variables between the Gα and Gβ layers (betweenness and closeness centrality, clustering coefficient and modularity), in order to detect which topological characteristic is preserved due to node aggregation expressed by the transformation g : G G . The next step of pattern recognition targets to detect the SW topology. The SW attribute describes networks that are significantly more clustered than random networks, but they lie within the same scale with the latter. This property is rigorously defined on an available family of graphs, by detecting that the average path length scales logarithmically as the number of nodes tends to infinity, namely l =O(logn), with n→∞ (Xu and Sui, 2007; Porter, 2012). Due to the unavailability of studying a family of graphs in most of the empirical cases, since it is quite data-demanding to collect many aspects of the same network for different time periods, the small-world attribute detection for the GMN is applied using the approximation ω index proposed by Telesford et al. (2011). This index compares the clustering of the empirical network with that of a p(k)-equivalent (i.e. with the same degree distribution) lattice network c latt and the empirical network’s path
length with that of an p(k)-equivalent random network l l l c c
rand
, according to the formula:
(3) The null models are computed using a randomization algorithm (Maslov and Sneppen, 2002) and the ‘‘latticization’’ algorithm (Rubinov and Sporns, 2010), which both preserve the degree distribution of the original network. Values of ω are restricted to the interval [-1,1], where those close to zero illustrate the SW attribute, positive values indicate random characteristics, whereas negative values indicate more regular or lattice-like characteristics (Tsiotas and Polyzos, 2015b). rand
latt
2.4.3. Rank-size distributions Rank-size (RS) distributions are used in Statistics to quantify descending arrangements of cases, usually with a power-law f(x)=αx-α expression. The alpha (α) exponent of a RS distribution operates as an indicator of homogeneity for the source dataset. The study of RS distributions scarcely enjoys of applications in CNA, but it is rather common and effective tool in the fields of Regional and Spatial Analysis and particularly in the study of urban systems (Tsiotas, 2016). Provided that the GMN is a spatial network of interregional scale with socioeconomic activity, such an approach is considered compatible to be applied also in this case of maritime network and it is expected to serve interdisciplinary purposes and to provide interesting insights. Moreover, there is also a practical necessity for applying RS distribution analysis in this paper, because of the unavailability on collecting plenty of (frequency-based) distribution data for the Gβ layer, which shows only 7 discrete classes of node degree (k=1,2,3,4,5,6,12). Subjected to these few degree distribution cases, none of the curve fitting methods is capable to produce fittings of high determination and to provide useful insights, thus necessitating a supplementary analysis. Rank-size Page | 10
distributions are computed for the most common network node-measures, such as degree, centrality, clustering coefficient and modularity classification. In particular, the RS distributions are generated from collections of node scores that refer to a certain network measure (or attribute) X and they are extracted from the same layer Gi, i=α,β, after they are ranked in descending order, according to the relation: X N xi i
p : X N X N : p ( n ) xn xi xi 1 , i 1,..., n -1
(4) The RS distributions of the layer Gα include 229 elements, whereas these of Gβ include 35. For each distribution a scatter plot is drawn to visualize the pattern of the data. In this paper the RS distributions are not restricted to power-law fitting patterns, but they are modeled to the most optimal possible curve. Precisely, the technique of parametric fitting is applied on each distribution p (n) xn , choosing among the available
exponential, power-law (Stumpf and Porter, 2012), and linear polynomial (Tsiotas and Polyzos, 2015a,b) fitting curves f(x) this with the maximum coefficient of determination (R2), according to the relation:
f f ( x ) f i ( x ) i 1, 2, 3 a e bx , a x b , a x b a , b [ x ]
subjected to
R 2f max R 2fi ( x )
(5) where R is the coefficient of determination computed for the model f and [ x ] expresses the set of the real functions. Each fitting curve f(x) is an expansion function of the distribution p(n) in the positive real line . After the estimation of the best fitting pattern f(x), corresponding RS cases of the Gα and Gβ layers are compared and the analytical form of transformations g:Gα→Gβ are computed. In particular, for a given pair of fitting curves f1(x) and f2(x) and a given network attribute X, the transformation g(x) producing the fitting curve f2(x) of the Gβ layer from the fitting curve f1(x) of the Gα layer is shown at the relation: X i , Yi , i a, 2 f
f : X Y , f f ( x ) f : X Y , f f ( x ) g : Y X Y , g f a ( x ) g f ( x ) f ( x )
(6) This process generates a transformation gX:Gα→Gβ for each network attribute X and thus comparisons may provide insights either for the network attribute X or for the layers’ consistency in respect to this attribute. The outcomes of this analysis are evaluated comparatively with other results of the empirical examination.
2.5. Conversion of network variables This part deals with the problem of insufficient connectivity (Koschutzki et al., 2005), which describes the conversion of local measures (i.e. computed within each connected component) into aggregate ones (i.e. computed for the total network composed by disconnected components), within the same layer. In practice, this problem in the case of GMN deals with the manipulation of the isolated nodes in the calculations of the global measures, since all the network measures are well defined only within connected components. A pair of the available repairing connectivity methods is used in this study, the local restriction (LRM) and the proportional conversion method (PCM) (Tsiotas and Polyzos, 2015a). According to LRM, calculations are restricted within connected components, assigning zeros to the cases of the isolated nodes, whereas in the PCM the Page | 11
local node-scores are converted to their corresponding global (aggregate) scores proportionally to the size of the components they belong. The PCM was empirically detected that outperforms 5% the LRM in the description of the GMN’s connectivity, by comparing a pair of linear regression models (response variable: network degree) constructed on a set of topological and socioeconomic variables that were converted to global using each of these methods. Between these two models the one with the PCM variables showed a 5% better determination ability than the LRM 2 2 variables model ( RPCM .729 RLRM .676 ), which was interpreted as a quantification of better performance (Tsiotas and Polyzos, 2015a). Based on this consideration, the PCM was chosen as the proper method to apply in the part of the regression analysis. However, for all the other parts of the analysis (rank-size distributions, compare means, correlations), which focus more on the statistical mechanic rather on the socioeconomic aspect of this network, the simpler LRM method is considered sufficient to provide insights.
2.6. Empirical analysis This part applies three separate statistical methods for the analysis of the vector network variables (table 2); the independent samples t-test for the comparison of means, the Pearson’s bivariate coefficients of correlation, and the multivariate linear regression model. First, the independent-samples t-test (Hays, 1981; Norusis, 2004) compares the means μα and μβ between two groups of cases (e.g. Gα and Gβ) originating from the same initial set (e.g. ). The Levene’s test is applied in order to detect the equality of variances between these groups and it produces a separate series of independent samples t-test results (t-values, degrees of freedom - df, ) per case (for separate/unpooled and pooled variances), where the researcher decides the proper one according to its significance that is obtained from the t-distribution (Norusis, 2004). Next, the Pearson’s bivariate coefficients of correlation (Walpole et al., 2012) capture the existence of linear relations between pairs of topological variables (X,Y). The coefficient ranges within the interval [-1,1], indicating a perfect linear relation between X,Y when rxy 1 . The final part of the empirical analysis applies a multivariate linear regression model and particularly the Backward Elimination Method (BEM), for the description of the GMN’s connectivity (variable degree - DEG) as a function of topological and socioeconomic variables. The BEM algorithm starts with the full model including all the available predictors xi, and it provides a sequence of models yk, where the most insignificant predictors are removed in succession (one per loop), among those that have statistical significance (p-value) p≥0.1. For a given set of predictor (independent) variables Xn={x1, x2,…, xn} the sequence of the BEM response (dependent) variables (yk)k>0 is described as follows (Walpole et al., 2012; Tsiotas and Polyzos, 2015a,b):
y k k1,...,n y k
n k 1
b x i 1
i
i
ck 1
X n {x1 , x 2 ,..., x n }, x X (7) n k 1 , i X X n k 1 {x p } nk x p X n k 1 : P[ b( x p ) 0] max{P[ bi 0] 0,1} The final (optimum) model of the sequence (yk)k>0 includes only the significant predictors, where each of the standardized coefficients bi quantifies the participation of the ith component to the formation of the response variable y (Tsiotas and Polyzos, 2015a,b). Page | 12
3. Results and Discussion 3.1. Network measures and descriptives The GMN connects via shipping routes the insular with coastal Greece, constituting a transportation system of major importance for the national economy. According to the national 2011 census (ELSTAT, 2011), the population of the municipalities that include the 229 ports of the Gα layer is 2595272 inhabitants, representing an amount of 24% of the total population, whereas the population of insular or coastal nodes/prefectures of the Gβ layer occupies an amount of 87% of the country’s population. The network measures of the GMN () are shown comparatively in table 4. The Gα connects 229 Greek ports via 231 bidirectional shipping routes, whereas the connectivity of Gβ describes that the 35 coastal and insular Greek prefectures have 43 possible maritime interconnections. In the majority of cases, the measures of the Gβ layer are smaller than those of Gα, a fact that complies with the intuition that the reduction of the data resolution (i.e. decreasing the scale’s fraction) causes an inevitable loss of information. However, the cases of average degree k , average edge length dij , graph density ρ, and average clustering coefficient c seem to lie against this rule and to suggest points of interest. Table 4. Network measures of the Ga (port level) and Gβ (prefectural level) layers Scale Coefficient(a),(b)
Change
15.28 18.61 63.16 121.81
-84.72 -81.39 -36.84 21.81
2.966
86.725
-13.275
9136.91
7468.16
81.74
-18.26
nm
40.89
86.84
212.37
112.37
net net # # # # nm net
0.009 0.020 87 77 12 15 624.14 0.345
0.072 0.106 8 6 Ø 5 434.65 0.598
816 526 9.20 7.79 NaN(e) 33.33 69.64 173.33
716 426 -90.80 -92.21 -100 -66.67 -30.36 73.33
Average Path Length (bin) / l
#
5.317
2.314
43.52
-56.48
Average Path Length (wei) / l
nm # net
217.39 5.88 0.764
188.50 4.47 0.585
86.71 76.02 76.57
-13.29 -23.98 -23.42
Metric Network type No. Vertices / n No. Edges / m Max. degree / kmax Average degree / k
Average degree / k Total length /
d
(d)
ij
Average Edge Length / dij Graph Density / ρ Graph Density / ρ΄ Connected components Isolated nodes Nodes with loops Network Diameter / d(G) Network Diameter's distance / d(G) Average clustering coefficient / c
(a)
Average nearest neighbours degree Modularity / Q
Values shown in percentages Calculated as the % ratio Gβ/Gα (c) Calculated as the % ratio (Gβ-Gα)/Gα (b)
Measure
# # # #
Gα undirected 229 231 19 2.017
Gβ undirected 35 43 12 2.457
#
3.42
nm
(d)
(a), (c)
Measures with the mark (΄) calculated without isolated nodes (e) NaN=Not a number
The maximum degree of the Gα is kmax,α=19 and belongs to the port of Piraeus, whereas the maximum degree of the Gβ is kmax,β=12 and belongs to the prefecture of Attiki that includes the Piraeus port. This observation, regardless the numerical inequality, illustrates a structural consistency of the hubs’ connectivity to node aggregation. Page | 13
Respectively, botth layers have h isolateed nodes (kkmin,α=kmin,β=0) implyiing that thee node aggreggation in thhe prefecturral level is insufficien nt to elimin nate disconnnectedness in the GMN.. Both the high h and zeero connecttivity cases seems to preserve p alsso their geography accordding to figurre 3. Howev ver, furtherr observations about the intermediiate degree cases k (kmmin,kmax) aree not clear by the vi sual inspecction of thee topologiccal map annd they necesssitate distribbution testin ng. The aveerage degreee k appeears to slighhtly grow through the traansformatioon g : G G and thhis exceptio onal perform mance is fur urther tested d at the empiriical analysiss. N Next, the obbserved inccrease in thee average edge e length for the Gβ layer seem ms to be expectted, whetheer taking intto consideraation that diistance is reelated to “trransportation n cost” (Tsiotaas and Polyyzos, 2015aa). This is bbecause nod de aggregation in GMN N has geneerated a new laayer (Gβ), where w the groups of nnodes (vβ) incorporatee within theeir boundarries the majoriity of the neighborho ood connecttivity of th he former layer (Gα). As a resu ult, the remainning connecctions are deeveloped beetween thesse new nodees (vβ) and tthus they arre more “distannt”. Howevver, the meaasure of thee graph den nsity provid des interestiing insightss in the adminnistrative levvel. The no ode aggregaation in this case produ uced a densser in conneectivity layer ((ρα<ρβ), whhich appearss to be the result of th he reduction n in the isoolated nodess, from 33.62% % of the Gα’s size to 17 7.14% of thhe Gβ’s size.. This implies that the nnode groupiing has induceed better communicat c tion abilityy to the generated g network, n ma mainly due to the reducttion of the isolated connectivityy. Finally, the t increase of the aaverage clu ustering coefficcient also supports s th he previous picture. The T inequality c a =00.345< c =0.598 impliees that is moore probablle for a nodde of the preefectural lay yer (vα) to hhave its neiighbors conneccted withinn the GMN N, rather thaan for nod de of the port layer (v β). Provideed that distancce increasees in the Gβ, this obseervation im mplies that node n aggreegation mak kes the distantt communiccation more accessible.. 3.2. Paattern Recoognition This suubsection examines e ho ow the nodee aggregatio on affects th he scale-freee and smalll-world attribuutes. In the first (SF) case, c the tw wo-sample K-S K test is applied a to ddetect wheth her the degreee distributioons between the Gα aand Gβ layeers follow the same ppower-law pattern (Tsiotaas and Polyyzos, 2015a)) and conseequently wh hether the SF F characteri ristics of thee GMN are preeserved. Figgure 4 show ws the overllaid plots with w the emp pirical degreee CDFs off the Gα and Gβ layers, whhere it can be b observedd that these two distrib butions havee not consid derably differeent patterns.
F Figure 4: Empirical E CD DFs of the vvariable deg gree k for th he (Gα annd Gβ layers). Page | 14
The previously detected similarity is verified from the results of the two-sample K-S test shown in table 5, where the null hypothesis for the equality of the degree distributions between the Gα and Gβ layers is retained. This implies that the node aggregation g : G G preserves the typology of the degree distribution in the GMN and thus both layers of this multilayer network () have the same possibility to possess the scale-free property. All the other results of the K-S test reject the null hypotheses (betweenness, closeness, clustering, modularity, edge distribution), implying that the node aggregation in the GMN affects all the topological attributes except from the degree distribution. Table 5. Results of the two-sample Kolmogorov-Smirnov Test(a) Most Extreme Differences Null Hypothesis The distributions DEGα & DEGβ are the same The distributions CBbin,α & CBbin,β are the same The distributions CBw,α & CBw,β are the same The distributions CCbin,α & CCbin,β are the same The distributions CCw,α & CCw,β are the same The distributions CLUSTα & CLUSTβ are the same The distributions MODα & MODβ are the same The edge distributions of Gα & Gβ are the same a. Cases include isolated nodes b. Asymptotic significances are displayed
nα
nβ
Abs(Δx)
Δx>0
Δx<0
229
35
.181
.181
229
35
.251
229
35
229
Statistic
Sig.(b),(c)
Decision
-.032
0.998
.272
Retain
.036
-.251
1.380
.044
Reject
.249
.062
-.249
1.371
.046
Reject
35
.414
.235
-.414
2.282
.000
Reject
229
35
.339
.339
-.127
1.871
.002
Reject
229
35
.377
.377
.000
2.079
.000
Reject
229
35
.952
.000
-.952
5.245
.000
Reject
231
43
.530
.530
-.009
3.190
.000
Reject
c. The significance level is 0.05 (2-sided test)
Next, the results of the approximate SW detection analysis are shown in table 6, where we can observe that the omega index of the Gα layer is (in numerical terms) smaller than this of the Gβ layer. Nevertheless, the indications of the analysis describe the existence of lattice-like characteristics (LL) for both network cases, which (at a first glance) they may appear contradictious to the observation of Tsiotas and Polyzos (2015a) that the portlevel layer (Gα) of GMN is ruled by some scale-free characteristics. Table 6. Results of the approximate small-world detection analysis c Gα Gβ
0,345 0,598
c
latt
0,208 0,415
l 5,317 2,314
l
rand
4,22 2,909
ω -0,865 -0,184
Indication LL LL
SW=Small-World characteristics RL=Random-like characteristics LL=Lattice-like characteristics
However, the approximate small-world detection proposed by Telesford et al. (2011) is a dichotomous consideration that cannot capture more than three typologies, namely small-world, random-like, or lattice-like characteristics, and thus it is deficient in the detection of the SF property. Due to this constraint, the outcomes of the degree distribution and the omega-index analysis appear complementary rather than contradictious for the pattern recognition in networks. Overall, this approach shows that the node aggregation in Page | 15
the GM MN producces a layer preserving p tthe lattice-llike charactteristics that at are imman nent in the network’s struucture, as well w as in thhe majority of the spatial networkss’ structure,, but in detail it appears to t smooth th he lattice coonfiguration n of this nettwork with some smalll-world attribuutes. 3.3. Raank-Size disstributions The raank-size (RS) distributions of the variables DEG D and ST TR are show wn in figurre 5. In the pRRS(k) case (degree) ( thee optimum fits fα,k(x) and fβ,k(x) are powerr-law curvees with determ mination Rα2=0.853 an nd Rβ2=0.8891, whereeas for thee pRS(s) (sttrength) th hey are exponential curvees fα,s(x) and d fβ,s(x) withh determinattion Rα2=0.9 996 and Rβ2 =0.982.
butions pRS(k) (left) annd for the sttrength Figuree 5: Fittingg curves forr the degreee RS distrib distribbutions pRS(s) (right) off the (Gα aand Gβ layerrs ) (log-log g axes). F Figure 6 shows s the edge RS distribution ns pRS(e) that are bbest describ bed by exponential fittinggs (Rα2=0.942 and Rβ2= =0.891).
Figuree 6: Fitting curves for the RS disttribution off the edge-leength pRS(e) of the (llog-log axes). N Next, figurre 7 shows the RS disstributions of the betw weenness ccentrality pRS R (CB), calculaated either on binary or o weightedd distances. All these cases are desscribed by powerp law fiitting curvees. Despite any numeerical differrences obseerved in thheir formulas, the patternns of the Gα and Gβ layers retaiin the samee typology (power-law w or exponential). Howevver, the closseness centrrality RS diistributions pRS(CC) appear to breaak this rule (figure Page | 16
8), illuustrating thhat the Gα layer l is desscribed by an exponen ntial patternn, whereas the Gβ layer bby a linear polynomial. p . LEGEND CBi,j : betweeenness cetrality of layer i=α,β computed on j=bin j (binary) oor w (weighted d) distances. R2=0.809) CBα,w : f(x) = 5937·x-0.6851 (R -0.8404 CBα,bin : f(x) = 7728·x (R R2=0.946) -1.0 CBβ,w: f(x) = 206.5·x 2 (R2==0.939) CBβ,bin: f(x) = 120.4·x-1.304 (R R2=0.988)
Figuree 7: Fittingg curves forr the betweeenness centtrality RS distributions d s pRS(CB) of the (log-loog axes, weighted and binary b distaances consid dered). LEGEND CCi,j : closeneess cetrality of layer i=α,β computed on j=bin (binary) or w (weighted d) distances. CCα,w: f(x) = 407.4·exp{-0.001436x} (R2=0.886) CCα,b: f(x) = 4.739·exp{-0.001437x} (R2= 0.875) CCβ,w: f(x) = -9.075·x+295. 8 (R2= 0.965) CCβ,bin: f(x) = -0.08451·x+33.458 (R2= 0.789)
Figuree 8: Fitting curves for the RS disttributions off the closen ness centraliity pRS(CC) for the Gα andd Gβ modelss (log-log ax xes). S Similarly, the RS distrributions off the Gα and d Gβ layers are not of tthe same ty ypology for thee cases of the t clusterin ng coefficieent pRS(c) and a the modularity claassification pRS(Q) (figuree 9).
Figuree 9: Fitting curves for the R-S disstributions of o the clusteering coefficcient (left) and a for the moodularity daata (right) off the Gα andd Gβ modelss (log-log ax xes). Page | 17
Table 7 tabulates the results of the RS distribution analysis, showing per layer-case the formulas of the fitting curves fα(x) and fβ(x) and of their transformations gX(fα(x))= fβ(x). Table 7. Results of the rank-size distributions analysis Transformation g : Ga G
Measure X=
Gα fitting curve
DEG
y 20.8 x 0.5466 (power-law)
g ( x)
STR
y 861exp{0.028 x} (exponential)
g ( x)
EDGE
y 305 exp{0.019 x} (exponential)
CBw
y 5937 x 0.6851 (power-law)
Gβ fitting curve
11.47 x1.194 1.194 20.8 (power-law)
1307 3.286
x
y 11.47 x 0.6525
(power-law) y 1307 exp{ 0.092 x}
3.286
861 (power-law) 368 g ( x) x 2.316 3052.316 (power-law) g ( x)
206.5 1.532
5937
(exponential) y 368 exp{ 0.044 x}
(exponential) y 2 0 6 .5 x 1.05
x1.532
CBbin
y 7728 x 0.8404 (power-law)
CCw
y 407.4 exp{0.0143 x} (exponential)
295.8 ln(407.4631.96 )
y 4.74 exp{0.0143 x} (exponential)
3.458 ln(4.745.885 )
CCbin
CLUST
MOD
y 3.138 x0.8248 (power-law)
y 46.28 exp{0.0112 x} (exponential)
(power-law)
(power-law) 120.4 1.551 g ( x) x 77281.551 (power-law) g ( x) 631.96 ln( x)
(logarithmic) g ( x) 5.885 ln( x)
(logarithmic) g ( x) 1.383 exp{ A},
x 3.138
1
A 0.0867
y 120.4 x 1.304
(power-law)
y 0.0845 x 3.458
(linear)
y 1.383 exp{0.0867 x}
0.8248
12.11 ln(46.2834.286 ) (logarithmic)
(linear)
(exponential) g ( x) 34.286 ln( x)
y 9.075 x 295.8
(exponential)
y 0.384 x 12.11
(power-law)
According to table 7, the cases preserving the typology of their RS distributions are the variables degree, strength, betweenness centrality, and the edge length, whereas the closeness centrality, clustering coefficient, and modularity alter their RS typology through this transformation. Amongst these results, the case of degree verifies the previous findings illustrating the robustness of this network measure to node aggregation, whereas the other cases provide further insights about the topological consistency of the GMN in terms of the RS distribution typology preservation.
3.4. Empirical analysis 3.4.1. Comparison of means This subsection examines the effects of node aggregation to the scale (i.e. mean values) of the network measures, by applying an independent samples t-test for the comparison of means. In particular, the test examines the statistical equality of the means μi,α and μi,β that are computed for a given network measure i within the layers Gα and Gβ respectively. Page | 18
Table 8 shows the results of this analysis, where it can be observed that only the measures of degree and closeness centrality verify the null hypothesis of equality. This implies that the variables of degree and accessibility are indifferent to the effects of node aggregation in GMN, whereas the cases of neighborhood accessibility (strength), neighborhood connectivity (clustering), intermediacy (betweenness centrality), topological distance (binary closeness centrality), and community structure (modularity) are affected and change their scale. Table 8. Independent samples t-tests for the comparison of means μi,α, μi,β, i=1,…,9 Levene's Test for Equality of Variances Equal variances: (i=1) assumed n/a(a) DEG (i=2) assumed STR n/a(a) (i=3) assumed CB_bin n/a (i=4) assumed CB_w n/a (i=5) assumed CC_bin n/a (i=6) assumed CC_w n/a (i=7) assumed CLUST n/a (i=8) assumed MOD n/a (i=9) assumed EDGE n/a
F .503
Sig. .479
7.744
.006
9.489
.002
19.140
.000
75.527
.000
19.863
.000
8.261
.004
49.038
.000
5.629
.018
t-test for Equality of Means t .999 1.062 4.057
df 262 47.156 262 37.808 262 238.431 262 237.467 262 181.291 262 56.248 262 40.340 262 260.023 272 53.525
-1.844 -4.668 -2.323 -5.888 -1.737 -3.599 .538 .682 3.997 3.293 -10.348 -24.363 6.891 6.104
Sig. (2-tailed) .319 .294 .000 .008 .066 .000 .021 .000 .084 .000 .591 .498 .000 .002 .000 .000 .000 .000
Mean Difference .440 .440 247.29 247.29 -112.03 -112.03 -267.16 -267.16 -.800 -.800 11.25 11.25 .229 .229 -41.74 -41.74 87.53 87.53
Std. Error Difference .440 .414 60.95145 87.74439 60.76084 24.00034 115.01343 45.37646 .46071 .22235 20.91014 16.48495 .057180396 .069404525 4.034 1.713 12.70220 14.33981
95% Confidence Interval of the Difference Lower Upper -.427 1.306 -.393 1.272 127.27 367.31 69.63 424.95 -231.67 7.61 -159.31 -64.75 -493.63 -40.69 -356.55 -177.77 -1.71 .107 -1.24 -.362 -29.93 52.42 -21.77 44.27 .116 .341 .088 .369 -49.68 -33.80 -45.11 -38.37 62.52 112.53 58.77 116.28 (a) n/a=not assumed
3.4.2. Correlation analysis This part detects linear relations between pairs of network variables within each layer in order to examine in what cases the correlations are preserved between those layers. The further purpose of this approach is to investigate whether and in what level the distance affects the topology of the GMN, which is extracted from the correlations between homologous binary and distance-weighted network variables (Tsiotas and Polyzos, 2015a), and particular from the pairs r(DEGi,STRi), r(CBi,bin,CBi,w), and r(CCi,bin,CCi,w), where i=α,β. The results of the analysis are shown in table 9, where correlations within the Gα layer are shown in the upper triangular matrix and within the Gβ layer in the lower triangular matrix. In the case of the Gα layer, all the previously mentioned coefficients (rDEG,STR, rCB,bin, rCB,w, rCC,bin, rCC,w) show high (>0.75) and significant results (r(DEGα,STRa)=0.840**, r(CBα,bin,CBα,w)=0.803**, r(CCα,bin,CCα,w)=0.968**), whereas for the Gβ layer amongst these cases (r(DEGβ,STRβ)=0.957**, r(CBβ,bin,CBβ,w)=0.958**, r(CCβ,bin,CCβ,w)=-0.299) the coefficient r(CCβ,bin,CCβ,w) is neither high nor significant. Table 9. Pearson’s bivariate coefficients of correlation r(xi,α, xj,α)(a)(b) and r(xi,β, xj,β)(a)(c) DEG DEG
Pearson Correlation
STR
Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N
STR .840**
CBbin .655**
CBw .604**
CCbin .441**
CCw .457**
.000 229 .957**
.000 229 .791**
.000 229 .569**
.000 229 .290**
.000 229 .364**
.000 35
.000 229
.000 229
.000 229
.000 229
Page | 19
CBbin
Pearson Correlation
DEG .854**
STR .876**
CBw
Sig. (2-tailed) N Pearson Correlation
.000 35 .869**
.000 35 .848**
.958**
CCbin
Sig. (2-tailed) N Pearson Correlation
.000 35 .088
.000 35 -.039
.000 35 .004
Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N
.616 35 .305 .074 35
.825 35 .308 .072 35
.980 35 .015 .931 35
CCw
CBbin
CBw .803**
CCbin .227**
CCw .268**
.000 229
.001 229 .305**
.000 229 .307**
.000 229
.000 229 .968**
-.011
.951 .000 35 229 .040 -.299 .819 .081 35 35 a. i,j=DEG, STR, CBbin,CBw, CCbin,CCw b. correlations within the Gα layer are shown in the upper triangular matrix c. correlations within the Gβ layer are shown in the lower triangular matrix **. Correlation is significant at the 0.01 level (2-tailed).
In conjunction with the results of the RS distribution analysis in table 7, we can observe that the network measures that preserve their high and significant correlations r(xi,bin, xi,w) along with the typology of their RS distributions are the cases of degree and betweenness centrality. This indicates potentials for linking the RS distribution analysis with the detection of spatial effects in the network topology and it suggests a topic of further research. Also, the authors interpreted in one of their previous works (Tsiotas and Polyzos, 2015a) that “capturing high and significant correlations between pairs of binary and their homologous distance-weighted variables illustrates an attribute of structural indifference to distance that probably is related with the scale-free property of a network”. The results of table 9 provide further evidence for the evaluation of this statement, according to which both of the Gα and Gβ layers are related with the scale-free attribute (through their power-law degree distribution patterns) and both of them share high and significant correlations r(DEG,STR), r(CBbin,CBw). However, the insignificant coefficient r(CCβ,bin,CCβ,w)=-0.299 breaks the previous hypothesis and constraints its power to those variables that do not belong to the family of the accessibility measures, provided that the closeness centrality is measure of accessibility, whereas the degree and betweenness are metrics of network topology. This interpretation suggests a topic of further research necessitating further empirical evaluation. 3.4.3. Regression analysis The final part of analysis examines the consistency of the GMN in respect to its socioeconomic information. This is tested by the comparison of two linear regression models (constructed with the BEM algorithm) that express the connectivity (variable: degree) of the GMN (), where the first corresponds to the Gα and the second to the Gβ layer. In each case (i=α,β) the predictor variables entered in the models are those shown in table 2, except form the variable Strength (STR) that is highly correlated with the degree variable (see table 9). The results of the regression analysis are shown in table 10, where firstly it can be observed (sub-table: a) that both models have significant and high determination ability in the description of the available topological and socioeconomic data’s variation (R2α=0.729 for the Gα layer and R2β=0.936 for the Gβ), according to the results of the ANOVA analysis shown in table 10(b). This indication verifies the effectiveness of modeling the GMN’s connectivity by using a multivariate regression approach. Moreover, the inequality R2β=0.936> R2α=0.729 implies that the node Page | 20
aggregation has generated a network layer (Gβ) that is more efficient than the physical network (Gα) to represent information about the socioeconomic environment of the maritime transportation system in Greece. Table 10. a. BEM Model Summary Model(a) Gα Gβ
Loops until optimum model 10 8
Model(a) Gα Regression Residual Total Gβ Regression Residual Total
N 229 35
R2 .729 .936
Adjusted R2 .722 .922
Std. Error of the Estimate .648 .628
b. ANOVA table Sum of Squares 250.50 93.12 343.62 161.65 11.04 172.69
df 6 222 228 6 28 34
Mean Square 41.75 .42
F 99.53
Sig. .000(b)
26.94 0.39
68.33
.000(b)
c. BEM Coefficients
Gα
Gβ
Optimum Model(a),(b),(c) (Constant) CBbin,α CBw,α CCw,α CLUSTα C_SECα ARα (Constant) CCbin,β CBw,β CCw,β CLUSTβ C_SECβ ARβ
Unstandardized Coefficients B Std. Error -.755 .251 .001 .000 .001 .000 .006 .001 2.120 .363 1.232 .352 .177 .030 -.776 .632 .000 .000 .040 .004 .004 .002 1.207 .338 1.901 1.067 .062 .021
Standardized Coefficients B .176 .259 .281 .227 .125 .305 .132 .711 .138 .211 .112 .234
t -3.005 2.240 4.162 6.911 5.844 3.504 5.841 -1.229 2.154 9.375 2.027 3.570 1.782 2.991
Sig. .003 .026 .000 .000 .000 .001 .000 .229 .040 .000 .052 .001 .086 .006
a. Dependent Variable: DEG b. Predictors: (Constant), CBbin, CBw, CCbin, CCw, CLUST, MOD, ACC, POP, ASEC, CSEC, AR, DEP, LD, UNLD c. Variables converted to global using PCM
Finally, despite the (expected) changes observed in the values of the standardized coefficients, it is obvious that both BEM models have retained the same significant predictor variables in their final expressions. This verifies once more the consistency of the degree variable to the node aggregation, which was detected in every of the previous steps of analysis. In socioeconomic terms, this result illustrates that the GMN is consistent to its “socioeconomic determinants” when it is submitted to node aggregation, which implies that the socioeconomic information of this system is described by the same predictors for both levels of its spatial (port) and administrative (prefectural) organization.
Page | 21
4. Limitations of the study and evaluation The limitations of this study mainly concern the complexity of the graph modeling and the empirical constraints of the analysis. On the one hand, modeling the GMN as multilayer network is a complex procedure subjected to limitations depending on the researcher’s choices in an effort to construct the best representative network model. Such limitations are inevitable regardless the effectiveness of using CNA in the study of maritime transportation systems (see Koschutzki et al., 2005; Barthelemy, 2011; Tsiotas and Polyzos, 2015a; Ducruet, 2017). Provided that both the Gα and Gβ layers are constructed under the same modeling constraints (e.g. undirected, spatial weighted, etc.), the topologies of these networks should differ only in the effects caused by node aggregation and thus the analysis in this paper suffices to provide insights illustrating the transformation g : G G . On the other hand, this study is submitted to limitations concerning data considerations due to its empirical nature. For example, both the correlation and the linear regression analysis are applied under the assumption of normality and linearity, whereas the error term is assumed to follow the standardized normal distribution and to be homoscedastic. The normality is also assumed for the independent samples t-test procedure (Norusis 2004; Hastie et al. 2009; Devore and Berk, 2012) and all the fittings f(x) are restricted to their determination ability. However, the effects of such limitations are considered to be mutually reduced by the multilevel empirical approach applied in this paper and by some interpretations of the results that are based on typologies and ignore numerical terms. Within this context, neither the graph modeling nor the data consideration constraints seems to suggest a considerable concern for this study. Overall, the power in the utility of the methodological approach is based on the multilevel consideration due to the grouping effect, which is invariant to the empirical specializations of the analysis. This multilayer consideration due to node aggregation proposes a methodological framework (figure 1) that advances the quantitative analysis in multilayer networks, it promotes interdisciplinary research in the field of complex networks, and it can enjoy a variety of applications examining the effects of a grouping criterion (node aggregation in this study) on the network topology. 5. Conclusions This article modeled the Greek Maritime transportation Network (GMN) in the L-space representation as a non-directed multilayer (bilayer) graph consisting of two layers, the port layer Gα(229,231), where nodes represent ports, and the prefecture-layer Gβ(35,43), where nodes represent coastal and insular prefectural groups of these ports. The purpose of this study was to evaluate the kind of topological information that remains unalterable due to node aggregation and to develop a framework linking data of an empirical network with data of its socioeconomic environment, when the latter are available for hierarchically higher levels of aggregation. In technical terms, this study developed a methodological framework for the examination of the effects of node aggregation to the network topology, which are expressed by the set T={gX: X=network attribute} of transformations g X : G G captured for different network attributes {X}.
The research question was examined empirically on topological and socioeconomic data from the GMN, providing a novel methodological framework for further research. The general outcome of the analysis verifies the intuition that the reduction of data resolution (i.e. moving in the node aggregation hierarchy from a subject to a supernatant level) results to an inevitable loss of information. However, the multilevel statistical approach provided some interesting insights opposing to this intuition. Firstly, the GMN appeared to preserve Page | 22
its structural consistency to borderline connectivity due to node aggregation, implying that both hubs and isolated nodes preserved their geographic locations in the network map. However, the node aggregation shrunk neighborhood connections and made the network neighborhood more distant, but it simultaneously made the communication between nodes of the GMN more accessible. On the other hand, it generated a network layer with better communication ability, which was discharged from redundant or isolated mass. The node aggregation in the GMN did not affect the average connectivity (degree) and accessibility (closeness centrality) of the network, but it affected its neighborhood accessibility (strength), neighborhood connectivity (clustering), intermediacy (betweenness centrality), topological distance (binary closeness centrality), and community structure (modularity). The analysis also highlighted that the indifference of a network to spatial distance, which is captured as high and significant correlations r(xi,bin,xi,w) between homologous binary and distance weighted network variables, is reflected on those cases preserving the typology of their RS distributions due to node aggregation. Finally, the GMN was proven consistent to its “socioeconomic determinants”, because it preserved the same predictors describing its connectivity (variable degree) when it is expressed in terms of linear regression, despite that the model of the supernatant (prefectural) level showed better determination ability in the description of the socioeconomic environment of this transportation system. Through the comparison between these two layers (port- and prefectural-) of the GMN, this paper targeted to facilitate the linking of information between different levels of hierarchy in node aggregation and to serve interdisciplinary purposes. In the field of spatial and regional statistics this approach facilitates the management and the link of data between different levels of territorial units for Statistics and is capable to provide a standardization reference for those spatial units that are not included in a typical classification (e.g. NUTS I, II, and III). In the field of regional planning and policy, it proposes a methodological framework for the link and evaluation of different levels of spatial and administrative division existing in the complex institutional framework for the strategic and spatial planning of countries worldwide. In the field of CNA, this paper proposed a methodological framework for capturing and evaluating the consistency of the topological changes induced due to node aggregation between networks representing different layers in a multilayer model. Finally, in the study of GMN, the analysis illustrated the effectiveness of the administrative division in the Greek maritime transportation system, because the topology of the supernatant (prefectural) level facilitates the communication between distant places, it reduces an amount of redundant information of the network, it gains determination of socioeconomic representation, and thus it favors the application of control and policy in this macroscopic system.
6. References Albert, R., Barabasi, A-L., (2002) “Statistical Mechanics of Complex Networks”, Review of Modern Physics, 74(1), pp.47-97. Barabasi, A-L., Albert, R., (1999) “Emergence of Scaling in Random Networks”, Science, 286(5439), pp.509-512. Barthelemy, M., (2011) “Spatial networks”, Physics Reports, 499, pp.1–101. Bastian, M., Heymann, S., Jacomy, M., (2009) “Gephi: An Open Source Software for Exploring and Manipulating Networks”, Proceedings of the Third International ICWSM Conference, pp.361–362.
Page | 23
Blondel, V., Guillaume, J.-L., Lambiotte, R., Lefebvre, E., (2008) “Fast unfolding of communities in large networks”, Journal of Statistical Mechanics: Theory and Experiment, 10, P10008. Boccaletti, S., Bianconi, G., Criado, R., del Genio, C. I., Gomez-Gardenes, J., Romance, M., Sendina-Nadal, I., Wang, Z., Zanin, M., (2014) “The structure and dynamics of multilayer networks”, Physics Reports, 544, pp.1–122. Boccaletti, S., Latora, V., Moreno, Y., Chavez, M., Hwang, D.-U., (2006) “Complex networks: structure and dynamics”, Physics Reports, 424, pp.175–308. Brandes, U., Robins, G., McCranie, A., Wasserman, S., (2013) “What is network science?”, Network Science, 1, pp.1–15. De Montis, A., Caschili, S., Chessa, A., (2011) “Time evolution of complex networks: commuting systems in insular Italy”, Journal of Geographical Systems, 13(1), pp.49–65. Devore, J., Berk, K., (2012) Modern Mathematical Statistics with Applications, London, UK, Springer-Verlag Publications. Diestel, R., (2005) Graph Theory, Third Edition. Heidelberg, Germany, Springer-Verlag Publications. Ducruet, C., (2017) “Multilayer dynamics of complex spatial networks: The case of global maritime flows (1977–2008)”, Journal of Transport Geography, 60, pp.47–58. Ducruet, C., Beauguitte, (2014) “Spatial Science and Network Science: Review and Outcomes of a Complex Relationship”, Networks and Spatial Economics, doi 10.1007/s11067-013-9222-6. Ducruet, C., Ietri, D., Rozenblat, C., (2011) “Cities in Worldwide Air and Sea Flows: A multiple networks analysis”, European Journal of Geography, 528, doi:10.4000/cybergeo.23603. Eurostat, (2017) NUTS – Nomenclature of territorial units for statistics, Statistics illustrated, available at the URL: ec.europa.eu/Eurostat/web/nuts [last accessed: 025-2017]. Fortunato, S., (2010) “Community detection in graphs”, Physics Reports, 486, 75–174. Google Maps, (2013) “Google Mapping Services”, available at the URL: www.google.gr/maps?hl=el [last accessed: 29-8-2013] Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning. Data mining, inference and prediction (2nd ed.). New York: Springer. Hays, L. (1981). Statistics, 3rd ed. New York: Holt, Rinehart and Winston Publications. Hellenic Statistical Service (ELSTAT), (2011) “Results of the Census of PopulationHabitat 2011 referring to the permanent population of Greece”, Newspaper of Government (ΦΕΚ), Second Issue (Τ-Β), Number 3465, 28-12-12. Kivela, M., Arenas, A., Barthelemy, M., Gleeson, J., Moreno, Y. Porter, M. A., (2014) “Multilayer networks”, Journal of Complex Networks, 2, pp.203–271. Koschutzki, D., Lehmann, K., Peeters, L., Richter, S. (2005) “Centrality Indices”. In Brandes, U., Erlebach, T., (Eds.), Network Analysis, Berlin, Germany, SpringerVerlag Publications, pp.16-61. Makkonen, T., Salonen, M., Kajander, S., (2013) “Island accessibility challenges: Rural transport in the Finnish archipelago”, European Journal of Transport Infrastructure Research, 13, pp.274–290. Maslov, S., Sneppen, K., (2002) “Specificity and stability in topology of protein networks”, Science, 296, pp.910–913.
Page | 24
Greek Ministry of Environment and Energy (YPEKA) (2017) “Regional planning and urban development”, available at the URL: www.ypeka.gr/Default.aspx?tabid=2288&language=en-US [accessed: 5/5/17]. Norusis, M., (2004) SPSS 13.0 Statistical Procedures Companion, New Jersey (USA), Prentice Hall Publications. Polyzos, S., Petrakos, G., (2000) “Interregional Distances and Productivity of Regions: An Empirical Approach”, Technical Chronicles Scientific Journal, TCG, II, 1-2, pp.5968. Polyzos, S., (2009) “The Egnatia motorway and the changes in interregional trade in Greece: an ex ante assessment”, European Spatial Research and Policy, 16(2), pp.23-47. Porter, M. A., (2012) “Small-world network”, Scholarpedia, 7, 1739. Rubinov, M., Sporns, O., (2010) “Complex network measures of brain connectivity: uses and interpretations”, Neuroimage, 52, pp.1059–1069. Salet, W. G., Kreukels, A., (2003) Metropolitan governance and spatial planning: comparative case studies of European city-regions, London, Spon Press (Taylor & Francis). Stumpf, M. P., Porter, M. A., (2012) “Critical truths about power laws”, Science, 335(665), doi:10.1126/science.1216142. Telesford, Q., Joyce, K., Hayasaka, S., Burdette, J., Laurienti, P., (2011) “The Ubiquity of Small-World Networks”, Brain Connectivity, 1(5), pp.367–375. Tsiotas, D., Polyzos, S., (2015a) “Analyzing the Maritime Transportation System in Greece: a Complex Network approach”, Networks and Spatial Economics, 15(4), pp.981–1010. Tsiotas, D., Polyzos, S., (2015b) “Decomposing multilayer transportation networks using complex network analysis: A case study for the Greek aviation network”, Journal of Complex Networks, 3(4), pp.642–670. Tsiotas, D., (2016) “City-size or rank-size distribution? An empirical analysis on Greek urban populations”, Theoretical and Empirical Researches in Urban Management (TERUM), 11(4), pp.1-16. Walpole, R. E., Myers, R. H., Myers, S. L., Ye, K., (2012) Probability & Statistics for Engineers & Scientists, Ninth Edition, New York, Prentice Hall Publications. Xu, Z., Sui, D. Z., (2007) “Small-world characteristics on transportation networks: a perspective from network autocorrelation”, Journal of Geographical Systems, 9(2), pp.189–205.
Page | 25
HIGHLIGHTS - The manuscript studies the topological consistency of spatial networks due to node aggregation, by examining the changes captured between two different network representations (layers) of the same socioeconomic system (maritime network). - The main purpose of this study is to evaluate what kind of topological information remains unalterable due to node aggregation and, further, to develop a framework for linking the data of an empirical network with data of its socioeconomic environment, when the latter are available for hierarchically higher levels of aggregation, in an effort to promote the interdisciplinary research in the field of complex network analysis. - The methodological approach is developed on comparisons applied to a pair of network layers of different level of node aggregation (port-level vs. prefectural-level) in respect to a set of network measures and attributes. - The results of the analysis generally verify the intuition that the reduction of data resolution concludes to an inevitable loss of information. However, they provide insights opposing to this intuition, concerning the connectivity, some distance-based measures, and the socioeconomic information included in the network. - The overall approach formulates a methodological framework that can enjoy further applications about the effects that a grouping criterion induces to the network’s topology, providing physical, technical, and socioeconomic insights.
Page | 26