Analyzing multiplex networks using factorial methods

Social Networks 59 (2019) 154–170 Contents lists available at ScienceDirect Social Networks journal homepage: www.elsevier.com/locate/socnet Analyz...

Download PDF

7MB Sizes 0 Downloads 67 Views

Report

Full Text

Social Networks 59 (2019) 154–170

Contents lists available at ScienceDirect

Social Networks journal homepage: www.elsevier.com/locate/socnet

Analyzing multiplex networks using factorial methods a

b

c,⁎

Giuseppe Giordano , Giancarlo Ragozini , Maria Prosperina Vitale a b c

T

Dept. of Political and Social Studies, University of Salerno, Italy Dept. of Political Science, University of Naples Federico II, Italy Dept. of Political and Social Studies, University of Salerno, Via Giovanni Paolo II, n. 132, IT-84084 Fisciano, SA, Italy

ARTICLE INFO

ABSTRACT

MSC: 00-01 99-00

Multiplex networks arise when more than one source of relationships exists for a common set of nodes. Many approaches to deal with this kind of complex network data structure are reported in the literature. In this paper, we propose the use of factorial methods to visually explore the complex structure of multiplex networks. Specifically, the adjacency matrices derived from multiplex networks are analyzed using the DISTATIS technique, an extension of multidimensional scaling to three-way data. This technique allows the representation of the different types of relationships in both separate spaces for each layer and a compromise space. The analytical procedure is illustrated using a real world example and simulated data.

Keywords: Multiplex network data DISTATIS Multidimensional scaling Simulated data

1. Introduction Social relationships are the result of several types of interaction among actors (e.g., friendship, neighborship, kinship, membership) and can be described as complex networks in which these connections act together in a tie formation mechanism. Such complexity can be represented by a multilayer network, which arises when there is more than one source of connection for a common set of nodes or different sets of nodes. In this framework, multiplex networks consisting of a fixed set of nodes interacting through different relationships were introduced in the late 1970s (Lazega and Pattison, 1999; Pattison and Wasserman, 1999; White et al., 1976). More recently, multiplex networks have been considered a particular specification of the more general class of multilayer networks (Bianconi, 2018; Kivelaet al., 2014). Many issues have been addressed in exploring and analyzing this type of network. These issues include visualization (Erten et al., 2005; Fatemi et al., 2018; Matsuno and Murata, 2018; Xu et al., 2017), the procedures relating to the aggregation and flattening of layers (De Domenicoet al., 2015a; Kanawati, 2015; Kivela et al., 2014), community detection (Bothorel et al., 2015; De Bacco et al., 2017; Hmimida and Kanawati, 2015; Kuncheva and Montana, 2015; Mucha et al., 2010), blockmodeling (e.g., see Barbillon et al., 2017; Brusco et al.,

2013; Doreian et al., 2005), and statistical network models (ERG or p* models, Shafie, 2015, 2016; Snijders et al., 2013). In parallel, a wide range of software tools1 have been developed. Empirical studies on real multilayer network data have appeared in several fields (De Stefano and Zaccarin, 2013; Heaney, 2014; Rossi and Magnani, 2015; Santana et al., 2017; Simpson, 2015), as have newer and more sophisticated network measures and models (Battiston et al., 2017; Bródka et al., 2018; Halu et al., 2013; Magnani and Wasserman, 2017; Menichetti et al., 2014; Ostoic, 2017; Solá et al., 2013; Solé-Ribalta et al., 2014). In this paper, we use factorial methods to statistically analyze and visually explore multiplex networks by computing a data driven optimal weighting system for the layer aggregation and to analyze the hidden structure of this kind of network while preserving its inherent complexity. In general, factorial methods have been proposed in the social network analysis (SNA) framework to explore different network structures (see, e.g., D’Esposito et al., 2014a; Faust, 2005; Ragozini et al., 2015; Roberts, 2000), including the attributes of nodes and events (Giordano and Vitale, 2007, 2011), or to analyze network derived measures (Liberati and Zappa, 2013). In the case of multiplex networks, canonical correlation analysis has been adopted to identify the dimensions along which two networks are related to each other (Carroll, 2006), and an analytical procedure was recently introduced for

Corresponding author. E-mail address: [email protected] (M.P. Vitale). 1 See, for instance, the MuxViz platform (De Domenico et al., 2015b), a framework for the multilayer analysis and visualization of networks; XPNET, an extension of PNet (Wang et al., 2006) software incorporating the analysis of multivariate networks; and the R packages Multiplex (Ostoic, 2017) and Multinet (Dickison et al., 2016), developed for the treatment of multiplex and multilayer networks. ⁎

https://doi.org/10.1016/j.socnet.2019.07.005

Available online 18 August 2019 0378-8733/ © 2019 Elsevier B.V. All rights reserved.

Social Networks 59 (2019) 154–170

G. Giordano, et al.

dimension reduction using cluster analysis (Vörös and Snijders, 2017). However, several scholars have suggested the usefulness of singular value decomposition (SVD) techniques for three-way data for community detection in multiplex networks, similar to what has been done in monoplex networks (i.e., networks with one single layer) and referenced in Kivelä et al. (2014). In this line of research, we propose the use of the DISTATIS method (Abdi and Valentin, 2007; Abdi et al., 2005, 2007, 2009, 2012; Chollet et al., 2014; Lahne et al., 2018) to explore, visualize, and analyze a multiplex network. This method is a type of multidimensional scaling technique applied to a set of distance matrices derived from the same set of objects. More specifically, we show how to apply DISTATIS to multiplex network data and how to interpret the original procedure from a network perspective. Indeed, the method allows us to obtain a linear combination of the layers through optimal weights (the compromise) and to simultaneously visualize both the aggregated network and the single layers. The method enhances the visual exploration of: (i) the similarity among the structures of layers; (ii) the common structure of all layers, (iii) the network structure in terms of the similarity of nodes in each single layer; and (iv) the variation of nodes across layers. The remainder of the paper is organized as follows. Section 2 consists of a review of related works and a comparison of the proposed approach with other methods to deepen its differences and similarities with other approaches in analyzing multiplex networks. Section 3 describes the analytical procedure for handling multiplex network data in detail using the DISTATIS method. Section 4 discusses the potential of the proposed procedure and how it can be exploited when applied to a real dataset. Section 5 discusses the robustness and capability of the method through a simulation study. Section 6 concludes with suggestions for future lines of research.

In the class of statistical network models, exponential random graph models (ERGM or p* models; Pattison and Wasserman, 1999) have been extended to deal with this kind of network structure. An extension of the p* models to multivariate social network data was proposed for evaluating a wide range of hypotheses about the forms of structural interdependence in multiple relationships (Lazega and Pattison, 1999; Wang, 2013). These models expressed the probability of an overall multirelational network structure in terms of parameters associated with specific network substructures, and they were based on the multivariate Markov assumption to explore the interdependencies among the different types of relationships at the level of ties. The application of the multivariate p* model for analyzing three network relationships in a corporate law firm was discussed in Lazega and Pattison (1999). Recently, statistical models for multiplexity in the case of dynamic networks in the organizational field were discussed by Snijders et al. (2013). Random multigraph models for networks with multiple edges and loops of different kinds have been proposed for dealing with independent edge assignments (Shafie, 2015, 2016). This approach is not comparable to our proposal, either from the theoretical perspective or in relation to its goals. In a dimension reduction framework, clustering network data using blockmodeling techniques (e.g., Barbillon et al., 2017; Doreian et al., 2005; White et al., 1976) has been considered for handling multiplex data. Along with the blockmodeling framework, the seminal work of White et al. (1976) redefined the usual concepts of role and position in the context of several distinct types of ties present across all pairs in a network. Beyond the classical approach, Doreian et al. (2005) discussed multiple relationships among possible extensions to the generalized blockmodeling. More recently, the multiobjective blockmodeling and stochastic blockmodeling approaches (Barbillon et al., 2017; Brusco et al., 2013, respectively) have been introduced for classifying multirelational networks. These methods aim at clustering relationally equivalent nodes into blocks based on more than one type of relationship. Thanks to the similarities among structural equivalence measures in blockmodeling and the metrics adopted in factorial methods showed in D’Esposito et al. (2014a,b), our approach can be used in a complementary way to the blockmodeling analysis (Ragozini et al., 2016). With a similar aim, community detection approaches look for communities in graphs based on high edge density inside and low density outside the group (Bothorel et al., 2015; De Bacco et al., 2017; Hmimida and Kanawati, 2015; Kuncheva and Montana, 2015; Mucha et al., 2010). In addition, in this case, the proposed method aids in visually highlighting the presence of communities. In the framework of machine learning, visualization and exploratory methods are used for handling multiplex data by means of embedding algorithms and flattening procedures. Regarding the latter, distinct layer aggregation approaches are used for transforming a multiplex network into a monoplex graph. The two standard ways of aggregating a multiplex network into a single-layer one are the binary function and weighted aggregation function (Berlingerio et al., 2011; Kanawati, 2015). Common embedding algorithms can be then applied to the flattened network. In the literature, embedding algorithms for multiplex networks have also been proposed. More specifically, some authors have extended the force-directed layout to the case of multiplex networks (see, e.g. Erten et al., 2005; Fatemi et al., 2018); other contributions have adopted approaches that learn the embedding vectors via link prediction and enforcing an extra information-sharing embedding (Matsuno and Murata, 2018; Xu et al., 2017). Returning to the specific context of the factorial methods of analyzing multiplex networks, canonical correlation has been adopted for identifying the dimensions along which two networks are related

2. Related works and comparisons Multilayer networks explicitly incorporate multiple types of ties among nodes. They constitute a natural environment for describing complex systems in which different sets of nodes, or the same set of nodes in multiplex networks, can be connected according to different relationships, with each layer representing a relationship. In multilayer networks, it is possible to observe two sets of edges, which are as follows: (i) intralayer connections, that is, the edges that remain inside each layer, and (ii) interlayer connections, that is, the edges that cross the layers. To describe multilayer networks more formally, a multilayer network consists of a pair ( , ) , with = {Gk }k = 1, … , K , the collection of K networks, and = {Ekk } (k, k = 1, … , K ) , the collection of both intralayer (k = k′) and interlayer (k ≠ k′) edges. Note that, in the case of pure multiplex networks, such as that under analysis, the set of nodes is fixed, that is, V1 = V2 = … = VK = V, and the interlayer edges are constant and indicate only that the nodes are present in the different layers (Kivelä et al., 2014). Each layer Gk is then defined as (V, Ekk), with V = (v1, …, vn ) , the set of n nodes of each network, and Ekk ⊆ V × V, the set of edges. For k = 1, …, K, let us consider from the the corresponding adjacency matrix Ak = (aijk), with network Gk aijk = 1 if (vi, vj ) Ekk , and aijk = 0 otherwise. The methodological approaches developed for handling multiplex networks can be classified into several theoretical frameworks dealing with many different issues. An exhaustive literature review of related works is out of the scope of this contribution; however, a brief discussion of some related works will help introduce our proposal as a complementary approach, highlighting some advantages and possible drawbacks.

155

Social Networks 59 (2019) 154–170

G. Giordano, et al.

(Carroll, 2006). Cluster analysis has been employed for dimension reduction (Vörös and Snijders, 2017), and correspondence analysis has been used to study multiple relationships and attributes at both the individual and group levels in affiliation matrices (Zhu et al., 2016). In this paper, we consider factorial methods of handling systems consisting of networks with the same set of nodes interacting through multiple types of edges. Our goal is starting from the most general structure of this type of multiple network data and then using statistical methods designed for three-way data, as the set of K adjacency matrices = (A1, …, AK ) . Among gives rise to a three-way relationship matrix the plethora of statistical and factorial methods for dealing with threeway data (Kiers and Mechelen, 2001), we select the DISTATIS2 method (Abdi et al., 2005). Before detailing the proposed procedure, we want to stress some similarities and differences with other methods presented in the literature. First, it is worth noting that DISTATIS can be considered a particular flattening procedure followed by a certain spring embedding. Indeed, on the one hand, the compromise matrix, which is one of the main analytical feature of the method, is the distance matrix induced by the flattened network, where the weights of the aggregation function are data driven and optimal from a statistical point of view. This optimality comes from the use of the first eigenvector and the first eigenvalue of layers’ similarity matrix (Abdi et al., 2012). On the other hand, because the multidimensional scaling (MDS) applied to the monoplex can be viewed as a spring-embedding algorithm (Freeman, 2005), our approach can be considered a spring embedding applied to the flattened network. In addition, the proposed method provides a set of statistical and analytical tools for evaluating the quality of representation. It should be noted that the other factorial methods (Carroll, 2006; Vörös and Snijders, 2017) focus on evaluating and representing the similarity among the layers, neglecting the node level analysis. In DISTATIS, the comparison among the layers is just one step of the procedure that allows simultaneously analysis of the node similarity in each layer and in the global compromise space. Moreover, in terms of the reported embedding approaches, like the others, DISTATIS looks for a lower dimensional embedding space to visualize all the layers at the same time, considering the multiplex network structure. Clearly, the function to be optimized will be different according to the problem setting (Cai et al., 2018). In the generalization of the force-directed embedding algorithm (Fatemi et al., 2018), artificial interlayer links are added to keep together the same nodes in different layers that are all superimposed. The possibility of having a network summarizing all layers is not foreseen; the layers’ similarity cannot be directly evaluated, and the objective function cannot be interpreted as a quality function of the overall representation. In addition, considering an embedding algorithm focused on link prediction tasks (Matsuno and Murata, 2018), it is possible to obtain both layer and node feature vectors. Thus, the nodes’ and layers’ similarities can be visualized through the layers and embedding vectors. Like the other approaches, the proposed method aims to approximate the layers’ structure in a low dimensional space. The starting point is the set of distance matrices of all the layers, and DISTATIS allows an easy representation of the layers’ similarity and node proximity in all the layers, describing the topological structure of the network. In addition, it provides a representation of nodes that is able to synthesize all the layers, that is, the compromise. However, in this space, it is not possible to represent the edges among the nodes. Like the other embedding approaches, the method is well suited in the presence of similar layer structures, and it provides some statistical measures for evaluating the level of similarity of the nodes. The latter issue will be addressed in Section 5 through a simulation study to assess how the proposed method works when layers have different structures.

3. DISTATIS to handle multiplex network data The main idea behind DISTATIS is to define and analyze a common structure of a set of distance matrices among units by deriving from this structure an optimal set of weights. These weights are used to compute the best common representation of the units, called the compromise. The compromise is derived as the weighted sum of each distance matrix multiplied by the optimal weight (i.e., the compromise is a linear combination of the original data matrices). In many empirical frameworks where multiplex network analysis is applied, the relationships embedded in the layers can be considered different facets of a common underlying relational structure. This is assuming that one of the main goals of the analysis is to uncover a latent space. To pursue this aim, a three-way data analysis method can be used. Given the characteristics of the adjacency matrices and the assumption we made, we believe that DISTATIS fits our goals in a substantive way. DISTATIS is a generalization of classical MDS (Torgerson, 1958) in the STATIS approach (Escoufier, 1985) designed to analyze a set of distance matrices. Indeed, the MDS is obtained through the SVD of the double centered distance matrix. The method allows analyzing both the relational structure embedded in each single layer and the global relational structure derived as a linear combination (resembling a layer aggregation function) of the layers with data driven weights and provides a rich set of analytical and graphical results. Below, we illustrate how the method works and how it can be applied to the analysis of multiplex networks by discussing the following three points: 1. Deriving a three-way array of interlayer distance matrices; 2. Applying DISTATIS to the derived three-way array of distance matrices; 3. Interpreting the DISTATIS results and representations in terms of relational data. 3.1. Deriving a three-way array of distance matrices for a multiplex network To apply DISTATIS to analyze a multiplex network, a three-way = (D1, …, DK ) has to be derived from the array of distance matrices multiplex adjacency matrix . If two nodes are reachable to each other, a natural choice to accomplish that is to use the geodesic distance between the nodes vi and vj in the layer k, that is, dijk = geok (vi, vj ) if geok (vi, vj ) < . In the case of disconnected nodes, it is possible to set their geodesic distance to dijk = k max[geok (vi, vj )].3 Alternative distance/dissimilarity measures can be used to derive matrices that can be treated in the multidimensional scaling setting. One possibility is to consider the complement to 1 of the adjacency matrix, that is, Dk= 1 − Ak − I, where 1 is a matrix of 1s of the same size as Dk. Alternatively, any dissimilarity or distance measure for binary data can be considered (for a review, see Batagelj and Bren, 1995). In the simulation study reported in Section 5, we use the Hamann coefficient, stated as the S9 index in Gower and Legendre (1986), that is, d ijk =

(aijk + dijk )

(bijk + cijk )

(aijk + bijk + cijk + dijk )

, where aijk is the number of common

neighbors of vi and vj in the layer k, bijk is the number of nodes that are neighbors of vi and are not neighbors of vj , cijk is the number of nodes 3 Different approaches are proposed in the literature to handle the issue of geodesic distance for unreachable disconnected nodes. Such undefined distance is usually replaced by setting this value either to the number of nodes, or to the maximum observed distance (i.e., diameter) plus 1. Here, we consider the empirical diameter multiplied by a constant k. If k is too small, the disconnected nodes tend to be too close to the others. If k is too large, these nodes tend to be too extreme, pushing all the others into a small region in the center of the graph. Based on the main results of a simulation study not shown here, we suggest using k = 2 that yields balanced representations and consistent results.

2 For a comparison of DISTATIS with other factorial methods for three-way data, see Abdi et al. (2012).

156

Social Networks 59 (2019) 154–170

G. Giordano, et al.

that are not neighbors of vi and are neighbors of vj , and d ijk is the number of nodes that are not neighbors of both vi and vj .

reference space, analyzing and comparing the relational pattern for each layer (Map3); and (iv) all the layers and the compromise jointly, taking into consideration the nodes’ variability in the different layers (Map4). In the following, we describe the four different graphical representations more precisely.

3.2. Applying DISTATIS Given the three-way distance matrix = (D1, …, DK ) , the DISTATIS algorithm can be described as follows. In a first step, according to the classical MDS procedure, each matrix Dk is double centered to obtain the so called crossproduct matrix S˜ k . Given the centering operator C = I − 1nT, where I is an n-dimensional identity matrix, 1 is an ndimensional unit vector and n is an n-dimensional vector with elements equal to 1/n (the mass of each node), the crossproduct matrices S˜ k are 1 defined as S˜ k = 2 CDk CT . In a second step, the S˜ k matrices are linearly combined to obtain the compromise. In the DISTATIS approach, the compromise is a weighted average of the distance matrices using a double system of weights. The first set of weights aims at normalizing each layer in terms of variability, whereas the second set of weights expresses the similarity among the layers. Both weighting systems are data driven and depend on the relationship structure embedded in the multiplex network. More formally, we compute the first eigenvalue γ1k for each S˜ k and normalize them with Sk = 1k1 S˜ k . As the first eigenvalue of the crossproduct matrices depends on the density and topology of the network, after the normalization, the maximum eigenvalue is equal to 1 and then the layers are made comparable. To compute the second set of weights, we have to evaluate the similarity among layers. To do that, we use the cosine between the matrices Sk. As the matrices are normalized, these cosines correspond to the RV coefficient (Robert and Escoufier, 1976) that is a measure of similarity between positive semidefinite squared symmetric matrices. Let H = (hk, k ) be the similarity matrix collecting all the pairs of RV

• Map1: Between-layers analysis. This map is obtained by considering

the first two (or three) columns of the matrix PΛ1/2 that are the coordinates of a factorial map representing the similarities among the different kinds of relationships forming the multiplex network. In this map, each layer is represented as a point. If two points are close on the factorial plane, it implies that the global relational patterns of the corresponding layers are similar. The coordinates on the first axis also provide an idea of the weights of the layers in determining the compromise. Note that this map is the only one not based on the compromise space. We can measure the quality of the compromise space by the τj values that can be derived by dividing j the jth eigenvalue λj over the trace of Λ; that is, j = K . τ1 rek= 1 k

•

sT s k

coefficients, that is, hk, k = || s k|||| s || , k , k = 1, …, K with sk being the k k vectorization of S(k). Given the matrix H, we compute its eigenvalues T and eigenvectors, that is, H = P ΛP. The first eigenvector p1 is used to p determine the second set of weights, k = || pk1|| , with pk1 being the co1 ordinate of the k layer on the subspace spanned by the first eigenvector of H. Next, the αs reflect the similarities among the normalized crossproduct matrices. In particular, the algorithm yields larger weight αs for layers with similar relational patterns. This implies that the compromise tends to catch the underlying common features, as in the assumptions of the method. The compromise matrix S is the weighted K sum of the normalized crossproduct matrices, S = k = 1 k Sk . Note that the compromise matrix is still a crossproduct matrix, and thus it can be decomposed through the SVD in the MDS framework. Finally, following the classical MDS, we perform the eigenvalue decomposition of the compromise matrix S = VΣVT to obtain the factorial coordinates for plotting the nodes in the common space 1 1 F = V 2 = SV 2 . It is also possible to represent each crossproduct matrix Sk in the compromise space by projecting the matrices as supplementary points. The coordinates can be easily computed as 1 Fk = Sk V 2 and are called partial scores. They represent the position of each node in each layer, and all the coordinates have a common reference space given by the compromise.

presents a measure of the degree of the unidimensionality of the layers; high values of τ1, associated with low values of τ2, denote the presence of a strong unidimensional underlying structure (Lahne et al., 2018, p. 5). Map2: Compromise analysis. Each actor can be represented as a point in a two-dimensional map using the factor scores, that is, the first two columns of the matrix F. Two points are close on the factorial plane if the corresponding actors have both similar relational patterns in almost all layers (the two actors are connected almost to the same actors) and a short distance between them. In the compromise space, we can search for clusters, communities, or disconnected nodes; because the factorial plane is a metric space, we can also evaluate how different the global relational behaviors of actors are. It is possible to evaluate the contribution of each node to the factorial axes. The contribution of the ith node to the lth factorial axes f2

•

•

is derived as follows: crtil = il . Finally, as usual, the eigenvalues of l the matrix Σ can be used to evaluate the quality of representation of the factorial maps as a measure of the explained inertia. Map3: Single-layer analyses. These factorial maps, one for each layer, are obtained using the first two columns of the matrices Fk. In these maps, the relative positions of points express the similarities of the relational pattern of the corresponding actors, layer by layer. Note 1 that, because each layer is projected considering the matrices V 2 , they have a common reference space and can be compared. Map4: Joint representation. In this map, for each actor we plot K + 1 points – one for the compromise and one for each layer – by jointly using the first two columns of matrices F and Fk. For each actor, we connect the point representing the compromise with the points representing the layers. By doing so, we obtain a star for each actor, whereby the compromise is the barycenter of the layers. The shape and size of each star give an idea of the variability of the relational patterns of each actor in the layers.

4. A real world example: the AUCS data To illustrate how the DISTATIS procedure works in practice to treat multiplex network data, we use the CS-AARHUS dataset (AUCS data; Dickison et al., 2016; Rossi and Magnani, 2015). This dataset covers five types of online and offline relationships between 61 employees of the Computer Science Department at Aarhus University (Denmark).4 The connections among the employees are as follows: coauthoring a publication [coauthor]; being friends on facebook [FB]; being involved in repeated leisure activities [leisure]; regularly eating lunch together

3.3. Interpreting the DISTATIS results and representations in terms of network data Given the above analytical results, it is possible to represent the following: (i) each layer as a single point in the space defined by the first two or three eigenvectors of the between-distance matrix H, highlighting the similarity of the layers in terms of the whole network structure (Map1); (ii) the relational pattern of the nodes in the space given by the very first (say two or three) eigenvectors of the compromise, in which the position of each node derives from the combination of all the layers (Map2); (iii) nodes related to each layer in the common

4 For details, see the website relating to the book, Dickison et al. (2016): http://multilayer.it.uu.se/datasets.html.

157

Social Networks 59 (2019) 154–170

G. Giordano, et al.

Fig. 1. Simple graphs for the five networks and flattened network of AUCS data. All employees are colored and shaped according to the research group to which they belong.

[lunch]; and working together [work]. All relationships are undirected and unweighted. In addition, two attribute variables are measured for each employee, research group [G1-G8] and academic position (i.e., professor, postdoc researcher, PhD student, and administrative staff). Fig. 1 displays each layer as separated simple graphs, enhanced by coloring and shaping each actor according to the research group, and the multiplex network obtained by the flattening procedure. For an indepth description of the networks’ characteristics, readers should consult Dickison et al. (2016) and Rossi and Magnani (2015).

Table 1 RV coefficients matrix among the five layers.

Lunch FB Coauthor Leisure Work

Lunch

FB

Coauthor

Leisure

Work

1.00 0.22 0.41 0.32 0.58

1.00 0.38 0.23 0.25

1.00 0.35 0.51

1.00 0.27

1.00

158

Social Networks 59 (2019) 154–170

G. Giordano, et al.

and groups of employees based on the complete set of relationships taken as a whole? (5) How similar are the network structures achieved by the different types of relationships? Starting with the five adjacency matrices describing the different collaborations and interactions among employees, all the steps in the DISTATIS analytical procedure described in Section 3 are performed. Some derived measures are summarized in Tables 1–3. First, the RV's coefficient matrix is displayed in Table 1, showing the similarities between each pair of layers. The two layers lunch and work have the highest value, suggesting that people who work together also tend to spend lunch time together. On the contrary, FB and lunch have the lowest RV coefficient. It is interesting to note the quite high value of the RV coefficient of coauthor with work. Even if the coauthor layer has almost no edges, as it appears in Fig. 1, the few edges that are present are also present in the work layer. This is confirmed by the value of the coverage edge index (Bródka et al., 2018) equal to 0.85 between coauthor and work. The α weights of each layer in the compromise are shown in Table 2 along with the corresponding factor scores. The weights of the layers lunch, coauthor and work are slightly higher than those of the other two layers. These three layers that have the largest common structure play a slightly more important role in determining the compromise. Looking at the factor scores in Table 2, the first factor is characterized by the high scores of work, coauthor, and lunch, the second is characterized by a large negative score for FB, and the third is characterized by a large score for leisure. Table 3 reports the eigenvalues and corresponding percentage of quality of compromise (τ values), along with their cumulative percentages. The first three dimensions, retained for further investigation, account for about 81% of the dissimilarity of the interlayers. The τ1 is equal to 49%, denoting that the layers have half of the variability in common with a certain degree of unidimensionality. This implies that the relationship structures of the layers differ somewhat, and three dimensions are needed to describe the data. We recall that the compromise is the best solution, in the sense of the least squares, for the aggregation of the original crossproduct matrices. For the practitioner, it is important to establish how good this best solution is. To evaluate the compromise's quality, we need a statistical measure. This is usually given by the first eigenvalues of the matrix S denoted as λk. An alternative measure (easier to interpret) is the ratio between the first eigenvalue of S and the sum of its eigenvalues; thus, we can say that the two-dimensional compromise explains 66% of the inertia of the original set of layers. This is a relatively small value, indicating that the layers differ substantially on the relationships they capture about the social actors. Having established this as a fact inherent to the information grasped by the different layers, we decided to analyze further

Table 2 Factor scores (F1, F2, and F3) for the five layers in the three dimensions and α weights.

Lunch FB Coauthor Leisure Work

F1

F2

F3

α

0.75 0.55 0.78 0.59 0.78

0.41 −0.72 −0.11 −0.19 0.36

−0.05 −0.33 −0.01 0.78 −0.22

0.22 0.16 0.23 0.17 0.23

Table 3 Eigenvalues λk, relative (τk) and cumulative percentage of explained inertia.

λk τk (%) Cumulative (%)

dim 1

dim 2

dim 3

dim 4

dim 5

2.44 49 49

0.87 17 66

0.77 15 81

0.54 11 92

0.39 8 100

Fig. 2. Representation of the five layers as points in the three dimensional space spanned by the distances between pairs of layers in the AUCS data (Map1).

4.1. AUCS data with DISTATIS: analytical results Even if each singleton relationship among employees in the AUCS data could be of interest, in the proposed approach a unifying method for the underpinning relationship as a whole is employed. Therefore, the following research questions must be addressed: (1) Do employees’ positions vary across the layers? (2) Are there different relational patterns for each type of relationship? (3) Are there groups based on the position in the networks and relational similarities? (4) Are there relational patterns

Fig. 3. Representation of the actors as points in the two dimensional compromise subspace of the AUCS data (Map2). 159

Social Networks 59 (2019) 154–170

G. Giordano, et al.

Fig. 4. Representation of actors layer by layer in the common space of the AUCS data (Map3).

dimensions. Indeed, the first three dimensions account for about 81% of the interlayers’ dissimilarity. As a rule, the users should analyze the quality of the compromise by tabulating the successive cumulative percentages of explained inertia. A large value of τ1 implies that the original layers have a substantial coherence to be well represented along the first dimension. In contrast, flattening layers, which indicate uncorrelated relationships, will require a larger number of dimensions. The rule of thumb suggested in the main handbooks (Jolliffe and Cadima, 2016) is retaining not less than 70% of the total inertia.

4.2. AUCS data with DISTATIS: interpretation of factorial maps All the previous discussed analytical results related to layer similarity can be visualized considering the four graphical representations described in Section 3. Fig. 2 (Map1) shows the five layers as points in a 3D scatter plot with coordinates given by the first three factor scores. Recalling that the α weights are related to the coordinates on the first factorial axis, we note that all the layers have a positive role in weighting the final configuration (let us look at the first component as a size-effect component). The second and third axes reveal the shape of our configuration. The three layers (lunch, work, and coauthor) appear 160

Social Networks 59 (2019) 154–170

G. Giordano, et al.

Fig. 5. Representation of actor variations across the layers of the AUCS data (Map4): (a) employees U130 and U110; (b) employees U138 and U53.

close together in the space, while the FB and leisure layers are opposite on the second and third axes, respectively. The relative position of each layer determines the different weights and roles they play in building the compromise space. For sake of simplicity, once established the contribution of the five layers on the three factorial axes, in the following maps we consider only two dimensions. Based on the representation of the employees in the compromise factorial plane of the first two factorial scores in which each employee is colored and shaped according to the research group he/she belongs to, some groups clearly emerge (Map2, Fig. 3). These are mostly consistent with the research group membership, even if some employees

bridge different groups. The factorial maps (Map3) in Fig. 4 show the positions of the employees in the five layers. Note that all five maps have a common reference space given by the compromise, so that for each actor the relative position in each layer can be compared. This is not possible doing a simple force-based layout for each layer, as for the graphs in Fig. 1. In general, some layers present groups that accurately reproduce the research groups to which they belong. The three layers showing coauthorship, lunch, and leisure relationships separate the employees into groups with a different composition. The affiliation with a research group seems to be relevant for coauthoring a paper and having lunch 161

Social Networks 59 (2019) 154–170

G. Giordano, et al.

together, whereas it is less important for doing activities during leisure time. FB and work relationships show a low and high degree of variability, respectively, in the employees’ position on the factorial maps. The proximity of employees in the work network clearly relates to the research group. It is also possible to appreciate how each employee can change or not change his/her position in the various layers. For example, actor U130 (a professor in research group G2) works with the PhD student U109, but he has mainly coauthored publications with the two PhD students U99 and U18 from his research group and postdoctoral student U134 in research group G3. The representation of actor variations across the layers (Map4) shows in Fig. 5a two professors, U130 and U110, who belong to two different research groups, G2 and G7; in Fig. 5b it shows the postdoctoral student U138 and the PhD student U53, who belong to the same research group, G7. The size of the stars (compromise and layers) represents the variability of the relational pattern of each employee across the layers. In particular, for actor U130, the size and the shape of his/her star is due to the different neighbors in each layer. The actors U130 and U110 are close only in the FB layer, in which they actually share five common neighbors. Actors U138 and U53 are very close both in the compromise space and in some layers (coauthor and leisure), as they share a set of common neighbors and are both unconnected in the FB layer.

Table 4 Simulation design. Orthogonal array with four factors at three levels each. The design is generated through the DoE.base-package in R (Groemping, 2018).

5. A simulation study to assess the discriminatory power and robustness of DISTATIS To study and assess the capability of the proposed procedure as a tool for exploring, analyzing, and visualizing multiplex networks, we perform a Monte Carlo simulation study under different conditions based on the design of experiments (DOE). In Section 3, we showed that each layer can be represented as a single point in a factorial subspace defined by the first two eigenvectors of the between-distance similarity matrix H, highlighting the similarity of the layers in terms of the whole network structure. Based on the first two factorial axes and using different weights, the compromise space distinguishes each layer's capability to hold relevant information in terms of the nodes’ distance. As stated in Section 4, when the approach is applied to real data, it helps to uncover the role of each layer in defining a compromise configuration. In Fig. 2, for instance, we see that in the case of unidimensionality of the hidden structure the first axis assigns a positive weight to all the layers, whereas the second axis and the third axis are able to discriminate among them. Hence, we analyze these patterns through a simulation study by considering the following aims:

Density

Distance

Topology

yi

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

One One One One One One One One One Two Two Two Two Two Two Two Two Two Three Three Three Three Three Three Three Three Three

2.5% 2.5% 2.5% 5% 5% 5% 10% 10% 10% 2.5% 2.5% 2.5% 5% 5% 5% 10% 10% 10% 2.5% 2.5% 2.5% 5% 5% 5% 10% 10% 10%

I-A I-A I-A Hamann Hamann Hamann Geodesic Geodesic Geodesic Hamann Hamann Hamann Geodesic Geodesic Geodesic I-A I-A I-A Geodesic Geodesic Geodesic I-A I-A I-A Hamann Hamann Hamann

Pref. attachment Two-isles Small world Pref. attachment Two-isles Small world Pref. attachment Two-isles Small world Pref. attachment Two-isles Small world Pref. attachment Two-isles Small world Pref. attachment Two-isles Small world Pref. attachment Two-isles Small world Pref. attachment Two-isles Small world Pref. attachment Two-isles Small world

24.170 68.842 54.745 15.015 23.742 11.613 20.248 28.799 28.423 26.857 92.241 23.926 26.256 61.163 34.769 82.207 58.725 56.474 20.141 21.837 22.705 48.952 50.561 50.560 90.566 58.220 104.913

Nrandom Density Distance Topology Residuals

Df

Sum Sq

Mean Sq

F value

Pr(> F)

2 2 2 2 18

2674.26 2720.65 3297.40 701.92 8312.46

1337.13 1360.32 1648.70 350.96 461.80

2.90 2.95 3.57 0.76

0.0812 0.0782 0.0494 0.4821

Table 6 Regression coefficients for the response variable yi explained by the four experimental factors.

(Intercept) One random Two random Density 2.5% Density 5% Complement to 1 distance Geodesic distance Prefattachment topology Two-isles topology

1

•

#. random layers

Table 5 ANOVA model.

• A . To furnish empirical evidence of the discriminatory power of the •

Run

proposed procedure, that is, if and how our method can highlight differences among the layers in terms of the network structure under different conditions; A2. To analyze the stability and/or variability of the αk weights, that is, how different layers contribute to determining the compromise space under different conditions; A3. To determine the network characteristics and distance measures that most affect this capability.

Estimate

Std. Error

t-value

Pr(> |t|)

44.6914 −14.0696 6.7106 −5.1955 −8.8435 10.3348 −15.3204 −5.3124 6.8787

4.1357 5.8487 5.8487 5.8487 5.8487 5.8487 5.8487 5.8487 5.8487

10.81 −2.41 1.15 −0.89 −1.51 1.77 −2.62 −0.91 1.18

0.0000 0.0271 0.2663 0.3861 0.1479 0.0942 0.0174 0.3757 0.2549

5.1. Setting the experiment According to the DOE methodology, we establish a set of factors5 relevant for the experiment and their working levels. The simulation design is based on a multiplex network composed of four layers with 200 nodes in each layer. According to the three aforementioned aims, we consider different conditions that define each multiplex network by changing the following:

For the first aim (A1), we are especially interested in giving empirical evidence concerning how layers with different topologies are located far from each other in the compromise space. Therefore, for the second aim (A2), we look at the distribution of the αk weights generated in a long sequence of runs under the different experimental conditions. For the third aim (A3), we define a measure of discriminative power based on the compromise space, and this is used as a response variable in the experiment.

5 The term factor is used here to be consistent with the terminology adopted in the context of DOE.

162

Social Networks 59 (2019) 154–170

G. Giordano, et al.

Fig. 6. Effect plots of the four experimental factors with the response variable y.

• The network topology of the layers; • The amount of randomness in the multiplex network; • The network density Δ; • The type of distance for network data.

to 1 distance (I-A), and the Hamann coefficient. The whole set of experimental conditions gives rise to a full factorial design made up of 54 different combinations (three factors with three levels and one factor with two levels: 33 × 21). Therefore, we reduce the complexity of the design by using a half-fraction factorial design (Gunst and Mason, 2009) that accounts for 27 out of the 54 possible total combinations (Table 4), where all factors are orthogonal. The study is carried out using a Monte Carlo simulation to deal with aims A1 and A2 and an analysis of variance (ANOVA) model to deal with aim A3. For each run, according to the reduced factorial design in Table 4, the random replications of 200 multiplex networks are generated. For the first aim, A1, to evaluate the procedure's capability to discriminate between layers with different topologies, we look at the distribution over the 200 replicates of points representing the layers on the factorial maps (these are the Map1 plots described in Section 3.3). We expect that layers with the same topology should be close on the map, while layers with different topologies should be well separated. To evaluate the variability over the replicates and the degree of separation/overlapping of the points representing layers with different/equal topologies on the map, we represent the layers’ replications by their convex hulls (Edelsbrunner, 2012) projected onto the DISTATIS subspace. Here, we only consider a visual exploration of the hulls’ separation. The representation of the 27 subspaces is provided in Appendix A. Then, according to aim A2, we visually explore the box plot of the αk distribution of 200 values for each of the four layers in the 27 running conditions. These graphical representations are reported in Appendix B. Once these first two aims are attained, we discuss the robustness of the method to specific running conditions. For the third aim, A3, we first define the degree of separation between the layers as a performance measure with a predefined topology and the random layers (on average for the 200 simulated replications). Specifically, we compute the barycenter of the 200 replicated points for each layer in each

Hence, we define four experimental factors. The first factor is related to the presence of a well defined topological structure in the layers. We assume that the four layers can be characterized by either random behavior or a predefined topology. In each multiplex network, the topology is the same across all layers except for the random ones (e.g., a multiplex network could be composed of four layers: three small world networks and one random network). When simulating the layers with a specified topology, we constrain the nodes to have similar neighbors in each layer (this is achieved by rewiring a small amount of edges from one layer to another). The three topology levels taken into account are: preferential attachment, small world, and two-isles with few interconnecting edges. The second factor is the amount of randomness in the multiplex network. The random topology can be present in one, two, or three of the four layers. This experimental factor then consists of three levels: one random layer out of four (prevalence of a similar topological structure), two random layers (balance of random and topological structures), and three random layers (prevalence of random structures). The third factor is the network density Δ, which holds constant in each layer. We consider three levels: Δ = 2.5%, Δ = 5%, and Δ = 10%. However, network density is the same across the layers and changes only between the experimental conditions. Changing the density among layers would have given an excessive number of experimental conditions as a result. So, the effect we are controlling relates to: “How does the (common) density impact on the response variable?” and “Is the procedure able to separate layers across the density levels, all other factors being constant?”. The fourth factor is how to compute the distance between any pair of nodes in each layer. Considering the peculiar nature of binary network data, we use the three distances discussed in Section 3.1 – geodesic distance, the complement 163

Social Networks 59 (2019) 154–170

G. Giordano, et al.

multiplex network under all 27 experimental conditions. We then determine the Euclidean distance between the four barycenters on the factorial plane, obtaining the resulting distance matrices of order 4 × 4. Let Li be the 27 distance matrices computed under each configuration of the simulation design, lwi be the average distance within layers with the same topology (either random or predefined) in each Li, and lbi be the average distance between layers with different topologies in each Li; we define the response variable yi as:

yi =

lbi lwi

The results are summarized in Table 6 and in the effect plots in Fig. 6. It is evident that no factors significantly affect the robustness of the procedure. The type of distance is the most relevant factor in affecting the discriminant power of the procedure, followed by the number of random layers and network density. In fact, the effect of topology is negligible. 6. Conclusion

i = 1, …, 27.

In this section, we report the main results of the simulation study. For each run, we visualize the layers’ coordinates on Map1 of the between-layers analysis. In Appendix A, Figs. A.1-A.3 show all the maps according to the 27 runs, as reported in Table 4. Looking at these plots, we observe a good separation among the convex hulls, representing layers with a different topology. The procedure furnishes extremely good results in terms of the stability (low dispersion) of the 200 replicated layer points. Specifically, when the number of random layers is set to 1 (Fig. A.1) and 3 (Fig. A.3), the scatters of layer points with different topologies are well separated on both the first and second axes. When the random layers are balanced with structured topologies (Fig. A.2), only the second axis can discriminate among the different types of layers. Regarding the stability of the αk weights in determining the compromise, we look at the box plots portrayed in Figs. B.1-B.3 in Appendix B. These graphical representations confirm that the cases of balanced structures of randomness and topological structures show some peculiarities (see Fig. B.2). In presence of geodesic distance and the two-isles network configuration, the results are stable. Instead, in presence of preferential attachment and small world topologies, associated to high value of network density and to complement to 1 distance, the box plots overlap. This is the unique case that shows unclear separations of the layers. In addition, to determine and quantify the main effects of the four factors on the discriminative power of the procedure, an ANOVA model is employed. The results are given in Table 5. These analytical results seem to indicate that the procedure is affected by the distance type (pvalue = 0.0494). It is quite robust across the other three experimental conditions, as no factors show significant effects. The topology is the factor less relevant (p-value = 0.4821). To better understand the critical conditions that could affect the procedure, we set a regression model for the response variable yi defined above, explained by the four experimental factors.6

In this paper, we propose an analytical method for treating multiplex network data based on factorial techniques. Specifically, the statistical analysis yields a data driven optimal weighting system for the aggregation of layers and factorial maps in line with spring-embedding algorithms. The DISTATIS procedure analyzes the complexity of the network structure embedded in multiplex networks. The use of this approach for this type of data allows the representation of both actor nodes and each single layer as points in a common space. Indeed, the method allows the combination of different types of analysis into a common procedure defining a reference factorial subspace. In addition, it shows at the same time the actors and the association between their behavior in the different relational types. Of course, given the analytical properties of the compromise space, the DISTATIS works properly in all those cases where the layers are different expressions of an underlying multifacet relationship. The method's capability to capture such underlying common structure is not affected by the presence of layers with opposite relationships, but could produce poor results in the case of unrelated relationships embedded in layers. It is worth noting that it is also possible to measure the quality of the unidimensionality using the τ index. The results of the illustrative example also underline the high explicative power of the method to capture the similarities among relationships. The possibility of measuring the interdissimilarity between layers allowed the definition of a suitable subspace where comparisons can be made at both the layer and node level. In this sense, we illustrate the case study as a sort of guideline to highlight the richness of the analytical and graphical results. By using the proposed methodology, the analyst could gain knowledge – especially with respect to the presence of communities and/or topological facets – that can be used in combination with other techniques (e.g., ERG models, community detection, and so on). With the simulation study, we addressed the problem of establishing whether the procedure can recognize and separate layers with specific topologies when combined with random graph layers. We observed that in all replications and under several control conditions, the layers with a specific topology were all well separated. Specifically, the results showed that the second factor is best able to discriminate among the different topologies. The study highlights those circumstances under which this can happen, and it furnishes the best and worst conditions that may influence results. In conclusion, we provide some suggestions for future lines of research. The analytical results of DISTATIS are useful for the substantive interpretation of multiplex relationships. These results could also be used to compute new measures for multiplex network data. Moreover, as network data allows for several ways of computing distances, a comparison of other distance measures affecting the results and the visualization of compromise space should be addressed. The analyzed real world example considered dichotomous, undirected, one-mode networks, and the attribute data were used only for coloring nodes’ partitions. Extensions of the method for dealing with attribute data in

6 For each factor we impose the constraint that effects have to add up to zero. The results are shown in Table 6; the missing level for each factor is easily reconstructed from the constraints. For instance, we may be interested in estimating the density level set at 10%; from the table, we may derive this as:

(footnote continued) −(−5.1955 − 8.8435) = +14.039, the associate t statistics with 27 − (8 + 1) = 18 degree of freedom as 14.039/5.8487 = 2.4; and the corresponding p-value: Pr(t18 > |2.4|) = 0.0274.

The numerator allows consideration of the distance between the random layers and the other topologies, quantifying the discriminant capability of the procedure under the running conditions. A larger value is defined as better. The denominator allows controlling for the natural dispersion of points, given that it considers the average distance between layers of the same type. We would like these latter distances to be as small as possible. Considering the between-within ratio, the greater the value of yi, the more separated are the topology based layers from the random ones, while the closer ones will be layers of the same type. This measure is used as the response variable of an ANOVA model where the explanatory variables are the four experimental factors. Indeed, we are looking for those showing significant differences on the response variable when changing the factors’ levels. 5.2. Main results

164

Social Networks 59 (2019) 154–170

G. Giordano, et al.

defining the analytical results could be envisioned in line with the approach that includes external information in factorial methods (Giordano and Vitale, 2011).

Acknowledgment We thank you Matteo Magnani for sharing the AUCS data with us.

Appendix A. Simulation results: between analysis Appendix A reports the 27 plots representing DISTATIS two-dimensional compromise subspace. Each of the three figures shows nine plots arranged according to the Graeco-Latin square defining the experimental runs blocked by the number of random layers factor. In each plot, four convex hulls of 200 replicated layer points are represented. Layers showing networks with the same topology should stay close on the DISTATIS compromise; indeed, the random layers should appear well separated from the structured networks (preferential attachment, two-isles, and small world). The 27 plots are arranged in three figures (A.1, A.2, A.3); each figure holds nine configurations with fixed numbers of random layers in the simulated multiplex networks. In Fig. A.1, the configurations with one random layers out of the four layers are shown; in Fig. A.2 the nine configurations represent multiplex networks with balanced numbers of random layers (two of four). Finally, in Fig. A.3, the nine plots represent the results of the multiplex network dominated by random layers (three out of four). In each plot, each layer has a different color not related to the topologies.

Fig. A.1. Convex hulls of 200 layer points of the between analysis – one random layer.

165

Social Networks 59 (2019) 154–170

G. Giordano, et al.

Fig. A.2. Convex hulls of 200 layer points of the between analysis – two random layers.

166

Social Networks 59 (2019) 154–170

G. Giordano, et al.

Fig. A.3. Convex hulls of 200 layer points of the between analysis – three random layers.

167

Social Networks 59 (2019) 154–170

G. Giordano, et al.

Appendix B. Simulation results: box plots of the α weights Appendix B reports the 27 plots each representing the four box plots of the α weights distributions. Each of the three figures shows nine plots arranged according to the Graeco-Latin square defining the experimental runs blocked by the number of random layers factor. For each of the 27 experimental conditions, the 200 α weights represent the importance of the four layers in the definition of the DISTATIS compromise space. The following labels identify the network topologies: random graph RG#; preferential attachment PA#; two-isles IS#.

Fig. B.1. Box plots of 200 α weights – one random layer.

Fig. B.2. Box plots of 200 α weights – two random layers.

168

Social Networks 59 (2019) 154–170

G. Giordano, et al.

Fig. B.3. Boxplots of 200 α weights – three random layers.

D’Esposito, M.R., De Stefano, D., Ragozini, G., 2014a. On the use of multiple correspondence analysis to visually explore affiliation networks. Soc. Netw. 38, 28–40. D’Esposito, M.R., De Stefano, D., Ragozini, G., 2014b. A comparison of χ2 metrics for the assessment of relational similarities in affiliation networks. Analysis and Modeling of Complex Data in Behavioral and Social Sciences. Springer, pp. 113–122. Dickison, M.E., Magnani, M., Rossi, L., 2016. Multilayer Social Networks. Cambridge University Press. Doreian, P., Batagelj, V., Ferligoj, A., 2005. Generalized Blockmodeling, vol. 25 Cambridge University Press. Edelsbrunner, H., 2012. Algorithms in Combinatorial Geometry, vol. 10 Springer Science & Business Media. Erten, C., Kobourov, S.G., Le, V., Navabi, A., 2005. Simultaneous graph drawing: layout algorithms and visualization schemes. J. Graph Algorithms Appl. 9 (1), 165–182. Escoufier, Y., 1985. Objectifs et procédures de l’analyse conjointe de plusieurs tableaux de données. Stat. Anal. Donnees 10 (1), 1–10. Fatemi, Z., Salehi, M., Magnani, M., 2018. A generalized force-directed layout for multiplex sociograms. In: International Conference on Social Informatics. Springer. pp. 212–227. Faust, K., 2005. Using correspondence analysis for joint displays of affiliation networks. In: In: Carrington, P.J., Scott, J., Wasserman, S. (Eds.), Models and Methods in Social Network Analysis, vol. 27. Cambridge University Press, pp. 117–147. Freeman, L.C., 2005. Graphic techniques for exploring social network data. In: In: Carrington, P.J., Scott, J., Wasserman, S. (Eds.), Models and Methods in Social Network Analysis, vol. 27. Cambridge University Press, pp. 248–269. Giordano, G., Vitale, M.P., 2007. Factorial contiguity maps to explore relational data patterns. Stat. Appl. 19 (4), 297–306. Giordano, G., Vitale, M.P., 2011. On the use of external information in social network analysis. Adv. Data Anal. Classif. 5 (2), 95–112. Gower, J.C., Legendre, P., 1986. Metric and Euclidean properties of dissimilarity coefficients. J. Classif. 3 (1), 5–48. Groemping, U., 2018. CRAN Task View: Design of Experiments (DoE) & Analysis of Experimental Data. https://cran.r-project.org/web/views/ExperimentalDesign.html. Gunst, R.F., Mason, R.L., 2009. Fractional factorial design. Wiley Interdiscip. Rev.: Comput. Stat. 1 (2), 234–244. Halu, A., Mondragón, R.J., Panzarasa, P., Bianconi, G., 2013. Multiplex PageRank. PLOS ONE 8 (10), e78293. Heaney, M.T., 2014. Multiplex networks and interest group influence reputation: an exponential random graph model. Soc. Netw. 36, 66–81. Hmimida, M., Kanawati, R., 2015. Community detection in multiplex networks: a seedcentric approach. NHM 10 (1), 71–85. Jolliffe, I.T., Cadima, J., 2016. Principal component analysis: a review and recent developments. Philos. Trans. R. Soc. A: Math. Phys. Eng. Sci. 374 (2065), 20150202. Kanawati, R., 2015. Multiplex network mining: a brief survey. IEEE Intell. Inform. Bull. 16 (1), 24–27. Kiers, H.A., Mechelen, I.V., 2001. Three-way component analysis: principles and illustrative application. Psychol. Methods 6 (1), 84. Kivelä, M., Arenas, A., Barthelemy, M., Gleeson, J.P., Moreno, Y., Porter, M.A., 2014. Multilayer networks. J. Complex Netw. 2 (3), 203–271. Kuncheva, Z., Montana, G., 2015. Community detection in multiplex networks using locally adaptive random walks. In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015. ACM. pp.

References Abdi, H., Dunlop, J.P., Williams, L.J., 2009. How to compute reliability estimates and display confidence and tolerance intervals for pattern classifiers using the bootstrap and 3-way multidimensional scaling (DISTATIS). NeuroImage 45 (1), 89–95. Abdi, H., O’Toole, A.J., Valentin, D., Edelman, B., 2005. DISTATIS: the analysis of multiple distance matrices. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, 2005. CVPR Workshops. IEEE. pp. 1–8. Abdi, H., Valentin, D., 2007. Some new and easy ways to describe, compare, and evaluate products and assessors. New Trends in Sensory Evaluation of Food and Non-food Products. pp. 5–15. Abdi, H., Valentin, D., Chollet, S., Chrea, C., 2007. Analyzing assessors and products in sorting tasks: DISTATIS, theory and applications. Food Qual. Prefer. 18 (4), 627–640. Abdi, H., Williams, L.J., Valentin, D., Bennani-Dosse, M., 2012. STATIS and DISTATIS: optimum multitable principal component analysis and three way metric multidimensional scaling. Wiley Interdiscip. Rev.: Comput. Stat. 4 (2), 124–167. Barbillon, P., Donnet, S., Lazega, E., Bar-Hen, A., 2017. Stochastic block models for multiplex networks: an application to a multilevel network of researchers. J. R. Stat. Soc. Ser. A (Stat. Soc.) 180 (1), 295–314. Batagelj, V., Bren, M., 1995. Comparing resemblance measures. J. Classif. 12, 73–90. Battiston, F., Nicosia, V., Latora, V., 2017. The new challenges of multiplex networks: measures and models. Eur. Phys. J. Spec. Top. 226 (3), 401–416. Berlingerio, M., Coscia, M., Giannotti, F., 2011. Finding and characterizing communities in multidimensional networks. In: 2011 International Conference on Advances in Social Networks Analysis and Mining. IEEE. pp. 490–494. Bianconi, G., 2018. Multilayer Networks: Structure and Function. Oxford University Press. Bothorel, C., Cruz, J.D., Magnani, M., Micenkova, B., 2015. Clustering attributed graphs: models, measures and methods. Netw. Sci. 3 (3), 408–444. Bródka, P., Chmiel, A., Magnani, M., Ragozini, G., 2018. Quantifying layer similarity in multiplex networks: a systematic study. R. Soc. Open Sci. 5 (8), 1–16. Brusco, M., Doreian, P., Steinley, D., Satornino, C.B., 2013. Multiobjective blockmodeling for social network analysis. Psychometrika 78 (3), 498–525. Cai, H., Zheng, V.W., Chang, K.C.-C., 2018. A comprehensive survey of graph embedding: problems, techniques, and applications. IEEE Trans. Knowl. Data Eng. 30 (9), 1616–1637. Carroll, C., 2006. Canonical correlation analysis: assessing links between multiplex networks. Soc. Netw. 28 (4), 310–330. Chollet, S., Valentin, D., Abdi, H., 2014. Free sorting task. In: Valera, P., Ares, G. (Eds.), Novel Techniques in Sensory Characterization and Consumer Profiling. Taylor and Francis, Boca Raton, pp. 207–227. De Bacco, C., Power, E.A., Larremore, D.B., Moore, C., 2017. Community detection, link prediction, and layer interdependence in multilayer networks. Phys. Rev. E 95 (4), 042317. De Domenico, M., Nicosia, V., Arenas, A., Latora, V., 2015a. Structural reducibility of multilayer networks. Nat. Commun. 6, 6864. De Domenico, M., Porter, M.A., Arenas, A., 2015b. MuxViz: a tool for multilayer analysis and visualization of networks. J. Complex Netw. 3 (2), 159–176. De Stefano, D., Zaccarin, S., 2013. Modelling multiple interactions in science and technology networks. Ind. Innov. 20 (3), 221–240.

169

Social Networks 59 (2019) 154–170

G. Giordano, et al. 1308–1315. Lahne, J., Abdi, H., Heymann, H., 2018. Rapid sensory profiles with DISTATIS and barycentric text projection: an example with amari, bitter herbal liqueurs. Food Qual. Prefer. 66, 36–43. Lazega, E., Pattison, P.E., 1999. Multiplexity, generalized exchange and cooperation in organizations: a case study. Soc. Netw. 21 (1), 67–90. Liberati, C., Zappa, P., 2013. Dynamic Patterns Analysis Meets Social Network Analysis in the Modeling of Financial Market Behavior. International Statistical Institute. Magnani, M., Wasserman, S., 2017. Introduction to the special issue on multilayer networks. Netw. Sci. 5 (2), 141–143. Matsuno, R., Murata, T., 2018. MELL: effective embedding method for multiplex networks. In: Companion of the The Web Conference 2018 on The Web Conference 2018. International World Wide Web Conferences Steering Committee. pp. 1261–1268. Menichetti, G., Remondini, D., Panzarasa, P., Mondragón, R.J., Bianconi, G., 2014. Weighted multiplex networks. PLOS ONE 9 (6), e97857. Mucha, P.J., Richardson, T., Macon, K., Porter, M.A., Onnela, J.-P., 2010. Community structure in time-dependent, multiscale, and multiplex networks. Science 328 (5980), 876–878. Ostoic, J.A.R., 2017. Creating context for social influence processes in multiplex networks. Netw. Sci. 5 (1), 1–29. Pattison, P., Wasserman, S., 1999. Logit models and logistic regressions for social networks: II. Multivariate relations. Br. J. Math. Stat. Psychol. 52 (2), 169–193. Ragozini, G., De Stefano, D., D’Esposito, M.R., 2015. Multiple factor analysis for timevarying two-mode networks. Netw. Sci. 3 (01), 18–36. Ragozini, G., Serino, M., D’Ambrosio, D., 2016. On the analysis of time-varying affiliation networks: the case of stage co-productions. Convegno della Società Italiana di Statistica. Springer, pp. 119–129. Robert, P., Escoufier, Y., 1976. A unifying tool for linear multivariate statistical methods: the RV-coefficient. Appl. Stat. (3), 257–265. Roberts, J.M., 2000. Correspondence analysis of two-mode network data. Soc. Netw. 22 (1), 65–72. Rossi, L., Magnani, M., 2015. Towards effective visual analytics on multiplex and

multilayer networks. Chaos Solitons Fractals 72, 68–76. Santana, J., Hoover, R., Vengadasubbu, M., 2017. Investor commitment to serial entrepreneurs: a multilayer network analysis. Soc. Netw. 48, 256–269. Shafie, T., 2015. A multigraph approach to social network analysis. J. Soc. Struct. 16. Shafie, T., 2016. Analyzing local and global properties of multigraphs. J. Math. Sociol. 40 (4), 239–264. Simpson, C.R., 2015. Multiplexity and strategic alliances: the relational embeddedness of coalitions in social movement organisational fields. Soc. Netw. 42, 42–59. Snijders, T.A., Lomi, A., Torló, V.J., 2013. A model for the multiplex dynamics of twomode and one-mode networks, with an application to employment preference, friendship, and advice. Soc. Netw. 35 (2), 265–276. Solá, L., Romance, M., Criado, R., Flores, J., García del Amo, A., Boccaletti, S., 2013. Eigenvector centrality of nodes in multiplex networks. Chaos 23 (3), 033131. Solé-Ribalta, A., De Domenico, M., Gómez, S., Arenas, A., 2014. Centrality rankings in multiplex networks. In: Proceedings of the 2014 ACM Conference on Web Science. ACM. pp. 149–155. Torgerson, W.S., 1958. Theory and Methods of Scaling. John Wiley, New York. Vörös, A., Snijders, T.A., 2017. Cluster analysis of multiplex networks: defining composite network measures. Soc. Netw. 49, 93–112. Wang, P., 2013. Exponential random graph model extensions: models for multiple networks and bipartite networks. In: Lusher, D., Koskinen, J., Robins, G. (Eds.), Exponential Random Graph Models for Social Networks: Theory, Methods, and Applications. Cambridge University Press, pp. 115–129. Wang, P., Robins, G., Pattison, P., 2006. PNet: A Program for the Simulation and Estimation of Exponential Random Graph Models. White, H.C., Boorman, S.A., Breiger, R.L., 1976. Social structure from multiple networks. I. Blockmodels of roles and positions. Am. J. Soc. 81 (4), 730–780. Xu, L., Wei, X., Cao, J., Philip, S.Y., 2017. Multi-task network embedding. In: 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA). IEEE. pp. 571–580. Zhu, M., Kuskova, V., Wasserman, S., Contractor, N., 2016. Correspondence analysis of multirelational multilevel networks. In: Lazega, E., Snijders, T. (Eds.), Multilevel Network Analysis for the Social Sciences. Springer, pp. 145–172.

170

Analyzing multiplex networks using factorial methods

Analyzing multiplex networks using factorial methods

Recommend Documents