Effects of data quality in an animal trade network and their impact on centrality parameters

Effects of data quality in an animal trade network and their impact on centrality parameters

Social Networks 54 (2018) 73–81 Contents lists available at ScienceDirect Social Networks journal homepage: www.elsevier.com/locate/socnet Effects ...

2MB Sizes 1 Downloads 9 Views

Social Networks 54 (2018) 73–81

Contents lists available at ScienceDirect

Social Networks journal homepage: www.elsevier.com/locate/socnet

Effects of data quality in an animal trade network and their impact on centrality parameters Kathrin Büttner ∗ , Jennifer Salau, Joachim Krieter Institute of Animal Breeding and Husbandry, Christian-Albrechts-University, Olshausenstr. 40, D-24098 Kiel, Germany

a r t i c l e

i n f o

Article history: Keywords: Network analysis Trade network Pig Boundary specification problem Targeted removal scenarios

a b s t r a c t Dealing with the analysis of animal trade networks always faces the challenge of imperfect data sets mainly due to country borders or different producer communities. In the present study, the network robustness, i.e. the point at which false positive nodes or edges may influence the network structure and the results of the centrality parameters, were analysed for a pork supply chain of a producer community in Northern Germany. The analysis of animal trade networks mainly focusses on disease transmission and the development and implementation of targeted prevention and intervention strategies based on centrality parameters. Here, the inclusion criteria may impact the prediction of disease transmission as well as the outcome of the applied control measures. Thus, four different removal scenarios all based on the boundary specification problem (removal of arcs according to their frequency of appearance, removal of nodes according to their general frequency of appearance and according to their frequency of appearance as supplier or purchaser) were established to analyse the network robustness. In order to evaluate the changes in the rank order of the nodes a Spearman Rank Correlation Coefficient (rs ) was calculated between the original network and each removal step. The removal of nodes according to their frequency of appearance showed the most robust results. The values of rs stayed above the threshold of 0.70 for at least a fraction of 80% removed arcs. For the other removal scenarios the centrality parameters under investigation showed various robust results concerning the ranking of the nodes. Therefore, the exclusion of farms that trade infrequently in the network would not be associated with significant change in network structure and centrality parameters. For targeted disease prevention and intervention strategies based on centrality parameters, it is of great relevance to be able to evaluate the influence of inclusion criteria on the network structure and thus on the speed and the extent of possible disease transmission. © 2018 Elsevier B.V. All rights reserved.

1. Introduction Network theory has become a valuable framework in many different research areas. For instance, in the social sciences, e.g. contacts of individuals (Büttner et al., 2015b,c ; Kasper and Voelkl, 2009; Krause et al., 2007; Lewis et al., 2008; Makagon et al., 2012), in epidemiological studies, e.g. disease transmission over trade networks (Bigras-Poulin et al., 2007; Büttner et al., 2013a, 2016 ; Konschake et al., 2013; Lentz et al., 2016; Nöremark et al., 2011; Rautureau et al., 2012) or in the analysis of technological networks, e.g. the World Wide Web or the internet (Albert et al., 1999; Barabási et al., 2000; Cohen et al., 2000), to name but a few. However, dealing with network analysis always faces the challenges associated with imperfect data, e.g. some nodes or edges are

∗ Corresponding author. E-mail address: [email protected] (K. Büttner). https://doi.org/10.1016/j.socnet.2018.01.001 0378-8733/© 2018 Elsevier B.V. All rights reserved.

missing or information about some of their attributes are missing, and issues related to inclusion criteria for nodes or edges, e.g. nodes or edges which are not actual members of the network under investigation are included (false-positives). According to Kossinets (2006), much of this imperfection arises from the following sources: the so-called boundary specification problem (Laumann et al., 1983) as well as inaccuracy or non-response to data recording, which is especially valid for surveys (Brewer and Webster, 1999; Butts, 2003; Robins et al., 2004; Stork and Richards, 1992). For the analysis of animal trade networks primarily with focus on disease transmission and the development of prevention and control strategies, data quality and inclusion criteria used may influence the network structure and therefore the prediction of disease transmission as well as the outcome of the applied control measures. Only if the ranking of the farms based on the centrality parameters remained stable, the results of the network analysis can be used as reliable indicators for targeted removal strategies

74

K. Büttner et al. / Social Networks 54 (2018) 73–81

in the case of an epidemic. This means that the ranking of the centrality parameters must not change easily if only a small amount of elements are removed from or added to the network to ensure reliable results. In the case of pig trade networks, country borders or specific producer communities are often used as the inclusion criteria when building these networks, which can be associated with false-positives. In addition, data quality and inclusion criteria may be affected by the fact that additional information such as farm type or size are recorded only in specific marketing programs in which only a small amount of farms participate. The boundary specification problem (Laumann et al., 1983) deals with the issue of specifying system boundaries. The outcome of the network analysis depends on the nodes and edges included in the whole system, meaning special care must be given to specify the rules of inclusion of network elements (both nodes and edges). In the case of pig trade networks, the question which has to be answered is which farms or trade contacts should be qualified as legitimate members of the producer community under investigation? It can be assumed that network elements (i.e. nodes and edges) which appear less frequently in the network in the time period under analysis do not really belong to the studied producer community but sold their animals via this specific producer community only exceptionally. Therefore, these can be considered false positive nodes resulting from the supply and demand or different payment conditions within the pork supply chain. According to Barnes (1979), the centrality parameters may be underestimated if a too restrictive boundary is chosen. Additionally, in accordance to Wang et al. (2012), false positive nodes or edges, i.e. elements that are erroneously present in the network, can have a great impact on the outcome of the network analysis. Although one often encounters different data qualities, little research has been carried out to evaluate the effect of false positive nodes or edges on the networklevel or node-level parameters (Kossinets, 2006). Therefore, the aim of this study was to evaluate the network robustness under different network boundary definitions, i.e. to analyse how the rules of inclusion for nodes or edges in a network may influence the centrality parameters (e.g. in-degree and outdegree centrality, betweenness centrality, ingoing and outgoing closeness centrality). To carry out a sensitivity analysis, different removal scenarios representing various rules of inclusion and thus different exclusion criteria were established. For each removal step the development of the centrality parameters was recorded and then compared with each other. Especially for disease control strategies based on centrality parameters, it is of great relevance to understand the influence of the rules of inclusion, or how the boundary of the network is defined on the network structure and thus on the impact on the speed and the extent of possible disease transmission.

it was categorized as purchaser if it got animals from another farm. Due to this categorization there were also farms which could be both supplier and purchaser. In addition to the complete network which covered the whole observation period, yearly networks were constructed. For the yearly networks, the description of the network topology as well as the results of the sensitivity analysis can be found in the Supplementary material. Out of all the daily records a static aggregated network was constructed. Therefore, repeated trade contacts between the same farms during the observation period were aggregated to a single one. 2.2. Boundary specification problem

2. Materials and methods

All farms and trade contacts which were listed in the recordings were firstly allocated to the producer community in Northern Germany used for this study. According to Borgatti and Halgin (2011) it is important to realize that due to these settings the network itself and simultaneously its boundaries were predefined. The question is, however, if these boundaries were properly chosen to correctly describe the producer community under investigation or if the data set contains so-called false positive nodes or arcs. This is exactly the issue addressed by the boundary specification problem (Laumann et al., 1983). It deals with the question which set of units should be included in the network, i.e. the choice of the right exclusion or inclusion criteria of network elements. According to Marsden (1990), this is comparable to the general problem of defining the population to which research results are to be generalized. In the case of the pig trade network of the present producer community the question that has to be answered is: Do the farms or trade contacts really belong to this producer community, or were they recorded in the data set for some other reasons? For this purpose it was assumed that nodes or arcs which occurred less frequently in the analysed time periods were not legitimate members of the producer community under investigation. One explanation for their low occurrences in the data set might be that these farms sold their animals via this specific producer community only exceptionally. Thus, they can be seen as artefacts which may result because of supply and demand or different payment conditions within the pork supply chain. If the inclusion or exclusion criteria and therefore also the system boundaries are chosen too restrictive, the network structure may change which also influences the results of the centrality parameters (Barnes, 1979). In addition, false positive nodes or edges, i.e. elements that are erroneously present in the network, have a great impact on the outcome of the network analysis (Wang et al., 2012). In the case of the producer community under investigation, false positive nodes or edges can be considered as these artefacts.

2.1. Data basis and network construction

2.3. Definition and implementation of removal scenarios

In an observation period from 1st January 2013 to 31st December 2014 pig trade data from a producer community in Northern Germany were recorded. The producer community organizes the marketing of live pigs for their members and in this context also registers all trade contacts between its members, i.e. the transportation of live animals between a supplier and a purchaser on a given day. Here, the suppliers and purchasers were the nodes of the network which were connected by the trade contacts, i.e. the edges of the network. Each trade contact had one specific supplier and one specific purchaser, i.e. each edge has a certain direction, hereafter referred to as arcs. Furthermore, a farm was categorized as supplier if it delivered animals to another farm and

2.3.1. Definition of removal scenarios Based on the above described considerations four removal scenarios were established to investigate the influence of false positive nodes or arcs on the outcome of the network analysis: removal scenario 1 (removal of trade contacts according to their frequency of occurrence in the data set), removal scenario 2 (removal of farms according to their frequency of occurrence in the data set), removal scenario 3 (removal of suppliers according to their frequency of occurrence in the data set) and removal scenario 4 (removal of purchasers according to their occurrence in the data set). The distinction between the removal of nodes and arcs was made because of the hypothesis that the removal of nodes may have

K. Büttner et al. / Social Networks 54 (2018) 73–81

a greater impact on the network structure due to fact that not only the node is removed but simultaneously also the connected arcs (Borgatti et al., 2006). The term “farms” refers to all nodes of the network, independent if animals were only delivered, received or delivered and received. Furthermore, a further distinction of farms into suppliers and purchasers was made because previous studies showed clear differences between these two categories (Büttner et al., 2013b, 2015a). Farms that belong to both categories, e.g. a farrowing farm gets its breeding sows from the breeding farms and delivers piglets to the finishing farms, are included in both removal scenario 3 and removal scenario 4. 2.3.2. Implementation of removal scenarios Out of the daily records the occurrences of each trade contact, farm, supplier and purchaser were calculated. These frequencies were the basis for the different removal scenarios and represent the single removal steps. Firstly, all elements which occurred least in the data set were removed simultaneously, followed by the elements with the second-least occurrence and so on until all elements were removed from the network. If any isolated nodes evolved during the application of the different removal scenarios these were kept in the network. Therefore, the number of isolated nodes could increase with increasing amount of removed network elements. Let G (V, E) be a network with a set of nodes V and a set of arcs E. A removal scenario for G (V, E) is then a finite sequence of networks (Gn (Vn , En ))n ∈ N0 with G0 (V0 , E0 ) = G (V, E), whereby V = V0 ⊇ V1 ⊇ V2 ⊇. . .⊇ Vn and E = E0 ⊇ E1 ⊇ E2 ⊇. . .⊇ En . The single steps of the removal scenarios were as follows: 1. Consider the original network without any removed nodes or arcs G (V, E). 2. Calculate the frequency of occurrence for the specific removal scenario. 3. Apply a removal scenario to the original network. For each removal step n, this (Gn (Vn , En ))n ∈ N0 results in modified networks Gn (Vn , En ) which form the removal scenario. 4. For all removal steps n measure the centrality parameters of Gn (Vn , En ) and compare them to the “reference” values received from the original network G (V, E).

75

ranges between 0 and 1. The maximum value is reached when every node is isolated. For the calculation of the fragmentation the network was considered undirected. 2.4.2. Definition of centrality parameters To describe the importance of specific nodes in the network the centrality parameters for the original network as well as for each removal step were calculated based on the static aggregated networks. All parameters were calculated with the NetworkX Python Module (Hagberg et al., 2008). The in-degree and out-degree centrality as local centrality parameters measure how well-connected a farm is, i.e. how many direct trade partners a farm has (Newman, 2010). The in-degree centrality measures the number of trade partners which deliver animals to a specific farm. The out-degree centrality measures the number of trade partners which receive animals from a specific farm. The betweenness centrality as a global centrality parameter measures the extent to which a farm is located on the shortest paths between other farms (Freeman, 1977; Wasserman and Faust, 1994). Farms with a high betweenness centrality have a lot of influence on the flow within the network, especially if they build so-called bridges between different network parts. If the farms with high values for the centrality parameters are removed, the network can rapidly be decomposed into fragments and thus further disease transmission can be prevented (Newman, 2010). The closeness centrality as a global centrality parameter measures the reciprocal mean distance from one farm to all other reachable farms in the network (Bavelas, 1950; Sabidussi, 1966; Wasserman and Faust, 1994). The ingoing closeness centrality measures the reciprocal mean topological distance from all other reachable farms to one specific farm. The outgoing closeness centrality measures the reciprocal mean topological distance from one farm to all other reachable farms. High values for the ingoing closeness centrality means that a farm can be reached by its trade partners in only a few steps and is thus prone to infections, whereas high values for the outgoing closeness centrality means that the specific farm reaches its direct and indirect purchasers in only a few steps and is thus prone to infect other farms.

2.4. Definition of general network and centrality parameters 2.4.1. Definition of general network parameters To characterize the whole network structure of the analysed pig trade network, the density, the diameter, the number and size of the weakly and strongly connected components as well as the fragmentation were calculated. The density describes the proportion of the trade contacts that are present in the network compared to the number of all possible trade contacts, ranging from 0 (no contacts) to 1 (all possible contacts) (Newman, 2010). The diameter is the length of the shortest path between the most distant farms in the network (Newman, 2010). The higher the diameter, the less connected a network tends to be. Two farms are part of the same weakly connected component if they are connected by at least one path through the network. Here, the directed nature of the pig trade network is neglected, so that all paths are allowed to go either way along any arc (Kao et al., 2006). Two farms are part of the same strongly connected component if they are connected by at least one directed path through the network. Strongly connected components which consists of more than one farm contain a cycle. That means in the case of the pig trade network that there is a path from the supplier to the purchaser but also a path from purchaser to supplier (Newman, 2010). The fragmentation measures the number of components in relation to the number of nodes in the network (Borgatti, 2003). It

2.5. Sensitivity analysis of the centrality parameters depending on the different removal scenarios For the evaluation of the robustness of the different centrality parameters, Spearman Rank Correlation Coefficients (rs ) between the centralities of the original network G (V, E) and the centralities of the single removal steps Gn (Vn , En ) with G0 (V0 , E0 ) = G (V, E) ® ® were calculated using the SAS statistical software package (SAS Institute Inc, 2013). To derive the p-value for the calculated correlation coefficients the Fisher’s z transformation was used under the null hypothesis H0 : rs = 0 in a two-sided test. The rs was used as evaluation criterion, because it does not directly compare the values obtained, but their rank order (Martin and Bateson, 2007). For the present study, this ranking is of special importance as the question was whether the ordering of the nodes stays the same, or if there is a shift within the ranking during the removal process. If the node with the highest rank for e.g. the degree centrality loses half of its edges, but still remains top-ranked, this would reduce the linear correlation (Pearson Product-Moment Correlation Coefficient) between the original network and the single removal steps. However, there would be no effect on the results of the rank correlation (rs ), due to the fact that the node rankings are identical. In the case of an epidemic, this is of special importance for control strategies based on centrality parameters. Here, a targeted removal of the farms with the highest centrality parameters results in a

76

K. Büttner et al. / Social Networks 54 (2018) 73–81

Fig. 1. Visualisation of the pig trade network of a producer community in Northern Germany including all trade contacts between 1st January 2013 and 31st December 2014. The localisation of the farms is not related to their real geographical location.

rapid fragmentation of the network which leads to an interruption of the infection chain and therewith further disease transmission can be prevented (Büttner et al., 2013a, 2016). The values of rs can range from −1 to 1, where, regarding the present research question, higher rs values are preferred. To be more precise, a good robustness of the different centrality parameters is assumed for rs values equal to or greater than 0.70 (Martin and Bateson, 2007). Only on those nodes that appear in both the original network G (V, E) and the networks of the single removal steps Gn (Vn , En ), rs was calculated. For the presentation of the results, only significant rs values (p < 0.05) are illustrated. Additionally, to allow a comparison, all removal scenarios are presented in dependence of the fraction of removed arcs and illustrated until the second last removal step if the significance is given. Thus, for the different removal scenarios the percentage of maximal removed arcs can vary and may not reach a value close to 100%. This is especially true for removal scenarios 3 and 4 due to the limited number of suppliers and purchasers in the present pork supply chain. In addition to the sensitivity analysis, the fraction of simultaneously removed nodes, the mean centrality values as well as the fragmentation for each removal step were calculated and plotted in dependence on the fraction of removed arcs. 3. Results 3.1. General network characteristics The pig trade network under investigation considering the whole observation period is illustrated in Fig. 1. In total, the pig trade network consists of 978 farms which are connected by 18.871 daily and 2.280 static aggregated trade contacts. Out of the 978 farms, 233 could be categorized as supplier, 505 out of 978 could

be categorized as purchaser and 240 farms could be categorized as both supplier and purchaser. The network was very sparsely connected with a density of 0.0002 and had a diameter of 4. It consisted of 4 weakly connected components. The largest weakly connected component contained 970 nodes (99%). Only strongly connected components of one node each could be detected which means that the present network does not contain any cycles. Due to the fact that there was one large weakly connected component which included nearly all nodes of the network the fragmentation was very low with 0.02. In the Supplementary material the results of the yearly networks are illustrated (Section A).

3.2. Frequency distributions used for the removal scenarios Fig. 2 visualizes how often each trade contact (a), farm, supplier and purchaser (b) appeared in the network. The maximum number of occurrences of trade contacts in the data set was 452, i.e. one trade contact appeared 452 times in the data set. The maximum number of occurrences for the farms was 4825 (independent if the farm was listed as a supplier or a purchaser). Defining the farms further into suppliers and purchasers, the maximum number of occurrences was 1516 for the suppliers and 4825 for the purchasers. In all four categories (trade contacts, farms, suppliers and purchasers), most records in the dataset appeared only once. That means that 767 trade contacts (34%), 113 farms (12%), 62 suppliers (13%) and 98 purchasers (13%) were only listed once in the data set during the whole observation period of two years. Furthermore, 79% of the trade contacts, 52% of the farms, 44% of the suppliers and 63% of the purchasers appeared less than 10 times in the data set, i.e. all occurrences showed a right skewed distribution.

K. Büttner et al. / Social Networks 54 (2018) 73–81

77

Fig. 2. Frequency distributions of the number of occurrence of the trade contacts (a) as well as the farms (suppliers, purchasers or both), suppliers and purchasers (b). For illustration purposes the x-axis was only plotted until 50 records to put more emphasis on the occurrences which appeared more often.

Additionally, Fig. 2b shows that for the frequency distribution of the farms (suppliers, purchasers or both) and the distribution of the purchasers a similar course could be obtained. The results for the yearly networks are illustrated in the Supplementary material (Section B). 3.3. Sensitivity analysis of the robustness of the centrality parameters depending on the different removal scenarios In the present section, the results of the sensitivity analysis of the robustness of the centrality parameters depending on the different removal scenarios are described. Fig. 3 shows the results for the whole observation period of two years. In the Supplementary material the results of the sensitivity analysis for the yearly networks are illustrated (Section C). The results of the single removal scenarios are illustrated in more detail hereafter. 3.3.1. Removal scenario 1: removal of trade contacts according to their frequency of occurrence in the data set Removal scenario 1 (Fig. 3a) showed the most robust results for the centrality parameters based on the outgoing trade contacts, i.e. out-degree and outgoing closeness centrality. Only if more than 70% of the arcs were removed from the network, rs dropped below the threshold of 0.70. However, after this percentage of removed farms rs decreased rapidly to a value close to zero for both centrality parameters. For the centrality parameters based on the ingoing trade contacts, i.e. the in-degree centrality and the ingoing closeness centrality, only 40% of the arcs could be removed until rs dropped below 0.70. Comparable to the out-degree centrality and the outgoing closeness centrality rs decreased further to values close to zero. For the betweenness centrality similar but slightly better results compared to the in-degree centrality and the ingoing closeness centrality could be obtained. Here, the rs value fell below the threshold of 0.70 when 53% of the arcs were removed from the network and decreased further to a rs value close to zero. Additionally, Fig. 3a clearly shows that the number of isolated nodes increased almost linearly with the fraction of removed arcs. At the end the number of isolated nodes reached 100%. 3.3.2. Removal scenario 2: removal of farms according to their frequency of occurrence in the data set In contrast to removal scenario 1, removal scenario 2 (Fig. 3b) showed for all calculated centrality parameters highly robust results. Especially in-degree centrality, ingoing closeness centrality

and betweenness centrality showed more robust results compared to removal scenario 1. For the centrality parameters based on the ingoing trade contacts, i.e. the in-degree centrality and the ingoing closeness centrality, rs stayed above the threshold of 0.70 until 99% of the arcs were removed. The centrality parameters based on the outgoing trade contacts were the second robust. The outgoing closeness centrality firstly dropped under the threshold of 0.70 when 96% of the arcs were removed from the network. The out-degree centrality showed little less robust results compared to the outgoing closeness centrality. Here, 83% of the arcs could be removed until rs dropped under the threshold of 0.70. The minimum rs value for this parameter was 0.58 when 97% of the arcs were removed from the network. For the betweenness centrality similar results compared to the out-degree centrality could be obtained. For this centrality parameters, rs remained above the threshold of 0.70 until 80 % of the arcs were removed from the network. The minimum rs value for the betweenness centrality was 0.44 when 97% of the arcs were removed. For removal scenario 2, only in the very last removal steps the number of isolated nodes increased slightly.

3.3.3. Removal scenario 3: removal of suppliers according to their frequency of occurrence in the data set Removal scenario 3 (Fig. 3c) analysed the robustness of the centrality parameters according to the removal of nodes based on their number of records listed as supplier and can therefore be seen as a special case of removal scenario 2. In contrast to removal scenario 1 and removal scenario 2, removal scenario 3 showed less robust results. The rs values of all centrality parameters dropped under the threshold of 0.70, when 44% of the arcs were removed from the network. Ranking the centrality parameters according to their robustness, the centrality parameters based on the ingoing trade contacts, i.e. in-degree and ingoing closeness centrality, showed the most robust results. They were followed by the centrality parameters based on the outgoing trade contacts, i.e. out-degree and outgoing closeness centrality, which showed an almost identical course. The least robust results could be obtained for the betweenness centrality. Here, only 30% of the arcs could be removed until the rs values dropped below 0.70. Additionally, the figure clearly shows that the number of isolated nodes increased continuously when more than 20% of the arcs were removed from the network. At the end the number of isolated nodes reached 100%.

78

K. Büttner et al. / Social Networks 54 (2018) 73–81

The centrality parameters based on the ingoing trade contacts, i.e. in-degree centrality and ingoing closeness centrality showed less robust results compared to the out-degree centrality and the outgoing closeness centrality. For these two parameters, which showed an almost identical course, the rs values fell below the threshold of 0.70 when about 33% of the arcs were removed. The least robust results could be obtained for the betweenness centrality. Only 23% of the nodes could be removed until the rs values dropped below the threshold of 0.70. Additionally, the figure shows that the number of isolated nodes increased continuously when more than 20% of the arcs were removed from the network. At the end the number of isolated nodes reached the maximum of 64% isolated nodes.

Fig. 3. Sensitivity analysis for the four different removal scenarios: Removal scenario 1 (a; removal of trade contacts according to their frequency of occurrence in the data set), removal scenario 2 (b; removal of farms according to their frequency of occurrence in the data set), removal scenario 3 (c; removal of suppliers according to their frequency of occurrence in the data set) and removal scenario 4 (d; removal of purchasers according to their frequency of occurrence in the data set). Additionally, the development of the fraction of isolated nodes in the network depending on the fraction of removed arcs is plotted.

3.3.4. Removal scenario 4: removal of purchasers according to their frequency of occurrence in the data set Removal scenario 4 (Fig. 3d) is also a special case of removal scenario 2. Here, the robustness of the centrality parameters according to the removal of purchasers based on their frequency of occurrence in the data set was analysed. Contrary to the results of removal scenario 3, the robustness differed more between the centrality parameters under investigation. Removal scenario 4 showed the highest robustness for the centrality parameters based on the outgoing trade contacts, i.e. outdegree centrality and outgoing closeness centrality. Here, about 56% and 49% of the arcs could be removed until the rs values fell below the threshold of 0.70.

3.3.5. Mean centrality values and fraction of simultaneously removed nodes in dependency of the fraction of removed arcs Fig. 4 shows the mean centrality values as well as the fraction of simultaneously removed nodes in dependence on the fraction of removed arcs for the whole observation period. The results of the yearly networks can be found in the Supplementary material as well as a table containing the descriptive statistics to the number of simultaneously removed network elements (Section C). Removal scenario 1 and 2 showed for the mean centrality values comparable results. Only a larger variation of the mean centrality values could be obtained for both removal scenarios when more than 80% of the arcs were removed from the network. For the removal scenarios 3 and 4 these larger variations could already be obtained for a lower fraction of removed arcs (55%). Moreover, these two removal scenarios showed higher mean values compared to removal scenarios 1 and 2. In the Supplementary material a detailed illustration of each centrality parameter is provided in which also the standard deviation of the mean centrality parameters is included (Section C). The results of the fraction of simultaneously removed nodes showed that for removal scenario 1 about 34% of the arcs were removed in the first removal step which corresponds to about 15% of simultaneously removed nodes. After the first removal step, the percentage of simultaneously removed nodes decreased to about 5%. In the first removal step of removal scenario 2, about 12% of the nodes were removed simultaneously. After the first removal step the percentage of simultaneously removed nodes decreased to 5% until 40% of the arcs were removed from the network. Removal scenario 4 showed a similar course compared to removal scenario 2. Removal scenario 3 started with a lower fraction of simultaneously removed nodes (6%). 3.3.6. Network fragmentation in dependency on the fraction of removed arcs Fig. 5 shows the network fragmentation for the four removal scenarios depending on the fraction of removed arcs. If this figure is compared to Fig. 3 which illustrates the fraction of isolated nodes it becomes obvious that these two parameters showed nearly identical courses. The results for the yearly networks are illustrated in the Supplementary material (Section C). 4. Discussion In the present study, the network robustness, i.e. how the rules to include elements in the network may influence the network structure and, thus, the results of the centrality parameters, were analysed for a pork supply chain of a producer community in Northern Germany. Indeed, for the analysis of animal trade networks with focus on disease transmission and the development and implementation of targeted prevention and control strategies, these inclusion rules may influence the network structure and the outcome of the network analysis. Previous studies also showed that other factors

K. Büttner et al. / Social Networks 54 (2018) 73–81

79

Fig. 5. Network fragmentation for the four removal scenarios depending on the fraction of removed arcs.

Fig. 4. Mean centrality values depending on the fraction of removed arcs for the different removal scenarios: Removal scenario 1 (a; removal of trade contacts according to their frequency of occurrence in the data set), removal scenario 2 (b; removal of farms according to their frequency of occurrence in the data set), removal scenario 3 (c; removal of suppliers according to their frequency of occurrence in the data set) and removal scenario 4 (d; removal of purchasers according to their frequency of occurrence in the data set). Additionally, the development of the fraction of simultaneously removed nodes depending on the fraction of removed arcs is plotted.

such as the inclusion of dynamic trade contacts or the analysis of different observation periods had an influence on the calculated centrality parameters and thereby also on the ranking of the farms (Büttner et al., 2015a; Konschake et al., 2013; Lebl et al., 2016). Only if robust results of the centrality parameters, meaning a stable ranking of the farms, can be achieved these results can be used as reliable predictors of disease transmission and the implementation of targeted prevention and control strategies in the case of an epidemic. Based on the boundary specification problem (Laumann et al., 1983), different removal scenarios were established. In order to characterize the network structure and the centrality parameters for this specific producer community, the false positive nodes or

arcs, respectively, should be excluded from the network to avoid unreliable results. For each removal step the results were compared with each other within a sensitivity analysis. For all analysed centrality parameters, the removal of farms according to their frequency of occurrence in the data set (removal scenario 2) showed a higher network robustness in comparison to the removal of arcs (removal scenario 1). Only the centrality parameters based on the outgoing trade contacts, i.e. out-degree centrality and outgoing closeness centrality, showed also robust results for removal scenario 1. Here, 70% of the arcs could be removed until the rs values dropped below the threshold of 0.70. This is against the expectation that node removal would result in a lower network robustness compared to arcs removal due to the fact that the removal of a node simultaneously causes the loss of at least one edge. However, Borgatti et al. (2006) stated that the random removal of edges had more impact on the network robustness and that the removal of nodes was in general more forgiving than the removal of edges. They argued that these findings may be limited to random graphs. However, the present study showed that this finding is also true for a targeted removal of arcs in a realworld network. Furthermore, the results showed that for removal scenario 1 a simultaneous increase of isolated nodes and network fragmentation with increasing fraction of removed arcs could be obtained, whereas removal scenario 2 showed nearly no isolated nodes and fragmentation values close to zero. The development of the size of the largest weakly connected component (results not shown) showed a horizontally mirrored image compared to the results of the fragmentation. For removal scenario 2 the largest weakly connected component contained for almost all removal steps approximately all nodes of the network. Removal scenario 1 showed a clearer decline of the size of the largest weakly connected component which indicates a more prominent decomposition of the network which is in accordance with the results of the isolated nodes as well as the fragmentation. These findings can be explained by the network topology of the present pork supply chain. Here, the farms with a low frequency were localised at the margins of the network. Thus, a removal of nodes with a low number of occurrences in the data set only trims the margins of the network. On the other hand, arcs with a low frequency can appear at every position in the network. A removal of those arcs can therefore lead to a higher fragmentation of the network which had a higher impact on the centrality parameters. Thus, for removal scenario 1 a more prominent decline of the rs values could be obtained. This means for the pork supply chain under investigation that a lot of false positive nodes can be included in the data set without

80

K. Büttner et al. / Social Networks 54 (2018) 73–81

having a great impact on the outcome of the analysed centrality parameters. Thus, severe exclusion criteria of nodes may also not affect the network structure and therefore the ranking of the nodes remained stable. Otherwise, the application of severe exclusion criteria regarding the arcs of the network did not reveal stable rankings of the analysed centrality parameters. Thus, dealing with data sets containing a lot of false positive information, inclusion rules considering the nodes of the network provide more reliable results and should therefore be used for the implementation of targeted prevention and control strategies in the case of an epidemic. Removal scenario 3 and 4, which analysed the robustness of the centrality parameters based on the removal of nodes according to their frequency of occurrence as supplier or purchaser, respectively, have to be considered as special cases of removal scenario 2. Removal scenario 3 showed only minor differences between the calculated centrality parameters. However, for removal scenario 4, clear differences could be obtained. Here, the centrality parameters based on the outgoing trade contacts, i.e. out-degree centrality and outgoing closeness centrality showed a higher robustness compared to the in-degree centrality, the ingoing closeness centrality and the betweenness centrality. The dependence of the robustness of removal scenario 4 can be explained by the distribution of the calculated centrality parameters. Most of the purchasers (e.g. finishing farms, abattoirs) tend to have higher values for the centrality parameters based on the ingoing trade contacts and zero to low values for the centrality parameters based on the outgoing trade contacts. Furthermore, they also showed a smaller range in the centrality parameters based on the ingoing trade contacts compared to the out-degree centrality and the outgoing closeness centrality. Due to the smaller range, removing some of the purchasers from the network may easily change the ranking of the centrality parameters. Thus, removing only purchasers from the network had more impact on the centrality parameters based on the ingoing trade contacts than on the centrality parameters based on the outgoing trade contacts. Thus, in removal scenario 4, the robustness for the outdegree centrality and the outgoing closeness centrality was higher compared to the centrality parameters based on the ingoing trade contacts. If the results for the different centrality parameters are compared, the most prominent differences between the centrality parameters could be obtained for removal scenario 1 and 4. Here, the most robust results could be obtained for the centrality parameters based on the outgoing trade contacts, i.e. the out-degree centrality and the outgoing closeness centrality. The differences between the centrality parameters in the removal scenarios 2 and 3 were small and to avoid over interpretation these differences can be neglected because they are not considered to be basic structural concepts of the network under analysis. The centrality parameters chosen in this study are widely used and well-known centrality parameters. They were also used for the description and characterization of animal trade networks (Büttner et al., 2013a,b, 2015a,2016; Lentz et al., 2016; Natale et al., 2009, 2011; Rautureau et al., 2012). Furthermore, previous studies showed that the targeted removal of nodes based on centrality parameters regarding the outgoing trade contacts, i.e. out-degree centrality, outgoing closeness centrality, and the betweenness centrality led to a rapid fragmentation of animal trade networks (Büttner et al., 2013a, 2016; Kiss et al., 2006; Natale et al., 2009). These results can be explained by the right-skewed distribution of the centrality parameters in this kind of networks which makes them highly vulnerable to the targeted removal of highly central network elements. In order to enable the comparison of the present results with other studies these centrality parameters were focussed. However, there are plenty of other centrality parameters which should also be tested regarding their robustness. For instance, the ingoing infection chain (Nöremark et al., 2011) and

the outgoing infection chain (Dubé et al., 2008; Webb, 2006) are of special interest for the analysis of animal trade networks regarding disease transmission. These two parameters measure the number of direct and indirect trade contacts which lead to or come from a specific farm considering the chronological order of the trade contacts. Thus, the temporal aspect of the trade contacts is included in the calculation which is important for the analysis of disease transmission. Another centrality parameter which can be seen as a further development of the ingoing and outgoing infection chain is the disease flow centrality (Natale et al., 2011). Additional to the consideration of the chronological order of the trade contacts, the length of the single trade contacts is also included in the analysis. Thereby, the epidemiology of the disease under investigation can be involved enabling the approximation of the actual course of the disease. Working with network analysis always faces the problem of choosing the right border of the system under investigation. Due to the highly connected international market even country borders might not be the appropriate boundary for the network which then of course can again affect the outcome of the network analysis. However, the present study showed that even when investigating a smaller trade network of a producer community, the inclusion or exclusion criteria of the farms or trade contacts that really belong to the producer community had different effects on the centrality parameters. These results showed that one should be aware of the fact that the definition of the legitimate members of the group under study, here the farms and the trade contacts, could have an immense impact on the results of the network analysis. This is of special importance for the implementation of targeted prevention and control strategies in the case of an epidemic. Only stable results can provide reliable indicators. For future studies a clear definition of the farms and trade contacts which are included or excluded, respectively, in the network analysis should be given to enhance comparability between the studies in this research area. Due to the fact that the findings reported in the present paper are based on a case study, it should be noted that the obtained results may not apply for all animal trade networks.

5. Conclusion The aim of this study was to evaluate the effect of inclusion criteria for nodes or arcs in the network, how the community of producers was defined, and how this impacted the outcome of network analysis. For this purpose four different removal scenarios were established which represent possibilities of inclusion criteria. For each removal step within each removal scenario the development of the centrality parameters were recorded and compared with each other. The most robust results could be obtained for removal scenario 2 (removal of farms according to their frequency of appearance). For at least 80% removed arcs the values of rs stayed above the threshold of 0.70. This means for the pork supply chain under investigation that even if there are false positive nodes, the calculated centrality parameters remained stable enough to implement prevention and control strategies based on the ranking of the highest central network elements considering removal scenario 2. Particularly for disease control strategies based on centrality parameters it is of great importance to know about the influence of the data quality on the network structure and thus on the impact on the speed and the extent of possible disease transmission.

Acknowledgement This research was funded by the German Research Foundation (DFG) (Grant No. BU 3077/1-1, BU 3077/1-2).

K. Büttner et al. / Social Networks 54 (2018) 73–81

Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at https://doi.org/10.1016/j.socnet.2018.01. 001. References Albert, R., Jeong, H., Barabási, A.-L., 1999. Diameter of the world-wide web. Nature 401, 130–131. Barabási, A.-L., Albert, R., Jeong, H., 2000. Scale-free characteristics of random networks: the topology of the world-wide web. Physica A 281, 69–77. Barnes, J.A., 1979. Network analysis: orienting notion, rigorous technique or substantive field of study. In: Holland, P.W., Leinhardt, S. (Eds.), Perspectives on Social Network Research. Academic Press Inc., New York, pp. 403–423. Bavelas, A., 1950. Communication patterns in task-oriented groups. J. Acoust. Soc. Am. 22 (6), 725–730. Bigras-Poulin, M., Barfod, K., Mortensen, S., Greiner, M., 2007. Relationship of trade patterns of the Danish swine industry animal movements network to potential disease spread. Prev. Vet. Med. 80 (2–3), 143–165, http://dx.doi.org/10.1016/j. prevetmed.2007.02.004. Borgatti, S.P., Halgin, D.S., 2011. On network theory. Organ. Sci. 22 (5), 1168–1181, http://dx.doi.org/10.1287/orsc.1100.0641. Borgatti, S.P., Carley, K.M., Krackhardt, D., 2006. On the robustness of centrality measures under conditions of imperfect data. Soc. Netw. 28 (2), 124–136, http://dx.doi.org/10.1016/j.socnet.2005.05.001. Borgatti, S.P., 2003. The key player problem. In: Breiger, R.L., Carley, K.M., Pattison, P. (Eds.), Dynamic Social Network Modeling and Analysis. Workshop Summary and Papers. National Academies Press, Washington, D.C. Brewer, D.D., Webster, C.M., 1999. Forgetting of friends and its effects on measuring friendship networks. Soc. Netw. 21 (4), 361–373, http://dx.doi.org/ 10.1016/S0378-8733(99)00018-0. Büttner, K., Krieter, J., Traulsen, A., Traulsen, I., 2013a. Efficient interruption of infection chains by targeted removal of central holdings in an animal trade network. PLoS One 8 (9), e74292. Büttner, K., Krieter, J., Traulsen, A., Traulsen, I., 2013b. Static network analysis of a pork supply chain in Northern Germany − characterisation of the potential spread of infectious diseases via animal movements. Prev. Vet. Med. 110 (3–4), 418–428. Büttner, K., Krieter, J., Traulsen, I., 2015a. Characterization of contact structures for the spread of infectious diseases in a pork supply chain in Northern Germany by dynamic network analysis of yearly and monthly networks. Transbound. Emerg. Dis. 62 (2), 188–199. Büttner, K., Scheffler, K., Czycholl, I., Krieter, J., 2015b. Network characteristics and development of social structure of agonistic behaviour in pigs across three repeated rehousing and mixing events. Appl. Anim. Behav. Sci. 168, 24–30. Büttner, K., Scheffler, K., Czycholl, I., Krieter, J., 2015c. Social network analysis − centrality parameters and individual network positions of agonistic behavior in pigs over three different age levels. SpringerPlus 4, 185. Büttner, K., Krieter, J., Traulsen, A., Traulsen, I., 2016. Epidemic spreading in an animal trade network − comparison of distance-based and network-based control measures. Transbound. Emerg. Dis. 63 (1), e122–e134. Butts, C.T., 2003. Network inference, error, and informant (in)accuracy: a Bayesian approach. Soc. Netw. 25, 103–140. Cohen, R., Erez, K., ben-Avraham, D., Havlin, S., 2000. Resilience of the internet to random breakdowns. Phys. Rev. Lett. 85 (21), 4626–4628. Dubé, C., Ribble, C., Kelton, D., McNab, B., 2008. Comparing network analysis measures to determine potential epidemic size of highly contagious exotic diseases in fragmented monthly networks of dairy cattle movements in Ontario, Canada. Transbound. Emerg. Dis. 55, 382–392. Freeman, L.C., 1977. A set of measures of centrality based on betweenness. Sociometry 40 (1), 35–41. Hagberg, A., Schult, D., Swart, P.J., 2008. Exploring network structure, dynamics and function using NetworkX. In: Varoquaux, G., Vaught, T., Millman, J. (Eds.), Proceedings of the 7th Python in Science Conference (SciPy2008). Pasadena, CA, USA, pp. 11–15.

81

Kao, R.R., Danon, L., Green, D.M., Kiss, I.Z., 2006. Demographic structure and pathogen dynamics on the network of livestock movements in Great Britain. Proc. R. Soc. B 273 (1597), 1999–2007. Kasper, C., Voelkl, B., 2009. A social network analysis of primate groups. Primates 50 (4), 343–356. Kiss, I.Z., Green, D.M., Kao, R.R., 2006. The network of sheep movements within Great Britain: network properties and their implications for infectious disease spread. J. R. Soc. Interface 3 (10), 669–677, http://dx.doi.org/10.1098/rsif.2006. 0129. Konschake, M., Lentz, H.H.K., Conraths, F.J., Hövel, P., Selhorst, T., 2013. On the robustness of in- and out-components in a temporal network. PLoS One 8 (2), e55223, http://dx.doi.org/10.1371/journal.pone.0055223. Kossinets, G., 2006. Effects of missing data in social networks. Soc. Netw. 28 (3), 247–268, http://dx.doi.org/10.1016/j.socnet.2005.07.002. Krause, J., Croft, D.P., James, R., 2007. Social network theory in the behavioural sciences: potential applications. Behav. Ecol. Sociobiol. 62 (1), 15–27. Laumann, E.O., Marsden, P.V., Prensky, D., 1983. The boundary specification problem in network analysis. Res. Methods Soc. Netw. Anal. 61, 87. Lebl, K., Lentz, H.H.K., Pinior, B., Selhorst, T., 2016. Impact of network activity on the spread of infectious diseases through the german pig trade network. Front. Vet. Sci. 3, 48, http://dx.doi.org/10.3389/fvets.2016.00048. Lentz, H.H.K., Koher, A., Hovel, P., Gethmann, J., Sauter-Louis, C., Selhorst, T., Conraths, F.J., 2016. Disease spread through animal movements: a static and temporal network analysis of pig trade in Germany. PLoS One 11 (5), e0155196, http://dx.doi.org/10.1371/journal.pone.0155196. Lewis, K., Kaufman, J., Gonzalez, M., Wimmer, A., Christakis, N., 2008. Tastes, ties, and time: a new social network dataset using Facebook.com. Soc. Netw. 30 (4), 330–342. Makagon, M.M., McCowan, B., Mench, J.A., 2012. How can social network analysis contribute to social behavior research in applied ethology? Appl. Anim. Behav. Sci. 138 (3–4). Marsden, P.V., 1990. Network data and measurement. Annu. Rev. Sociol. 16, 435–463. Martin, P., Bateson, P., 2007. Measuring Behaviour: An Introductory Guide, 3rd ed. Cambridge University Press, Cambridge, UK. Nöremark, M., Håkansson, N., Lewerin, S.S., Lindberg, A., Jonsson, A., 2011. Network analysis of cattle and pig movements in Sweden: measures relevant for disease control and risk based surveillance. Prev. Vet. Med. 99 (2–4), 78–90. Natale, F., Giovannini, A., Savini, L., Palma, D., Possenti, L., Fiore, G., Calistri, P., 2009. Network analysis of Italian cattle trade patterns and evaluation of risks for potential disease spread. Prev. Vet. Med. 92 (4), 341–350, http://dx.doi.org/10. 1016/j.prevetmed.2009.08.026. Natale, F., Savini, L., Giovannini, A., Calistri, P., Candeloro, L., Fiore, G., 2011. Evaluation of risk and vulnerability using a Disease Flow Centrality measure in dynamic cattle trade networks. Prev. Vet. Med. 98 (2–3), 111–118. Newman, M.E.J., 2010. Networks: An Introduction. Oxford University Press Inc., New York. Rautureau, S., Dufour, B., Durand, B., 2012. Structural vulnerability of the French swine industry trade network to the spread of infectious diseases. Anim.: Int. J. Anim. Biosci. 6 (7), 1152–1162. Robins, G., Pattison, P., Woolcock, J., 2004. Missing data in networks: exponential random graph (p∗) models for networks with non-respondents. Soc. Netw. 26 (3), 257–283, http://dx.doi.org/10.1016/j.socnet.2004.05.001. ® SAS Institute Inc, 2013. User’s Guide (release 9.4) (Cary, North Carolina, USA). Sabidussi, G., 1966. The centrality index of a graph. Psychometrika 31 (4), 581–603, http://dx.doi.org/10.1007/BF02289527. Stork, D., Richards, W.D., 1992. Nonrespondents in communication network studies: problems and possibilities. Group Organ. Manage. 17 (2), 193–209, http://dx.doi.org/10.1177/1059601192172006. Wang, D.J., Shi, X., McFarland, D.A., Leskovec, J., 2012. Measurement error in network data: a re-classification. Soc. Netw. 34 (4), 396–409, http://dx.doi.org/ 10.1016/j.socnet.2012.01.003. Wasserman, S., Faust, K., 1994. Social Network Analysis: Methods and Applications. Cambridge University Press, New York. Webb, C.R., 2006. Investigating the potential spread of infectious diseases of sheep via agricultural shows in Great Britain. Epidemiol. Infect. 134 (1), 31–40.