A note on reconstructing animal social networks from independent small-group observations

A note on reconstructing animal social networks from independent small-group observations

Animal Behaviour 80 (2010) 551e562 Contents lists available at ScienceDirect Animal Behaviour journal homepage: www.elsevier.com/locate/anbehav A n...

793KB Sizes 0 Downloads 16 Views

Animal Behaviour 80 (2010) 551e562

Contents lists available at ScienceDirect

Animal Behaviour journal homepage: www.elsevier.com/locate/anbehav

A note on reconstructing animal social networks from independent small-group observations Charles Perreault* Department of Anthropology, University of California at Los Angeles

a r t i c l e i n f o Article history: Received 13 April 2010 Initial acceptance 7 May 2010 Final acceptance 26 May 2010 Available online 22 July 2010 MS. number: A10-00253 Keywords: animal social network clustering coefficient degree distribution network density path length power-law random network sampling scale-free network small-world network

Animal social networks are often built by aggregating a series of independent observations of two or more members of a group interacting or in association. Every time an observation is made, edges are drawn between each pair of individuals involved. I examined the effect of edge sample size on the reconstruction of social networks. I created different artificial networks and sampled edges from each. I estimated and compared the number of nodes, number of components, path length, clustering coefficient, network density, mean degree, betweenness centrality and degree probability distribution of the reconstructed networks to the true value of the network. I describe how the accuracy of these measures changes as the fraction of sampled edges increases. I show that edge sample size affects network measures in different ways and that when an incomplete sample is analysed, network properties can be considerably misrepresented. I also show that, because animal networks are typically small, simple curve fitting to the degree distribution P(k) should be done with caution, because different curve models can show significant fit for the same data. Overall, the results indicate that strong claims about animal social networks should not be made unless considerable effort has been made to collect an exhaustive number of association/interaction data points. If observations of associations/interactions are accumulated over a long period, the effect of increasing edge sample size could be mistaken for temporal change in social network and could also muddy the comparison of network structure between populations and between species. Ó 2010 The Association for the Study of Animal Behaviour. Published by Elsevier Ltd. All rights reserved.

Applying network methods to animal populations, including human groups and organizations, can reveal aspects of their sociobiology that are hardly detectable when looking solely at individual-level data (Krause et al. 2007; Croft et al. 2008; Wey et al. 2008; Whitehead 2008a). Making these aspects of sociobiology visible depends on avoiding a series of problems that the original body of theory, largely borrowed from the fields of mathematics and physics, was not designed to deal with (reviewed in: Marsden 1990; Kossinets 2006; Wey et al. 2008; Whitehead 2008a, b; James et al. 2009). Chief among these issues are the difficulty of providing a behaviourally meaningful definition of an association or interaction (i.e. the criteria by which two individuals will be considered linked in the network), the problem of accurately observing these associations, and finally, the problem of observing a sufficient number to get an accurate representation of a population’s network. My paper addresses this last issue.

* Correspondence: C. Perreault, Department of Anthropology, University of California at Los Angeles, 341 Haines Hall, CA 90095, U.S.A. E-mail address: [email protected]

It is often difficult to observe all of the individuals and their interactions at any particular moment in time. As a result, animal social networks are often built by aggregating a series of independent observations of two or more members of a group interacting or in association (for instance, two individuals are seen grooming or swimming in the same school). Every time such an observation is made, edges are drawn between each pair of individuals involved (e.g. Lusseau 2003; Croft et al. 2004, 2005; Lusseau & Newman 2004; Flack et al. 2006; Lusseau et al. 2006; McDonald 2007, 2009; Godfrey et al. 2009; Henzi et al. 2009; Naug 2009; Ramos-Fernández et al. 2009). Describing individuals as ‘nodes’ and their associations or interactions as ‘edges’, I examine the impact of edge sample size on the reconstruction of social network. Since biologists working in the wild are probably forever condemned to dealing with incomplete information about the social structure of their study population, understanding the impact of edge sample size on network topology is important if network analysis is to become a useful scientific method. The effect of sample size on animal network topology has largely been overlooked, despite numerous concerns about it (Marsden 1990; Kossinets 2006; Croft et al. 2008; Lusseau et al. 2008; Wey

0003-3472/$38.00 Ó 2010 The Association for the Study of Animal Behaviour. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.anbehav.2010.06.020

552

C. Perreault / Animal Behaviour 80 (2010) 551e562

et al. 2008; Franks et al. 2009; James et al. 2009). The few existing studies on the topic have tended to rely on resampling of real networks (Marsden 1990; Costenbader & Valente 2003; Wey et al. 2008). One problem with this approach is that the sampling errors contained in these networks are unknown (Borgatti et al. 2006). It is also unclear how representative these networks are of animal social networks in general. One way to work around these problems is to create an artificial network and sample from it, while introducing a controlled amount of error in the sampling procedure. So far, the studies that have used this method are not directly applicable to the study of animal social life. This is because they have either simulated networks that are uncharacteristically large for animal groups (i.e. with thousands of nodes) (Kossinets 2006; Lee et al. 2006; Yoon et al. 2007), have focused only on certain measures of network topology, such as the shape of the degree probability distribution P(k) (Stumpf et al. 2005), or have used random networks (Borgatti et al. 2006), while animal social networks are typically nonrandom. Sampling methods are currently being developed to assess the uncertainty of sampling procedures (e.g. Lusseau et al. 2008; Franks et al. 2009), but at present, we lack an exhaustive and general description of the impact of edge sample size on animal social network. Here I simulate five networks that are, by their size and topology, relevant to the study of animal social networks. I randomly sample edges from them and compare the networks constructed from these samples to the actual networks. I examine how sample size impacts nine network measures most frequently used by behavioural ecologists: (1) number of nodes (n); (2) number of components in the network (G); (3) mean path length (L); (4) mean clustering coefficient (C); (5) small-world features ðL  Lrandom and C[Crandom Þ; (6) network density (r); (7) mean degree (K); (8) mean normalized betweenness centrality (B); (9) shape of the degree probability distribution P(k). METHODS Network Construction I artificially created five networks: two random networks, two ‘small-world’ networks and one ‘scale-free’ network. Each network had the same number of nodes (n ¼ 125), but differed in terms of the number and distribution of edges (E). The random and small-world networks were constructed following the methods described in Watts & Strogatz (1998). I started with a regular network of 125 nodes, in which each node was linked to its k nearest neighbours (thus, the average degree of the network was k). I generated one random and one small-world network for each k ¼ 6 and k ¼ 10 values. For instance, when k ¼ 6, every node in the network was initially connected to its six nearest neighbours by six edges. Each of the edges of these regular networks was considered in a clockwise fashion and their end was reconnected randomly with a probability p. In the cases of the random networks, p was set to 1. The two random networks thus approximated the intensively studied random graphs of Erdös & Rényi (1959), where networks with E edges are randomly placed between n nodes. I included random networks in this study because they are frequently used as a null model to which realworld networks are compared (Watts 1999; Pastor-Satorras & Vespignani 2001; Lusseau 2003; Croft et al. 2004, 2005; Lusseau et al. 2008; Wey et al. 2008; Naug 2009; Ramos-Fernández et al. 2009). For the small-world networks, the edges of the regular networks were rewired with a probability p ¼ 0.1. Small-world networks are characterized by a low mean path length (L) and a high mean clustering coefficient (C) (Watts & Strogatz 1998; Watts 1999;

Amaral et al. 2000; Barrat & Weigt 2000; Latora & Marchiori 2001; Newman 2003). From a biological point of view, smallworld networks are interesting because information and diseases can spread rapidly through them even though the social network is highly clustered. It turns out that many real networks display such characteristics (Watts 1999; Latora & Marchiori 2001; Humphries et al. 2006), including possibly animal social networks such as those of the bottlenose dolphins, Tursiops truncatus (Lusseau 2003; Lusseau et al. 2006), Trinidadian guppies, Poecilia reticulata (Croft et al. 2004), sticklebacks (Croft et al. 2005) and social wasps (Naug 2009). The scale-free network was constructed following Ravasz & Barabási’s (2003) iterative method, according to which a series of fully connected clusters of five nodes are replicated and linked in a hierarchical fashion, with each node also connected to the central nodes of the clusters (see Ravasz & Barabási 2003 for a depiction of the scale-free network used in this paper). This hierarchical network structure is interesting because it is highly modular and displays a degree distribution that follows a power-law PðkÞwkg, which is why it is called ‘scale-free’ (Barabási & Albert 1999). This means that the degree distribution P(k) is highly skewed to the right because of the presence of a few highly connected individuals. These ‘superhubs’ are not predicted by the Poisson distribution, which characterizes random and small-world networks. It is interesting that so many real networks appear to be scale-free, including human communication and collaboration networks (Faloutsos et al. 1999; Amaral et al. 2000; Pastor-Satorras & Vespignani 2001; Albert & Barabási 2002; Ebel et al. 2002; Boccaletti et al. 2006; but see Willinger et al. 2009). Cases of human and animal social systems displaying scale-free characteristics have also been reported (Lusseau 2003; Schneeberger et al. 2004). The power-law distribution is thought to arise notably by a growth and preferential attachment process (Barabási & Albert 1999). New nodes are continuously added to the network and connect preferentially to nodes which are already well connected. Table 1 summarizes the characteristics of the five baseline networks used in this study. As pointed out by Croft et al. (2008), many key properties of random, small-world and scale-free networks only become apparent when networks are very large. Here I used networks for which the topologies are well understood at large sizes, but I scaled them down to a dimension that is more natural to many animal groups. Sampling Procedure To investigate the impact of edge sample size sampling, I used a simple random draw to select a number e of edges, without replacement, from the E edges of the network. For each of the five baseline networks, 10 independent samples each of size e/E ¼ 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 and 1.0 were taken. I used UCINET 6.266 (Borgatti et al. 2002) to construct and analyse the networks

Table 1 Characteristics of the five baseline networks used in the simulation

Number of nodes (n) Number of edges (E) Number of components (G) Mean path length (L) Mean clustering coefficient (C) Mean density (r) Mean degree (K) Mean betweenness centrality (B)

Random

Small-world

K¼6

K¼10

K¼6

K¼10

125 375 1 2.86 0.04 0.05 6 1.51

125 625 1 2.32 0.07 0.08 10 1.08

125 375 1 3.95 0.45 0.05 6 2.4

125 625 1 2.87 0.49 0.08 10 1.52

Scale-free

125 451 1 1.94 0.96 0.06 7.22 0.77

C. Perreault / Animal Behaviour 80 (2010) 551e562

Random, K = 6

Small-world, K = 10

553

Scale-free

e/E = 1

e/E = 0.5

e/E = 0.1

Figure 1. Examples of network constructed from edge sample size e/E ¼ 1.0, 0.5 and 0.1 drawn from the random K ¼ 6, small-world K ¼ 10 and scale-free networks. Drawing produced with Netdraw (Borgatti 2002).

based on each of these samples. Examples of networks reconstructed from these samples are presented in Fig. 1. The different network measures were averaged over the 10 independent samples and normalized against the true value of the network. Assume for instance that we take 10 independent samples of size e/E ¼ 0.2 from the scale-free network. Ten different networks are generated from these 10 samples, each with its own clustering coefficient (C). If the mean of these 10 clustering coefficients is 0.19 and the actual clustering coefficient of the scale-free network is 0.96 (see Table 1), the normalized mean clustering coefficient is 0.19/0.96 ¼ 0.198. Normalized values represent the fraction of the true clustering coefficient captured in the sample. Normalization is important because it allows for the comparison of the impact of sample size across different network measures and across the five different networks. A normalized measure of 1 indicates a perfect fit between the average value of the samples and the true measure. A value below 1 indicates that the measure is being underestimated, and a value above 1 indicates that it is being overestimated. RESULTS The network measures examined in this paper were affected differently by changes in edge sample size. Depending on the size of the fraction of observed edges, the same network can look drastically different. For any given sample size, some network measures will be

estimated with greater accuracy than others. Here I start by describing how these measures change as sample size e/E goes from 0.1 to 1.0. I then examine the impact of sample size on the shape of degree probability distributions P(k). The mean and standard deviation of the network measures, calculated over each set of 10 independent samples (before normalization), are presented in the Appendix. Number of Nodes (n) In my simulations I sampled associations or interactions between individuals (i.e. edges), not the individuals themselves. This means that for any given edge sample size, the number of individuals contained in the network may vary. Figure 2 illustrates why adding an edge to the sample can lead to the addition of either 0, 1 or 2 nodes to the network. Figure 2a depicts a network composed of two edges and three nodes. One edge is added to the sample in Fig. 2b. Because the edge represents an association between two individuals that have not been observed before, two new nodes are added to the network. When an edge is added in Fig. 2c, only one new node is added to the network, because the association involves an individual that is already part of the network. Finally the addition of an edge between individuals that have already been seen interacting with others does not increase the number of nodes in the network (Fig. 2d). How many individuals exist in a group of gregarious species such as primates is oftentimes not difficult to estimate. Such is not

554

C. Perreault / Animal Behaviour 80 (2010) 551e562

(a)

(b)

+ 2 nodes

(c)

(d)

+ 1 node

+ 0 node

Figure 2. The different ways by which adding an edge can affect the number of nodes in a network.

the case for many other species, perhaps because they are aquatic, nocturnal or nongregarious. In these instances, the total number of individuals that exist in the network remains unknown and is only revealed as associations or interactions are observed. Knowing the number of individuals in a network is important for many scientific reasons, notably because it enters into the calculation of measures such as network density (r). Figure 3a shows that the relationship between edge sample size and the number of nodes in the network is not linear. Nodes are added at a higher rate when sample size e/E is small (e.g. e/E ¼ 0.1e0.2). As e/E increases, the rate at which nodes are added decreases, and when e/E is equal to or larger than 0.5, almost every node has been observed. The reason is that the situations depicted in Fig. 2b and c become less frequent as e/E goes from 0.1 to 1. Interestingly, in the small-world K ¼ 10 network and the random K ¼ 10 network, the number of individuals is more accurately estimated for small e/E values than it is in other network topologies. This suggests that the more densely connected a network is, the more rapidly the different individuals it contains are observed. Note, however, that the networks in this paper were unweighted and that my simulations did not directly address the problem of sampling weighted networks. In a weighted network, edges are given a weight, which is often based on the number of times that two animals have been observed together. Sampling from a weighted network can be considered to some extent as the reverse of the filtering procedure, a method used to identify the core structure of a network. When filtering a network, edges with a weight smaller than a certain threshold value are weeded out (Croft et al. 2008). Consequently, pairs of individuals that interact frequently are more likely to stay in the filtered network. The same can also be true of sampling from weighted networks: individuals

with stronger ties are more likely to be observed and included in the network than solitary individuals. This can potentially impact network measures such as the number of nodes. For instance, individuals with strong ties might be sampled rapidly, while individuals with weak ties might be included in the network only after a significant research effort. This can lead to a bias in the estimation of the number of nodes in the network. Analogously, Croft et al. (2008) filtered a network of 342 red deer, by keeping only the edges linking individuals seen together at least six times, and ended up with a network composed of only 46 nodes. Number of Components (G) Components are groups of interconnected nodes that are not connected to the rest of the network. The number of components G in a network may offer an indication of how fragmented a population is (Croft et al. 2008, page 51). It is important to understand how G is affected by sample size because network measures such as path length are calculated only between nodes belonging to the same component and that the smaller components of a network are sometimes excluded from analysis (e.g. Fowler & Christakis 2008). Networks constructed from very small samples appear to be highly fragmented and may contain as many as 21 components (see Fig. 3b). However, this overestimation decreases rapidly with increased sampling, so that G(e)/G(E) approaches 1 when e/E ¼ 0.5. Bollobás (1985) described how large random networks are composed of many small components when less than n/2 edges are sampled. Once n/2 edges have been sampled, the network goes through a percolation threshold, after which the number of components decreases very rapidly. Croft et al. (2008, pp. 76e77)

C. Perreault / Animal Behaviour 80 (2010) 551e562

1.2

n(e)/n(E)

1

(a) Small-world, K = 10 Random, K = 10

0.8 Random, K = 6 Random, K = 10 Small-world, K = 6 Small-world, K = 10 Scale-free

0.6 0.4 0.2 0

0.2 0.4 0.6 0.8 Proportion of edges sampled (e/E)

1

25 (b)

G(e)/G(E)

20 15 10 5

0

0.2

0.4

0.6

0.8

1

Proportion of edges sampled (e/E) 3 (c)

L(e)/L(E)

2.5 2 1.5 1

noted that if the mean degree K of a network is less than or close to 1, which means that less than n/2 edges have been sampled, then it is normal for this network to be fragmented. We should thus be careful about regarding this fragmentation as biologically informative as it could be due to a poor sampling problem. In line with other work (e.g. Newman & Watts 1999; Callaway et al. 2000; Moore & Newman 2000), the results presented in Fig. 3b suggest that a percolation threshold also exists for small nonrandom networks. However, if each edge in the network represents a large number of observed interactions, as is the case for strong ties in a weighted network, then it is possible that the multiple components being observed are indeed accurate, even though the mean degree K of a network approaches 1. Mean Path Length (L) Path length is a measure of how distant two nodes are one from another in a network. It is calculated as the minimum number of edges that links them. Mean path length (L) is the average shortest path taken over every reachable (i.e. belonging to the same component) pair of nodes within a network. Path length is often analysed because it affects how easily information and epidemic diseases spread into a network (e.g. Watts & Strogatz 1998; Watts 1999, 2004). Figure 3c shows that L(e)/L(E) plotted against sample size e/E follows a peak-shaped path. L is only underestimated when the sample size is very small (e/E  0.2). This is because the network at such a sample size is still composed of many small components. As the ‘percolation threshold’ is crossed, L becomes abruptly overestimated. This overestimation peaks in the region e/E ¼ 0.2e0.4. Components in that region tend to be shaped like chains rather than webs (similar to the two components in Fig. 2c), thus decreasing the number of possible paths between the pairs of nodes. At its peak, L is overestimated by factors of about 2 or 3, depending on the network topology in question. Beyond this peak, all L estimates decrease following a power-law-like decay function and all converge to similar values. For example, when 70% of the edges have been sampled, L is overestimated by about 20% for all five networks. Mean Clustering Coefficient (C)

0.5 0

0.2

0.4

0.6

0.8

1

Proportion of edges sampled (e/E) 1.2 (d) 1 C(e)/C(E)

555

0.8 0.6

The mean clustering coefficient C is a measure of the cliquishness of the network. An individual’s clustering coefficient c is the fraction of edges that can exist between its direct neighbours that actually exist. C is the average clustering coefficient taken over each node in the network. C is always underestimated when sample size e/E < 1 (Fig. 3d). For networks reconstructed from e/E  0.2 edges from random, the K ¼ 6 network showed no clustering at all (C ¼ 0). Interestingly, the ratios C(e)/C(E) for the five networks converged when e/E > 0.3, and then increased linearly with a slope of z1. Thus, a 50% sample of a network’s edges will estimate a mean clustering coefficient half of the true value, independent of the type of network.

0.4 Small-world Features

0.2 0

0.2 0.4 0.6 0.8 Proportion of edges sampled (e/E)

1

Figure 3. Normalized mean (a) number of nodes (n), (b) number of components (G), (c) shortest path length (L) and (d) clustering coefficient (C) as a function of edge sample size (e/E).

Small-world networks are characterized by a short mean path length approximating that of a random network with the same number of nodes and edges ðL  Lrandom Þ and a mean clustering coefficient much greater than an equivalent random network ðC[Crandom Þ (Watts & Strogatz 1998). After discussing how edge sample size affects L and C, it is interesting to examine how sampling impacts the detection of the small-world features ðL  Lrandom and C[Crandom Þ of the two small-world networks. In Fig. 4, C(e) and L(e) values are taken from the small-world K ¼ 6 and

556

C. Perreault / Animal Behaviour 80 (2010) 551e562

10 9 8 7 6 5 4 3 2 1 0

C(e)/C(random), small-world, K = 6 C(e)/C(random), small-world, K = 10 L(e)/L(random), small-world, K = 6 L(e)/L(random), small-world, K = 10 0.2

0.4

0.6 0.8 1 Proportion of edges sampled (e/E)

Figure 4. Mean clustering coefficient (C) and mean path length (L) normalized over the expected C and L values of the corresponding random network, as a function of edge sample size (e/E).

1.2

The mean degree K of a network is the expected number of edges for any node in the network. Like network density r and clustering C, K is always underestimated when samples are incomplete. The impact of edge sample size on K(e)/K(E) is linear and largely the same for the five networks (Fig. 5b). Differences between networks occur when samples are small (e/E ¼ 0.1e0.4). With such sample sizes, K increases with sample size at a rate of about 0.7 in sparser networks (random, k ¼ 6; small-world, k ¼ 6; scale-free) and at a rate of about 0.82 for denser ones (random, k ¼ 10; small-world, k ¼ 10). However, when e/E > 0.4, the values

0.6

Random, K = 10 Small-world, K = 10

0.2 0

0.2

0.4

0.6

0.8

1

Proportion of edges sampled (e/E) 1.2

(b)

K(e)/K(E)

1 0.8 0.6 0.4 0.2 0

0.2

0.4

0.6

0.8

1

Proportion of edges sampled (e/E) 4.5 4

(c)

3.5 B(e)/B(E)

Mean Degree (K)

0.8

Random, K = 6 Random, K = 10 Small-world, K = 6 Small-world, K = 10 Scale-free

0.4

Network Density (r) Network density (r) is the fraction of all possible links that can exist between the nodes of the network that actually exist (r ¼ e/((n(n  1))/2)). It is a measure of network sparseness. Croft et al. (2008, page 66) noted that in most social networks r is much smaller than 1, meaning that most individuals are only indirectly connected to each other. Since networks of similar density are often compared in an attempt to control for sample size effect (see Croft et al. 2008, page 153), it is important to know how sample size actually impacts estimates of network density. My simulations show that network density is always underestimated when calculated from incomplete samples (Fig. 5a). The p(e)/p(E) function is concave when sample size is small because in the region e/E ¼ 0.2e0.3, nodes are added more rapidly than edges (see how Fig. 5a mirrors Fig. 3a). This means that for any given sample of edges e, fewer nodes are present in the sample; hence the denominator of the network density equation presented above is larger. Once e/E  0.4, this sampling bias is attenuated and density estimates increase linearly with sample size with a slope of 1.

(a)

1 (e)/ (E)

K ¼ 10 networks, normalized against the expected C and L values for a random network with same number of nodes and edges. When e/E ¼ 0.3 edges are sampled from the small-world K ¼ 6 network, 113 edges are observed (30% of E ¼ 375). On average, such a sample contains 111 nodes (thus, the average degree K ¼ 2.02). A corresponding random network with 111 nodes and a mean degree K of 2.02 is expected to have Crandom ¼ 0.02 and Lrandom ¼ 6.65 (see Croft et al. 2008, pp. 75e76). Figure 4 indicates that the clustering coefficients of the network constructed from sampling from the small-world networks are always many factors greater than the random expectation, even when sample size is small. Conversely, the mean path length is always close to one. This result suggests that these small-world characteristics can be detected even when the sample size is small.

3 2.5 2 1.5 1 0.5 0

0.2

0.4

0.6

0.8

1

Proportion of edges sampled (e/E) Figure 5. Normalized (a) network density (r), (b) mean degree (K) and (c) mean node betweenness centrality (B) as a function of edge sample size (e/E).

C. Perreault / Animal Behaviour 80 (2010) 551e562

557

Random, K = 6 1

(a) 1 0.8

0.1

0.6 0.4

0.01

0.2 0

2

4

6

8

10

12

0.001 1

10

100

10

100

Random, K = 10

(b) 1

1

0.8

0.1

0.6

0.01

0.4 0.001

0.2 0

5

10

15

20

Pccdf(k)

(c)

0.0001 1

Small-world, K = 6

1

1

0.8

0.1

0.6

0.01

0.4 0.001

0.2 0

2

4

6

8

(d)

10

0.0001

1

10

Small-world, K = 10

1

1

0.8

0.1

0.6

0.01

0.4 0.001

0.2 0

2

4

6

8

10

12

14

(e)

0.0001

1

10

100

Scale-free

1

1

0.8 0.1

0.6 0.4

0.01

0.2 0

20

40

60 k

80

100

120

0.001

1

10

100

1000

k

Figure 6. Pccdf(k) of samples of size e/E ¼ 0.1 (-), 0.5 (:) and 1.0 (C) drawn from the (a) random K ¼ 6 network, (b) random K ¼ 10 network, (c) small-world K ¼ 6 network, (d) small-world K ¼ 10 network and (e) scale-free network, plotted on linearelinear (left side) and logelog (right side) scales.

558

C. Perreault / Animal Behaviour 80 (2010) 551e562

Table 2 Significance and R2 values of different trend lines fitted to the Pccdf (k) of the random K ¼ 6 network when e/E ¼ 0.1, 0.5 and 1 Random, k¼6

Tail of random, k¼6

e/E¼0.1

Linear Exponential S-shaped Power-law

e/E¼0.5 2

P

R

0.105 0.002 0.093 0.032

0.801 0.996 0.822 0.937

e/E¼1 2

P

R

0.000 0.000 0.048 0.004

0.89 0.931 0.45 0.727

e/E¼0.1 2

e/E¼0.5 2

e/E¼1 2

P

R

P

R

P

R

0.000 0.000 0.097 0.009

0.944 0.811 0.277 0.547

0.256 0.021 0.1 0.041

0.847 0.999 0.975 0.996

0.02 0.000 0.004 0.001

0.779 0.978 0.903 0.955

P

R2

0.012 0.099 0.145 0.122

1.0 0.976 0.949 0.964

removal (Albert et al. 2000; Albert & Barabási 2002). Lusseau (2003) analysed the P(k) distribution of bottlenose dolphins’ social network and found that it does not fit a Poisson distribution, but rather decays following a power-law for k  7. Although studying the shape of P(k) may be interesting, it is important to stress that the features of the P(k) distributions mentioned above only emerge when networks are large. Because the maximum degree k that a node can have is n  1, it is only in large networks that k is free to increase by many orders of magnitude. As Croft et al. (2008) have noted, animal groups are often small, and thus, the range of k is restricted. As a result, it is unclear whether strong claims can be drawn from fitting, for instance, a power-law to the P(k) distribution of small animal groups (Croft et al. 2008). The results of my simulations support Croft et al.’s (2008) view. For any given sample size, different competing models can be fitted to the P(k) distribution with high statistical significance. As many have done before, I analysed the ‘complementary cumulative distribution function’ (ccdf), Pccdf(k), which specifies the probability that a node has a degree greater than or equal to k. Assuming that the cumulative distribution function is Pcum(k), Pccdf(k) ¼ 1  Pcum(k). Analysing the Pccdf(k) has the advantage of smoothing the statistical errors present in the tail of small samples (Boccaletti et al. 2006). The Pccdf(k) distributions of the small-world and random networks (Fig. 6) were similar to each other and resembled the distributions reported for different freshwater fish populations (Croft et al. 2008, page 81) and bottlenose dolphins (Lusseau 2003). When sample size e/E is small, most nodes have a degree k of 1 and the Pccdf(k) appears to decrease seemingly exponentially, with a sharp drop at k ¼ 2. As more edges are added to the sample, more connected nodes start to appear. The overall shapes of the distributions also change and appear to be S-shaped on a linearelinear scale, with a plateau in the very small k region. This is not surprising since the cumulative distribution of a unimodal distribution such as a Poisson distribution is also S-shaped. The plateau in the small k region occurs because nodes with very small k are rare in the network. Thus, as edge sample size changes, so does the range of observed ks. The upper limit of the observed k range is the first to be affected by increases in e/E. For example, when sampling from the random K ¼ 10 network, as e/E goes from 0.1 to 0.5, the range of observed ks goes from 1e5 to 1e14 (Fig. 6b). Eventually, the lower limit of the observed k range also increases, as very small k values

for all networks converge and increase with sample size at a rate of 1. For example, with 50% of the edges sampled, mean degree K is underestimated by half. Mean Normalized Node Betweenness Centrality (B) The betweenness centrality of a node is the number of shortest path lengths between other pairs of nodes that go through that node. The normalized betweenness of a node is its betweenness centrality divided by the maximal betweenness centrality possible in the network. Node betweenness centrality B is used to identify the key individuals of a group in terms of information transmission and social cohesion (Newman 2003; Lusseau & Newman 2004). The impact of sample size on B (Fig. 5c) resembles that of path length L (Fig. 3c). The average betweenness centrality is underestimated for very small samples sizes (this is especially true for the small-world networks) because most nodes are connected on average to only one node (k ¼ 1). The sharp increase in average B around e/E ¼ 0.2e4 results because, as mentioned previously, networks built from such samples tend to be composed of multiple chain-like components. Individuals in the centre of such chains find themselves in the path length of all other nodes on the chain. As more edges are sampled, components become bigger and more web-like than chain-like. When this occurs, the overestimation of the mean betweenness centrality B decreases rapidly. P(k) Distribution The degree distribution P(k) specifies the probability that a node is connected to k other nodes, and networks are often classified according to the shape of this distribution (Albert et al. 2000; Amaral et al. 2000; Albert & Barabási 2002; Boccaletti et al. 2006). For instance, both random and small-world networks are characterized by a Poisson P(k) distribution that peaks at K, the mean of the distribution, and decays exponentially for k[k (Bollobás 1985; Albert & Barabási 2002). On the other hand, scalefree networks have a degree distribution with tail decaying in a power-law fashion (Barabási & Albert 1999). The reason why the shape of P(k) has drawn so much attention is that it is thought to be informative of many properties of a network, such as the processes by which it has formed (Barabási & Albert 1999; Amaral et al. 2000; Albert & Barabási 2002) or its resistance to attack by node or edge

Table 3 Significance and R2 values of different trend lines fitted to the Pccdf(k) of the random K ¼ 10 network when e/E ¼ 0.1, 0.5 and 1 Random, k¼10 e/E¼0.1

Linear Exponential S-shaped Power-law

Tail of random, k¼10 e/E¼0.5

2

P

R

0.035 0.002 0.08 0.023

0.817 0.974 0.693 0.862

e/E¼1 2

P

R

0.000 0.000 0.023 0.000

0.876 0.936 0.36 0.695

e/E¼0.1 2

e/E¼0.5 2

e/E¼1 2

P

R

P

R

P

R

0.000 0.000 0.093 0.004

0.889 0.756 0.188 0.463

0.091 0.01 0.074 0.01

0.826 0.931 0.857 0.98

0.005 0.000 0.000 0.000

0.696 0.989 0.938 0.979

P

R2

0.000 0.001 0.000 0.000

0.966 0.91 0.707 0.82

C. Perreault / Animal Behaviour 80 (2010) 551e562

559

Table 4 Significance and R2 values of different trend lines fitted to the Pccdf(k) of the small-world K ¼ 6 network when e/E ¼ 0.1, 0.5 and 1 Small-world, k¼6 e/E¼0.1

Linear Exponential S-shaped Power-law

Tail of small-world, k¼6 e/E¼0.5

2

P

R

0.105 0.017 0.151 0.072

0.8 0.966 0.721 0.862

e/E¼1 2

e/E¼0.1 2

e/E¼0.5

e/E¼1

P

R

P

R

P

R

P

R

P

R2

0.000 0.006 0.149 0.043

0.953 0.809 0.368 0.592

0.013 0.032 0.307 0.126

0.67 0.563 0.172 0.344

0.256 0.082 0.203 0.144

0.847 0.984 0.902 0.95

0.068 0.041 0.097 0.066

0.869 0.92 0.816 0.872

0.02 0.045 0.119 0.077

0.874 0.787 0.609 0.701

cease to be observed. Still sampling from the random K ¼ 10 network, the range of observed ks goes from 1e14 to 6e16, when e/ E ¼ 0.5 and 1.0, respectively. The Pccdf(k) of the scale-free network looks substantially different (Fig. 6e). The distribution drops sharply before k ¼ K, the mean of the distribution, and a long tail is created by a few highly connected nodes. The upper limit of observed k is the most affected by an increase in edge sample size. The range of observed ks is 1e20, 1e67 and 4e124 for e/E ¼ 0.1, 0.5 and 1.0, respectively. The scale-free network contains one superhub (k ¼ 124) and four other well-connected nodes with k ¼ 25. The other nodes are connected to four, five or six nodes. The results suggest that, even though their actual degree will be underestimated, the superhubs in the network can be accurately identified as such, even when edge sample size is small, because of the probability that k  6 is greater than zero even when e/E ¼ 0.1. While the Pccdf(k) distributions differ visually both between the five networks and between sample sizes, they can often be fitted with statistical significance by the same models. Tables 2e6 present the significance and R2 values of various trend lines fit to both the entire distributions and their tails, with each tail defined as the values greater than the mean of the distribution (k > K). The models tested were linear, exponential, S-shaped (logistic) and power-law curves. Most data sets can be fitted significantly by at least two models, and sometimes by all four. Interestingly, a powerlaw model fitted the random network tails very well (Tables 2, 3). I also used one-sample KolmogoroveSmirnov Z tests to compare each degree frequency distribution P(k) to a Poisson distribution. The degree frequency distribution of the scale-free networks reconstructed from all sample sizes differed significantly from a Poisson distribution (Table 7), which is in line with scale-free graph theory. However, all the distributions obtained from sampling the small-world networks and most of the random ones also differed significantly from a Poisson distribution. The only exceptions were the frequency distributions tallied from the e/E ¼ 0.4 and e/E ¼ 1.0 samples taken from the random K ¼ 10 network. These results suggest that it is possible to incorrectly reject the small-world model (false negative), because its degree distribution is not Poisson distributed, and to claim that a network is scale-free (false positive), because a power-law trend line can be fitted to its tail. Overall, simulations suggest that the P(k) distribution of animal

2

2

social networks, because they are typically small, can be fitted to different competing models. We should be careful before making strong scientific claims based solely on a poor fit between the P(k) distribution and a Poisson model, or by fitting a power-law line on a portion of the P(k) distribution. DISCUSSION I have described the impact of edge sample size on many different measures used to analyse animal social networks. I have shown that when an incomplete sample is analysed, network properties can be considerably misrepresented. However, because edge sample size affects the various network measures in different ways, it is not possible to define a minimal threshold at which all network measures become reliable. For example, a network constructed from 30% of its actual edges, while displaying a mean betweenness centrality value that is very close to true mean betweenness centrality, will underestimate the mean clustering coefficient by about 70%, underestimate the density of the network by 60%, overestimate the number of components 10-fold, and overestimate the mean path length by 200%. Finally, I have also shown that because animal networks are typically small (the networks simulated here contained 125 individuals), simple curve fitting to the degree distribution P(k) is a perilous exercise because many different models can be fit significantly to the same data. Most degree distributions, including that of the small-world and random networks, differed statistically from a Poisson distribution. If the observations of individuals’ associations or interactions are accumulated over a long period, the effect of increasing e/E could be mistaken for a temporal change in the social network, or vice versa. The edge sample size effect could also muddy the comparison of network structure between populations and between species (see Croft et al. 2008, Chapter 7, for a good discussion of these issues). For example, it could potentially make two populations look similar while in fact they have a different network structure, or make them appear different while in reality they are not. Overall, the results suggest that strong claims about animal social networks should not be made unless considerable effort has been made to collect an exhaustive number of associationeinteraction data points. While this study is primarily aimed at biologists studying animal social networks, the conclusions drawn

Table 5 Significance and R2 values of different trend lines fitted to the Pccdf (k) of the small-world K ¼ 10 network when e/E ¼ 0.1, 0.5 and 1 Small-world, k¼10 e/E¼0.1

Linear Exponential S-shaped Power-law

Tail of small-world, k¼10 e/E¼0.5

2

P

R

0.105 0.017 0.151 0.072

0.8 0.966 0.721 0.862

e/E¼1 2

e/E¼0.1 2

e/E¼0.5 2

e/E¼1 2

P

R

P

R

P

R

P

R

P

R2

0.000 0.006 0.149 0.043

0.953 0.809 0.368 0.592

0.013 0.032 0.307 0.126

0.67 0.563 0.172 0.344

0.256 0.082 0.203 0.144

0.847 0.984 0.902 0.95

0.068 0.041 0.097 0.066

0.869 0.92 0.816 0.872

d d d d

d d d d

560

C. Perreault / Animal Behaviour 80 (2010) 551e562

Table 6 Significance and R2 values of different trend lines fitted to the Pccdf (k) of the scale-free network when e/E ¼ 0.1, 0.5 and 1 Scale-free

Tail of scale-free

e/E¼0.1

Linear Exponential S-shaped Power-law

e/E¼0.5 2

P

R

0.022 0.000 0.000 0.000

0.257 0.836 0.761 0.913

e/E¼1 2

P

R

0.000 0.000 0.000 0.000

0.191 0.573 0.692 0.865

2

K¼6

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Small-world K¼10

K¼6

e/E¼1 2

P

R

P

R

P

R

0.163 0.39 0.589 0.766

0.012 0.000 0.000 0.000

0.32 0.868 0.754 0.863

0.000 0.000 0.000 0.000

0.186 0.61 0.844 0.786

Table 7 KolmogoroveSmirnov Z scores and significances when testing for a difference between P(k) frequency distribution and a Poisson distribution Random

e/E¼0.5 2

0.000 0.000 0.000 0.000

here are also relevant for those of us who are interested in the study of social network of small human groups and organizations. For example, network analysis methods have been applied to human prehistoric social networks (e.g. Peregrine 1991; Jenkins 2001; Mizoguchi 2009). Given the fact that information about prehistoric social networks is likely to be highly fragmentary (and uneven between time periods and regions), this paper indicates that the reconstructions of past social network are likely to be greatly distorted. It is difficult to provide definite guidelines for estimating the proportion of edges that have been sampled from a given study population. Plotting the different network measures as a function of observation time can give some indication of the relative size of the edge sample. As discussed previously, a fragmented network containing many isolated components can indicate poor edge sampling. If, as new edges are added to the network, the number of nodes keeps increasing, then the proportion of edges sampled could be small, since in all the networks studied here, the number of nodes as a function of edge sample size reached an asymptote around e/E ¼ 0.5 (Fig. 3a). The mean clustering coefficient C (Fig. 3d) and the mean degree K (Fig. 5b), both of which increase linearly with edge sample size, could be used in the same way to indicate the completeness of the edge sample. The problem of sampling networks is an important one because animal social networks are often built by aggregation of multiple independent interaction or association observations. It is, however, a complex issue. In this study a number of simplifying assumptions have been made. I assumed that networks did not evolve over time: all the edges were taken from the same fixed networks. There were no errors in the sampling process: when an association was observed (i.e. an edge was sampled), it truly existed in the network. Finally, edges were also undirected and unweighted, and all had the same probability of being sampled. The unweightedness is perhaps the most important assumption to recognize, since empirical evidence suggests that animal social networks are typically weighted. Weighted networks will impact the results presented

e/E

e/E¼0.1

Scale-free K¼10

P

Z

P

Z

P

Z

P

Z

P

Z

0.000 0.000 0.000 0.000 0.005 0.004 0.000 0.000 0.000 0.000

6.33 5.67 4.06 2.53 1.73 1.75 2.40 2.31 2.82 1.064

0.000 0.000 0.012 0.410 0.018 0.000 0.002 0.000 0.000 0.345

6.05 3.48 1.6 0.89 1.53 2.32 1.88 2.15 2.52 0.97

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

6.93 6.00 4.34 2.83 3.26 4.39 4.60 5.90 7.79 3.09

0.000 0.000 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000

6.38 3.35 1.92 2.34 2.71 3.80 4.04 5.88 7.17 3.28

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

4.96 4.53 5.46 6.53 6.3 8.16 9.96 11.17 14.99 5.88

P

R2

0.000 0.000 0.000 0.000

0.391 0.391 0.712 0.621

here because, for any given amount of research effort, strong relationships will be observed more frequently than weak ones, inducing a systematic and nonrandom bias in the sampling process. The next step in furthering our understanding of the impact of sample size on the reconstruction of network will require relaxing these assumptions. Acknowledgments I thank Joan B. Silk, P. Jeffrey Brantingham, Irene Godoy, Bailey House, Michelle Kline, Christina Larson, Ming Xue and Hannah Reiss as well as the two anonymous referees for their constructive comments on the manuscript. This work was supported by the Social Sciences and Humanities Research Council of Canada (7522006-2301). References Albert, R. & Barabási, A.-L. 2002. Statistical mechanics of complex networks. Reviews of Modern Physics, 74, 47. Albert, R., Jeong, H. & Barabasi, A.-L. 2000. Error and attack tolerance of complex networks. Nature, 406, 378e382. Amaral, L. A. N., Scala, A., Barthélémy, M. & Stanley, H. E. 2000. Classes of smallworld networks. Proceedings of the National Academy of Sciences, U.S.A., 97, 11149e11152. Barabási, A.-L. & Albert, R. 1999. Emergence of scaling in random networks. Science, 286, 509e512. Barrat, A. & Weigt, M. 2000. On the properties of small-world network models. European Physical Journal B, Condensed Matter and Complex Systems, 13, 547e560. Boccaletti, S., Latora, V., Moreno, Y., Chavez, M. & Hwang, D. 2006. Complex networks: structure and dynamics. Physics Reports, 424, 175e308. Bollobás, B. 1985. Random Graphs. London: Academic Press. Borgatti, S. P. 2002. Netdraw: Graph Visualization Software. Harvard, Massachusetts: Analytic Technologies. Borgatti, S. P., Everett, M. G. & Freeman, L. C. 2002. UCINET for Windows: Software for Social Network Analysis. Harvard, Massachusetts: Analytic Technologies. Borgatti, S. P., Carley, K. M. & Krackhardt, D. 2006. On the robustness of centrality measures under conditions of imperfect data. Social Networks, 28, 124e136. Callaway, D. S., Newman, M. E. J., Strogatz, S. H. & Watts, D. J. 2000. Network robustness and fragility: percolation on random graphs. Physical Review Letters, 85, 5468. Costenbader, E. & Valente, T. W. 2003. The stability of centrality measures when networks are sampled. Social Networks, 25, 283e307. Croft, D. P., Krause, J. & James, R. 2004. Social networks in the guppy (Poecilia reticulata). Proceedings of the Royal Society B, Supplement, 271, S516eS519. Croft, D. P., James, R., Ward, A. J. W., Botham, M. S., Mawdsley, D. & Krause, J. 2005. Assortative interactions and social networks in fish. Oecologia, 143, 211e219. Croft, D. P., James, R. & Krause, J. 2008. Exploring Animal Social Networks. Princeton, New Jersey: Princeton University Press. Ebel, H., Mielsch, L.-I. & Bornholdt, S. 2002. Scale-free topology of e-mail networks. Physical Review E, 66, 035103. Erdös, P. & Rényi, A. 1959. On random graphs. Publicationes Mathematicae (Debrecen), 6, 290e297. Faloutsos, M., Faloutsos, P. & Faloutsos, C. 1999. On power-law relationships of the Internet topology. In: Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication. Cambridge, Massachusetts: Association for Computing Machinery. Flack, J. C., Girvan, M., de Waal, F. B. M. & Krakauer, D. C. 2006. Policing stabilizes construction of social niches in primates. Nature, 439, 426e429. Fowler, J. H. & Christakis, N. A. 2008. Dynamic spread of happiness in a large social network: longitudinal analysis over 20 years in the Framingham heart study. British Medical Journal, 337, a2338.

C. Perreault / Animal Behaviour 80 (2010) 551e562 Franks, D., James, R., Noble, J. & Ruxton, G. 2009. A foundation for developing a methodology for social network sampling. Behavioral Ecology and Sociobiology, 63, 1079e1088. Godfrey, S., Bull, C., James, R. & Murray, K. 2009. Network structure and parasite transmission in a group living lizard, the gidgee skink, Egernia stokesii. Behavioral Ecology and Sociobiology, 63, 1045e1056. Henzi, S., Lusseau, D., Weingrill, T., van Schaik, C. & Barrett, L. 2009. Cyclicity in the structure of female baboon social networks. Behavioral Ecology and Sociobiology, 63, 1015e1021. Humphries, M. D., Gurney, K. & Prescott, T. J. 2006. The brainstem reticular formation is a small-world, not scale-free, network. Proceedings of the Royal Society B, 273, 503e511. James, R., Croft, D. & Krause, J. 2009. Potential banana skins in animal social network analysis. Behavioral Ecology and Sociobiology, 63, 989e997. Jenkins, D. 2001. A network analysis of Inka roads, administrative centers, and storage facilities. Ethnohistory, 48, 655e687. Kossinets, G. 2006. Effects of missing data in social networks. Social Networks, 28, 247e268. Krause, J., Croft, D. & James, R. 2007. Social network theory in the behavioural sciences: potential applications. Behavioral Ecology and Sociobiology, 62, 15e27. Latora, V. & Marchiori, M. 2001. Efficient behavior of small-world networks. Physical Review Letters, 87, 198701. Lee, S. H., Kim, P.-J. & Jeong, H. 2006. Statistical properties of sampled networks. Physical Review E, 73, 016102. Lusseau, D. 2003. The emergent properties of a dolphin social network. Proceedings of the Royal Society B, Supplement, 270, S186eS188. Lusseau, D. & Newman, M. E. J. 2004. Identifying the role that animals play in their social networks. Proceedings of the Royal Society B, Series B, Supplement, 271, S477eS481. Lusseau, D., Wilson, B., Hammond, P. S., Grellier, K., Durban, J. W., Parsons, K. M., Barton, T. R. & Thompson, P. M. 2006. Quantifying the influence of sociality on population structure in bottlenose dolphins. Journal of Animal Ecology, 75, 14e24. Lusseau, D., Whitehead, H. & Gero, S. 2008. Incorporating uncertainty into the study of animal social networks. Animal Behaviour, 75, 1809e1815. McDonald, D. B. 2007. Predicting fate from early connectivity in a social network. Proceedings of the National Academy of Sciences, U.S.A., 104, 10910e10914. McDonald, D. B. 2009. Young-boy networks without kin clusters in a lek-mating manakin. Behavioral Ecology and Sociobiology, 63, 1029e1034. Marsden, P. V. 1990. Network data and measurement. Annual Review of Sociology, 16, 435e463. Mizoguchi, K. 2009. Nodes and edges: a network approach to hierarchisation and state formation in Japan. Journal of Anthropological Archaeology, 28, 14e26. Moore, C. & Newman, M. E. J. 2000. Epidemics and percolation in small-world networks. Physical Review E, 61, 5678. Naug, D. 2009. Structure and resilience of the social network in an insect colony as a function of colony size. Behavioral Ecology and Sociobiology, 63, 1023e1028. Newman, M. E. J. 2003. The structure and function of complex networks. SIAM Review, 45, 167e256. Newman, M. E. J. & Watts, D. J. 1999. Scaling and percolation in the small-world network model. Physical Review E, 60, 7332. Pastor-Satorras, R. & Vespignani, A. 2001. Epidemic spreading in scale-free networks. Physical Review Letters, 86, 3200. Peregrine, P. 1991. A graph-theoretic approach to the evolution of Cahokia. American Antiquity, 56, 66e75. Ramos-Fernández, G., Boyer, D., Aureli, F. & Vick, L. 2009. Association networks in spider monkeys (Ateles geoffroyi). Behavioral Ecology and Sociobiology, 63, 999e1013. Ravasz, E. & Barabási, A.-L. 2003. Hierarchical organization in complex networks. Physical Review E, 67, 026112. Schneeberger, A., Mercer, C. H., Gregson, S. A. J., Ferguson, N. M., Nyamukapa, C. A., Anderson, R. M., Johnson, A. M. & Garnett, G. P. 2004. Scale-free networks and sexually transmitted diseases: a description of observed patterns of sexual contacts in Britain and Zimbabwe. Sexually Transmitted Diseases, 31, 380e387. Stumpf, M. P. H., Wiuf, C. & May, R. M. 2005. Subnets of scale-free networks are not scale-free: sampling properties of networks. Proceedings of the National Academy of Sciences, U.S.A., 102, 4221e4224. Watts, D. J. 1999. Small Worlds: the Dynamics of Networks Between Order and Randomness. Princeton, New Jersey: Princeton University Press. Watts, D. J. 2004. The ‘new’ science of networks. Annual Review of Sociology, 30, 243e270. Watts, D. J. & Strogatz, S. H. 1998. Collective dynamics of ‘small-world’ networks. Nature, 393, 440e442. Wey, T., Blumstein, D. T., Shen, W. & Jordán, F. 2008. Social network analysis of animal behaviour: a promising tool for the study of sociality. Animal Behaviour, 75, 333e344. Whitehead, H. 2008a. Analysing Animal Societies: Quantitative Methods for Vertebrate Social Analysis. Chicago: University of Chicago Press. Whitehead, H. 2008b. Precision and power in the analysis of social structure using associations. Animal Behaviour, 75, 1093e1099. Willinger, W., Alderson, D. & Doyle, J. C. 2009. Mathematics and the internet: a source of enormous confusion and great potential. Notices of the American Mathematical Society, 56, 586e599. Yoon, S., Lee, S. H., Yook, S. & Kim, Y. 2007. Statistical properties of sampled networks by random walks. Physical Review Letters, 75, 046114.

561

APPENDIX Table A1 Mean and standard deviation of number of nodes (n) calculated over the 10 independent samples of size e/E e/E

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Random, K¼6

Random, K¼10

Small-world, K¼6

Small-world, K¼10

Scale-free

MeanSD

MeanSD

MeanSD

MeanSD

MeanSD

56.22.62 88.22.57 108.12.28 115.52.8 120.81.81 123.41.17 124.20.79 124.70.48 124.90.33 1250

802.62 110.72.83 120.41.9 123.30.95 124.50.97 1250 1250 1250 1250 1250

58.52.46 92.82.82 110.93.38 119.81.81 123.31.25 124.40.7 1250 1250 1250 1250

83.13.18 111.22.49 122.10.88 124.50.97 124.90.32 1250 1250 1250 1250 1250

57.21.62 902 108.82.2 117.43.34 1231.49 124.30.67 124.70.48 124.90.32 1250 1250

Table A2 Mean and standard deviation of number of components (G) calculated over the 10 independent samples of size e/E e/E

Random, K¼6

Random, K¼10

Small-world, K¼6

Small-world, K¼10

Scale-free

MeanSD

MeanSD

MeanSD

MeanSD

MeanSD

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

19.22.62 15.82.62 5.61.71 2.20.92 1.20.42 1.20 10 10 10 10

17.52.42 3.11.2 1.30.48 1.10.32 10 1.0 10 10 10 10

21.22.35 20.12.77 9.83.46 3.51.35 1.50.71 10 10 10 10 10

21.72.59 5.81.62 1.70.68 10 10 10 10 10 10 10

13.52.07 10.61.84 5.11.66 2.10.745 1.30.48 1.20.42 10 10 10 10

Table A3 Mean and standard deviation of path length (L) calculated over the 10 independent samples of size e/E e/E

Random, K¼6

Random, K¼10

Small-world, K¼6

Small-world, K¼10

Scale-free

MeanSD

MeanSD

MeanSD

MeanSD

MeanSD

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

1.810.23 5.051.54 7.460.57 5.290.24 4.370.11 3.870.04 3.480.03 3.220.02 3.010.02 2.860

3.650.74 6.570.49 4.410.10 3.570.05 3.150.02 2.870.01 2.660.01 2.520 2.410 2.320

1.650.24 3.481.03 7.442.44 7.720.77 6.440.39 5.580.22 4.920.19 4.510.12 4.150.05 3.950

2.790.56 7.550.98 5.810.45 4.530.14 4.060.22 3.580.06 3.340.04 3.150.05 3.020.05 2.870

2.540.26 3.830.57 3.740.22 3.350.17 3.10.15 2.690.10 2.480.07 2.270.06 2.120.03 1.940

Table A4 Mean and standard deviation of clustering coefficient (C) calculated over the 10 independent samples of size e/E e/E

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Random, K¼6

Random, K¼10

Small-world, K¼6

Small-world, K¼10

Scale-free

MeanSD

MeanSD

MeanSD

MeanSD

MeanSD

00 00 0.010.01 0.020.01 0.020.02 0.030.01 0.030.01 0.030.01 0.040 0.040

0.020.03 0.020.02 0.010.01 0.030.01 0.030.01 0.040.01 0.040 0.050 0.060 0.070

0.110.1 0.050.06 0.130.06 0.150.03 0.230.04 0.260.03 0.320.02 0.370.02 0.40.01 0.450

0.060.07 0.090.05 0.140.03 0.200.02 0.240.02 0.290.01 0.350.01 0.390.02 0.440.02 0.490

0.120.09 0.190.05 0.290.04 0.360.05 0.470.04 0.560.03 0.690.02 0.780.02 0.870.01 0.960

562

C. Perreault / Animal Behaviour 80 (2010) 551e562

Table A5 Mean and standard deviation of network density (r) calculated over the 10 independent samples of size e/E e/E

Random, K¼6

Random, K¼10

Small-world, K¼6

Small-world, K¼10

Scale-free

MeanSD

MeanSD

MeanSD

MeanSD

MeanSD

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.020 0.020 0.020 0.020 0.030 0.030 0.030 0.040 0.040 0.050

0.020 0.020 0.030 0.030 0.040 0.050 0.060 0.060 0.070 0.080

0.020 0.020 0.020 0.020 0.020 0.030 0.030 0.040 0.040 0.050

0.020 0.020 0.030 0.030 0.040 0.050 0.060 0.060 0.070 0.080

0.030 0.020 0.020 0.030 0.030 0.040 0.040 0.050 0.050 0.060

Table A6 Mean and standard deviation of mean degree (K) calculated over the 10 independent samples of size e/E e/E

Random, K¼6

Random, K¼10

Small-world, K¼6

Small-world, K¼10

Scale-free

MeanSD

MeanSD

MeanSD

MeanSD

MeanSD

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

1.320.58 1.660.85 2.091.07 2.61.31 3.111.46 3.651.61 4.241.67 4.811.72 5.411.75 61.81

1.590.82 2.261.2 3.121.45 4.061.78 5.031.94 62.01 7.012.19 82.22 9.012.24 10.052.24

1.30.54 1.620.77 2.040.92 2.51.15 3.051.23 3.621.25 4.211.22 4.801.15 5.410.94 60.65

1.510.72 2.251.15 3.081.39 4.021.54 4.911.65 61.62 6.961.6 81.46 8.911.27 100.91

1.571.81 22.63 2.483.66 3.074.7 3.675.7 4.366.86 5.077.93 5.789.06 6.59.92 7.2211.09

Table A7 Mean and standard deviation of mean node betweenness centrality (B) calculated over the 10 independent samples of size e/E e/E

Random, K¼6

Random, K¼10

Small-world, K¼6

Small-world, K¼10

Scale-free

MeanSD

MeanSD

MeanSD

MeanSD

MeanSD

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.070.18 1.12.91 4.66.12 3.593.92 2.812.61 2.352.06 2.041.55 1.811.27 1.641.03 1.510

0.491.16 4.585.75 2.852.68 2.121.75 1.751.33 1.521.01 1.350.85 1.240.68 1.150.58 1.080

0.050.15 0.250.67 2.425.28 56.08 4.334.73 3.743.36 3.162.86 2.852.39 2.561.93 2.40

0.140.31 3.665.7 3.914.44 2.882.72 2.512.35 2.11.82 1.921.56 1.751.37 1.641.22 1.520

0.562.38 1.685.45 2.067.38 1.938.08 1.728.08 1.377.96 1.27.97 1.048.01 0.918 0.770