Effect of microsatellite outliers on the genetic structure of eight Italian goat breeds

Effect of microsatellite outliers on the genetic structure of eight Italian goat breeds

Small Ruminant Research 103 (2012) 99–107 Contents lists available at ScienceDirect Small Ruminant Research journal homepage: www.elsevier.com/locat...

669KB Sizes 48 Downloads 60 Views

Small Ruminant Research 103 (2012) 99–107

Contents lists available at ScienceDirect

Small Ruminant Research journal homepage: www.elsevier.com/locate/smallrumres

Effect of microsatellite outliers on the genetic structure of eight Italian goat breeds Riccardo Negrini a , Mariasilvia D’Andrea b,∗ , Paola Crepaldi c , Licia Colli a , Letizia Nicoloso c , Anna Maria Guastella d , Tiziana Sechi e , Salvatore Bordonaro d , Paolo Ajmone-Marsan a , Fabio Pilla b , the Econogene Consortium1 a b c d e

Università Cattolica del Sacro Cuore, Istituto di Zootecnica, Via Emilia Parmense 84, 29122 Piacenza, Italy Università degli Studi del Molise, Dipartimento Scienze Animali Vegetali e dell’Ambiente, Via De Sanctis, 86100 Campobasso, Italy Università degli Studi di Milano, Dipartimento di Scienze Animali, Sezione di Zootecnica Agraria, Via Celoria 2, 20133 Milano, Italy Università degli Studi di Catania, DACPA, sez. di Scienze delle Produzioni Animali, Via Valdisavoia 5, 95125 Catania, Italy Research Unit, Genetics and Biotechnology, DiRPA – AGRIS, Sardegna, Olmedo (SS), Italy

a r t i c l e

i n f o

Article history: Received 16 March 2011 Received in revised form 12 July 2011 Accepted 10 August 2011 Available online 6 September 2011 Keywords: Goat Selection signature Outlier Microsatellites

a b s t r a c t The identification of outlier loci in multilocus datasets is a critical step for reliably inferring population demographic histories and for detecting signatures of adaptive selection, since the presence of just a few of them can lead to a biased estimation of genetic parameters. Here, we show the effect of four outliers out of 30 microsatellites on phylogeography and relationship between eight native Italian goat breeds. The presence of the outliers influenced the relationships between individuals as estimated by factorial correspondence analysis, and between breeds as estimated by Reynolds’ distances. The outliers also affected the neighbour-joining (NJ) tree topology and the computation of the putative genetic barriers separating the populations. The complete dataset supported the existence of two significant barriers: the first isolates Orobica from the surrounding northern Alpine populations and the second separates Girgentana from the southern and Sarda breeds. After removing the outliers, a single and different barrier was significant. It separated the Sarda breed from the continental populations. Our results give new insights into the genetic structure of several native goat breeds across Italy and provide a clear example of the importance of testing for the occurrence of non-neutral loci in population genetics studies. Four outlier loci demonstrated the ability to blur the effect of geography, hiding the natural genetic barrier that separates the Sarda breed from the continental populations, while highlighting the peculiar demographic and selection history of Orobica and Girgentana. © 2011 Elsevier B.V. All rights reserved.

1. Introduction The ability of goats to adapt to very different environments is the result of the high genetic variability harboured within the 1159 breeds presently recognised worldwide, of which almost 50 are native to Italy (FAO, 2009). With a

∗ Corresponding author. Tel.: +39 0874 404818; fax: +39 0874 404855. E-mail address: [email protected] (M. D’Andrea). 1 www.econogene.eu. 0921-4488/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.smallrumres.2011.08.006

few exceptions, Italian goat breeds are mainly reared in the Alps and in Mediterranean scrub landscapes. In both environments the scarce employment of intensive farming and selective breeding have, on the one hand, limited the diffusion of cosmopolitan breeds while on the other hand they have challenged the viability of several breeds, dwarfing their population size. Knowledge about the extent and distribution of genetic variability is of pivotal importance for the design of proper conservation strategies, which also require detailed information about the processes and patterns of

100

R. Negrini et al. / Small Ruminant Research 103 (2012) 99–107

gene flow and the role of the landscape in structuring populations. The identification of genetic spatial patterns needs an exact record of sampling locations, while population genetics requires surveys of a sufficient number of informative loci across the genome. These requirements were met within the framework of the ECONOGENE project (www.econogene.eu). A large number of goat samples were collected, the GPS coordinates were recorded, and the samples were genotyped with the complete panel of microsatellites, as suggested by the FAO (FAO, 2009). The combined analysis of geographical and molecular data using appropriate statistical tools allows population diversity and evolution to be better understood, especially at fine spatial scales (Manel et al., 2003). However, as pointed out by Luikart et al. (2003), in order to reliably infer demographic histories and phylogenetic relationships and to identify adaptive molecular variations, the identification of non-neutral loci is also extremely important. Although they might be under selection and thus informative from a completely different perspectives, the inclusion of a few outliers among neutral loci can significantly bias the estimates of population genetics indices. For example Landry et al. (2002) found that the presence of a single locus within a panel of 17 microsatellite markers showing an extremely high inter-population variance in allele lengths seriously affected phylogeny branching patterns and caused an overestimation of phylogenetic tree stability. The effect of outlier loci was also assessed in the estimates of FST in animal populations using seven allozyme datasets from Peromyscus mice. The populations of this small terrestrial mammal apparently displayed high levels of gene flow and adaptive geographic differences in physiology, morphology and behaviour. However, the Fdist test of Beaumont and Nichols (1996) detected several outlier loci occurring in five out of seven datasets and showed that the removal of outliers substantially reduced FST values by 18–62% (Storz and Nachman, 2003). The aim of this paper was to investigate the effect of outlier loci on the description of patterns of genetic variation in the main eight native Italian breeds of goat reared in Alpine and Mediterranean areas. The purpose of this was to disentangle different microevolutionary processes, such as gene flow, genetic drift and selection, and to correctly identify spatial patterns of genetic diversity, genetic barriers and population structure. Understanding the underlying patterns of gene flow and local adaptation is, in fact, crucial for proper management of the genetic diversity of endangered populations (Manel et al., 2003). 2. Materials and methods 2.1. Sample collection and DNA extraction A total of 246 biological samples were collected from eight native Italian breeds: Camosciata delle Alpi (ChItCAM = 30); Valdostana (ChItVAL = 31), Bionda dell’Adamello (ChItBIO = 31); Orobica (ChItORO = 31); Grigia Molisana (ChItGRI = 31), Girgentana (ChItGIR = 30); Argentata dell’Etna (ChItARG = 31); Sarda (ChItSAR = 31). For each breed, a maximum of three unrelated goats were sampled from at least 10 different flocks. The geographical location of each sampling site was recorded using a GPS

device. When this was not possible, the location of farms was derived from local or regional maps. The DNA was extracted from 200 ␮l of frozen whole blood using a commercial kit and following the manufacturer’s instructions, and checked for quality and yields. 2.2. Microsatellite genotyping Information on the FAO panel of 30 microsatellite markers, including primer sequences, annealing temperatures, number of alleles, range of allele sizes and expected and observed heterozygosities in 45 European and Middle Eastern goat breeds (including those analysed in this study) ˜ et al. (2006). The chromosome position of the is reported in Canón microsatellite markers studied is reported at http://www.econogene. and http://www.fao.org/ag/againfo/ eu/list of msmarkers.html programmes/en/genetics/angrvent-docs.html (Document Code CGRFA/ WG-AnGR-6/10/Inf.7). 2.3. Statistical analysis The possible occurrence of genotyping errors due to nonamplified alleles (null alleles), large allele dropout, stutter peaks or potential typographic errors was checked with MICRO–CHECKER v.2.3.2 (Van Oosterhout et al., 2004; freely downloaded from http://www.microchecker.hull.ac.uk/). The detection of outlier loci was performed following the FST outlier approach developed by Beaumont and Nichols (1996). This method evaluates the relationship between FST and He (expected heterozygosity) by calculating the expected distribution of Wright’s coefficient FST vs. He under an island model of migration with neutral markers. This distribution is then used to identify outlier loci that have excessively high or low FST values compared to neutral expectations. Such outlier loci are candidates for being under directional selection. The initial average FST value is often non-neutral because the still undetected loci under selection are included in the computation. Indeed, a preliminary candidate subset of loci under selection was identified and removed by running Dfdist once with 50,000 iterations and a threshold value of 95%, and the FST was estimated again. This procedure lowers the bias on the estimation of the mean neutral FST by removing the loci with the most extreme behaviour. A second and final run of Dfdist, considering all markers and 100,000 iterations, was then run using the unbiased FST mean estimate. The loci above the 95% threshold in this second run were treated as putatively subject to selection in the subsequent analyses. The required computations were carried out using the LOSITAN software online facilities (Antao et al., 2008). All exploratory statistics were performed using the FSTAT v. 2.9.3 program (Goudet, 2001) unless otherwise stated. The expected and unbiased observed heterozygosity (He and Ho, respectively) for each locus were estimated from the allele frequencies observed within a population and corrected for small population size. The computation of allelic richness was based on the rarefaction method devised by Hurlbert (1971). Weir and Cockerham’s (1984) inbreeding coefficient, FIS , was calculated per locus and across loci per breed and its significance was tested by P-values calculated over 2300 randomisations and adjusted for multiple testing using a Bonferroni correction (Rice, 1989). Deviations from Hardy–Weinberg equilibrium in each population and at each locus, as well as over all loci, were estimated from FIS . The pair-wise differentiation between populations was quantified by estimating Reynolds’ distances (Reynolds et al., 1983) from the observed allele frequencies and assigning a frequency of 0.01 to missing data to avoid null values. The statistical significance of the Reynolds’ distances was tested by permutations and applying Bonferroni corrections. Two Reynolds’ distance matrices were calculated to consider either the complete set of loci or to exclude those putatively under selection. The correlation between matrices was estimated by the Mantel test (Mantel, 1967). Neighbour-joining (NJ) trees were built using Mesquite program for both matrices (Maddison and Maddison, 2010; Saitou and Nei, 1987) and the topology compared through the Robinson-Foulds metric, also known as “bipartition distance” (Robinson and Foulds, 1981). Genetic relationships between individuals were investigated using the factorial correspondence analysis (FCA) implemented in the Genetix software using both the entire dataset or excluding the outliers (Belkhir et al., 1996) The projection of the contributions onto the same set of factorial axes enables two-dimensional graphs to be drawn. The comparison

R. Negrini et al. / Small Ruminant Research 103 (2012) 99–107

101

Table 1 Allele richness per locus, per population and overall. Locus

ChItCAM

ChItVAL

ChItBIO

ChItORO

ChItGMO

ChItGIR

ChItARG

ChItSAR

Overall

CSRD247 DRBP1 ILSTS011 ILSTS087 INRA023 INRA063 InraBern172 MAF65 McM527 OarAE54 OarFCB20 OarFCB48 SPS113 SRCRSP09 SRCRSP23 SRCRSP3 MAF70 SRCRSP5 ILSTS005 ETH10 TGLA53 SRCRSP8 BM6444 P19 MAF209 SRCRSP7 ILSTS029 SRCRSP15 TCRVB6 INRABERN185 Overall

8.0 8.9 5.0 6.0 7.9 6.0 5.0 8.9 6.0 5.9 5.0 8.0 6.0 9.0 11.9 5.0 7.0 6.9 2.0 3.0 3.0 4.0 8.9 6.9 3.0 3.0 3.9 5.0 9.9 3.9 6.1

3.9 5.9 5.9 4.9 7.7 4.9 5.9 6.9 5.9 4.0 6.0 7.9 5.9 4.9 7.9 4.9 7.8 6.9 4.0 3.0 5.0 6.9 14.0 8.7 3.0 4.0 2.9 5.9 7.9 3.0 5.9

6.0 6.9 6.0 5.9 6.0 4.9 8.0 9.9 7.9 6.0 4.9 6.0 7.9 8.9 12.7 3.9 6.9 6.9 3.9 2.9 5.0 8.9 11.7 8.0 3.0 4.0 3.0 5.9 7.9 5.9 6.5

5.9 7.0 5.9 6.0 4.0 4.9 4.9 5.9 4.9 4.9 4.0 3.9 4.9 5.9 9.0 3.0 4.9 5.9 4.0 2.9 3.0 3.9 14.8 5.9 2.0 4.0 3.0 4.9 8.9 2.0 5.2

7.0 8.9 8.0 6.9 8.9 3.0 7.9 10.9 7.0 8.9 5.9 7.9 9.9 8.9 10.7 6.0 5.0 6.0 4.9 3.0 9.8 8.8 14.7 9.0 3.0 4.0 5.8 6.0 9.9 4.9 7.4

6.0 8.0 6.0 6.0 6.0 4.0 8.0 7.9 5.0 5.0 5.0 7.0 7.9 5.0 6.0 5.0 7.0 7.0 4.0 3.0 4.0 4.9 11.0 6.0 3.0 4.0 4.0 4.0 7.0 4.0 5.7

9.9 7.9 7.0 9.8 10.9 4.0 8.9 12.9 7.9 10.9 6.9 8.9 9.9 8.9 12.9 7.0 6.9 6.9 6.0 4.0 5.9 9.9 16.7 10.9 3.0 4.0 3.9 6.9 8.9 4.9 8.1

7.9 6.9 7.9 7.8 8.0 4.0 6.9 14.6 7.9 7.8 7.8 8.9 7.9 6.9 9.0 6.0 8.9 8.9 6.0 3.0 6.9 7.8 11.0 7.9 3.0 3.0 4.9 6.0 7.0 7.8 7.3

6.8 7.6 6.5 6.7 7.4 4.5 6.9 9.7 6.6 6.7 5.7 7.3 7.5 7.3 10.0 5.1 6.8 6.9 4.4 3.1 5.3 6.9 12.9 7.9 2.9 3.7 3.9 5.6 8.4 4.6 6.5

between the two different FCA configurations was done by calculating the pair-wise Euclidian distances between individuals based on the first two principal axes and by testing the matrices by the Mantel test (Mantel, 1967). The population structure was further investigated using STRUCTURE 2.2 (Pritchard and Wen, 2004). Ten independent runs for K = 1–8 with a burn-in step of 50,000 followed by 100,000 Markov Chain Monte Carlo (MCMC) iterations were carried out to estimate the most likely number of partitions, independent of breed affiliation. The program was run using the admixture model and considering the correlated allele frequencies (Falush et al., 2003). Interpreting the results of recursive clustering procedures is not always straightforward; therefore, in order to identify the effective number of genetic clusters, i.e. the value of K best, highlighting the actual population structure, we followed: (i) the approach by Evanno et al. (2005) using the modal value of the distribution of the K estimator as an indicator of the strength of the genetic structure detected by the software; (ii) the permutation Greedy algorithm implemented in CLUMPP (Jakobsson and Rosenberg, 2007), considering the permuted position of a randomly chosen cluster (among eight) in a randomly chosen run (among 100). Bottleneck software version 1.2 (http://www.ensam.inra.fr/URLB/ bottleneck/bottleneck.html) was used to test for evidence of a reduction in the population size by computing, for each population and for each locus, the distribution of the values of heterozygosity expected from the observed number of alleles (K), given the sample size (n) and under the assumption of mutation-drift equilibrium. This distribution was obtained by simulating the coalescent process of a set of genes under the stepwise mutation model (SMM), suitable for microsatellites (Cornuet and Luikart, 1997). Monmonier’s (1973) maximum difference algorithm for identifying boundaries, as implemented in the Barrier software (Manni et al., 2004), was used to estimate three potential barriers considering three geographical areas: northern Italy, southern Italy and the island of Sardinia, whereas in the map only the barriers deemed to be significant were drawn. The

significance of the barriers was tested by 100 bootstrap replicates. None of the external or internal edges of the triangulation were deleted.

3. Results 3.1. Genetic variability All microsatellites were successfully amplified in all samples and were polymorphic in all populations. A total of 334 alleles were detected. The number of alleles per locus ranged from 31 for BM6444 to 3 for MAF209. Allele richness ranged from 5.179 in Orobica to 8.120 in the Argentata dell’Etna breeds while across loci it ranged from 12.865 in BM6444 to 2.869 in MAF 209 (Table 1). Four out of thirty loci, ILST29, SRCRSP8, SPS113 and P19 (Fig. 1), had high FST (genetic divergence) values (>0.1) compared to the mean observed FST value (0.07) and to the range of “neutral” FST values (0.0 < FST < 0.63) estimated using computer simulations. Either directional selection or genotyping errors can generate outlier effects; therefore, loci showing such behaviour were removed from the following estimation of population genetics parameters. Excluding the four outlier loci (Dfdist Prob < 0.05), Ho ranged from 0.55 to 0.68 (mean Ho = 0.61, standard deviation = 0.028), with the lowest value found in the Orobica and the highest in the Argentata dell’Etna breeds. No significant difference between breeds was detected at the level of heterozygosity (Mann–Whitney U = 44.6, P = 0.23) (Table 2).

102

R. Negrini et al. / Small Ruminant Research 103 (2012) 99–107

Fig. 1. Outlier detection using the Lositan workbench at the 95% threshold. Red = confidence area for candidate loci under positive selection; grey = confidence area for neutral loci; yellow = confidence area for loci potentially under balancing selection. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.)

P19 and BM6444 significantly deviated from the Hardy–Weinberg expectation in all populations, with the exception of Valdostana and Grigia Molisana. The expected frequency of null alleles in BM6444, estimated according to Van Oosterhout et al. (2004), was not negligible (combined probability for all classes of allele size: P ≤ 0.001; estimated null allele frequency: 0.28). All breeds retained a significant positive FIS value, ranging from 0.08 in Argentata dell’Etna to 0.13 in Girgentana as a result of slightly lower than expected heterozygosities. The global FST estimated following Weir and Cockerham was 0.062 (95% confidence interval 0.054–0.070). Twenty-six out of thirty loci displayed private alleles in one or more breeds, usually occurring at a low frequency, with the exception of the alleles 142 at locus MAF70 and 196 at locus P19 (already identified as outliers) that were found with high frequency in Orobica (0.19 and 0.47, respectively). The allele 196 at locus P19 was also found in a single individual from Camosciata delle Alpi.

for the whole dataset ranged from 0.20 to 0.40, whereas those calculated excluding the four outliers ranged from 0.10 to 0.32 (Grigia Molisana/Argentata dell’Etna and Orobica/Girgentana, respectively, in both matrices). All pair-wise comparisons were significant after Bonferroni correction (P ≤ 0.05). Conversely, the outliers affect the relationships between individuals that were estimated using factorial correspondence analysis (Supplementary Material, Fig. S1). The Mantel test between the pair-wise Euclidian distance matrices based on the two principal axes was not significant (Mantel test 0.91; P ≤ 0.08). The first two axes together explained 43.4% of the total variance when the entire dataset was used and 40.6% when the outliers were excluded. In both configurations, Orobica individuals formed a clearly distinct group, whereas the other breeds clustered together and displayed an approximate south to north geographical cline. The southern (Argentata dell’Etna, Girgentana) and Sarda breeds had negative coordinates along the first axis and positive along the second, whereas the north-western alpine breeds (Valdostana, Camosciata delle Alpi) had positive coordinates along the first axis and negative along the second. Grigia Molisana (central Italy) and Bionda dell’Adamello, (central Eastern Alps) had an intermediate position around the origin of the two axes. The exclusion of the outlier loci impacted the coordinates of Camosciata delle Alpi and Valdostana

3.2. Genetic structure and genetic relationships Reynolds’ distances estimated with and without outliers are reported in Table 3. The exclusion of outlier loci decreased the pair-wise distances by about 40%, but did not bias the overall relationships between the breeds (Mantel test 0.94; P ≤ 0.01). Pair-wise distance values calculated

Table 2 Sample size, expected heterozygosity (He), observed heterozygosity (Ho) and mean number of alleles per breed and relative standard deviations. Population

Sample size

He (±S.D.)

ChItCAM ChItVAL ChItBIO ChItORO ChItGMO ChItGIR ChItARG ChItSAR

30 31 31 31 31 30 31 31

0.66 0.65 0.68 0.59 0.72 0.66 0.74 0.70

± ± ± ± ± ± ± ±

(0.03) (0.03) (0.03) (0.04) (0.03) (0.03) (0.02) (0.03)

Ho (±S.D.)

No alleles (±S.D.)

0.59 (±0.02) 0.58 (±0.02) 0.61 (±0.02) 0.53 (±0.02) 0.63 (±0.02) 0.57 (±0.02) 0.68 (±0.02) 0.61 (±0.02)

6.13 (±2.43) 5.97 (±2.30) 6.60 (±2.44) 5.23 (±2.50) 7.47 (±2.74) 5.70 (±1.80) 8.20 (±3.16) 7.37 (±2.40)

R. Negrini et al. / Small Ruminant Research 103 (2012) 99–107

103

Table 3 Reynolds’ distances excluding (above the diagonal) or including (below the diagonal) non-neutral loci.

ChItCAM ChItVAL ChItBIO ChItORO ChItGMO ChItGIR ChItARG ChItSAR

ChItCAM

ChItVAL

ChItBIO

ChItORO

ChItGMO

ChItGIR

ChItARG

ChItSAR

0 0.31 0.23 0.32 0.27 0.33 0.27 0.28

0.20 0 0.25 0.35 0.30 0.36 0.28 0.31

0.12 0.13 0 0.30 0.23 0.32 0.23 0.27

0.19 0.19 0.15 0 0.34 0.40 0.34 0.35

0.16 0.21 0.11 0.21 0 0.29 0.20 0.23

0.22 0.28 0.22 0.32 0.20 0 0.26 0.30

0.15 0.17 0.11 0.23 0.10 0.16 0 0.20

0.16 0.21 0.15 0.21 0.14 0.21 0.11 0

individuals along the second axis, so that their relative positions switched. In our dataset, the STRUCTURE software, used as an unsupervised method, was insensitive to outliers. Independently of the set of markers (either 30 or 26 loci), the best description of the genetic structure was

obtained at K = 2 (likelihood Ln values declined starting from K = 3, two tailed t test P < 0.034). In detail, K = 2 indicated a geographic subdivision between the Alpine breeds and the other breeds, with Bionda dell’Adamello and Grigia Molisana showing some degree of admixture;

Fig. 2. Population partitioning suggested by STRUCTURE software. From top to bottom: K = 2, K = 3, K = 4, K = 5, K = 6, K = 8.

104

R. Negrini et al. / Small Ruminant Research 103 (2012) 99–107

4. Discussion 4.1. Outlier loci

Fig. 3. Neighbour-joining trees based on Reynolds’ distances: (a) whole dataset and (b) with outliers removed. Data was constructed using Mesquite program and bootstrapped with 1000 replicates (bootstrap values shown on nodes).

K = 3 evidenced Orobica as a single cluster; K = 4 split Valdostana. At K = 8, the second highest likelihood value, each breed was allocated to a single cluster (Fig. 2). The outliers heavily affected the NJ tree topology (Fig. 3), with the “bipartition distances” between the two trees being equal to 8. Using all 30 loci, it was found that the Sarda breed lies within the cluster of central/southern Italian breeds, Camosciata delle Alpi and Orobica did not cluster with the other northern breeds and the Orobica breed appears as the most distinct breed. Following the removal of outliers, two clusters were formed. One grouped central/southern Italian breeds and the second grouped northern alpine breeds. Sarda remained excluded from the two clusters, replacing Orobica in the most outlier position. Tree topologies showed an average bootstrap value of 59% across the six nodes including the outliers, and an average bootstrap value of 57.2% removed them. A strong effect of outlier loci was also observed when computing the putative genetic discontinuities using Monmonier’s algorithm combining Reynolds’ genetic distances with geographical distances. The complete dataset (30 microsatellites) supported the existence of two significant barriers: the first isolates Orobica from the geographically close Alpine populations and the second separates Girgentana from the southern and Sarda breeds. After removing the outliers, a single and different barrier was found to be significant. It separates the Sarda breed from the continental populations (Supplementary Material, Fig. S2).

In livestock, divergent selection towards productive traits is a significant force inducing the genetic differentiation of populations via two main mechanisms: (i) by acting on specific loci and those physically linked to them (Barton, 2000), and (ii) by enhancing breed reproductive isolation that causes barriers to gene flow and boosts the stochastic effects of genetic drift (Rundle and Nosil, 2005; Nosil et al., 2009). Therefore, individuals from different populations living in different environments or managed differently can show genome-wide variations due to genetic drift and variations in a few key sites in their genome that respond to selection or adaptation to different local conditions. These latter sites can be identified by tracing markers that show outlier behaviour. Therefore, outlier markers may mark genomic regions, influencing economic or adaptive traits of particular interest for further characterisation. But detecting outliers is also important for excluding them from datasets used in molecular phylogenetic studies, since the latter require the use of inherited neutral loci, according to Mendelian laws, in Hardy–Weinberg proportions. A number of examples exist in which the deleterious effect of even a few outlier loci has biased estimates of population genetic parameters, leading to inaccurate evolutionary histories and phylogeographical reconstructions (Allendorf and Seeb, 2000; Luikart et al., 2003; Bonin et al., 2006; Allendorf and Luikart, 2007). A number of methods and software programs exist for outlier detection (e.g. Beaumont and Balding, 2004; Beaumont, 2005; Vasemagi et al., 2005; Hedrick, 2006; Bonin et al., 2007; Foll and Gaggiotti, 2008; Riebler et al., 2008). In this paper we used the method developed by Beaumont and Nichols (1996), based on a concept introduced about 40 years ago by Lewontin and Krakauer (1973). The rationale is that loci influenced by directional selection or subject to balancing selection have FST coefficients that are statistically different from those expected under the neutral theory. Applying this method, we identified four microsatellites (ILST29, SRCRSP8, SP113 and P19) that showed genetic differentiation across the breeds that was significantly larger than that of the neutral loci. None of the outliers revealed the presence of a null allele and their departure from neutrality can most likely be attributed to hitchhiking effects with nearby genes subjected to positive directional selection. Indeed, the markers SP113 and P19 have already been associated with nematode resistance (Obexer-Ruff et al., 2003). Particular attention should therefore be devoted to the use of these markers in goat population genetics investigations, including or excluding them depending on the research goal.

4.2. Effect of outliers in breed description In order to investigate the effect of the four outlier loci on the genetic structure and relationships between the breeds, we either included or excluded them in FCA, NJ

R. Negrini et al. / Small Ruminant Research 103 (2012) 99–107

tree construction, Bayesian clustering analysis and genetic barrier computations. The relative position of the individuals and the variance explained by the first two axes of the multivariate FCA plot were affected by the outliers (Supplementary Material, Fig. S1). In both graphs, Orobica individuals formed a separate cluster, confirming the distinctiveness of this breed as already observed by Crepaldi et al. (2001) using AFLP markers and by Pariset et al. (2006) using a panel of SNPs, whereas the relative positions of Camosciata and Valdostana were marker-set dependent. The other breeds were grouped within a single dispersed cluster in which a rough south to north geographical cline was detectable. The Bayesian clustering method was insensitive to the outliers and only the bar plot resulting from the analysis of all markers is shown (Fig. 2). The first partition segregated the Alpine from the Mediterranean breeds, reinforcing at a finer geographic scale the strong genetic differentiation between northern and southern goat genomes already ˜ observed in a larger dataset (Canón et al., 2006; Pariset et al., 2006). Orobica clustered as a single breed at K = 3, while at the modal value for the distribution of K (K = 4) Valdostana was also separated. The Alpine breeds and Girgentana were differentiated at K = 5, suggesting the existence of a stronger genetic structure in northern breeds compared to the rest of Italy, probably as a result of stronger demographic effects (e.g. bottlenecks) in the Alps compared to elsewhere. At K = 8 all breeds clustered separately. The clustering of Orobica was consistently independent of the presence of outliers, indicating that the originality of the breeds was pervasive in its genome and not solely linked to private alleles. Such a widespread effect points to a different origin rather than divergent selection or occasional introgression from wild or feral goats. Conversely, the presence of outliers severely affected the number and position of putative genetic barriers separating the breeds analysed. Considering the whole dataset, two significant genetic discontinuities were identified: one separating the Orobica from the neighbouring Alpine breeds and the second separating Girgentana from Mediterranean populations. On removing the outliers, a single geographical barrier separating the Sarda breed, which is native to Sardinia, was found to be significant. Therefore, the outliers seemed to blur the real geographical pattern separating the Sarda breeds from the continent (Sechi et al., 2007) by stressing the effect of demography (Orobica and Girgentana showed results positive for recent bottlenecks) or environmental adaptation (loci SP113 and P19) present in Orobica were found linked to nematode resistance (Obexer-Ruff et al., 2003), or by suggesting an original population history (Orobica showed a surprisingly original genetic makeup). The neighbour joining topology including the outliers confirmed the originality of Orobica and, in general, reflected the population dynamics. Girgentana showed a long-branch effect that was probably due to reduced genetic diversity, whereas Sarda was grouped with the southern breeds. Removal of the outliers reshaped the branch topology; however, in accordance with the scenario revealed by Barrier program taking the outliers out, two major clusters

105

linked to geography were well supported by bootstrapping and Sarda was clearly separated.

4.3. Unbiased description of genetic parameters Only genome-wide effects carry reliable information about the demography and genetic history of populations; thus, following Allendorf and Luikart (2007), the outlier loci were removed from the dataset prior to the estimation of demographic parameters and statistics. Allele richness was estimated based on a minimum sample size of 29 diploid individuals. The lowest and highest values were found in Orobica and Girgentana and in Argentata dell’Etna and Grigia Molisana, respectively. The low population size of Girgentana and its phenotypic homogeneity (Marletta et al., 2004) might be effects of the reduced genetic variability that was observed, while, as expected, Grigia Molisana, a primitive breed only recently subjected to a breeding program, showed a high level of genetic variation. With the exception of Girgentana, a fair decline in allelic richness in the northern breeds with respect to the southern breeds was observed, marking different levels of gene flow or differences in breed management and intensity of selection. Conversely, the expected and observed heterozygosities did not differ among the breeds. Bottlenecks are typically characterised by severe drops in allelic richness and a concomitant and slight diminution of heterozygosity, especially if population size rebounds rapidly (Nei et al., 1975; Ellstrand and Elam, 1993). Indeed, our results indicate that some of the analysed breeds might have experienced a severe contraction in numbers followed by a sudden recovery in the recent past. Two out of eight breeds, namely Orobica and Girgentana, exhibited significant heterozygosity losses (Wilcoxon’s one tailed test, P < 0.001) and therefore, according to Luikart and Cornuet (1998), evidence of a recent bottleneck. This observation was also supported by the large number of rare alleles and the general excess of homozygotes for most allele size classes found in BM6444. Having excluded large allelic dropout and scoring errors due to stuttering, these effects can be attributed to past demographic events such as a sudden population expansion after a bottleneck. For the conservation of genetic resources, identifying populations that have experienced a size reduction in the past is crucial because bottlenecks can increase demographic stochasticity, rates of inbreeding, loss of genetic variation, and fixation of deleterious alleles, thereby reducing the evolutionary potential and increasing the probability of population extinction (Lande, 1988, 1994; Mills and Smouse, 1994; Nei, 2005). Identifying recently bottlenecked populations is even more important, because such populations may not have recovered their former population size yet and, therefore, may have a higher risk of extinction. However, the more recent a bottleneck, the greater the probability that its deleterious effects can be mitigated by appropriate management procedures able to minimize loss of heterozygosity and quantitative genetic variation.

106

R. Negrini et al. / Small Ruminant Research 103 (2012) 99–107

The survey of 26 neutral microsatellite markers in eight Italian goat breeds revealed that 94% of the variation is retained within rather than among breeds. This value is ˜ only moderately lower than that found in goats by Canón et al. (2006) in a larger dataset comprising breeds of several European and Middle-Eastern countries, and is in agreement with the GST value (0.073) previously observed in southern Italian breeds (Iamartino et al., 2005). The weak genetic differentiation among goat populations has been already observed by Luikart et al. (2001), looking at the worldwide dispersal of the predominant mtDNA haplogroup A that was documented for African and Asian goats (Meng-Hua et al., 2002). Conversely, a stronger genetic differentiation (GST = 0.17) and a reduced gene flow ascribed to the landscape orography has been reported in Swiss breeds (Saitbekova et al., 1999; Glowatzki-Mullis et al., 2008). 5. Conclusions In conclusion, our data give new insights into the genetic structure of several native goat breeds across Italy and confirm the importance of testing for the occurrence of non-neutral loci because, although fairly rare within datasets, their impact on population genetics parameters can be significant. Ignoring the effect of natural selection can lead to wrong estimates of the demographic history and relationships between breeds. Therefore, separating the effect of neutral drift and adaptive genetic differentiation is a necessary preliminary step in most analyses. In our dataset, four outlier loci affected the genetic distances between individuals as well as the tree topologies and demonstrated the ability to blur the effect of geography, hiding the natural genetic barrier that separates the Sarda breed from the continental populations, while stressing the probable effect of selection and highlighting the peculiar demographic history of Orobica and Girgentana. The Orobica breed was found to be highly original with respect to the national genetic background and the Girgentana breed was found to have recently recovered after a severe bottleneck. The unbiased understanding of the structure of genetic variation at both population and individual levels is crucial, not only for improving the knowledge on genetic diversity of farm animals, but also for properly managing threatened and endangered populations. Acknowledgments The authors acknowledge ASSONAPA (Italian Sheep and Goats Breeder Association) for assisting with the biological sampling. This work was funded by the EU Econogene project (contract number QLK5–CT2001–02461). Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.smallrumres.2011.08.006.

References Allendorf, F.W., Seeb, L.W., 2000. Concordance of genetic divergence among sockeye salmon populations at allozyme, nuclear DNA, and mitochondrial DNA markers. Evolution 54, 640–651. Allendorf, F.W., Luikart, G., 2007. Conservation and the Genetics of Populations. Blackwell Publishing, Oxford, UK. Antao, T., Lopes, A., Lopes, R.J., Beja-Pereira, A., Luikart, G., 2008. LOSITAN: a workbench to detect molecular adaptation based on a FST -outlier method. BMC Bioinformatics 28, 323. Barton, N.H., 2000. Genetic hitchhiking. Philos. Trans. Roy. Soc. Lond. B: Biol. Sci. 355, 1553–1562. Beaumont, M.A., Nichols, R.A., 1996. Evaluating loci for use in the genetic analysis of population structure. Proc. Roy. Soc. Lond. B: Biol. Character. Royal Society (Great Britain) 263, 1619–1626. Beaumont, M.A., Balding, D.J., 2004. Identifying adaptive genetic divergence among populations from genome scans. Mol. Ecol. 13, 969–980. Beaumont, M.A., 2005. Adaptation and speciation: what can FST tell us? Trends Ecol. Evol. 20, 435–440. Belkhir, K., Borsa, P., Chikhi, L., Raufaste, N., Bonhomme, F., 1996–2004. GENETIX logiciel sous Windows TM pour la génétique des populations. Laboratoire Gènome, Populations, Interactions, CNRS, Université de Montpellier II, Montpellier, France. Bonin, A., Taberlet, P., Miaud, C., Pompanon, F., 2006. Explorative genome scan to detect candidate loci for adaptation along a gradient of altitude in the common frog (Rana temporaria). Mol. Biol. Evol. 23, 773–783. Bonin, A., Nicole, F., Pompanon, F., Miaud, C., Taberlet, P., 2007. Population adaptive index: a new method to help measure intraspecific genetic diversity and prioritize populations for conservation. Conserv. Biol. 21, 697–708. ˜ Canón, J., García, D., García-Atance, M.A., Obexer-Ruff, G., Lenstra, J.A., Ajmone-Marsan, P., Dunner, S., ECONOGENE Consortium, 2006. Geographical partitioning of goat diversity in Europe and the Middle East. Anim. Genet. 37, 327–334. Cornuet, J.M., Luikart, G., 1997. Description and power analysis of two tests for detecting recent population bottlenecks from allele frequency data. Genetics 144, 2001–2014. Crepaldi, P., Negrini, R., Milanesi, E., Gorni, C., Cicogna, M., AjmoneMarsan, P., 2001. Diversity in five goat populations of the Lombardy Alps: comparison of estimates obtained from morphometric traits and molecular markers. J. Anim. Breed. Genet. 118, 173–180. Ellstrand, N.C., Elam, D.R., 1993. Population genetic consequences of small population size: implications for plant conservation. Annu. Rev. Ecol. Syst. 24, 217–242. Evanno, G., Regnaut, S., Goudet, J., 2005. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol. Ecol. 14, 2611–2620. Falush, D., Stephens, M., Pritchard, J.K., 2003. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164, 1567–1587. FAO (2009). http://dad.fao.org/ (accessed on 08.03.2010). 6th Session of the Intergovernmental Technical Working Group on Animal Genetic Resources for Food an Agriculture (2010). Draft guidelines on molecular fenetic characterization. http://www.fao.org/ag/ againfo/programmes/en/genetics/angrvent-docs.html. Foll, M., Gaggiotti, O., 2008. A genome scan method to identify selected loci appropriate for both dominant and codominant markers: a Bayesian perspective. Genetics 180, 977–993. Glowatzki-Mullis, M.L., Muntwyler, J., Bäumle, E., Gaillard, C., 2008. Genetic diversity measures of Swiss goat breeds as decision-making support for conservation policy. Small Ruminant Res. 74, 202–211. Goudet, J., 2001. FSTAT, A Program to Estimate and Test Gene Diversities and Fixation Indices (Version 2. 9. 3). Lausanne University, Lausanne, Switzerland. Hedrick, P.W., 2006. Genetic polymorphism in heterogeneous environments: the age of genomics. Annu. Rev. Ecol. Syst. 37, 67–93. Hurlbert, S.H., 1971. The nonconcept of species diversity: a critique and alternative parameters. Ecology 52, 577–586. Iamartino, D., Bruzzone, A., Lanza, A., Blasi, M., Pilla, F., 2005. Genetic diversity of Southern Italian goat populations assessed by microsatellite markers. Small Ruminant Res. 57, 249–255. Jakobsson, M., Rosenberg, N.A., 2007. CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics 23, 1801– 1806. Lande, R., 1988. Genetics and demography in biological conservation. Science 241, 1455–1459. Lande, R., 1994. Risk of population extinction from fixation of new deleterious mutations. Evolution 48, 1460–1469.

R. Negrini et al. / Small Ruminant Research 103 (2012) 99–107 Landry, P.A., Koskinen, M.T., Primmer, C.R., 2002. Deriving evolutionary relationships among populations using microsatellites and (␦-␮)2 : all loci are equal, but some are more equal than others. Genetics 161, 1339–1347. Lewontin, R., Krakauer, J., 1973. Distribution of gene frequency as a test of theory of selective neutrality of polymorphisms. Genetics 74, 175–195. Luikart, G., Cornuet, J.M., 1998. Empirical evaluation of a test for identifying recently bottlenecked populations from allele frequency data. Conserv. Biol. 12, 228–237. Luikart, G., Gielly, L., Excoffier, L., Vigne, J.D., Bouvet, J., Taberlet, P., 2001. Multiple maternal origins and weak phylogeographic structure in domestic goats. PNAS 98, 5927–5932. Luikart, G., England, P.R., Tallmon, D., Jordan, S., Taberlet, P., 2003. The power and promise of population genomics: from genotyping to genome typing. Nat. Rev. Genet. 4, 981–994. Maddison, W.P., Maddison, D.R., 2010. Mesquite: A Modular System for Evolutionary Analysis. Version 2. 73., http://mesquiteproject.org. Manel, S., Schwartz, M., Luikart, G., Taberlet, P., 2003. Landscape genetics: combining landscape ecology and population genetics. Trends Ecol. Evolut. 18, 189–197. Manni, F., Guérard, E., Heyer, E., 2004. Geographic patterns of (genetic, morphologic, linguistic) variation: how barriers can be detected by “Monmonier’s algorithm”. Hum. Biol. 76, 173–190. Mantel, N.A., 1967. The detection of disease clustering and a generalized regression approach. Cancer Res. 27, 209–220. Marletta, D., Bordonaro, S., Guastella, A.M., D’Urso, G., 2004. Genetic polymorphism at CSN1S2 locus in two endangered Sicilian goat breeds. J. Anim. Breed. Genet. 121, 52–56. Meng-Hua, L., Shu-Hong, Z., Ci, B., Hai-Sheng, W., Hong, W., Bong, L., Mei, Y., Bin, F., Shin-Lin, C., Meng-Jin, Z., Shi-Jun, L., Tong-An, X., Kui, L., 2002. Genetic relationship among twelve indigenous Chinese goat populations based on microsatellite analysis. Genet. Select. Evol. 34, 729–744. Mills, S.L., Smouse, P.E., 1994. Demographic consequences of inbreeding in remnant populations. Am. Nat. 144, 412–431. Monmonier, M., 1973. Maximum-difference barriers: an alternative numerical regionalization method. Geogr. Anal. 3, 245–261. Nei, M., Maryama, T., Chakraborty, R., 1975. The bottleneck effect and genetic variability in populations. Evolution 29, 1–10. Nei, M., 2005. Bottlenecks, genetic polymorphism and speciation. Genetics 170, 1–4. Nosil, P., Funk, D.J., Ortiz-Barrientos, D., 2009. Divergent selection and heterogeneous genomic divergence. Mol. Ecol. 18, 375–402.

107

Obexer-Ruff, G., Sattler, U., Martinez, D., Maillard, J.C., Chartier, C., Saitbekova, N., Glowatzki, M.L., Gaillard, C., 2003. Association studies using random and “candidate” microsatellite loci in two infectious goat diseases. Genet Select Evol. 35, S113–S119. Pariset, L., Cappuccio, I., Ajmone-Marsan, P., Dunner, S., Luikart, G., England, P.R., Obexer-Ruff, G., Peter, C., Marletta, D., Pilla, F., Valentini, A., Econogene Consortium, 2006. Assessment of population structure by single nucleotide polymorphisms (SNPs) in goat breeds. J. Chromatogr B: Anal. Technol. Biomed. Life Sci. 833, 117–120. Pritchard, J.K., Wen, W., 2004. Documentation for the STRUCTURE software Version 2 (accessed on 04.02.2011) http://pritch. bsd.uchicago.edu/structure.html. Reynolds, J.B., Weir, B.S., Cockerham, C.C., 1983. Estimation of the coancestry coefficient: basis for a short-term genetic distance. Genetics 105, 767–779. Rice, W.R., 1989. Analyzing tables of statistical tests. Evolution 43, 223–225. Riebler, A., Held, L., Stephan, W., 2008. Bayesian variable selection for detecting adaptive genomic differences among populations. Genetics 178, 1817–1829. Robinson, D.F., Foulds, L.R., 1981. Comparison of phylogenetic trees. Math. Biosci. 53, 131–147. Rundle, H., Nosil, P., 2005. Ecological speciation. Ecol. Lett. 8, 336–3352. Saitbekova, N., Gaillard, C., Obexer-Ruff, G., Dolf, G., 1999. Genetic diversity in Swiss goat breeds based on microsatellite analysis. Anim. Genet. 30, 36–41. Saitou, N., Nei, M., 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425. Sechi, T., Usai, M.G., Miari, S., Mura, L., Casu, S., Carta, A., 2007. Identifying native animals in crossbred populations: the case of the Sardinian goat population. Anim. Genet. 38, 614–620. Storz, J.F., Nachman, M.W., 2003. Natural selection on protein polymorphism in the rodent genus Peromyscus: evidence from interlocus contrasts. Evolution 57, 2628–2635. Van Oosterhout, C., William, F., Hutchinson, D., Wills, P.M., Shipley, P., 2004. Program note micro-checker: software for identifying and correcting genotyping errors in microsatellite data. Mol. Ecol. Notes 4, 535–538. Vasemagi, A., Nilsson, J., Primmer, C.R., 2005. Expressed sequence taglinked microsatellites as a source of gene-associated polymorphisms for detecting signatures of divergent selection in Atlantic salmon (Salmo salar L.). Mol. Biol. Evol. 22, 1067–1076. Weir, B.S., Cockerham, C.C., 1984. Estimating F-statistics for the analysis of population structure. Evolution 38, 1358–1370.