Strong genetic structure revealed by multilocus patterns of variation in Giardia duodenalis isolates of patients from Galicia (NW-Iberian Peninsula)

Strong genetic structure revealed by multilocus patterns of variation in Giardia duodenalis isolates of patients from Galicia (NW-Iberian Peninsula)

Infection, Genetics and Evolution 48 (2017) 131–141 Contents lists available at ScienceDirect Infection, Genetics and Evolution journal homepage: ww...

2MB Sizes 0 Downloads 53 Views

Infection, Genetics and Evolution 48 (2017) 131–141

Contents lists available at ScienceDirect

Infection, Genetics and Evolution journal homepage: www.elsevier.com/locate/meegid

Research paper

Strong genetic structure revealed by multilocus patterns of variation in Giardia duodenalis isolates of patients from Galicia (NW-Iberian Peninsula) Luis B. Gabín-García a,b,⁎, Carolina Bartolomé a,b, José L. Abal-Fabeiro a,b, Santiago Méndez b,c, José Llovo b,c, Xulio Maside a,b a Grupo de Medicina Xenómica, Centro de Investigación en Medicina Molecular e Enfermidades Crónicas da Universidade de Santiago de Compostela (CIMUS), Avda. Barcelona s/n, 15782 Santiago, Galicia, Spain b Xenómica Comparada de Parasitos Humanos, Instituto de Investigacións Sanitarias de Santiago (IDIS), Travesia da Choupana s/n, 15706 Santiago, Galicia, Spain c Servizo de Microbioloxía e Parasitoloxía, Complexo Hospitalario Universitario de Santiago, Travesia da Choupana s/n, 15706 Santiago, Galicia, Spain

a r t i c l e

i n f o

Article history: Received 20 September 2016 Received in revised form 28 November 2016 Accepted 13 December 2016 Available online 16 December 2016 Keywords: Giardia duodenalis Population structure PCR cloning Genetic diversity Recombination

a b s t r a c t We report a survey of genetic variation at three coding loci in Giardia duodenalis of assemblages A and B obtained from stool samples of patients from Santiago de Compostela (Galicia, NW-Iberian Peninsula). The mean pooled synonymous diversity for assemblage A was nearly five times lower than for assemblage B (0.77% ± 0.30% and 4.14% ± 1.65%, respectively). Synonymous variation in both assemblages was in mutation-drift equilibrium and an excess of low-frequency nonsynonymous variants suggested the action of purifying selection at the three loci. Differences between isolates contributed to 40% and 60% of total genetic variance in assemblages A and B, respectively, which revealed a significant genetic structure. These results, together with the lack of evidence for recombination, support that (i) Giardia assemblages A and B are in demographic equilibrium and behave as two genetically isolated populations, (ii) infections are initiated by a reduced number of individuals, which may be genetically diverse and even belong to different assemblages, and (iii) parasites reproduce clonally within the host. However, the observation of invariant loci in some isolates means that mechanisms for the homogenization of the genetic content of the two diploid nuclei in each individual must exist. © 2016 Elsevier B.V. All rights reserved.

1. Introduction Giardia duodenalis (syn. G. intestinalis, G. lamblia; Eukaryota, Excavata, Diplomonadida) is a binucleated protozoan that infects mammals causing giardiasis (Adam, 2001), one of the most common intestinal parasitic diseases in humans in both developed and developing countries (Feng and Xiao, 2011). To date, eight assemblages (A–H) and numerous sub-assemblages (e.g. AI, AII, BIII, BIV) have been described based on the analysis of genetic variation (Andrews et al., 1989; Ey et al., 1997; Homan et al., 1992; Lasek-Nesselquist et al., 2010; Mayrhofer et al., 1995; Monis et al., 1999, 2003; Monis et al., 1998; Nash and Keister, 1985; Nash et al., 1985). Only assemblages A and B infect humans, but they also infect other mammals, having zoonotic potential (Ryan and Caccio, 2013). The taxonomical status of ⁎ Corresponding author at: Centro de Investigación en Medicina Molecular e Enfermidades Crónicas da Universidade de Santiago de Compostela (CIMUS), Avda. Barcelona s/n, 15782 Santiago, Galicia, Spain. E-mail address: [email protected] (L.B. Gabín-García).

http://dx.doi.org/10.1016/j.meegid.2016.12.014 1567-1348/© 2016 Elsevier B.V. All rights reserved.

assemblages is still a matter of debate, as they have been proposed to correspond to different species (Monis et al., 2009; Ryan and Caccio, 2013; Xu et al., 2012), but also to sub-species entities such as “nearclades”, owing to their observed genetic integrity over space and time (Tibayrenc and Ayala, 2012, 2014). The efficient colonization of the upper intestinal tract by trophozoites depends on both the strain of the infecting pathogen and the host (Nash et al., 1987), which means that our understanding of the epidemiology and pathogenic properties of Giardia strongly relies on a detailed knowledge of the patterns of genetic diversity and population structure of the parasite. In recent years, an increasing effort has been devoted to describe the patterns of diversity in Giardia populations with a special focus on human-derived samples, usually by direct sequencing of PCR products. This wealth of data uncovered a very complex scenario: (i) nucleotide variation exists both within assemblages and within isolates –revealed by the presence of double peaks in the electropherograms– (Ankarklev et al., 2012; Caccio et al., 2008; Choy et al., 2015; de Lucio et al., 2015; Durigan et al., 2014; Huey et al., 2013; Lalle et al., 2005; Lebbad et al., 2008; Lebbad et al., 2011; Minetti et al., 2015; Robertson

132

L.B. Gabín-García et al. / Infection, Genetics and Evolution 48 (2017) 131–141

et al., 2007), (ii) mixed-assemblage infections have been described in humans and other hosts (Almeida et al., 2010; Sprong et al., 2009) and (iii) multilocus analyses have shown inconsistent assemblage assignment and incongruent phylogenies within assemblages (Sprong et al., 2009). All these results have put into question even the classical limits of the assemblage-host associations (Durigan et al., 2014; Feng and Xiao, 2011; Sprong et al., 2009). But a detailed quantitative description of the patterns of genetic diversity of Giardia populations has been hampered by the limited resolution of the direct sequencing methodology. In fact, only in a handful of studies PCR products were cloned prior to sequencing (Hussein et al., 2009; Kosuwin et al., 2010; Lasek-Nesselquist et al., 2009; Siripattanapipong et al., 2011; Teodorovic et al., 2007). The data thus obtained allowed an incipient description of the patterns of diversity in Giardia samples and raised new questions that are central to the understanding of giardiasis at the biological, clinical and epidemiological levels. For example, the low heterozygosity usually observed seemingly contradicts the high allelic divergence expected for long-term asexual polyploids (Birky, 2010; Mark Welch and Meselson, 2000; Mark Welch et al., 2004). This is particularly so in Giardia where differences should accumulate between alleles within the same nucleus but also between the two sister diploid nuclei, which segregate together during the entire life cycle of the parasite (Carpenter et al., 2012; Jirakova et al., 2012; Sagolla et al., 2006; Yu et al., 2002). These observations suggest a role in genome homogenization for mitotic recombination, gene conversion, other mechanisms of asexual genetic exchange between nuclei (Carpenter et al., 2012; Poxleitner et al., 2008), but also for meiotic recombination (Cooper et al., 2007; Ramesh et al., 2005). A detailed description of the genetic structure of the parasite populations (i.e. how much of the extant variation corresponds to differences between isolates, between individuals and within individuals) would contribute to delimit the role of sex and to understand the demographic history of the parasite. Additional population genetics data are needed to boost our still limited understanding of these fundamental aspects of the biology and population dynamics of this organism. Here, we report a multilocus population genetics survey of Giardia isolated from human patients of Galicia (NW-Spain), by PCR, cloning and sequencing three single copy loci. 2. Materials and methods 2.1. Giardia duodenalis isolates The G. duodenalis isolates used in this study were selected from a collection of isolates from the Complexo Hospitalario Universitario de Santiago (CHUS, Santiago de Compostela, Spain) during years 2000–2010. Symptomatic patients were diagnosed with giardiasis by examination of fresh faecal samples by microscopy. Total genomic DNA was extracted from stool samples using the QIAamp DNA Stool Mini kit (QIAgen). 2.2. Gene selection and primers design Primers were designed to amplify coding regions from four single copy loci: the widely used triose phosphate isomerase (tpi, Genbank: GL50803_93938), glutamate dehydrogenase (gdh, GL50803_21942), and beta-giardin (bg, GL50803_4812), as well as caltractin (calt, GL50803_104685), a marker for which only the genomic sequences of assemblages A (isolates WB and DH2), E (isolate P15) and B (isolate GS) were available at the time (Adam et al., 2013; Franzen et al., 2009; Jerlstrom-Hultqvist et al., 2010; Morrison et al., 2007) (Supplementary Table 1). The first marker was used for assemblage typing of isolates prior to sequencing due to its high divergence between assemblages, while the other three were used in the sequence analysis. Sequences from the reference genome assemblages A and B from GiardiaDB: http://giardiadb.org/giardiadb (Aurrecoechea et al., 2009) and additional sequences from GenBank were aligned to select conserved and

divergent regions between assemblages to design conserved and assemblage specific primers at the target loci. The Primer-BLAST tool (http://www.ncbi.nlm.nih.gov/tools/primer-blast/) was used to design the primers in the selected regions. 2.3. Preliminary genotyping DNA isolates were initially typed for assemblage A and/or B by nested-PCR with assemblage-specific primers for the tpi locus (Supplementary Table 1). The external PCR was carried out in a volume of 10 μL containing 1–2 μL of extracted DNA, 1 μL of 10x DreamTaq Buffer, 200 μM of each dNTP, 0.6 μM of each primer (Tpi F/A_tpiR, and Tpi F/ B_tpiR, for assemblages A and B, respectively) and 0.25 U of DreamTaq DNA Polymerase (Thermo Scientific). The internal PCR, with primers A_tpiF2/A_tpiR2, and B_tpiF2/B_tpiR2 for assemblages A and B, respectively, was carried out in the same conditions using 1 μL of PCR product from de external PCR as template. Cycling conditions were as follows: initial denaturation 2 min at 95 °C; 35 cycles for the first and 25 for the second PCR of 30 s at 95 °C, 30 s at 55 °C (Supplementary Table 1) and 1 min at 72 °C; and final extension of 5 min at 72 °C. 2.4. Amplification, cloning and sequencing Initially, amplifications of gdh, bg and calt were performed using primers designed in conserved regions of assemblages A and B. If the PCRs did not yield the expected products, assemblage specific primers were used (details of the primer pairs used to amplify each isolate and amplicon lengths are given in Supplementary Table 2). Primer sequences and annealing temperatures are shown in Supplementary Table 1. Eight extra sequences of assemblage A were obtained using assemblage B specific primers in gdh in isolates 147 and 209. After that, a restriction enzyme analysis with AluI and BglI was performed to further confirm the assemblage assignation prior to sequencing. AluI was used in gdh and calt, while BglI was used in calt. Briefly, AluI cuts gdh amplicons at four positions in assemblage B (450, 634, 741 and 750, with respect to the reference sequence) and at three in assemblage A (450, 741 and 750). AluI also cuts calt amplicons at one position in B (165) and at three in A (165, 249, 405); BglI cuts bg amplicons only at position 431 in assemblage A and does not cut assemblage B amplicons. In order to minimize PCR errors, a high fidelity Taq DNA polymerase was used (FastStart High Fidelity PCR System; Roche Diagnostics). The external PCR was carried out in a volume of 10 μL containing 1–2 μL of extracted DNA, 1 μL of 10x FastStart High Fidelity Reaction Buffer, 0.4 μL of DMSO, 200 μM of each dNTP, 0.4 μM of each primer and 1 U of the FastStart High Fidelity Enzyme Blend. The internal PCR was carried out in a volume of 20 μL maintaining final concentrations but using 1 μL of 1:20 diluted PCR product from the external PCR or 1 μL of undiluted PCR product when the amplification was not achieved using the 1:20 dilution. Cycling conditions were as follows: initial denaturation 2 min at 95 °C; 25–35 cycles of 30 s at 95 °C, 30 s at 50–60 °C (according to the annealing temperature of the primers used) and 1 min at 72 °C; and final extension of 5 min at 72 °C. PCR products were visualized by means of agarose gel electrophoresis. Fragments of the expected size were gel-extracted with QIAquick Gel Extraction Kit (QIAgen). PCR products were cloned using either TOPO TA Cloning Kit for Sequencing (Invitrogen), TA Cloning Kit (Invitrogen) or CloneJET PCR Cloning Kit (Thermo Scientific), and transfected into One Shot TOP10 Escherichia coli competent cells (Invitrogen). Eight positive clones from each PCR reaction were selected and inserts amplified with FastStart High Fidelity PCR System (Roche Diagnostics) using universal M13 or pJET1.2 primers. PCR products were purified and sequenced on both strands at the Universidade de Santiago de Compostela (USC) sequencing services on an ABI3730xI automatic sequencing machine using Big Dye Terminatorv3.1 (Applied Biosystems). By sequencing eight or more clones per isolate the probability of missing alleles segregating at frequency ≥ 0.25 in the pool is

L.B. Gabín-García et al. / Infection, Genetics and Evolution 48 (2017) 131–141

≤ 0.05. Sequences were trimmed to exclude primer sequences and corrected for accurate base calling by comparison of both strands using CodonCode Aligner (CodonCode Corporation). Sequence alignments were performed with ClustalW (Thompson et al., 1994) and manually edited with BioEdit (Hall, 1999). Sequences obtained were submitted to the European Nucleotide Archive (ENA) with the exception of two of them, which contained unexpected stop codons (accession numbers LT603736-LT604071; accession numbers per locus and isolate are specified in Supplementary Table 3). 2.5. Molecular evolutionary analyses DNA polymorphism and divergence analyses were performed with DnaSP v5.10.01 (Librado and Rozas, 2009) and MEGA version 5 (Tamura et al., 2011). Nucleotide positions with gaps in the alignments were excluded from the analysis. Haplotype diversity (Hd) was calculated as the inverse of the probability that any two sequences selected from an alignment were identical (Nei, 1987). Nucleotide diversity was estimated at synonymous and nonsynonymous sites by means of the π (Nei, 1987) and θW (Watterson, 1975) statistics, applying the Jukes-Cantor (JC) correction (Jukes and Cantor, 1969; Lynch and Crease, 1990). The former represents the average proportion of nucleotide differences between two sequences and the latter measures the levels of variability taking into account the total number of segregating sites independently of their frequency. The Tajima's D statistic was used to test the neutrality of mutations by comparing the values π and θW (Tajima, 1989). Under the neutral equilibrium hypothesis, both parameters should be equal and D close to zero. A negative value indicates an excess of low frequency variants, as compared with neutral expectations, and a positive one indicates an excess of intermediate frequency variants. Nucleotide divergence between assemblages was estimated at synonymous (KS) and nonsynonymous (KA) sites as the average number of nucleotide substitutions per site, using the Nei-Gojobori method (Nei and Gojobori, 1986) and the JC correction (Jukes and Cantor, 1969). Synonymous sites were used as a proxy for neutral positions in the Giardia genome. Phylogenetic analyses were conducted using MEGA version 5 (Tamura et al., 2011). The evolutionary relationships among alleles were inferred using the Neighbor-Joining (NJ) method (Saitou and Nei, 1987), and the distances among them were computed using the JC method (Jukes and Cantor, 1969). Independent phylogenetic trees were constructed for each locus as allele phasing was unknown. Bootstrap values (Felsenstein, 1985) were calculated using 500 replicates. Sequences of Giardia ardeae (AF069060.2), Giardia muris (EF455599.1) and Spironucleus salmonicida (KI546059.1) were used as outgroups for the gdh, bg and calt loci, respectively. Sequences previously used to describe intra-assemblage variation in Wielinga and Thompson (2007) and Lalle et al. (2005) were included in gdh and bg phylogenies. Sequences for the gdh and bg genes obtained from isolates of human origin were downloaded from GenBank and those with degenerate bases were discarded. Overlapping regions were used to construct four alignments with the maximum number of sequences including those from this work: gdhA (137 sequences, nucleotide positions: 457–634), gdhB (264 sequences, nucleotide positions: 459–641), bgA (105 sequences, nucleotide positions: 217–539) and bgB (174 sequences, nucleotide positions: 156–539). Haplotype networks were constructed using the Median-Joining method (Bandelt, Forster et al. 1999) with the aid of the NETWORK v5.0.0.0 program (fluxusengineering.com), using default parameters. The Connection Host criterion was used for distance calculation. The Reduced Median algorithm (with r = 2) was applied to obtained a simplified network containing all shortest trees. All networks were redrawn manually. The scaled population recombination rate ρ was estimated by applying the composite-likelihood method (Hudson, 2001) using a finite site model as implemented in LDhat version 2.2 package (McVean et al., 2002). Given that the most common cellular form –the trophozoite–

133

harbours two diploid nuclei, it was assumed that most sampled individuals are tetraploids, so that ρ equals two times 4Ne multiplied by the recombination rate (8Ner). This program performs a likelihood permutation-based test for the null hypothesis of no recombination: ρ = 0 (Lkmax), and other permutation tests that explore the decay of linkage disequilibrium with distance –corr(r2, d) and corr(|D'|,d)– and the sum of distances between all pairs of sites that have all four possible haplotypes (G4). The program RecMin (Myers and Griffiths, 2003) was used to calculate lower bounds on the number of recombination events as a function of the difference between the number of types in the sample and the number of segregating sites (Rh) under the assumption that each segregating site has mutated only once since the most recent common ancestor. An analysis of molecular variance (AMOVA) was performed with Arlequin 3.5 (Excoffier and Lischer, 2010). This method combines information on the variance of haplotype frequencies and the genetic distance between haplotypes for the estimation of the covariance components corresponding to three levels of genetic structure (i.e. within-isolates, among isolates within groups, and among groups of isolates). The significance was checked by non-parametric permutations, as implemented in Arlequin 3.5. 3. Results 3.1. Assemblage genotyping, PCR amplification, cloning and sequencing After assemblage genotyping of 119 isolates, eight assemblage A isolates were detected (122, 147, 152, 209, 251, 263, 321, 839) with assemblage-specific primers for the tpi locus, two of which were also positive for assemblage B (122, 321). Given the reduced number of assemblage A positive isolates, they were all included in the analysis along with four additional isolates (407, 704, 1221, 1343) randomly selected among the assemblage B positives. PCR amplification of gdh, bg, and calt with degenerate primers produced some unexpected results: (i) sequences from organisms other than Giardia, mainly Bifidobacterium, were obtained in some isolates when using primers for the gdh locus, (ii) assemblage A was preferentially amplified in the mixed isolates, and (iii) two chimeric assemblage A/B sequences for calt were obtained in isolate 321, which had been typed as a mixed A/B infection. Considering that the putative parental sequences were also detected in this PCR reaction and that chimeras can be generated during PCR amplification of mixed templates (Kanagawa, 2003; Speksnijder et al., 2001), these chimeric sequences were removed from further analyses (see the Discussion). In an effort to obtain a minimum of eight sequences per locus per isolate, different combinations of new primers were used, as detailed in the Materials and methods section. Finally, a total of 339 sequences were obtained from the 12 isolates: 125 for gdh, 108 for bg and 106 for calt. No assemblage B sequences were obtained for bg and gdh from isolate 122, which was typed as a mixed infection. 3.2. Nucleotide diversity Estimates of the pairwise diversity at synonymous sites (pooled πS; estimated by pooling the sequences from all the isolates) varied substantially across assemblages and loci (Table 1). Assemblage B sequences displayed higher diversity at the three loci, although differences between assemblages were only statistically significant at gdh. This locus was the least variable for assemblage A and the most variable for assemblage B (pooled πS = 0.25% ± 0.22% and 7.36% ± 1.93%, respectively; average ± SE; Table 1); bg reproduced the opposite pattern, although no significant difference between the pooled diversities for both assemblages was detected (1.29% ± 0.75% vs 1.88% ± 0.98%, for A and B, respectively). Overall, the mean pooled diversity across the three loci was nearly five times lower in assemblage A than in assemblage B, although the difference was not statistically significant due to the large variance across loci in assemblage B samples (0.77% ± 0.30% and 4.14% ± 1.65%, respectively).

134

L.B. Gabín-García et al. / Infection, Genetics and Evolution 48 (2017) 131–141

Table 1 Genetic diversity. Synonymous sites

Non-synonymous sites

Assemblage

Locus

Isolate

N

H

Hd

LS

SS

πS

SE

θS

A

gdh

122 147 152 209 251 263 321 839 Pooled Average SE 122 147 152 209 251 263 321 839 ooled Average SE 122 147 152 209 251 263 321 839 Pooled Average SE 122 321 407 704 1221 1343 Pooled Average SE 122 321 407 704 1221 1343 Pooled Average SE 122 321 407 704 1221 1343 Pooled Average SE

8 16 8 16 8 8 8 8 80

1 5 2 2 2 4 1 2 12

81,8 81,8 81,7 81,9 81,8 81,8 81,8 81,8 81,8

0 0 0 0 0 1 0 0 2

2 1 6 2 4 5 6 4 21

67,4 67,3 67,5 66,6 67,1 67,3 67,5 67,3 67,3

0 0 2 0 2 3 1 3 7

8 8 8 8 8 8 6 8 62

1 1 1 3 2 2 2 2 6

64,8 64,8 64,8 63,6 64,8 64,9 64,8 64,8 63,5

0 0 0 1 1 0 1 1 3

– 13 8 8 8 8 45

– 6 3 2 1 1 13

– 81,5 81,8 81,8 81,8 82,0 81,8

– 1 6 0 0 0 17

– 6 8 6 13 8 41

– 2 4 3 7 6 18

– 67,2 67,4 67,3 67,5 67,4 67,4

– 0 2 1 2 1 6

5 7 8 8 8 8 44

2 4 5 1 4 2 15

64,9 64,8 64,8 64,8 64,8 64,9 64,9

0 2 5 0 5 0 10

0,00 0,00 0,00 0,00 0,00 0,31 0,00 0,00 0,25 0,04 0,04 0,00 0,00 0,75 0,00 1,64 1,13 0,26 1,89 1,29 0,71 0,27 0,00 0,00 0,00 0,40 0,84 0,00 0,52 0,39 0,77 0,27 0,11 – 0,19 3,97 0,00 0,00 0,00 7,36 0,83 0,78 – 0,00 1,29 0,50 1,39 0,37 1,88 0,71 0,27 0,00 1,19 3,34 0,00 3,49 0,00 3,19 1,34 0,68

0,00 0,00 0,00 0,00 0,00 0,32 0,00 0,00 0,22

8 8 8 8 8 8 12 7 67

0,00 0,45 0,43 0,13 0,25 0,64 0,00 0,25 0,50 0,27 0,08 0,25 0,00 0,89 0,25 0,75 0,79 0,68 0,81 0,69 0,55 0,12 0,00 0,00 0,00 0,46 0,54 0,25 0,33 0,25 0,48 0,23 0,08 – 0,64 0,68 0,25 0,00 0,00 0,89 0,31 0,15 – 0,33 0,75 0,60 0,83 0,89 0,89 0,68 0,10 0,40 0,71 0,86 0,00 0,75 0,25 0,91 0,50 0,14

0,00 0,00 0,00 0,00 0,00 0,47 0,00 0,00 0,49 0,06 0,06 0,00 0,00 1,14 0,00 1,15 1,72 0,49 1,82 2,18 0,79 0,27 0,00 0,00 0,00 0,61 0,60 0,00 0,68 0,60 1,01 0,31 0,12 – 0,40 2,83 0,00 0,00 0,00 4,76 0,64 0,55 – 0,00 1,14 0,65 0,96 0,57 2,08 0,66 0,20 0,00 1,26 2,98 0,00 2,97 0,00 3,55 1,20 0,60

bg

calt

B

gdh

bg

calt

0,00 0,00 0,54 0,00 1,18 0,65 0,26 1,12 0,75

0,00 0,00 0,00 0,42 0,84 0,00 0,54 0,41 0,68

– 0,18 1,64 0,00 0,00 0,00 1,93

– 0,00 0,98 0,51 0,97 0,37 0,98

0,00 0,78 1,59 0,00 1,71 0,00 1,31

LA

SA

πA

SE

θA

263,2 263,2 263,3 263,2 263,2 263,2 263,2 263,3 263,2

0 5 1 1 1 4 0 1 12

0,00 0,10 0,16 0,05 0,10 0,19 0,00 0,08 0,06

253,6 253,7 253,5 254,4 253,9 253,8 253,5 253,7 253,8

1 0 3 1 2 2 5 1 15

250,2 250,2 250,2 245,4 250,2 250,1 250,2 250,2 245,5

0 0 0 1 0 1 0 0 2

1,36

– 263,5 263,2 263,2 263,2 263,0 263,2

– 5 1 1 0 0 8

0,41 −0,93 1,21 −1,05 −0,30

– 253,8 253,6 253,7 253,6 253,6 253,6

– 1 1 1 4 5 12

250,1 250,2 250,2 250,2 250,2 250,1 250,2

1 2 1 0 0 1 5

0,00 0,24 0,16 0,05 0,10 0,38 0,00 0,10 0,17 0,13 0,05 0,10 0,00 0,28 0,10 0,10 0,20 0,33 0,11 0,17 0,15 0,04 0,00 0,00 0,00 0,10 0,00 0,10 0,00 0,00 0,03 0,03 0,02 – 0,29 0,22 0,10 0,00 0,00 0,33 0,12 0,06 – 0,13 0,10 0,13 0,24 0,50 0,23 0,22 0,07 0,16 0,23 0,00 0,00 0,00 0,10 0,07 0,08 0,04

0,00 0,57 0,15 0,12 0,15 0,59 0,00 0,15 0,92 0,21 0,08 0,15 0,00 0,46 0,15 0,30 0,30 0,65 0,16 1,24 0,27 0,07 0,00 0,00 0,00 0,16 0,00 0,15 0,00 0,00 0,17 0,04 0,03 – 0,61 0,15 0,15 0,00 0,00 0,70 0,14 0,11 – 0,17 0,15 0,17 0,51 0,76 1,11 0,29 0,12 0,19 0,33 0,15 0,00 0,00 0,15 0,46 0,15 0,05

DS

−1,05

−0,78

−1,31 1,45 −1,45 −1,14 0,05 −1,07

−1,05 1,17 −0,93 −1,05 −0,48

−1,15 1,60

−0,27 0,42 0,59 −0,33

p

0,10 0,00 0,21 0,10 0,10 0,14 0,17 0,11 0,05

0,00 0,00 0,00 0,98 0,00 0,10 0,00 0,00 0,02

– 0,14 0,21 0,10 0,00 0,00 0,19

– 0,12 0,10 0,13 0,12 0,22 0,06

0,16 0,16 0,00 0,00 0,00 0,11 0,04

DA

p

−1,93 0,33 −1,16 −1,05 −1,53



−1,05 −2,25

⁎⁎

−1,05 −1,57 −1,05 −1,31 −1,31 −1,83 −1,01 −2,51

⁎ ⁎⁎⁎

−1,05 −1,05

−1,44

−1,58 1,44 −1,05

−1,42

−0,93 −1,05 −0,93 −1,77 −1,60 −2,43



−0,82 −1,24 −1,05

−1,05 −2,00



N: number of sequences; L: number of sites; S: number of polymorphic sites; πS and πA: pairwise nucleotide diversity at synonymous and non-synonymous sites expressed as percentage, respectively; θS and θA: nucleotide site variability based on the number of synonymous and non-synonymous segregating sites expressed as percentage, respectively; DS and DA: Tajima's D at synonymous and non-synonymous sites, respectively; SE: standard error; p: statistical significance. ⁎ p b 0.05. ⁎⁎ p b 0.01. ⁎⁎⁎ p b 0.001.

The pooled πS estimates at each locus were consistently larger than the corresponding averages across isolates (Table 1). This effect was greatest at gdh, as the diversity estimated from the pooled isolates was over six times larger in both assemblages, which reflected that a

significant fraction of the variation corresponded to differences between rather than within isolates (see below). A third remarkable feature was the large variation in the estimates of synonymous diversity across isolates. Some of them were

L.B. Gabín-García et al. / Infection, Genetics and Evolution 48 (2017) 131–141 Table 2 Gdh haplotypes from G. duodenalis assemblage A.

135

Table 4 Calt haplotypes from G. duodenalis assemblage A.

Sequences from the reference genome of assemblage A (isolate WB) was used as reference for calt (XM_001705435.1). Dots indicate identical sites, and “-” indicate gaps. Sequence from the draft genome of sub-assemblage AII (isolate DH) was also included (AHGT01000042.1). The number of clones from each isolate is shown. Non-synonymous changes are typed in bold. Polymorphic sites indicate the nucleotide positions from the start codon. Sequence from the reference genome of assemblage A (isolate WB) was used as reference (XM_001707205.1). Dots indicate identical sites, and “–” indicate gaps. Shorter sequences contain initial and/or final gaps. Sequence from the draft genome of sub-assemblage AII (isolate DH) was also included (AHGT01000014.1). The number of clones from each isolate is shown. Non-synonymous changes are typed in bold. Polymorphic sites indicate the nucleotide positions from the starting codon.

monomorphic at two or even the three loci (e.g. isolates 122, 147, 209, 704 and 1343, Table 1), but others, particularly 407 and 1221 of assemblage B, displayed very high synonymous diversity at some of the loci.

Table 3 Bg haplotypes from G. duodenalis assemblage A.

The Tajima's D test suggested no significant deviation of the frequency spectra of the synonymous variants with respect to neutral expectations neither within individual isolates nor in the pooled datasets. Nonsynonymous diversity across loci was much lower in both assemblages (mean pooled π A = 0.12% ± 0.05% and 0.21% ± 0.07%, for assemblages A and B, respectively) and did not reproduce the patterns described for silent sites. For example, the pooled and average estimates of π A were very similar. In addition, nearly all nonsynonymous changes were singletons (were found only once in the sample), except for three mutations in gdh nucleotide positions 532 in assemblage A and 461 and 463 in assemblage B (Tables 2 and 5). This meant that θ A was consistently higher than π A at all loci in both assemblages, reflecting a surplus of low-frequency variants relative to mutation-drift equilibrium expectations at these sites. This led to negative DA values for both assemblages at the three loci, which were significant for gdh and bg in assemblage A, and bg and calt in assemblage B (Table 1). In fact, the deviation

Table 5 Gdh haplotypes from G. duodenalis assemblage B.

Sequence from the reference genome of assemblage A (isolate WB) was used as reference for bg (XM_001705373.1). Dots indicate identical sites, and “-” indicate gaps. Shorter sequences contain initial and final gaps. Sequence from the draft genome of sub-assemblage AII (isolate DH) was also included (AHGT01000121.1). The number of clones from each isolate is shown. Non-synonymous changes are typed in bold and one nonsense change is typed in italic bold. Polymorphic sites indicate the nucleotide positions from the start codon.

Sequences from the reference genome of assemblage B (isolate GS) was used as reference (GL50581_4496). Dots indicate identical sites, and “–” indicate gaps. Shorter sequences contain initial and final gaps. The number of clones for each isolate is shown. Non–synonymous changes are typed in bold and one nonsense change is typed in italic/bold. Polymorphic sites indicate the nucleotide positions from the start codon.

136

L.B. Gabín-García et al. / Infection, Genetics and Evolution 48 (2017) 131–141

of the pooled DA across loci was statistically significant for both assemblages (DA = − 3,60; p b 0.001and DA = − 3,42; p b 0,001 for A and B, respectively). An analysis of the covariance components revealed that differences between and within isolates contributed to the total genetic variance in approximately similar proportions (i.e. 46% ± 8.3% and 59% ± 15.1% of the variance across loci corresponded to variation between isolates in assemblage A and in assemblage B, respectively; Supplementary Table 4). Although assemblage B displayed higher levels of variance in all comparisons, bg and calt presented a similar genetic structure. Contrastingly, gdh showed much higher between isolate variance in assemblage B (86.6%) than in assemblage A (45.8%), where within isolate variance was larger.

Table 7 Calt haplotypes from G. duodenalis assemblage B.

3.3. Haplotype diversity The higher nucleotide diversity in assemblage B also meant higher haplotype diversity across loci (Table 1). Haplotype variation among isolates varied widely between assemblages. Assemblage A isolates usually presented one common haplotype at each locus which was shared across isolates (i.e. hA2 at gdh –hA3 and hA4 are probably the same haplotype amplified from an internal pair of specific primers –, hA1 at bg and hA1 at calt; Tables 2, 3 and 4, respectively), and some other closely related haplotypes which were found in a reduced number isolates (i.e. hA1 and hA10 at gdh, hA11 at bg and hA2 at calt; Tables 2, 3 and 4, respectively). The remaining haplotypes carried singletons (i.e. single nucleotide variants which were found only once in the sample; Fig. 1). Contrastingly, assemblage B isolates displayed higher haplotype differentiation and seldom shared haplotypes (Tables 5, 6 and 7). For instance, gdh

Table 6 Bg haplotypes from G. duodenalis assemblage B.

Sequences from de reference genome of assemblage B (isolate GS) was used as reference (GL50581_3112). Dots indicate identical sites, and “–” indicate gaps. Shorter sequences contain initial and final gaps. The number of clones for each sample is shown. Non–synonymous changes are typed in bold. Polymorphic sites indicate the nucleotide positions from the start codon.

haplotypes differed from those from other isolates by a minimum of three changes (Table 5). In order to check whether the Giardia population from Santiago de Compostela (Galicia, NW-Iberian Peninsula) was genetically distinct from those from other world areas, a network analysis was conducted to compare the nucleotide sequences of the haplotypes for gdh and bg with those available in GenBank. The haplotype networks showed that most haplotypes are identical or nearly identical to others found at very distant locations, including Africa, America and East Asia (Supplementary Figs. 1 and 2).

3.4. Divergence Mean synonymous divergence between assemblage A and B was 0.72 ± 0.22, although there was significant variation across loci. Calt presented the largest synonymous differentiation, with K S values greater than unity, which means that some positions might have mutated more than once since the split of the two assemblages (Table 8). Divergence at non-synonymous positions was over an order of magnitude lower, and KA/KS was close to zero at all loci, consistent with a predominant effect of purifying selection in the evolution of these genes.

Table 8 Genetic divergence between assemblage A and B loci. Sequence from the reference genome of assemblage B (isolate GS) was used as reference (GL50581_2741). Dots indicate identical sites, and “–” indicate gaps. Shorter sequences contain initial and final gaps. The number of clones for each sample is shown. Non–synonymous changes are typed in bold. Polymorphic sites indicate the nucleotide positions from the start codon. Haplotypes highlighted with * correspond with sequences of different lengths.

Locus

KS(JC)

KA(JC)

KS/KA

gdh bg calt

0.59 0.42 1.15

0.02 0.00 0.00

0.0380 0.0049 0.0005

KS(JC) and KA(JC): nucleotide divergence at synonymous and nonsynonymous positions corrected by Jukes and Cantor correction.

L.B. Gabín-García et al. / Infection, Genetics and Evolution 48 (2017) 131–141

137

Fig. 1. Phylogenetic trees of haplotypes from three G. duodenalis loci: gdh (A); bg (B) and calt (C). Taxon names identify the haplotype (as in Tables 2–7) and the isolate(s) they were found in. Sequences from complete genomes and reference sequences are also indicated.

3.5. Phylogenetic analysis Assemblage A and B sequences clustered into two well defined separate groups (Fig. 1). Within-assemblage structure was poorly defined, with many subgroups with low bootstrap support. Gdh and calt sequences of assemblage A usually clustered with the sub-assemblage AII and apart from the AI reference sequences. The lack of variation in the sequenced fragment of bg prevented the separation of AI and AII sequences (Fig. 1B). Higher diversity in assemblage B meant that haplotypes clustered in more subgroups and that larger genetic distances separated them. In the gdh phylogeny there were two major clusters which, according to the reference sequences, could be assigned to the classical sub-assemblages BIII and BIV (Fig. 1A). However, this division was only weakly supported by bootstrap and it was not observed at the other loci. Another interesting feature was that the haplotypes of some isolates grouped together (e.g. sequences from isolates 321 or 1343), whereas allelic variants from isolates 407 and 1221 (particularly of bg and calt in 1221; Fig. 1B, C) could be found scattered across different branches of assemblage B.

3.6. Recombination Specific tests for recombination were run with the sequences of each assemblage and in the entire dataset. None of the five different tests used produced significant evidence for recombination neither within

nor between assemblages (Supplementary Table 5). Estimates of the population-scale rate of recombination were very low and frequently zero. The highest value was ρ = 12 for calt in assemblage A, but a permutation test suggested that none of them were significantly different from zero (Supplementary Table 4). The Rh statistic, which estimates the lower bound on the number of recombination events in the sample, was zero for assemblage A and also showed consistently low values for assemblage B (Rh ≤ 2) and the entire dataset (Rh ≤ 6). 4. Discussion Our analysis unveiled substantial levels of within-isolate diversity at the three loci analyzed, which justifies the cloning of the PCR products prior to sequencing instead of the more common practice of direct sequencing of the mixed PCR products. This latter procedure likely leads to the systematic underestimation of the true diversity values, as the signal of the most common allele in the chromatograms likely masks the variants in alleles present at low frequency in the PCR product mixture. G. duodenalis assemblages A and B displayed distinct patterns of neutral diversity. The mean synonymous variation across loci in assemblage A was below 1%; it was structured around one or two frequent haplotypes present in most isolates and other low-frequency haplotypes, which differed from the common ones by a reduced number of singleton nucleotide variants. Contrastingly, the mean pooled synonymous diversity was nearly five times higher for

138

L.B. Gabín-García et al. / Infection, Genetics and Evolution 48 (2017) 131–141

assemblage B, which also displayed higher levels of between-isolate variation and greater differentiation between haplotypes (b3% of them were found in two or more isolates). This scenario was in good agreement with previous evidence for lower diversity in assemblage A (Hussein et al., 2009; Kosuwin et al., 2010; Siripattanapipong et al., 2011) and suggested that assemblages A and B have experienced different recent evolutionary histories. Considering that variation at neutral sites is a direct function of the population size and the mutation rate (Charlesworth and Charlesworth, 2010), and assuming that the latter is constant across the genus and that it does not vary significantly over time, the above patterns suggested that assemblage A has a smaller effective population size. As the two assemblages are able to infect multiple hosts (Feng and Xiao, 2011; Ryan and Caccio, 2013), this observation could be attributed to different ecological properties such as a comparatively limited proliferation ability or a more recent adaptation of assemblage A to the human host. This raises the question of a turnover of Giardia lineages adapted to infect humans over time. The zoonotic potential of G. duodenalis relies on the possibility that other mammals may serve as reservoirs for human adapted assemblages (A and B), as illustrated by the common finding of pet cats and dogs infected by assemblages C and D, but also by A and B (revised in Ryan and Caccio, 2013). The interspecific exchange of parasites between humans and other mammals involves human but also non-human specific lineages, as suggested by the presence of assemblage C in human clinical isolates recently reported in Brazil and Thailand (Choy et al., 2015; Feng and Xiao, 2011), and thus provides an excellent opportunity of adaptation of new parasite lineages to the human gut environment. Eighty-six out of the 129 (67%) segregating sites observed in our data were singletons, and this fraction was higher among assemblage A sequences (85%). The use of nested-PCR protocols, which imply a large number of PCR cycles and thus a higher chance of nucleotide misincorporations, meant that the possibility of some of them being the result of polymerase errors cannot be discarded. However, their authenticity was supported by the use of a high fidelity polymerase with proofreading activity and by the observation that some of them had been described in previous studies (e.g. gdh606C N T; bg198C N T, and bg496G N A), even at polymorphic frequencies (e.g. bg273A N G) (Ankarklev et al., 2012; Kosuwin et al., 2010; Siripattanapipong et al., 2011). Unfortunately, the criteria used to determine what is considered a “true” variant varies among studies. For example, new sequence variants were taken as true alleles if: “the variants occurred two or more times from more than one PCR amplicon library or if the single sequence variant comprised the entire clonal library” (Lasek-Nesselquist et al., 2009); “it was redetermined using PCR products from two independent amplifications from the same DNA template” (Kosuwin et al., 2010); or “the substitution at that position occurred two or more times” (Siripattanapipong et al., 2011); and only those considered “true” variants are usually reported. Therefore, comparisons between sets of data must be taken with caution. Here we decided to include all the singletons in the analysis, because (i) real variants at low frequencies could not be distinguished from PCR misincorporations, (ii) their inclusion did not have a meaningful impact in the estimate of nucleotide diversity measured as π, as this statistic gives low weight to low frequency polymorphisms, and (iii) they should be made available for comparison with other datasets. Although the inclusion of all the singletons reduced the mean frequency of the variants in the sample, the frequency spectra of synonymous variants did not deviate from neutral expectations, as inferred by the Tajima's D values, for the pooled sample as well as for the individual isolates. These results could be taken as evidence for both assemblages being in mutation-drift equilibrium at these sites, and this would be in contrast with a proposed population expansion of assemblage A, based on evidence for significant negative values of Tajima's D

among variants at all sites of the tpi locus (i.e. making no distinction between synonymous and nonsynonymous sites; Choy et al., 2015). In fact, this latter result probably reflected the effect of purifying selection against nonsynonymous changes that keeps them at low frequencies in the populations, as evidenced by the significantly negative Tajima's D estimates at nonsynonymous sites in our dataset. The hierarchical analyses revealed a significant structure of the genetic variance in both assemblages, such that differences between isolates contributed to up to 40% and 60% of the total variance in assemblages A and B, respectively. This scenario complicated the interpretation of the Tajima's D test on the pooled samples because the standard neutral model assumptions might have not been met. The existence of between-isolate variation means that estimates of Tajima's D on the pooled sample might have been overestimated, as fixed differences between isolates would be found at intermediate frequencies in the pooled samples. Indeed, this seemed to be the case for some of the changes at gdh in assemblage B (Table 5), which were apparently fixed in different samples and segregated at intermediate frequencies in the pooled dataset. Consistently, the pooled Tajima's D at this locus was slightly positive, although it did not reach statistical significance. However, this effect was not obvious at the other loci nor in assemblage A sequences, and therefore it was difficult to determine the extent to which the structure of the population might have affected the variant's frequency spectra. Thus, Tajima's D values should be interpreted with caution. A second implication of the relatively large between-isolate variation was that each isolate was likely to represent only a small fraction of the diversity in the parasite population. Considering that the isolates were collected within a restricted geographic location –the health district of Santiago de Compostela covers an area of 3400 km2 and includes 440,000 inhabitants–, it seems reasonable to think that our sample might only partially represent the total genetic diversity of G. duodenalis assemblages A and B in this region. Alternatively, the fact that the isolates were collected over a ten-year period means that this variation could be partially attributed to genetic differentiation of the Giardia population over time. But the time elapsed between the sampling dates displayed no positive association with the levels of between-isolate differentiation at any of the three loci (data not shown), which argues against this possibility. At any rate, a comparison with sequence data from GenBank suggested no differentiation between the Galician population (NW-Iberian Peninsula) from those of different geographic origins. In the absence of available population datasets from other parts of the world (in most cases only distinct haplotypes are reported without estimates of their frequencies), the presence of identical or very similar haplotypes in different continents suggests that there is little worldwide geographic differentiation. The levels of neutral genetic diversity varied widely across isolates: e.g. 251 and 839 (assemblage A), 407 and 1221 (assemblage B) displayed levels of mean synonymous diversity across loci well above that of their respective assemblages due to the presence of two or more divergent haplotypes, whereas isolates 122 and 147 (assemblage A) and 704 and 1343 (assemblage B) were almost monomorphic, with one frequent haplotype and a few others which only differed from it by a single nucleotide change. The high levels of variation observed within isolates might occur: between individuals –mixed infections (Almeida et al., 2010)–, within each individual –allelic sequence heterozygosity (Ankarklev et al., 2012) – or by a combination of both. Under the null hypothesis of asexual reproduction and equational partition of nuclei during cellular division (i.e. sister nuclei segregate together) (Carpenter et al., 2012; Jirakova et al., 2012; Sagolla et al., 2006; Yu et al., 2002) mutations would be expected to accumulate over the generations, building up the level of differentiation between the two nuclei and between individuals. But this scenario is hard to reconcile with the observed low variation in some isolates even at highly variable loci like gdh (see Section 3.2). Allelic dropout –i.e. the failure to detect alleles segregating at low frequencies in the isolates– is unlikely to explain it, as by

L.B. Gabín-García et al. / Infection, Genetics and Evolution 48 (2017) 131–141

sequencing eight clones per isolate there is a 0.90 likelihood of detecting alleles at frequencies N 0.25, which is the lowest frequency of a variant expected in an organism with two diploid nuclei. Mitotic recombination –by gene conversion or crossing-over– and ploidy cycles –loss of one chromosome by nondisjunction and subsequent duplication of the other– might reduce the differentiation between homologous chromosomes in the same nucleus, but they are not expected to occur between nuclei and cannot be invoked to explain this observation (Birky, 2010). However, exchange of genetic material between nuclei in the cyst could explain the unexpected levels of homozygosity (Carpenter et al., 2012; Jirakova et al., 2012; Poxleitner et al., 2008). Also, the incidental co-segregation of parental and derived nuclei during trophozoite proliferation or excystation (instead of the reported equational partitions) would eliminate the between-nuclei variation, but to the best of our knowledge there is no evidence for this phenomenon in Giardia. The sample analyzed here included two isolates with sequences attributable to assemblages A and B. Mixed A/B infections represent to up to 10% of all the human cases, although they are more common in non-industrialized communities (Feng and Xiao, 2011; Ryan and Caccio, 2013). However, their abundance might have been underestimated by the frequent use of generic primers, with different affinities for alleles from the different assemblages. This biased amplification might explain inconsistent assemblage assignments across loci described in previous multilocus studies (Caccio et al., 2008; Takumi et al., 2012; Yang et al., 2010), as well as the results reported here for isolate 122, which was successfully amplified with assemblage A and B specific primers for two loci (tpi and calt), but not with specific primers for assemblage B gdh nor bg. In line with previous results (Kosuwin et al., 2010; Takumi et al., 2012), there was no significant evidence for genetic recombination neither within nor between assemblages in the present dataset. The large genetic differentiation between A and B (KS N 70%) (our data and Franzen et al., 2009) argues against the hypothesis that they do recombine or that they do so at a meaningful rate in nature, as it has been suggested (Lasek-Nesselquist et al., 2009; Teodorovic et al., 2007). This large divergence also means that the pairs of sites with four gametic types and the Rh values greater than zero observed in our sample are better explained by recurrent mutation events, which generate patterns that mimic recombination. In this context, the finding of two chimeric assemblage A/B sequences of the calt locus in sample 321 must be taken with caution. Although similar chimeric molecules had been previously reported and presented as true recombinants (Lasek-Nesselquist et al., 2009; Takumi et al., 2012; Teodorovic et al., 2007), the fact that the isolate where it occurred corresponded to a mixed A/B infection means that the possibility of the chimeric sequence being a PCR artefact could not be discarded. Such molecules might result from template switching during extension in one or more consecutive PCR cycles (Kanagawa, 2003). Another possibility is that at the end of a mixed-template PCR the hybridization of heterologous sequences might lead to the formation of heteroduplexes, which can generate mosaic sequences when repaired by the mismatch repair system of the transformed cell (Speksnijder et al., 2001). Such artefacts can be easily detected if the template sequences involved are highly divergent –e.g. alleles of assemblages A and B–, but when they differ by a reduced number of changes, it is difficult to ascertain whether a new sequence combining them was originated by a recurrent mutation or by recombination. The development of a protocol that minimizes this type of artefacts would help estimate the recombination rate in G. duodenalis populations (Ramirez and Llewellyn, 2014). G. duodenalis sub-assemblages were initially defined on the basis of allozyme studies of axenic cultures and the sequences later obtained from those cultures were used as references for AI, AII, BIII and BIV (Andrews et al., 1989; Mayrhofer et al., 1995; Monis et al., 1996). The increasing amount of sequence data from different loci support the split of assemblage A into sub-assemblages AI and AII,

139

but the genetic integrity of sub-assemblages BIII and BIV is not backed by the currently available sequence data (Wielinga and Thompson, 2007). In our dataset gdh alleles formed two main groups with haplotypes from isolates 704, 1221 and 321 in one branch and isolates 1343 and 407 in another, that cluster with reference sequences of sub-assemblages BIII and BIV, respectively. But this clustering was not reflected at the other loci, as haplotypes from the same isolates displayed different phylogenetic relationships, which questions the use of this criterion for the classification of isolates. 5. Conclusions The results presented here draw an overall scenario where human giardiasis is caused by the ingestion of a reduced number of individuals, which represent a small fraction of the diversity of the parasite population. Differences in nucleotide diversity suggest that assemblages A and B have recently experienced different demographic histories –assemblage A has a smaller estimated effective population size–, and both are apparently in demographic equilibrium. There is no evidence for genetic recombination nor for genetic exchange between assemblages, but absence of variation at some loci in some isolates suggests that there must be a mechanism for the homogenization of the genetic content of the two diploid nuclei present in each individual. Also, lack of consistency of lower-level classifications (i.e. sub-assemblages) across loci indicates that they may only represent the patterns of variation at limited genomic locations. There is no evidence for genetic differentiation of the G. duodenalis population in Galicia (NW-Iberian Peninsula) from other isolates retrieved worldwide. Financial support This study was funded with grant 10CSA208038PR from Xunta de Galicia (Spain), co-financed by European Regional Development Fund, to XM. LG was supported by a Predoctoral Fellowship by Xunta de Galicia (Spain). Aknowledgements We would like to thank Dr. A. Campos (Servizo de Hematoloxía, CHUS, Santiago de Compostela, Spain) and Dr. A. Vidal and Dra. C. Carneiro (CIMUS, USC) for the use of their laboratory facilities for some procedures. Appendix A. Supplementary data Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.meegid.2016.12.014. References Adam, R.D., 2001. Biology of Giardia lamblia. Clin. Microbiol. Rev. 14, 447–475. Adam, R.D., Dahlstrom, E.W., Martens, C.A., Bruno, D.P., Barbian, K.D., Ricklefs, S.M., Hernandez, M.M., Narla, N.P., Patel, R.B., Porcella, S.F., Nash, T.E., 2013. Genome sequencing of Giardia lamblia genotypes A2 and B isolates (DH and GS) and comparative analysis with the genomes of genotypes A1 and E (WB and Pig). Genome Biol. Evol. 5, 2498–2511. Almeida, A., Pozio, E., Caccio, S.M., 2010. Genotyping of Giardia duodenalis cysts by new real-time PCR assays for detection of mixed infections in human samples. Appl. Environ. Microbiol. 76, 1895–1901. Andrews, R.H., Adams, M., Boreham, P.F., Mayrhofer, G., Meloni, B.P., 1989. Giardia intestinalis: electrophoretic evidence for a species complex. Int. J. Parasitol. 19, 183–190. Ankarklev, J., Svard, S.G., Lebbad, M., 2012. Allelic sequence heterozygosity in single Giardia parasites. BMC Microbiol. 12, 65. Aurrecoechea, C., Brestelli, J., Brunk, B.P., Carlton, J.M., Dommer, J., Fischer, S., Gajria, B., Gao, X., Gingle, A., Grant, G., Harb, O.S., Heiges, M., Innamorato, F., Iodice, J., Kissinger, J.C., Kraemer, E., Li, W., Miller, J.A., Morrison, H.G., Nayak, V., Pennington, C., Pinney, D.F., Roos, D.S., Ross, C., Stoeckert Jr., C.J., Sullivan, S., Treatman, C., Wang, H., 2009. GiardiaDB and TrichDB: integrated genomic resources for the eukaryotic protist pathogens Giardia lamblia and Trichomonas vaginalis. Nucleic Acids Res. 37, D526–D530.

140

L.B. Gabín-García et al. / Infection, Genetics and Evolution 48 (2017) 131–141

Birky Jr., C.W., 2010. Giardia Sex? Yes, but how and how much? Trends Parasitol. 26, 70–74. Caccio, S.M., Beck, R., Lalle, M., Marinculic, A., Pozio, E., 2008. Multilocus genotyping of Giardia duodenalis reveals striking differences between assemblages A and B. Int. J. Parasitol. 38, 1523–1531. Carpenter, M.L., Assaf, Z.J., Gourguechon, S., Cande, W.Z., 2012. Nuclear inheritance and genetic exchange without meiosis in the binucleate parasite Giardia intestinalis. J. Cell Sci. 125, 2523–2532. Cooper, M.A., Adam, R.D., Worobey, M., Sterling, C.R., 2007. Population genetics provides evidence for recombination in Giardia. Curr. Biol. 17, 1984–1988. Charlesworth, B., Charlesworth, D., 2010. Elements Of Evolutionary Genetics. Roberts and Company Publishers, Greenwood Village, CO. Choy, S.H., Mahdy, M.A., Al-Mekhlafi, H.M., Low, V.L., Surin, J., 2015. Population expansion and gene flow in Giardia duodenalis as revealed by triosephosphate isomerase gene. Parasit. Vectors 8, 454. de Lucio, A., Martinez-Ruiz, R., Merino, F.J., Bailo, B., Aguilera, M., Fuentes, I., Carmena, D., 2015. Molecular genotyping of Giardia duodenalis isolates from symptomatic individuals attending two major public hospitals in Madrid, Spain. PLoS One 10, e0143981. Durigan, M., Abreu, A.G., Zucchi, M.I., Franco, R.M., de Souza, A.P., 2014. Genetic diversity of Giardia duodenalis: multilocus genotyping reveals zoonotic potential between clinical and environmental sources in a metropolitan region of Brazil. PLoS One 9, e115489. Excoffier, L., Lischer, H.E., 2010. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol. Ecol. Resour. 10, 564–567. Ey, P.L., Mansouri, M., Kulda, J., Nohynkova, E., Monis, P.T., Andrews, R.H., Mayrhofer, G., 1997. Genetic analysis of Giardia from hoofed farm animals reveals artiodactyl-specific and potentially zoonotic genotypes. J. Eukaryot. Microbiol. 44, 626–635. Felsenstein, J., 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 783–791. Feng, Y., Xiao, L., 2011. Zoonotic potential and molecular epidemiology of Giardia species and giardiasis. Clin. Microbiol. Rev. 24, 110–140. Franzen, O., Jerlstrom-Hultqvist, J., Castro, E., Sherwood, E., Ankarklev, J., Reiner, D.S., Palm, D., Andersson, J.O., Andersson, B., Svard, S.G., 2009. Draft genome sequencing of Giardia intestinalis assemblage B isolate GS: is human giardiasis caused by two different species? PLoS Pathog. 5, e1000560. Hall, T.A., 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp. Ser. 95–98. Homan, W.L., van Enckevort, F.H., Limper, L., van Eys, G.J., Schoone, G.J., Kasprzak, W., Majewska, A.C., van Knapen, F., 1992. Comparison of Giardia isolates from different laboratories by isoenzyme analysis and recombinant DNA probes. Parasitol. Res. 78, 316–323. Hudson, R.R., 2001. Two-locus sampling distributions and their application. Genetics 159, 1805–1817. Huey, C.S., Mahdy, M.A., Al-Mekhlafi, H.M., Nasr, N.A., Lim, Y.A., Mahmud, R., Surin, J., 2013. Multilocus genotyping of Giardia duodenalis in Malaysia. Infect. Genet. Evol. 17, 269–276. Hussein, A.I., Yamaguchi, T., Nakamoto, K., Iseki, M., Tokoro, M., 2009. Multiplesubgenotype infections of Giardia intestinalis detected in Palestinian clinical cases using a subcloning approach. Parasitol. Int. 58, 258–262. Jerlstrom-Hultqvist, J., Franzen, O., Ankarklev, J., Xu, F., Nohynkova, E., Andersson, J.O., Svard, S.G., Andersson, B., 2010. Genome analysis and comparative genomics of a Giardia intestinalis assemblage E isolate. BMC Genomics 11, 543. Jirakova, K., Kulda, J., Nohynkova, E., 2012. How nuclei of Giardia pass through cell differentiation: semi-open mitosis followed by nuclear interconnection. Protist 163, 465–479. Jukes, T.H., Cantor, C.R., 1969. Evolution of protein molecules. In: Munro, H.N. (Ed.), Mammalian Protein Metabolism. Academic Press, New York, pp. 21–132. Kanagawa, T., 2003. Bias and artifacts in multitemplate polymerase chain reactions (PCR). J. Biosci. Bioeng. 96, 317–323. Kosuwin, R., Putaporntip, C., Pattanawong, U., Jongwutiwes, S., 2010. Clonal diversity in Giardia duodenalis isolates from Thailand: evidences for intragenic recombination and purifying selection at the beta giardin locus. Gene 449, 1–8. Lalle, M., Pozio, E., Capelli, G., Bruschi, F., Crotti, D., Caccio, S.M., 2005. Genetic heterogeneity at the beta-giardin locus among human and animal isolates of Giardia duodenalis and identification of potentially zoonotic subgenotypes. Int. J. Parasitol. 35, 207–213. Lasek-Nesselquist, E., Welch, D.M., Sogin, M.L., 2010. The identification of a new Giardia duodenalis assemblage in marine vertebrates and a preliminary analysis of G. duodenalis population biology in marine systems. Int. J. Parasitol. 40, 1063–1074. Lasek-Nesselquist, E., Welch, D.M., Thompson, R.C., Steuart, R.F., Sogin, M.L., 2009. Genetic exchange within and between assemblages of Giardia duodenalis. J. Eukaryot. Microbiol. 56, 504–518. Lebbad, M., Ankarklev, J., Tellez, A., Leiva, B., Andersson, J.O., Svard, S., 2008. Dominance of Giardia assemblage B in Leon, Nicaragua. Acta Trop. 106, 44–53. Lebbad, M., Petersson, I., Karlsson, L., Botero-Kleiven, S., Andersson, J.O., Svenungsson, B., Svard, S.G., 2011. Multilocus genotyping of human Giardia isolates suggests limited zoonotic transmission and association between assemblage B and flatulence in children. PLoS Negl. Trop. Dis. 5, e1262. Librado, P., Rozas, J., 2009. DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics 25, 1451–1452. Lynch, M., Crease, T.J., 1990. The analysis of population survey data on DNA sequence variation. Mol. Biol. Evol. 7, 377–394. Mark Welch, D., Meselson, M., 2000. Evidence for the evolution of bdelloid rotifers without sexual reproduction or genetic exchange. Science 288, 1211–1215.

Mark Welch, J.L., Mark Welch, D.B., Meselson, M., 2004. Cytogenetic evidence for asexual evolution of bdelloid rotifers. Proc. Natl. Acad. Sci. U. S. A. 101, 1618–1621. Mayrhofer, G., Andrews, R.H., Ey, P.L., Chilton, N.B., 1995. Division of Giardia isolates from humans into two genetically distinct assemblages by electrophoretic analysis of enzymes encoded at 27 loci and comparison with Giardia muris. Parasitology 111 (Pt 1), 11–17. McVean, G., Awadalla, P., Fearnhead, P., 2002. A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics 160, 1231–1241. Minetti, C., Lamden, K., Durband, C., Cheesbrough, J., Fox, A., Wastling, J.M., 2015. Determination of Giardia duodenalis assemblages and multi-locus genotypes in patients with sporadic giardiasis from England. Parasit. Vectors 8, 444. Monis, P.T., Andrews, R.H., Mayrhofer, G., Ey, P.L., 1999. Molecular systematics of the parasitic protozoan Giardia intestinalis. Mol. Biol. Evol. 16, 1135–1144. Monis, P.T., Andrews, R.H., Mayrhofer, G., Ey, P.L., 2003. Genetic diversity within the morphological species Giardia intestinalis and its relationship to host origin. Infect. Genet. Evol. 3, 29–38. Monis, P.T., Andrews, R.H., Mayrhofer, G., Mackrill, J., Kulda, J., Isaac-Renton, J.L., Ey, P.L., 1998. Novel lineages of Giardia intestinalis identified by genetic analysis of organisms isolated from dogs in Australia. Parasitology 116 (Pt 1), 7–19. Monis, P.T., Caccio, S.M., Thompson, R.C., 2009. Variation in Giardia: towards a taxonomic revision of the genus. Trends Parasitol. 25, 93–100. Monis, P.T., Mayrhofer, G., Andrews, R.H., Homan, W.L., Limper, L., Ey, P.L., 1996. Molecular genetic analysis of Giardia intestinalis isolates at the glutamate dehydrogenase locus. Parasitology 112 (Pt 1), 1–12. Morrison, H.G., McArthur, A.G., Gillin, F.D., Aley, S.B., Adam, R.D., Olsen, G.J., Best, A.A., Cande, W.Z., Chen, F., Cipriano, M.J., Davids, B.J., Dawson, S.C., Elmendorf, H.G., Hehl, A.B., Holder, M.E., Huse, S.M., Kim, U.U., Lasek-Nesselquist, E., Manning, G., Nigam, A., Nixon, J.E., Palm, D., Passamaneck, N.E., Prabhu, A., Reich, C.I., Reiner, D.S., Samuelson, J., Svard, S.G., Sogin, M.L., 2007. Genomic minimalism in the early diverging intestinal parasite Giardia lamblia. Science 317, 1921–1926. Myers, S.R., Griffiths, R.C., 2003. Bounds on the minimum number of recombination events in a sample history. Genetics 163, 375–394. Nash, T.E., Herrington, D.A., Losonsky, G.A., Levine, M.M., 1987. Experimental human infections with Giardia lamblia. J. Infect. Dis. 156, 974–984. Nash, T.E., Keister, D.B., 1985. Differences in excretory-secretory products and surface antigens among 19 isolates of Giardia. J. Infect. Dis. 152, 1166–1171. Nash, T.E., McCutchan, T., Keister, D., Dame, J.B., Conrad, J.D., Gillin, F.D., 1985. Restrictionendonuclease analysis of DNA from 15 Giardia isolates obtained from humans and animals. J. Infect. Dis. 152, 64–73. Nei, M., 1987. Molecular Evolutionary Genetics. Columbia University Press, New York. Nei, M., Gojobori, T., 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3, 418–426. Poxleitner, M.K., Carpenter, M.L., Mancuso, J.J., Wang, C.J., Dawson, S.C., Cande, W.Z., 2008. Evidence for karyogamy and exchange of genetic material in the binucleate intestinal parasite Giardia intestinalis. Science 319, 1530–1533. Ramesh, M.A., Malik, S.B., Logsdon Jr., J.M., 2005. A phylogenomic inventory of meiotic genes; evidence for sex in Giardia and an early eukaryotic origin of meiosis. Curr. Biol. 15, 185–191. Ramirez, J.D., Llewellyn, M.S., 2014. Reproductive clonality in protozoan pathogens–truth or artefact? Mol. Ecol. 23, 4195–4202. Robertson, L.J., Forberg, T., Hermansen, L., Gjerde, B.K., Langeland, N., 2007. Molecular characterisation of Giardia isolates from clinical infections following a waterborne outbreak. J. Infect. 55, 79–88. Ryan, U., Caccio, S.M., 2013. Zoonotic potential of Giardia. Int. J. Parasitol. Sagolla, M.S., Dawson, S.C., Mancuso, J.J., Cande, W.Z., 2006. Three-dimensional analysis of mitosis and cytokinesis in the binucleate parasite Giardia intestinalis. J. Cell Sci. 119, 4889–4900. Saitou, N., Nei, M., 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425. Siripattanapipong, S., Leelayoova, S., Mungthin, M., Thompson, R.C., Boontanom, P., Saksirisampant, W., Tan-Ariya, P., 2011. Clonal diversity of the glutamate dehydrogenase gene in Giardia duodenalis from Thai isolates: evidence of genetic exchange or mixed infections? BMC Microbiol. 11, 206. Speksnijder, A.G., Kowalchuk, G.A., De Jong, S., Kline, E., Stephen, J.R., Laanbroek, H.J., 2001. Microvariation artifacts introduced by PCR and cloning of closely related 16S rRNA gene sequences. Appl. Environ. Microbiol. 67, 469–472. Sprong, H., Caccio, S.M., van der Giessen, J.W., Network Z. partners, 2009. Identification of zoonotic genotypes of Giardia duodenalis. PLoS Negl. Trop. Dis. 3, e558. Tajima, F., 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123, 585–595. Takumi, K., Swart, A., Mank, T., Lasek-Nesselquist, E., Lebbad, M., Caccio, S.M., Sprong, H., 2012. Population-based analyses of Giardia duodenalis is consistent with the clonal assemblage structure. Parasit. Vectors 5, 168. Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., Kumar, S., 2011. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 28, 2731–2739. Teodorovic, S., Braverman, J.M., Elmendorf, H.G., 2007. Unusually low levels of genetic variation among Giardia lamblia isolates. Eukaryot. Cell 6, 1421–1430. Thompson, J.D., Higgins, D.G., Gibson, T.J., 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680. Tibayrenc, M., Ayala, F.J., 2012. Reproductive clonality of pathogens: a perspective on pathogenic viruses, bacteria, fungi, and parasitic protozoa. Proc. Natl. Acad. Sci. U. S. A. 109, E3305–E3313. Tibayrenc, M., Ayala, F.J., 2014. Cryptosporidium,Giardia, Cryptococcus, Pneumocystis genetic variability: cryptic biological species or clonal near-clades? PLoS Pathog. 10, e1003908.

L.B. Gabín-García et al. / Infection, Genetics and Evolution 48 (2017) 131–141 Watterson, G.A., 1975. On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7, 256–276. Wielinga, C.M., Thompson, R.C., 2007. Comparative evaluation of Giardia duodenalis sequence data. Parasitology 134, 1795–1821. Xu, F., Jerlstrom-Hultqvist, J., Andersson, J.O., 2012. Genome-wide analyses of recombination suggest that Giardia intestinalis assemblages represent different species. Mol. Biol. Evol. 29, 2895–2898.

141

Yang, R., Lee, J., Ng, J., Ryan, U., 2010. High prevalence Giardia duodenalis assemblage B and potentially zoonotic subtypes in sporadic human cases in Western Australia. Int. J. Parasitol. 40, 293–297. Yu, L.Z., Birky Jr., C.W., Adam, R.D., 2002. The two nuclei of Giardia each have complete copies of the genome and are partitioned equationally at cytokinesis. Eukaryot. Cell 1, 191–199.