Infection, Genetics and Evolution 15 (2013) 18–24
Contents lists available at SciVerse ScienceDirect
Infection, Genetics and Evolution journal homepage: www.elsevier.com/locate/meegid
Extensive intra-host genetic diversity uncovered in Cryptosporidium parvum using Next Generation Sequencing A. Grinberg ⇑, P.J. Biggs, V.S.R. Dukkipati, T.T. George Massey University, Institute of Veterinary, Animal and Biomedical Sciences, Private Bag 11 222, Palmerston North 4442, New Zealand
a r t i c l e
i n f o
Article history: Available online 6 September 2012 Keywords: Cryptosporidium PCR Next Generation Sequencing Gp60
a b s t r a c t The theory about the Cryptosporidium life cycle predicts genetic diversity of sporozoites within the host. Nevertheless, the Cryptosporidium intra-host genetic diversity is difficult to study using conventional Sanger sequencing or electrophoretic resolution of amplicons, due to the methods’ inability to resolve mixtures of templates. We analysed the within-isolate genetic diversity of two Cryptosporidium parvum isolates sharing common descent, by combining the use of Next Generation Sequencing and cloning of PCR amplicons with database searches. The analysis focused on the single-copy 70 kDa heat shock protein (HSP70) and the 60 kDa surface glycoprotein (gp60) genes, which allowed any diversity to be ascribed to the presence of a heterogeneous population of sporozoites. The results indicated an unprecedented intrahost genetic diversity, with two HSP70 and 10 gp60 alleles in these isolates, in spite of the initial resolution of one allele per locus using Sanger sequencing. At both loci, the predominant alleles were those initially identified by Sanger sequencing. A significant (p < 0.01) overrepresentation of gp60 alleles previously reported in New Zealand was observed. These results further our understanding of the genetic structure of C. parvum populations, and expose the limitations of the use of non-axenic isolates as operational taxonomic units of genetic studies of cryptosporidiosis. Ó 2012 Elsevier B.V. All rights reserved.
1. Introduction Protozoa belonging to the genus Cryptosporidium, in particular Cryptosporidium parvum and Cryptosporidium hominis, are cosmopolitan diarrhoeagenic parasites of humans and animals transmitted through the faecal-oral route. The life cycle of C. parvum and C. hominis involves excystation of the sporozoites in the intestinal tract of the host, followed by successive rounds of conservative cell divisions, gametogony and sexual mating of gametes, and a meiosis-like division. All the cell divisions occur in the intestinal cells, and the cycle culminates with the formation of environmentallyresistant oocysts containing four haploid sporozoites. In immunocompetent hosts, cryptosporidiosis is typically a self-limiting illness lasting a number of days, characterised by the excretion of tens of millions of oocysts with the faeces. Numerous genotyping methods have been developed to overcome the lack of phenotypic differences between Cryptosporidium taxa (Caccio et al., 2005). In addition, single and multilocus subtyping schemes have been used for intra-taxon discrimination of subtypes. In particular, several studies have applied sequencing of the polymorphic gp60 gene (Cevallos et al., 2000; Strong et al., 2000) for subtyping of C. parvum and C. hominis isolates, leading to the ⇑ Corresponding author. Tel.: +64 6 3569099; fax: +64 6 3505635. E-mail address:
[email protected] (A. Grinberg). 1567-1348/$ - see front matter Ó 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.meegid.2012.08.017
currently accepted classification of these species into gp60 allelic ‘families’ and ‘subtypes’ (Sulaiman et al., 2005). Yet, the population genetic structure of C. parvum and C. hominis remains poorly understood. One of the reasons for the lack of progress in the understanding of the genetic structure of Cryptosporidium populations is the lack of methods for axenic culture of sporozoites, which are the basic units carrying the parasites’ genome. This limitation imposes the need to define non-axenic isolates composed of oocysts originating from single hosts, as operational taxonomic units (OTUs) of genetic studies. Such a conceptualisation of the OTU conflicts with the body of theory predicting genetic heterogeneity of sporozoites within the isolates, even in the case of infections with single Cryptosporidium taxa. In fact, intra- and inter-oocyst genetic heterogeneity of sporozoites may occur as a consequence of infections with heterogeneous parasites, or mutations and recombinations occurring in the course of the infection. Evidence for the occurrence of within isolate genetic diversity (WIGD) has been generated in a number of studies. For instance, variation in the C. parvum b-tubulin gene has been reported more than a decade ago by different groups (Widmer et al., 1998; Rochelle et al., 2000). Subsequently, Tanrinverdi et al. (2008) reported a significant rate of ‘mixed’ C. parvum infections in Uganda (2008); and more recently, three gp60 alleles were found in a C. parvum isolate using dual fluorescent terminal-restriction fragment length
A. Grinberg et al. / Infection, Genetics and Evolution 15 (2013) 18–24
polymorphism analysis (Waldron and Power, 2011). Nevertheless, the WIGD is seldom discussed or taken into account in molecular epidemiological studies of cryptosporidiosis. Theory and evidence indicate that the natural history of an infectious disease may be influenced by the interactions among different co-infecting lineages, and among these lineages and the immune system (Nowak et al., 1990; Domingo et al., 1998, 2006; Alizon and van Baalen, 2008). For example, mutants produced in the course of infections with human immunodeficiency virus contribute to the evasion from the immune response and the development of resistance to chemotherapy (Domingo et al., 1998, 2006; McMichael and Phillips, 1997). Thus, many key aspects of cryptosporidiosis could be studied more effectively by analysing the WIGD of the parasites. Yet, the Cryptosporidium WIGD is difficult to be studied by the conventional means of PCR followed by Sanger sequencing, due to the method’s inability to resolve complex mixtures of templates. Unlike Sanger sequencing methods, Next Generation Sequencing (NGS) platforms report individual sequences of an enormous number of DNA fragments in a single sequencing reaction, thus, are powerful platforms for the study of the WIGD of microorganisms. The method has been used recently for the fast sequencing of Cryptosporidium isolates (Widmer et al., 2012). Stimulated by this technological development, we analysed the WIGD of two C. parvum isolates originating from a point-source outbreak, by combining the use of NGS, molecular cloning of PCR amplicons, and searches in DNA sequence databases. We concentrated on an analysis of the polymorphic HSP70 and gp60 genes. The first contains an imperfect repeat composed of a variable number of 12-basepairs repeat units (Khramtsov et al., 1995). Conversely, the gp60 contains an imperfect TCA/TCG (polyserine) trinucleotide repeat (Strong et al., 2000). Both genes display natural polymorphism in the number of repeat units, and are present in single copies in the C. parvum genome (http://eupathdb.org/eupathdb/), which allowed any diversity identified within the analysed isolates to be ascribed to a heterogeneous population of sporozoites. Here, we present the results of the study and discuss their implications on research and on the ongoing discussion on the population genetic structure of Cryptosporidium parasites. 2. Materials and methods 2.1. Source of isolates and preliminary molecular analysis Two Cryptosporidium-positive faecal specimens (‘Is8’; ‘Is9’) collected in 2006 in New Zealand were used in this study. The specimens originated from a point-source outbreak of cryptosporidiosis caused by a rare C. parvum gp60 subtype. The parasites were identified as C. parvum by sequencing of the 18S rRNA gene. Subtyping by PCR-sequencing of the gp60 locus identified the rare gp60 IIaA21G4R1 allele, and sequencing of the HSP70 gene found identical alleles composed of 11 imperfect repeats, in both specimens. The HSP70 gene of Is8 exhibited a thymine insertion, which interrupted the reading frame and was attributed to PCR error (Grinberg et al., 2011). No additional tests were required at that junction and genomic DNA from Is8 and Is9 was stored at 20 °C. In 2010, a new PCR failed to reveal the thymine insertion in Is8. However, single stranded DNA conformational polymorphism analysis (SSCPA) of a new PCRproduct suggested the presence of heterogeneous HSP70 alleles (Fig. 1), which prompted the molecular analyses described below. 2.2. Next Generation Sequencing of a gp60 amplicon of Is8 Stored DNA from Is8 was used to PCR-amplify the polymorphic region of the gp60 gene, and the amplicon was submitted for NGS.
19
The region amplified included the polymorphic compound TCA/ TCG repeat sequence, which is commonly used to classify C. parvum isolates into gp60 families and types, as proposed by Sulaiman et al. (2005). The primers were F: 50 -ATGAGATTGTCGCTCATTATC30 , and R: 50 -TTACAACACGAATAAGGCTGC-30 (Pangasa et al., 2010). Each 20 ll PCR solutions contained 5 nmol of each dNTP, 37.5–45.0 nmol MgSO4, 0.01 nmol each of forward and reverse primers, 1.25 units of PlatinumÒ Taq DNA Polymerase High Fidelity (Invitrogen, Carlsbad, CA, USA), 2 ll of 10 PCR buffer and 50– 100 ng of template DNA. Amplification was performed in a Gene Amp PCR system 9600 (Applied Biosystems, Foster City, CA, USA), with an initial hold at 95 °C for 5 min, followed by 30 cycles (94 °C for 30 s, 57 °C for 30 s, 72 °C for 1 min), and a final extension at 72 °C for 7 min. The presence of amplicon was assessed using agarose gel electrophoresis. The PCR product was purified using a column (Roche Diagnostics, Mannheim, GmbH), and the purified product was concatenated by isothermal whole genome amplification (WGA) by multiple strand displacement (Repli-g mini kit, Qiagen, Hilden, GmbH). This step was necessary in order to generate high molecular weight DNA suitable for fragmentation before DNA library preparation for the NGS. The final product was submitted to a NGS service provider. Fragmentation was performed by nebulisation and the DNA library was prepared using TruSeQ reagents and software (http://www.illumina.com/truseq/about_ truseq/truseq). Finally, the library was sequenced on an Illumina HiSeq instrument (San Diego, CA, USA), with 100-bp paired-end reads, using Version 3 chemistry. The sequencing was planned to run as a small fraction of a lane, but it was eventually run using a full lane. 2.3. Cloning of HSP70 and gp60 amplicons of Is8 and Is9 Stored DNA of Is8 and Is9 was used to PCR-amplify the variable repeat regions of the HSP70 and gp60 genes for amplicon-cloning. Two cloning experiments (using Taq and proofreading polymerase) were performed for each locus, for a total of four experiments per isolate. The primers for marker HSP70 were 50 -CACCATCCAAGAACCAAAGG-30 (forward) and 50 -GCCTAAAGGTAGAGTG TGCTTTTC-30 (reverse) (Grinberg et al., 2008). The PCR for the first cloning of the HSP70 gene used a 20 ll solution containing 2 mM of each dNTP, 0.6 ll MgCl2 at a concentration of 50 mM/ml, 0.1– 0.2 nmol each of forward and reverse primers, 0.2 ll of PlatinumÒ Taq DNA Polymerase (Invitrogen), 2 ll of 10 PCR buffer, and 2 ll of template DNA. Amplification was performed with an initial hold at 96 °C for 2 min and 40 cycles of 96 °C for 30 s, 57 °C for 30 s, and 72 °C for 30 s. For the second PCR-cloning, each 20-ll solution contained 5 nmol of each dNTP, 37.5–45.0 nmol MgSO4, 0.1 nmol each of forward and reverse primers, 1.25 units of PlatinumÒ Taq DNA Polymerase High Fidelity (Invitrogen Corporation, Carlsbad CA, USA), 2 ll of 10 PCR buffer, and 50–100 ng of template DNA. The PCR was performed in a Gene Amp PCR system 9600 (Applied Biosystems, Foster City CA, USA), with an initial hold at 95 °C for 5 min, followed by 30 cycles (94 °C for 30 s, 57 °C for 30 s, 72 °C for 1 min), and a final extension at 72 °C for 7 min. For both PCRs, the presence of HSP70 amplicons was assessed by means of agarose gel electrophoresis. The PCR for the first gp60 cloning experiment was a nested procedure and used Taq polymerase. The primers were F: 50 -ATAGTCTCCGCTGTATTC-30 , and R: 50 -GAAGGAACGATGTATCT-30 (externals); and F: 50 -TCCGCTGTATTCTCAGCC-30 , and R: 50 -GCAG AGGAACCAGCATC-30 (internals). The PCR solution was similar to that used for the first amplification of the HSP70 gene, with the addition of 2 ll of nonacetylated bovine serum albumin (2 mg/ ml) (Biolabs, Ipswich, MA) for the external reaction. The PCR was performed with an initial hold at 96 °C for 2 min, 40 cycles of 96 °C for 30 s, 54 °C for 30 s, and 72 °C for 30 s, and a final
20
A. Grinberg et al. / Infection, Genetics and Evolution 15 (2013) 18–24
Fig. 1. Single stranded conformational polymorphism analysis of HSP70 amplicons of Is8 and Is9 on polyacrylamide gel electrophoresis. Note the presence of a third visible band in Is8. Two subsequent cloning experiments revealed the presence of an additional allele longer than the most abundant allele by an entire 12-basepairs repeat unit. The two sequences of the imperfect repeat subsequently found by PCR-cloning are indicated. Different colours indicate different repeat-unit types in the repeat.
elongation at 72 °C for 2 min. External PCR products were diluted 1/10 in water prior to the internal reaction. The PCR for the second gp60 cloning reaction using proofreading enzyme was performed exactly as described above for the NGS. Amplicons were purified using columns (Purelink™, Invitrogen), and inserted into pCRÒ4-TOPOÒ vector (TOPO TA CloningÒ kit for Sequencing, Invitrogen). Recombinant vectors were used to transform One ShotÒ TOP10 chemically-competent Escherichia coli cells that were plated on to Luria–Bertani agar containing kanamycin. After incubation at 37 °C for 24 h, individual colonies were picked and cultured into LB broth for overnight incubation at 37 °C in a shaking incubator. E. coli cells were harvested by spinning and the pellets screened by PCR using primers and conditions identical to those described above (screening by PCR was not performed for the second cloning of the gp60 marker). Positive clones were sent for forward-sequenced using M13 forward primers provided in the kit. 2.4. Next Generation Sequencing data handling Given the large amount of data received from the NGS provider, a MySQL database (http://www.mysql.com/, accessed April 2012) was used to store sequences and associated information about the short reads. There were multiple sequence files (called ‘‘bins’’), each comprising of 4 million reads, and these were processed separately before analysis could proceed on the final dataset of unique sequences. On a per-bin basis, sequences were analysed with the ‘fastx_collapser’ program within the fastx_toolkit suite (http://hannonlab.cshl.edu/fastx_toolkit/, accessed April 2012), to generate a list of sequences, and the number of times each sequence was represented in the bin. The ‘fasta_formatter’ program was used to convert the data into a tabular format that was subsequently loaded into a MySQL database version 5.1.54 (running on Ubuntu; http:// www.ubuntu.com/, accessed April 2012). All the sequences with more than 10 ‘Ns’ in a row (indicating ambiguous basecalls and therefore poor quality sequence) were removed from the table, and the remaining sequences were combined in a NGS working file. 2.5. Search of gp60 TCA/TCG repeats in the NGS working file and gp60 allele definition Initially, the NGS working file was mined for the presence of gp60 alleles by calling all the 100-bp reads starting with the sequence immediately upstream the TCA/TCG repeat in the IIa gp60 allelic family – which is 50 -TGTTGAGGGC0 -30 . These reads were collated in a .txt file (the ‘TCA/TCG file’), which also contained the data on the number of occurrences of each sequence type in the file. The presence of non-IIa alleles in the NGS working file was ruled out by mapping the full length of representative IIc; IId; IIi; IIj;
and IIe alleles to the working file using the mapper BWA (Li and Durbin, 2009). These non-IIa sequences were previously used by Widmer and Lee (2010) and were kindly provided by the authors. Two external sources of sequence data were then used for the identification of gp60 allelic types in the TCA/TCG file. The first source was a pre-existing database containing the gp60 Sanger-sequences of >350 C. parvum isolates isolated from human and animals in New Zealand since 2003 (the ‘NZDB’). The second source was a quasi-complete list of global gp60 IIa alleles deposited in the GenBank. This list was generated through successive alignments with referent alleles using the basic local alignment search tool (BLAST; http://blast.ncbi.nlm.nih.gov/), until three consecutive iterations failed to identify new alleles. All the gp60 sequences types found by cloning and in the NZDB or globally were trimmed, to include the 50 -GGGC’-30 sequence upstream the TCA/TCG repeat and the repeat itself, followed by the 50 -ACATCAACC’-30 sequence downstream the repeat. These trimmed sequence types were mapped to the TCA/TCG file using the ‘find’ function. Finally, each trimmed sequence found by cloning and/or in the NZDB, which was present in the TCA/TCG file, was considered an ‘allele’ and was tabulated (Table 1). 2.6. Statistical analysis of data The following statistical analyses were performed in order to corroborate and explain the results: Q1. Theory predicts that true alleles occur at higher frequency than PCR or sequencing artifacts in a PCR product, simply because biological sequences account for most of the template available for primer annealing during each PCR cycle. Thus, we compared the abundances of the most common 100-bp reads containing the gp60 ‘alleles’ with these of all the reads in the working NGS file. Q2. An initial screening of the TCA/TCG file suggested an overrepresentation of allelic types previously reported in New Zealand in the file. Two null hypotheses were formulated in order to test the statistical significance of this trend. The first null hypothesis stated that the proportion of gp60 allelic types present in the NZDB (PNZDB), which were found in the TCA/TCG file, did not differ from the proportion of allelic types circulating abroad (Pabroad), which were present in the same file. This hypothesis was tested using a two-tailed Fisher’s exact test (P0: PNZDB/Pabroad = 1). The second null hypothesis stated that the proportion of sequence types present in the NZDB which were found in the TCA/ TCG file did not differ from the proportion which is expected to be found in the same file by chance alone. This hypothesis was tested using an cumulative binomial function of (n, x, and p), where ‘n’ (sample size) was the total number of gp60 sequence types present in the NZDB, ‘x’ (number of successes) was the count of sequence types present in the TCA/TCG file and also in the NZDB;
NZDB, the New Zealand Cryptosporidium sequence database; Taq, Taq polymerase; PR, proofreading polymerase; P, sequence present in the specified source; NP, sequence not present in the specified source; Is8 and Is9, the two isolates analysed in this study.
Genbank
P P P P P NP P P P P
NZDB
1 6 212 88 NP NP 13 1 7 3 56 11 3 2 2 1 0 0 0 0
Total, cloning Cloning, PR (Is8/Is9)
9/14 2/1 2/0 0/0 1/0 0/0 0/0 0/0 0/0 0/0 17/16 5/3 1/0 0/2 1/0 1/0 0/0 0/0 0/0 0/0
Cloning, Taq (Is8/Is9) TCA/TCG file
1539 241 217 18 14 14 4 3 1 1 GGGCtcatcatcgtcatcgtcatcgtcatcgtcatcatcatcatcatcatcatcatcatcatcatcatcatcatcatcaACATCAACC GGGCtcatcatcgtcatcgtcatcgtcatcgtcatcatcatcatcatcatcatcatcatcatcatcatcatcatcaACATCAACC GGGCtcatcatcgtcatcgtcatcgtcatcatcatcatcatcatcatcatcatcatcatcatcatcaACATCAACC GGGCtcatcatcgtcatcgtcatcgtcatcgtcatcatcatcatcatcatcatcatcatcatcatcatcatcaACATCAACC GGGCtcatcatcgtcatcgtcatcgtcatcatcatcatcatcatcatcatcatcatcatcatcaACATCAACC GGGCtcatcatcgtcatcgtcatcgtcatcgtcatcatcatcatcatcatcatcatcatcatcatcatcatcatcatcatcaACATCAACC GGGCtcatcatcgtcatcgtcatcgtcatcgtcatcgtcatcatcatcatcatcatcatcatcatcatcatcatcatcaACATCAACC GGGCtcatcatcgtcatcgtcatcgtcatcgtcatcatcatcatcatcatcatcatcatcatcatcatcaACATCAACC GGGctcatcatcgtcatcgtcatcgtcatcatcatcatcatcatcatcatcatcatcatcatcatcatcaACATCAACC GGGctcatcatcgtcatcgtcatcatcatcatcatcatcatcatcatcatcatcatcatcatcaACATCAACC A21G4R1 A20G4R1 A18G3R1 A19G4R1 A17G3R1 A22G4R1 A20G5R1 A18G4R1 A19G3R1 A18G2R1
Number of occurrences of sequence in Trinucleotide TCA/TCG repeat sequence Allelic type
Table 1 The frequency of occurrence of the gp60 alleles found in the two analysed isolates by Next Generation Sequencing, cloning, and/or in the New Zealand Cryptosporidium sequence database. The allelic type designation follows the scheme suggested by Sulaiman et al. (2005). The occurrence of each allele in NGS is represented by the number of times the most abundant 100-bp read containing the sequence was seen in the TCA/TCG file (see manuscript). The number of GenBank reports is not reported. Note the high correlation between the allelic abundances in the TCA/TCG file and cloning.
A. Grinberg et al. / Infection, Genetics and Evolution 15 (2013) 18–24
21
and ‘p’ (baseline proportion) was the number of sequence types present in the NZDB divided by the number of gp60 sequence types found globally. Q3. The correlation between the abundance of each gp60 sequence type present both in the TCA/TCG file and cloning was assessed visually. No statistical testing was required. All calculations were performed using NCSS package (NCSS version 2001; Kaysville, Utah, USA).
3. Results 3.1. Results of Next Generation Sequencing A total of 253,972,758 paired end sequences (or 507,945,516 sequences) were present in the files provided by the sequencing service provider. There were 63 pairs of bins, each with 4,000,000 sequences, and a single pair with 1,972,758 sequences. Of these, a total of 41,076,716 paired sequences (16.17%) failed the Illumina proximity filter and were no longer considered, leaving 212,896,042 paired sequences (or 425,792,084 sequences) for analysis. Of these remaining sequences, 106,766,145 (50.15%) from read 1 and 153,123,651 (71.92%) from read 2 only occurred once per bin. There were 2,086,492 sequence types in the 165,902,288 sequences that occurred more than once per bin. Some head-to-head, head-totail and tail-to-head concatemer reads were revealed during manual alignment of individual reads, indicating successful WGA (not shown). The TCA/TCG file contained 1589 sequence types. Numerous reads in this file contained TCA/TCG repeats showing non-synonymous single nucleotide polymorphisms (SNPs), which were found only in this file and were not further analysed. Three read types found in the TCA/TCG file contained non-synonymous SNPs in the repeat and were also found by cloning. However, these sequences were contained in reads which were only found once and are of an uncertain significance. Interestingly, the six most abundant TCA/TCG repeat types found in the TCA/TCG file were also found by cloning (Table 1). With the exception of single nucleotide polymorphisms (SNPs) which were not further analysed, only the gp60 IIa sequence initially identified by Sanger sequencing was found by mapping downstream and upstream the TCA/TCG repeat using BWA. 3.2. Results of cloning 3.2.1. Cloning of HSP70 gene amplicons Twelve transformants per isolate from the first PCR-cloning reaction of the HSP70 gene were submitted for sequencing. Conversely, nine transformants from Is8 and five from Is9 were sequenced from the second PCR-cloning reaction. Two HSP70 sequence types were found in each PCR-cloning reaction. The predominant type was the same sequence containing 11 repeat units initially resolved in Is8 and Is9 using Sanger sequencing and was found in Is8 and Is9, in both cloning reactions. The second sequence differed from the first by the presence of 12 repeat units (Fig. 1), and was found in one Is8 transformant, in both cloning reactions. Both HSP70 sequences identified by cloning had been previously identified in C. parvum isolates in New Zealand (not shown). 3.2.2. Cloning of gp60 gene amplicons Sixty five transformants were isolated from the first gp60 PCRcloning reaction of Is8. A total of 50 (77%) were PCR-positive, and 44 were sequenced. A total of 40 transformants were isolated from the first cloning reaction of Is9, and 29 (72.5%) were PCR-positive and submitted for sequencing. Conversely, 15 and 21
22
A. Grinberg et al. / Infection, Genetics and Evolution 15 (2013) 18–24 Table 2 Forty one gp60 IIa allelic types found in the GenBank and in the New Zealand Cryptosporidium sequence database. The last 12 allelic types in the list are those present in the New Zealand Cryptosporidium database, and are in bold. The allelic types found in Is8 by Next Generation Sequencing are underlined. The GenBank accession numbers and trimmed sequences of the allelic types found in the GenBank are in Supplementary files. A20G3R1; A16G1R1; A17G1R1; A17G1⁄R1; A17G2R1; A19G2R1; A17G3R1; A16G2R1; A13G2R1; A13G2⁄R1; A15G1R1; A15G2R2; A14G2R1; A16G2R2; A16G0R1; A18G1R1; A17G4R1; A20G2R1; A20G2⁄R1; A18G0R1; A19G1R1; A21G2R1; A18G5R1; A21G1A1; A22G3R1; A15G2R1; A11G2R1; A13G2R2; A13G0R1; A21G4R1; A20G5R1; A20G4R1; A18G4R1; A15G3R1; A19G3R1; A18G3R1; A18G2R1; A16G3R1; A15G2R1; A19G4R1; A14G1R1
transformants were submitted for sequencing from the second PCR-cloning reaction of Is8 and Is9, respectively. With the exception of a few foreign sequences, most transformants contained gp60 sequences. Overall, 16 gp60 TCA/TCG repeat types were found in Is8 and Is9 by cloning. Ten types were found in the first cloning reaction, two were found in the second reaction, and four types were shared and found in both reactions. The two most abundant TCA/TCG sequence types in both isolates were the A21G4R1 and A20G4R1. Six types found by cloning were also found in the TCA/TCG file, and are tabulated as ‘alleles’ (Table 1).
3.3. Database search results Overall, 41 IIa allelic types were found in the GenBank. There were 12 IIa allelic types in the NZDB, representing 29,26% (12/41) of the global number of IIa allelic types (Supplementary material). Fourteen out of 41 allelic types found in the Genbank were present in the TCA/TCG file. A total of 8/12 (66.6%) allelic types present in the NZDB were found in the TCA/TCG file and were tabulated as ‘alleles’. These allelic types accounted for >90% of the gp60 sequences of New Zealand C. parvum isolates for which sequences are stored in the NZDB. Conversely, only 6/29 (16.6%) allelic types found abroad but not in the NZDB were present in the TCA/TCG file (Table 2).
3.4. Statistical results Q1. The six most common tabulated alleles were found by cloning, and their most common 100-basepair reads were present >13 times in the TCA/TCG file (Table 1). These six alleles occupied the upper decile in the rank abundance distribution of reads. This decile was composed of reads present 10 or more times in the NGS working file (not shown). Q2. Assuming a baseline probability of 0.292 (29,2% of NZDB sequences were found in the TCA/TCG file, see above), the cumulative binomial probability of observing eight or more alleles in the NZDB by chance alone was p = 0.0079. The statistical significance persisted (p = 0.01) even after dropping the initially found A21G4R1 allele from the calculation. The Fisher’s exact test supported this finding, with a two-tailed probability of observing the proportions 8/12 and 6/29 by chance alone of p = 0.0097. Also in this case the statistical significance persisted after dropping the A21G4R1 allele (p = 0.02). Together, these results indicated a significant overrepresentation of New Zealand alleles in the TCA/ TCG file. Q3. Based on their abundances, each of the six most abundant alleles occupied the same rank in the TCA/TCG file and the cloning reactions. Consistent with this finding, the four low-abundance alleles in the TCA/TCG file were not detected by cloning (see Table 1). No statistical testing was required to prove this pattern.
4. Discussion We analysed the WIGD of two C. parvum isolates sharing common descent, by combining the use of NGS and molecular cloning of amplicons, with sequence database searches. The results indicated the presence of two HSP70 and 10 gp60 alleles in these isolates, in spite of the initial resolution of one allele per locus by Sanger sequencing. These findings are consistent with previous population-based studies indicating the HSP70 is less polymorphic than the gp60 gene (Mallon et al., 2003; Leoni et al., 2007; Zintl et al., 2011). It is relevant to highlight here the possible presence of artifactual sequences in the data, which could have determined overestimation of the genetic diversity of Is8 and Is9. NGS methods report the individual sequences of an enormous number of DNA fragments, and are amenable for the resolution of complex mixtures of templates. Yet, these platforms are unable to discern biological sequences from in vitro artifacts, which normally remain undetected by Sanger-based methods due to their low frequency. In this study, artifacts could have been generated during the PCR, cloning, or sequencing, in particular as a consequence of polymerase slippage by one or more trinucleotide repeats during the PCR of the gp60 locus. Nevertheless, several lines of evidence indicate that the extensive WIGD observed in Is8 and Is9 cannot be discounted on the basis of in vitro error. Firstly, considering the polymorphism observed in the HSP70 gene, the finding of multiple gp60 alleles was in line with the expectation. In order to minimise error, we performed a single-step PCR using a proofreading enzyme for the NGS, and two cloning experiments using different primers and polymerases (Taq and proofreading enzyme), with consistent results. Furthermore, slippage cannot explain the presence of many of the gp60 alleles seen in Is8 and Is9. For instance, an in vitro generation of the A18G3R1 or an A17G3R1 would have required the occurrence of similar slippages by multiple trinucleotides in the TCA/TCG region, in the three PCR reactions. Such an event would have been unlikely, given the stability of imperfect repeat regions (Petes et al., 1997; Bacon et al., 2000; Rolfsmeier and Lahue, 2000; Walsh et al., 1996; Klintschar and Wiegand, 2003). Lastly, the strong correlation observed between the allelic abundances in NGS and cloning, and the overrepresentation of New Zealand alleles in the isolates are simply incompatible with error. Therefore, the possible presence of some artifactual alleles does not alter the conclusions of this study. This study contributes to the discussion on the population genetic structure of C. parvum in several ways. The results allude to C. parvum isolates as assemblages of genetically heterogeneous sporozoites, rather than single gp60 ‘subtypes’. However, the fact that only gp60 alleles belonging to the IIa ‘family’ were identified (despite the presence of other allelic families in New Zealand) could indicate that the allelic ‘families’ are reproductivelysegregated entities. The results may also point toward potential error from the use of multilocus subtyping schemes, when
A. Grinberg et al. / Infection, Genetics and Evolution 15 (2013) 18–24
23
Fig. 2. Simplistic example of lack of differentiation between spurious and by-descent multilocus genotypes by Sanger sequencing. The infecting doses are represented by three oocysts. The genetic makeup of the parasites is shown at two non-co-segregating loci. Three levels of genetic diversity are shown: 1, within-oocyst; 2, between-oocysts; and 3, between-dose diversity. The ‘isolates’ obtainable from each host are represented in the containers. Note that, in spite of their different genetic makeup, the isolates’ bilocus genotype definition does not differ, and is ‘AS’. The isolate from Patient 1 is genetically homogenous and Sanger sequencing of amplicons reveals its genotype-by descent. Isolates from Patients 2 and 3 are heterogeneous, and Sanger sequencing resolved the predominant alleles at each locus. For simplicity, the model assumes lack of selection, mutation, or PCR bias.
permutations of abundant alleles, rather than genotypes by descent, are defined as OTUs (see Fig. 2). However, the most striking finding was the overrepresentation of New Zealand gp60 alleles in Is8 and Is9. Our results do not allow for conclusions on clonality or panmixia to be drawn, as the mating patterns of the sporozoites remain obscure. The rate of mutation of the gp60 gene also remains unknown. The high environmental resilience of the C. parvum sporozoites inside the oocyst (Fayer, 2008) predicts a lifestyle consisting of long periods of reproductive inertia in the environment, followed by a short infection period, during which the parasite reaches a population size exceeding tens of millions of oocysts (Grinberg et al., 2002). It could be hypothesised that the short generation time needed to reach such a huge population size favours genetic diversification in every infection. Thus, some of the gp60 alleles observed may have arisen ex novo via mutation. Conversely, the most abundant alleles present in both isolates seem to represent pre-existing ‘founders’. The gp60 gene codes a zoite surface protein assumed to be exposed to the immune system. Many of the gp60 variants found in this study differed in the length of the translated polyserine chain, with possible implications on the protein’s antigenicity. As hypothesised for the multi-copy gene coding the PfEMP1 protein in Plasmodium falciparum (Kyes et al., 2007), intra-host antigenic variation of the gp60 could facilitate the parasite’s evasion from the immune response in the course of the infection. In a similar vein, antigenic variation could hinder the development of vaccines based on the gp60. We used faecal DNA, which may contain genomes of free sporozoites and merozoites. An analysis of purified oocysts would have allowed us to assess the genetic diversity of sporozoites residing inside the oocysts, thus likely to infect and propagate in space and time. Unfortunately, Is8 and Is9 were collected in 2006, and when these analyses were performed the oocysts were presumed to be empty. In summary, the isolates analysed in this study exhibited a complex genetic makeup, representing a microcosm of their local population of origin. The prevalence of this phenomenon is unknown. Nevertheless, this result per se exposes the Achilles’ heel of Sanger
sequencing of amplicons as a genotyping tool for C. parvum, and points out new, stimulating research directions aimed at exploring the intra-host genetic diversity of this parasite. Acknowledgments We are grateful to Errol Kwan (Protozoa Research Unit, Massey University) for searching the New Zealand Cryptosporidium database and Giovanni Widmer (Tufts University) for providing non-IIa gp60 sequences used for mapping the NGS file. Fig. 2 was designed by Peter Parkinson. This study was funded by McGeorge Research Fund, New Zealand.
Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.meegid.2012. 08.017. References Alizon, S., van Baalen, M., 2008. Acute or chronic? Within-host models with immune dynamics, infection outcome, and parasite evolution. The American Naturalist 172, E244–E256. Bacon, A.L., Farrington, S.M., Dunlop, M.G., 2000. Sequence interruptions confer differential stability at microsatellite alleles in mismatch repair-deficient cells. Human Molecular Genetics 9, 2707–2713. Caccio, S.M., Thompson, R.C.A., McLauchlin, J., Smith, H.V., 2005. Unravelling Cryptosporidium and Giardia epidemiology. Trends in Parasitology 21, 430–437. Cevallos, A.M., Zhang, X., Waldor, M.K., Jaison, S., Zhou, X., Tzipori, S., Neutra, M.R., Ward, H.D., 2000. Molecular cloning and expression of a gene encoding Cryptosporidium parvum glycoproteins GP40 and gp15. Infection and Immunity 68, 4108–4116. Domingo, E., Baranowski, E., Ruiz-Jarabo, C.M., Martin-Hernandez, A.M., Saiz, J.C., et al., 1998. Quasispecies structure and persistence of RNA viruses. Emerging Infectious Diseases 4, 521–527. Domingo, E., Martin, V., Perales, C., Grande-Perez, A., Garcia-Arriaza, J., et al., 2006. Viruses as quasispecies: biological implications. Current Topics in Microbiology and Immunology 299, 51–82. Fayer, R., 2008. Biology. In: Fayer, R., Xiao, L. (Eds.), Cryptosporidium and Cryptosporidiosis, Second ed. CRC Press and IWA Publishing, Boca Raton, FL, pp. 1–42.
24
A. Grinberg et al. / Infection, Genetics and Evolution 15 (2013) 18–24
Grinberg, A., Learmonth, J., Kwan, E., Pomroy, W.E., Lopez Villalobos, N., Gibson, I., Widmer, G., 2008. Genetic diversity and zoonotic potential of Cryptosporidium parvum causing foal diarrhea. Journal of Clinical Microbiology 46, 2396–2398. Grinberg, A., Markovics, A., Galindez, J., Lopez-Villalobos, N., Kosak, A., Tranquillo, V.M., 2002. Controlling the onset of natural cryptosporidiosis in calves with paromomycin sulphate. Veterinary Record 151, 606–608. Grinberg, A., Pomroy, W.E., Squires, R.A., Scuffham, A., Pita, A., Kwan, E., 2011. Retrospective cohort study of an outbreak of cryptosporidiosis caused by a rare Cryptosporidium parvum subgenotype. Epidemiology and Infection 139, 1542– 1550. Khramtsov, N., Tilley, N., Blunt, D.N., Montelone, B.A., Upton, S.J., 1995. Cloning and analysis of a Cryptosporidium parvum gene encoding a protein with homology to cytoplasmic form Hsp70. Journal of Eukariotic Microbiology 42, 416–422. Klintschar, M., Wiegand, P., 2003. Polymerase slippage in relation to the uniformity of tetrameric repeat stretches. Forensic Science International 135, 163–166. Kyes, S.A., Kraemer, S.M., Smith, J.D., 2007. Antigenic variation in Plasmodium falciparum: gene organization and regulation of the var multigene family. Eukaryotic Cell 6, 1511–1520. Leoni, F., Mallon, M.E., Smith, H.V., Tait, A., McLauchlin, J., 2007. Multilocus analysis of Cryptosporidium hominis and Cryptosporidium parvum isolates from sporadic and outbreak-related human cases and C. parvum isolates from sporadic livestock cases in the United Kingdom. Journal of Clinical Microbiology 45, 3286–3294. Li, H., Durbin, R., 2009. Fast and accurate short read alignment with burrows– wheeler transform. Bioinformatics 25, 1754–1760. Mallon, M.A., MacLeod, J., Wastling, H., Smith, H., Reilly, B., Tait, A., 2003. Population structures and the role of genetic exchange in the zoonotic pathogen Cryptosporidium parvum. Molecular Genetics and Evolution 56, 407–417. McMichael, A.J., Phillips, R.E., 1997. Escape of human immunodeficiency virus from immune control. Annual Review of Immunology 15, 271–296. Nowak, M.A., May, R.M., Anderson, R.M., 1990. The evolutionary dynamics of HIV-1 quasispecies and the development of immunodeficiency disease. AIDS 4, 1095– 1103. Pangasa, A., Jex, A.R., Nolan, M.J., Campbell, B.E., Haydon, S.R., Stevens, M.A., Gasser, R.B., 2010. Highly sensitive non-isotopic restriction endonuclease fingerprinting of nucleotide variability in the gp60 gene within Cryptosporidium species, genotypes and subgenotypes infective to humans, and its implications. Electrophoresis 31, 1637–1647. Petes, T.D., Greenwell, P.W., Dominska, M., 1997. Stabilization of microsatellite sequences by variant repeats in the yeast Saccharomyces cerevisiae. Genetics 146, 491–498.
Rochelle, P.A., De Leon, R., Atwill, E.R., 2000. Intra-isolate heterogeneity and reproducibility of PCR-based genotyping of Cryptosporidium parvum using the – tubulin gene. Quantitative Microbiology 2, 87–101. Rolfsmeier, M.L., Lahue, R.S., 2000. Stabilizing effects of interruptions on trinucleotide repeat expansions in Saccharomyces cerevisiae. Molecular and Cellular Biology 20, 173–180. Strong, W.B., Gut, J., Nelson, R.G., 2000. Cloning and sequence analysis of a highly polymorphic Cryptosporidium parvum gene encoding a 60-kilodalton glycoprotein and characterization of its 15- and 45-kilodalton zoite surface antigen products. Infection and Immunity 68, 4117–4134. Sulaiman, I.M., Hira, P.R., Zhou, L., Al-Ali, F.M., Al-Shelahi, F.A., Shweiki, H.M., Iqbal, J., Khalid, N., Xiao, L., 2005. Unique endemicity of cryptosporidiosis in children in Kuwait. Journal of Clinical Microbiology 43, 2805–2809. Tanrıverdi, S., Grinberg, A., Chalmers, R.M., Hunter, P.R., Petrovic, Z., Akiyoshi, D.E., London, E., Zhang, L., Tzipori, S., Tumwine, J.K., Widmer, G., 2008. Inferences about the global population structure of Cryptosporidium parvum and Cryptosporidium hominis. Applied and Environmental Microbiology 27, 7227– 7234. Waldron, L.S., Power, M.L., 2011. Fluorescence analysis detects gp60 subtype diversity in Cryptosporidium infections. Infection Genetics and Evolution 11, 1388–1395. Walsh, S., Fildes, N.J., Reynolds, R., 1996. Sequence analysis and characterisation of stutter products at the tetranucleotide repeat locus vVWA. Nucleic Acids Research 24, 2807–2812. Widmer, G., Lee, Y., 2010. Comparison of single- and multilocus genetic diversity in the protozoan parasites Cryptosporidium parvum and C. hominis. Applied and Environmental Microbiology 76, 6639–6644. Widmer, G., Tchack, L., Chappell, C.L., Tzipori, S., 1998. Sequence polymorphism in the beta-tubulin gene reveals heterogeneous and variable population structures in Cryptosporidium parvum. Applied and Environmental Microbiology 64, 4477– 4481. Widmer, G., Lee, Y., Hunt, P., Martinelli, A., Tolkoff, M., Bodie, K., 2012. Comparative genome analysis of two Cryptosporidium parvum isolates with different host range. Infection, Genetics and Evolution 12 (6), 1213–1221. Zintl, A., Ezzaty-Mirashemi, M., Chalmers, R., Elwin, K., Mulcahy, G., Lucy, F.E., de Waal, T., 2011. Longitudinal and spatial distribution of GP60 subtypes in human cryptosporidiosis cases in Ireland. Epidemiology and Infection 139, 1945–1955.