Infection, Genetics and Evolution 10 (2010) 1101–1109
Contents lists available at ScienceDirect
Infection, Genetics and Evolution journal homepage: www.elsevier.com/locate/meegid
Complete genome characterization of Genogroup II norovirus strains from India: Evidence of recombination in ORF2/3 overlap Preeti Chhabra a, Atul M. Walimbe b, Shobha D. Chitambar a,* a b
Enteric Viruses Group, National Institute of Virology, 20-A, Dr. Ambedkar Road, Pune 411001, India Bioinformatics Group, National Institute of Virology, Pune, India
A R T I C L E I N F O
A B S T R A C T
Article history: Received 7 April 2010 Received in revised form 3 July 2010 Accepted 5 July 2010 Available online 13 July 2010
Noroviruses (NoVs) are considered as important causative agents of non-bacterial acute gastroenteritis, worldwide. The data on NoV genomes, their diversity and evolution from Indian subcontinent are not available to date. The present study describes the characterization of full-length genomes of Indian NoV strains for the first time to establish their phylogenetic and evolutionary relationship with those circulating worldwide. Amplification of full-length genomes of three NoV strains (PC15, PC51 and PC52) was carried out using nine overlapping sets of forward and reverse primers. Full-length genomes of all of the three strains were characterized by phylogenetic, SimPlot, selection pressure and hydrophilicity analyses. The strain, PC15 was placed in the GII.4-Hunter subcluster. An intragenotype recombination event between ORFs 2 (new GII.4 variant) and 3 (Den Haag subcluster) of the strain, PC51 was detected for the first time in this study. The strain, PC52 showed the presence of commonly detected intergenotype recombination, GII.b/GII.3. A 16 amino-acid signature code (TDVVYYAGASQPRDDI) was identified in the ORF2 of recombinant GII.3 specificity strains, which may serve as a genetic marker for differentiation of these strains from non-recombinant GII.3 strains. The amino-acid substitutions in the ORF2 of PC51 and PC52 strains in comparison to the reference strains (Toyama1 and TV24) resulted in an increase in the hydrophilicity suggested alterations in the antigenic regions of Indian NoV strains. A unique pattern of amino-acid substitutions was observed within seven subclusters of GII.4 at 19 sites (including 13 sites under positive selection pressure) spanning entire ORF2. The study indicates adaptation of NoVs in the environment to escape the host immune response and to persist in the population. It also provides in-depth analyses of NoV genomes from India and determines the extent of conserved and variable features of the Indian NoV strains. ß 2010 Elsevier B.V. All rights reserved.
Keywords: Norovirus GII.4 GII.b/GII.3 New GII.4 variant Intergenotype recombination Intragenotype recombination Complete genome India
1. Introduction Noroviruses (NoVs) are widely recognized as important causative agents of outbreaks of non-bacterial acute gastroenteritis; however, they also cause sporadic infections (Buesa et al., 2002; Lindell et al., 2005). These viruses belong to the family Caliciviridae and are non-enveloped, icosahedral viruses with a positive-sense, single-stranded RNA genome of 7.5 kb in size (Green et al., 2001). Three open reading frames (ORFs) have been identified in the NoV genomes. ORF1 encodes a polyprotein that is cleaved into six non-structural (NS) proteins, which carry amino acid sequence motifs conserved in NTPase, protease and RNA dependent RNA polymerase (RdRp) (Lee et al., 1977; Rueckert and Wimmer, 1984). ORF2 encodes a major structural protein [Viral Protein (VP1)]
* Corresponding author. Tel.: +91 020 26127301; fax: +91 020 26122669. E-mail address:
[email protected] (S.D. Chitambar). 1567-1348/$ – see front matter ß 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.meegid.2010.07.007
consisting of two domains—the shell domain (S) and the protruding arm (P) that is again divided in two subdomains P1 and P2. The S domain is highly conserved while P domain is variable. P2 of P domain is hypervariable and carries immune and cellular recognition sites (Tan et al., 2003; Nilssson et al., 2003; Lochridge et al., 2005). ORF3 encodes minor capsid protein, VP2, rich in basic amino acids and is proposed to have a role in viral stability (Glass et al., 2000; Bertolotti-Ciariet et al., 2003). NoVs are divided into five major groups Genogroup I (GI) to Genogroup V (GV) according to the amino acid sequence diversities of the VP1 gene (Zheng et al., 2006). GI, GII and GIV infect primarily humans while GIII and GV infect bovine and murine species, respectively. GI, GII and GIII are further subdivided into 8, 17 and 2 genetic clusters/genotypes, respectively, whereas GIV and GV have 1 genotype each (Zheng et al., 2006). Genotyping based solely on the capsid sequences is not sufficient to characterize the recombinant NoVs with different capsid and polymerase specificities including those with unclassified polymerases (GII.a, GII.b, GII.c and GII.d) in combination with known capsids (Bull et al., 2007).
P. Chhabra et al. / Infection, Genetics and Evolution 10 (2010) 1101–1109
1102
Molecular epidemiology of NoVs suggests that GII is the most common genogroup circulating with variants of genotype 4 that causes majority of outbreaks and sporadic infections, worldwide (Noel et al., 1999; Castilho et al., 2006). However, a recombinant NoV GII.b/GII.3 has emerged recently as the main causative agent for many outbreaks across Europe, Australia and Asia (AmbertBalay et al., 2005; Bull et al., 2005; Phan et al., 2006, 2007). Complete genome studies play an important role in establishing phylogenetic and evolutionary relationship of NoVs with other members of the genogroups/genotypes circulating worldwide. Fulllength human NoV genome sequencing has been done for more than 100 strains from around the world (Thackray et al., 2007). However, the data on NoV genomes, their diversity and evolution from Indian subcontinent are not available to date. In the present study, characterization of full-length genomes of norovirus GII.4 (Hunter subcluster), its intragenotype recombinant {new GII.4 variant (ORF2)/Den Haag subcluster (ORF3)} and an intergenotype recombinant {GII.b (RdRp)/GII.3 (Capsid)} was carried out to determine the extent of conserved and variable features of the Indian NoV strains in relation to the strains from other geographic regions. 2. Materials and methods 2.1. Specimens In a study conducted on NoV surveillance in patients with acute gastroenteritis from western India during 2005–2007, GII.4 and recombinant GII.b/GII3 strains predominated in the years 2005– 2006 and 2006–2007, respectively (Chhabra et al., 2009). Also, the study indicated occurrence of ‘‘novel GII.4 variants’’ in the year 2007 on the basis of a 300 bp region at the 50 end of ORF2 (Chhabra et al., 2009). One strain each from GII.4 (PC15), GII.b/GII.3 (PC52) and ‘‘new GII.4 variants’’ (PC51) specificities was selected for complete-genome sequencing and analysis. Two of these strains were recovered from patients presenting severe disease (PC51, PC52) while the remaining one (PC15) was obtained from a patient suffering from moderate disease as per Vesikari scoring system (Ruuska and Vesikari, 1990). 2.2. RNA extraction and RT-PCR The viral RNA was extracted from 30% fecal suspensions using Trizol, LS reagent (Invitrogen, USA) according to the manufac-
turer’s instructions. The amplification of complete genomes of NoV strains was carried out using single cDNAs prepared by oligoDT (Invitrogen) and Superscript II RT (Invitrogen) at 45 8C for 1 h. A total of 10 overlapping sets of primers spanning the entire genome were designed using sequence information available in GenBank (www.ncbi.nlm.nih.gov) (Table 1). The PCR amplification was carried out using BD Advantage 2 PCR kit (BD Clonetech, USA). Briefly, the initial denaturation was carried out at 94 8C for 5 min, followed by 40 cycles of 94 8C for 1 min, 50 8C for 1 min, 68 8C for 3 min with final extension at 70 8C for 7 min. The amplified products were analyzed on 2% agarose gels stained with 0.5 mg/ml ethidium bromide. 2.3. Nucleotide sequencing All RT-PCR products were excised from the gel for purification using QIAquick gel extraction kit (QIAGEN, UK). This was followed by cycle sequencing using the BigDye1 Terminator v3.1 cycle sequencing kit (Applied Biosystems, USA). The nucleotide sequences were determined in an ABI 3130 sequencer (Applied Biosystems). The nucleotide sequences reported in this study for PC15, PC51 and PC52 NoV strains are deposited in the GenBank under the accession numbers EU921344, EU921388 and EU921389, respectively. 2.4. Phylogenetic analysis The multiple sequence alignment was carried out in CLUSTAL W program (Thompson et al., 1994) and the phylogenetic analysis of aligned sequences was carried out using MEGA 4 (Tamura et al., 2007). The phylogenetic tree was generated by the use of neighborjoining algorithm and Kimura 2-parameter distance model. 2.5. Selection pressure analysis The deduced amino acid sequences of ORF2 of the strains PC15 (GII.4-Hunter subcluster) and PC51 (new GII.4 variant) were aligned with those of the strains selected from each of the 6 known (Camberwell, Grimsby, Farmington Hills, Hunter, Sakai and Den Haag) and 7th new (GII.4 variant) subclusters. The alignment of deduced amino acid sequences of ORF2 of the GII.3 and GII.b/GII.3 strains was carried out separately. Selection pressure analysis was carried out using HyPhy software (Kosakovsky Pond et al., 2005).
Table 1 List of primers used in the study. S. No.
Name
Primer sequences
Positiona
Ref.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
NVF1 NVR1 NVF2 NVR2 NVF3 NVR3 NVF4 NVR4 NVF5 NVR5 NVF6 NVR6 NV4611 Mon383 NVF7 NVR7 NVF8 NVR8 NVF9 NVR9
GTGAATGAAGATGGCGTCTA AAACTCCAAAGAGCTCTGCCA GGACTTTCGCAGGCATAGTGGAGT TCCTGTGAGGGAGGCTGCGAT GAGAATCGCTGCTGCACGTT CGTCAGACTCAAGTGTGTAGGT GATCGCAACAAAGTGCTTGCCT CAACCTGAACCAAAGTTGACTA GCGACCGAGGAAGACTTCTGTGA TGGGCTCTGTAAATGGTTTCA AGCACCAAGACGAAATTCTGGAG ATGGAGTTCCATTGGGAGGTGCA CWGCAGCMCTDGAAATCATGG CAAGAGACTGTGAAGACATCATC AATGCTGTACACACCACTTAG GAAGTGCTGCACCCATTCCT GGTGAGCAACTTCTTTTCTT TTTGTCATGGGGGCGTTGATT GCACAAATTGAGGCCACTAA AAAGACACTAAAGAAAAGGAAAGA
1–20 912–932 876–899 1538–1558 1402–1421 2291–2312 2120–2141 3054–3075 2813–2835 3770–3791 3638–3660 4481–4503 4338–4358 5660–5682 5627–5647 6475–6495 6394–6413 7033–7053 6929–6948 7563–7585
This study This study This study This study This study This study This study This study This study This study This study This study Yuen et al. (2001) Noel et al. (1997) This study This study This study This study This study This study
a
Genomic locations of primers are given as per Lordsdale virus genome (X86557).
P. Chhabra et al. / Infection, Genetics and Evolution 10 (2010) 1101–1109
The maximum likelihood (ML) models of codon substitution that allow varying dN/dS ratio among sites were used (Yang et al., 2000). The dN/dS ratio > 1 indicates evidence of positive/adaptive selection. The likelihood ratio test (LRT) was used to compare the ‘‘Neutral’’ models versus ‘‘Selection’’ models. The sites with Bayesian posterior probability above 0.9 were considered to be under positive selection pressure. 2.6. Hydropathy index Hopp–Woods scale (a hydrophilic index, with nonpolar residues assigned negative values) was used to compare the hydrophilic indices of the NoV GII.4 variant (PC51) and recombinant NoV GII.b/ GII.3 (PC52) strain in the antigenic region with that of the reference strains, Toyama1 (GII.4-Den Haag subcluster) and TV24 (GII.3) respectively. At each position, the mean hydrophilic index of the amino acids within the window was calculated and that value was plotted as the midpoint of the window. A window size of 9 was used to determine the regions of maximal hydrophilicity. 2.7. SimPlot analysis Genomes of all of the three strains (PC15, PC51 and PC52) under study were subjected to SimPlot analysis to find out the evidence of recombination, if any. The program SimPlot 3.5 was used to carry out recombination analysis (Lole et al., 1999). The SimPlot was constructed using window size 600 and step size 3. BootScan analysis and maximum x2 method were used to identify the putative breakpoint (Smith, 1992; Salminen et al., 1995). The pvalue for each breakpoint was calculated by using Fisher’s exact test based on the phylogenetically informative sites supporting alternative tree topologies. 3. Results 3.1. Genome The strains PC15 (accession no. EU921344) and PC51 (accession no. EU921388) showed the genome length of 7559 nucleotides excluding 30 poly A tail. The ORF1, 2 and 3 included 5104 (1700), 1622 (540) and 852 (268) nucleotides (amino acid), respectively. In comparison, the size of PC52 (accession no. EU921389) genome was 7547 nucleotides that included three ORFs- 1, 2 and 3 of 5100 (1699), 1647 (548) and 765 (254) nucleotides (amino acid) in length, respectively. 3.2. Phylogenetic analysis 3.2.1. ORF1 (non-structural proteins) The phylogenetic analysis of the deduced amino acid sequences of complete ORF1 region (1700 aa) placed PC15 and PC51 strains in the GII.4 genetic cluster indicating 96–98% and 95–96% amino acid identities with the reference strains, Lordsdale virus (X86557), MD145 (AY032605), Farmington Hills virus (AY502023), Sakai Virus (AB220922) and Toyama1/2006 (AB447443) (Fig. 1a). However, with other strains of GII included in the study, the amino acid identities of both PC15 and PC51 strains varied at 86– 96% and 69–96% levels, respectively. The pairwise analysis of the deduced amino acid sequence of complete ORF1 region (1699 aa) of PC52 showed 94% identity with the strain MD145 (AY032605) of GII.4 specificity. The ORF1 region of the strain PC52 also showed 98% amino acid identity with the NoV strain TCH04-577 (AB365435) of unknown genotype specificity available in GenBank. With other strains of GII included in the phylogenetic analysis (Fig. 1a), it showed 71–93% amino acid identities. Comparative analysis of the amino acid sequences of
1103
RdRp gene of the strain PC52 indicated 99% amino acid identity with the Paris Island strain (AY652979) confirming the presence of GII.b polymerase. 3.2.2. ORF2 (VP1 gene) The deduced amino acid sequence of full-length ORF2 region (VP1 gene) of the strain PC15 showed highest (99%) identity with the strain Katori/041008 (AB294783) from Hunter subcluster and 91–96% identities with GII.4 strains from other five subclusters (Fig. 1b). The genetic marker of the Hunter subcluster, ‘TQN’ described earlier (Tu et al., 2008) was also identified in PC15 at positions 296–298 of hypervariable region (P2 subdomain). The entire ORF2 region (VP1 gene) of the strain, PC51 showed only 93.3–94.6% identities with the reference strains (Lordsdale virus (X86557), MD145 (AY032605), Farmington Hills virus (AY502023), Hunter virus (DQ078794), Sakai Virus (AB220922) and Toyama1 (AB447443)) of six known GII.4 subclusters (Fig. 1b). Thus, the strain PC51 represented ‘new GII.4 variant’ type according to the criteria described earlier (Tu et al., 2008). Complete VP1 gene sequences of new variant types from Japan, Egypt and USA are available to date in GenBank for six strains (OC07138 (AB434770), Cairo 2 (EU876882), Cairo 4 (EU876884), Cairo 8 (EU876888), SSCS (FJ411171) and RIS (FJ411171)). The strain PC51 presented 97–99% amino acid identities with all of the six strains. The comparative analysis of the deduced amino acid sequences of GII.4 strains placed in 7 (including new GII.4 variants cluster) subclusters showed the presence of a unique amino acid motif ‘LTGSADFA’ at positions 352– 59 in hypervariable region of the new variant types. The full-length ORF2 region of the strain, PC52 showed 96–99% amino acid identities with the capsids of Arg320 (AF190817), Paris Island (AY652979), Saga/5424 (AB242256), Oberhausen 455 (AF539440), TCH04-577 (AB365435), Bitburg/289 (AF427112) and Beijing/375 (EU850827) strains carrying GII.b/GII.3 specificity (Fig. 1b). However, its amino acid identities with single (nonrecombinant) GII.3 specificity strains reduced to 94–96%. A pairwise analysis of the deduced amino acid sequences of the complete ORF2 region of randomly selected strains of single GII.3 and GII.b/GII.3 specificities indicated 16 (9, 30, 154, 289, 304, 311, 312, 333, 368, 381, 389, 392, 394, 407, 415 and 456) informative sites with different amino acid residues specific to the capsids of GII.b/GII.3 strains (Fig. 2). 3.2.3. ORF3 (VP2 gene) The pairwise analysis of the deduced amino acid sequences of the entire ORF3 region (VP2 gene) of PC15 indicated 86–94% identities with Lordsdale virus (X86557), MD145 (AY032605), Farmington Hills virus (AY502023), Sakai Virus (AB220922) and Toyama1/2006 (AB447443) strains of GII.4 specificity (Fig. 1c). The pairwise analysis of complete ORF3 region (VP2 gene) of PC51 showed 99.3% amino acid identity with Toyama1/2006 (AB447443) strain of Den Haag subcluster, however, only 89– 89.6% identities with that of the ‘new GII.4 variants’ (Cairo 2, Cairo 4 and Cairo 8) indicating a possibility of recombination between ORF2 and ORF3 regions of the strain, PC51 (Fig. 1c). The pairwise analysis of complete ORF3 region of the strain, PC52 revealed 96% identity with the NoV strain, Paris Island (AY652979) of GII.b/GII.3 specificity. However, with single GII.3 specificity strains included in the analysis (Fig. 1c), it showed 90.6-93.2% identity. 3.3. SimPlot analysis The phylogenetic analysis of ORFs 2 (new GII.4 variant) and 3 (Den Haag subcluter) of the strain PC51 showed different specificities indicating the possibility of intragenotype recombination between the two ORFs. To authenticate the recombination event, a continuous stretch of 2429 bp covering complete VP1
[(Fig._1)TD$IG]
1104
P. Chhabra et al. / Infection, Genetics and Evolution 10 (2010) 1101–1109
Fig. 1. Phylogenetic dendrogram of deduced amino acid sequences of complete (a) ORF1, (b) ORF2 and (c) ORF3 regions of NoV GII strains. The Indian strains are in boldface and underlined.
(ORF2-1, 622 bp) and VP2 (ORF3, 807 bp) genes was subjected to SimPlot analysis. The estimated recombination breakpoint was at the position 6706 (x2sum ¼ 27, p < 0.01) i.e. at the ORF2/3 overlap according to the genome of PC51 strain (Fig. 3). No evidence of recombination was identified in the strain PC15 in SimPlot analysis. The strain PC52 carried a recombination of known type II.b/GII.3 that showed 97.9% nucleotide identity with the strain, Sydney C14 of the same specificity in a continuous stretch of 3232 bp of RdRp and capsid genes. 3.4. Selection pressure analysis The comparative analysis of deduced amino acid sequences of the complete ORF2 region of selected strains (inclusive of PC15 of
the present study) from all of the known six subclusters of GII.4 and seven strains (inclusive of PC51 of the present study along with OC07138, Cairo 2, Cairo 4, Cairo 8, SSCS and RIS) of new GII.4 variant type (7th subcluster) indicated 13 (174, 196, 297, 333, 340, 352, 357, 368, 372, 382, 393, 394 (except Camberwell and Grimsby clusters) and 407) sites under selection pressure (Fig. 4). A pairwise comparative analysis of these sites identified a constant amino acid change in every new epidemic variant, for example, the position 174 showed the presence of amino acid S in the strains from Camberwell and Grimbsy subclusters (1987–2002), S was replaced by P in the Farmington Hills and Hunter subclusters (2002–2004), Sakai subclusters (2002–2006) again showed S at the same position while in new GII.4 variants (2007 cluster) S is again replaced by P (Fig. 4). The similar pattern of amino acid changes
[(Fig._2)TD$IG]
P. Chhabra et al. / Infection, Genetics and Evolution 10 (2010) 1101–1109
1105
Fig. 2. Comparative analysis of deduced amino acid sequences of ORF2 region of single and recombinant GII.3 specificity strains indicating 16 informative sites. *Site under positive pressure.
was noted in other six sites (193, 255, 298, 356, 412 and 534) not identified under pressure by the criteria employed in the study (Fig. 4). The selection pressure analysis of ORF2 of the strains with single GII.3 and GII.b/GII.3 specificities revealed positive pressure at only one site (381). 3.5. Hydrophilicity index As compared to the Toyama1 strain of Den Haag subcluster, the PC51 (new GII.4 variant) showed amino-acid substitutions at 306 (L-Q), 352 (Y-L), 357 (P-D), 259 (T-A) and 364 (S-R) positions in hypervariable region (ORF2-P2 subdomain) at pH 7.0 indicating increase in the hydrophilicity of the protein (Fig. 5a). Similarly, there was an increase in the hydrophilicity of GII.3 capsid protein of the strain, PC52 (GII.b/GII.3) as compared to the strain, TV24 of single GII.3 specificity due to amino-acid substitutions at 289 (TV), 333 (A-G), 389 (P-Q), 391 (Q-K) and 394 (K-R) positions in the hypervariable region (Fig. 5b). 4. Discussion The emergence of NoVs as a major cause of epidemic gastroenteritis has increased the interest in molecular epidemiology of these viruses (Kustera et al., 2008). It has been documented that human NoVs diverge by 45% in full-length genomes and by 57% in VP1 gene (Thackray et al., 2007). Although information on complete genomes of many common and uncommon NoV strains circulating in different countries is available, no such data has been
[(Fig._3)TD$IG]
Fig. 3. SimPlot analysis of nucleotide sequences of ORF2 and ORF3 of the strain PC51 identified in western India. Window size: 600 bp; step size: 3 bp. Vertical line indicates the putative recombination breakpoint.
documented to date from India. It is essential to acquire a complete understanding of locally circulating Indian NoV strains, firstly for the diagnosis of NoV infection and secondly for its prevention. The study presented here reports analysis of three complete genomes of the NoV strains PC15, PC51 and PC52, respectively with GII.4 (Hunter subcluster), intragenotype recombinant new GII.4 variant (ORF2) and Den Haag subcluster (ORF3) and intergenotype recominant (GII.b polymerase and GII.3 ORF2) specificities from India. Molecular epidemiology data made available from different parts of the world has revealed that GII.4 is the most common genotype in circulation and that it is a major causative agent in most of the outbreaks and sporadic infections (Noel et al., 1999; Castilho et al., 2006). Its pandemic spread was first recognized in the mid-1990s (Noel et al., 1999). Since then, it has been continuously evolving and persisting in the population (Lindesmith et al., 2008). Six subclusters within GII.4 cluster have been described on the basis of their capsid diversities as per the period of their predominance (Lindesmith et al., 2008). These include— Camberwell cluster (1987–1995), Grimsby cluster (1995–2002), Farmington Hills cluster (2002–2004), Hunter cluster (2002– 2004), Sakai cluster (2004–2006) and Den Haag cluster (2006b variants) (Lindesmith et al., 2008). The Hunter subcluster of GII.4 predominated in the years 2002–2004. Although the origin of the Hunter virus is unknown, it first appeared in New South Wales in February 2004 and subsequently in the Netherlands, Taiwan and Japan (Kroneman et al., 2004; Bull et al., 2006). The capsid diversity in NoV strains from Hunter subcluster is well studied (Bull et al., 2006), however, analysis of complete genome has not been done for any of the strains to date. Ninty-nine percent amino acid homology of the strain, PC15 with Katori/041008 strain from Hunter subcluster and the presence of genetic marker ‘TQN’ at positions 296–298 in ORF2 (Tu et al., 2008) confirmed its relatedness to the Hunter subcluster. Thus, the strain PC15 represents the first NoV strain in the Hunter subcluster analyzed for all three ORFs phylogenetically. The strain PC51 represented a ‘‘novel GII.4 variant’’ type with >5% amino acid variation in ORF2 (VP1 gene) as compared to the prototype strains of six GII.4 subclusters. A molecular epidemiology study conducted on NoV strains from western India has described the circulation of Den Haag subcluster in the years 2006– 2007 just before the appearance of ‘new variant type’ in 2007 (Chhabra et al., 2009). These data support the concept of continuous replacement of GII.4 norovirus strains by introduction of new variants for promoting their persistence in human population (Lindesmith et al., 2008). The presence of amino acid motif ‘SRN’ (genetic marker of Den Haag cluster) at positions
[(Fig._4)TD$IG]
1106
P. Chhabra et al. / Infection, Genetics and Evolution 10 (2010) 1101–1109
Fig. 4. Nineteen informative sites identified in ORF2 of GII.4 strains indicating a constant amino acid change in every new epidemic variant. *Sites not under selection pressure.
296–298 in ORF2 of the strain PC51 is indicative of its emergence from the strains of Den Haag subcluster. The strains with a new variant type specificity have also been reported to cause infections in greater Cairo, Egypt in 2006 and Eastern India in 2007 (Kamel et al., 2009; Nayak et al., 2009). Sequence data deposited in GenBank suggest circulation of the new variants in Japan and USA. Interestingly, a unique amino acid motif ‘LTGSADFA’ at positions 352–59 in hypervariable region is recognized in all of the strains (PC51, SSCS, RIS, Cairo 2, Cairo 4, Cairo 8 and OC07138) of the new variant type available in GenBank. The capsid protein of NoVs has two major domains, the Shell (S) (1–225 aa) and the protruding arm (P), which is divided into the P1 (226–278 and 406–520 aa) and P2 (279–405 aa) subdomains. The P2 subdomain in ORF2 carries highest sequence variability and is considered as an important site in conferring antigenic and receptor binding specificity of the NoVs (Lochridge et al., 2005; Nilssson et al., 2003; Tan et al., 2003). It is noteworthy that most of the unique amino-acid substitutions in the capsid protein of the new variant PC51 were identified in hypervariable region (Fig. 5a). It is known that amino acids vary in their hydrophilic or hydrophobic character depending on the polarity of the side chain. In the present study, unique amino-acid substitutions in NoV strain, PC51 increased the hydrophilicity of the hypervariable region as compared to that of Toyama1, strain (Den Haag subcluster). It has been suggested that the P2 subdomain is evolving by positive selection in response to herd immunity (Pal et al., 2006; Siebenga et al., 2007). Selection pressure analysis carried out in the present study showed positive selection in 13 sites across entire ORF2 (capsid gene). Majority of these sites were in P2 subdomain
(Fig. 4). These observations are in agreement with earlier study that identified positive selection in 10 sites (Lindesmith et al., 2008). Interestingly, all of the 13 sites along with six additional sites identified in the present study revealed a particular pattern of amino-acid substitutions with every new epidemic variant (Fig. 4). Five of these 19 sites have been described earlier for similar pattern of amino-acid substitutions in GII.4 strains from the Netherlands (Siebenga et al., 2007). These findings indicated the evolutionary means adopted by GII.4 strains to escape the immune response and persist in the population. Recombinant NoV strains play a major role in causing outbreaks and sporadic infections, worldwide (Ambert-Balay et al., 2005; Bull et al., 2005). A recombinant (NoV) displays separate genetic specificities when different regions of its genome are subjected to phylogenetic analysis (Bull et al., 2005). Among the three types of recombination viz. intergenogroup, intergenotype and intragenotype reported for NoVs, intergenotype is the most frequently detected recombination (Bull et al., 2007; Phan et al., 2007). Recombinant NoVs are described to occur naturally within GI, GII and GIII (Katayama et al., 2002; Han et al., 2004; Phan et al., 2007). While only one intergenogroup (GI.3/GII.4) recombination is reported (Nayak et al., 2008), nine intragenotype NoV GII recombinants have been published to date (Rohayem et al., 2005; Etherington et al., 2006; Phan et al., 2006a). All the recombination events reported to date are either between ORF1 and ORF2 or within RdRp and ORF2 regions (Bull et al., 2007; Waters et al., 2007; Rohayem et al., 2005). This study documents for the first time the evidence of intragenotype recombination event that occurred between ORFs 2 and 3 of the strain PC51. The strain represented ‘new GII.4 variant’ specificities in ORF1 and ORF2 region, however, it was closer to Den Haag subcluster in ORF3
[(Fig._5)TD$IG]
P. Chhabra et al. / Infection, Genetics and Evolution 10 (2010) 1101–1109
1107
Fig. 5. The hydrophilic indices of ORF2 of (a) GII.4 variant (PC51) compared with the Toyama1 strain (GII.4-Den Haag subcluster) and (b) recombinant GII.b/GII.3 (PC52) compared with the TV24 (GII.3) strain. The circles indicate increase in hydrophillicity of the strains PC51 and PC52 in the antigenic region.
region. It is possible that the recombination event between the two ORFs (2 and 3) occurred when two parent NoV strains (new GII.4 variant and Den Haag subcluster) coinfecting one cell came in physical contact with each other. It may be noted that the new GII.4 variants and the strains with Den Haag subcluster specificity were detected co-circulating in the study region during the year, 2007 (Chhabra et al., 2009). Since ORF3 (VP2 gene) is responsible for virus stability, it is possible that ORF3 of Den Haag specificity may have provided stability to new GII.4 variant to sustain in the population. The study on recombination event between ORF2 and 3 needs to be extended to identify the emergence/existence of more stable NoV strains in the environment. With the identification of NoV strains carrying unclassified polymerases (GII.a, GII.b, GII.c and GII.d) with increasing capacity to switch their capsid coat, the number of reports on recombination in NoVs has increased in the recent years (Bull et al., 2007). The polymerase type GII.b emerged in Europe in 2000–2001 and is known to be associated with six different capsid genotypes GII.1, GII.2, GII.3, GII. 4, GII.7 and GII.18 of which, GII.b/GII.3 is the most common type of recombination found (Buesa et al., 2002; Reuter et al., 2006; Chhabra et al., 2009; Nayak et al., 2009). This recombinant type is responsible for several outbreaks of gastroenteritis across Europe, Australia and Asia (Bull et al., 2005; Phan et al., 2006a,b). The first full-length sequence of the GIIb polymerase was reported for the strain Sydney C14 (AY845056) (Bull et al., 2005). However, complete genome analysis has not
been carried out for any of the GII.b/GII.3 strains to date. The NoV strain, PC52 with GII.b/GII.3 specificity is the first strain to be analyzed phylogenetically. In the study presented here, amino acid sequence of ORF1 region in PC52 strain was found to be closer to its counterpart in the strains with GII.4 genotype specificity, and thus, indicated its origin in GII.4 strains. The predominance of GII.4 strains was recognized worldwide prior to the emergence of GII.b polymerase in the years, 2000–2001. It has been suggested that in order to escape population bottleneck the viruses tend to switch their capsid coats carrying the antigenic determinants (Bull et al., 2007). It is likely that ORF1 of GII.4 strains gradually varied to increase its capacity to switch the capsid coat efficiently in order to escape the immune response and persist in the population for a longer time. The capsid sequences of GII.b/GII.3 strains are well characterized. However, their comparative analysis with GII.3 single specificity strains has not been detailed. The identification of 16 informative sites in ORF2 of GII.b/GII.3 strains with amino acid composition different from GII.3 strains in the present study is indicative of changes that might have occurred in the capsid during recombination event (Fig. 2). These informative sites may serve as a genetic marker/signature code (TDVVYYAGASQPRDDI) in differentiating capsids from single and recombinant GII.3 specificity strains. The amino-acid substitutions in the ORF2 of recombinant strains that increased the hydrophilicity of the protein in the hypervariable region are indicative of a change in the antigenic determinants of
1108
P. Chhabra et al. / Infection, Genetics and Evolution 10 (2010) 1101–1109
capsid (Fig. 5b). The selection pressure analysis identified a positive selection among these strains at one site (381). Such alterations probably facilitate the recombinant strains to escape the immune response and spread in the population easily. Overall, to increase the fitness and virulence, NoV strains tend to use the replicative potential of ORF1 and antigenic diversity of ORF2 and escape the environmental selection pressure and their extinction. To conclude, the present study documents in-depth analyses of complete genomes of three NoV strains circulating in Indian population. The study has generated baseline data on Indian NoV strains and highlighted the genetic drift in NoV strains that may be linked to the evolution, sustenance and spread of NoVs in the population. Acknowledgements We are grateful to Dr. A.C. Mishra, Director, National Institute of Virology, Pune for his constant support. We also acknowledge the cooperation extended by Drs. R. Dhongade, V. Kalrao and A.R. Bavdekar for clinical specimens. References Ambert-Balay, K., Bon, F., Le Guyader, F., Pothier, P., Kohli, E., 2005. Characterization of new recombinant noroviruses. J. Clin. Microbiol. 43, 5179–5186. Bertolotti-Ciariet, A., Crawford, S.E., Hutson, A.M., Estes, M.K., 2003. The 30 end of Norwalk virus mRNA contains determinants that regulate the expression and stability of the viral capsid protein VP1: a novel function for the VP2 protein. J. Virol. 77, 11603–11615. Buesa, J., Collado, B., Lo´pez-Andu´jar, P., Abu-Mullouh, R., Dı´az, J.R., Dı´az, A.G., Part, J., Guix, S., Llovet, T., Prats, G., Bosch, A., 2002. Molecular epidemiology of caliciviruses causing outbreaks and sporadic cases of acute gastroenteritis in Spain. J. Clin. Microbiol. 40, 2854–2859. Bull, R.A., Hansman, G.S., Clancy, L.E., Tanaka, M.M., Rawlinson, W.D., White, P.A., 2005. Norovirus recombination in ORF1/ORF2 overlap. Emerg. Infect. Dis. 11, 1079–1085. Bull, R.A., Tu, E.T.V., Melver, C.J., Rawlinson, W.D., White, P.A., 2006. Emergence of a new norovirus genotype II.4 variant associated with global outbreaks of gastroenteritis. J. Clin. Microbiol. 44, 327–333. Bull, R.A., Tanaka, M.M., White, P.A., 2007. Norovirus recombination. J. Gen. Virol. 88, 3347–3359. Castilho, J.G., Munford, V., Resque, H.R., Fagundes-Neto, U., Vinje´, J., Ra´cz, M.L., 2006. Genetic diversity of norovirus among children with gastroenteritis in Sao Paulo state, Brazil. J. Clin. Microbiol. 44, 3947–3953. Chhabra, P., Dhongade, R.K., Kalrao, V.R., Bavdekar, A.R., Chitambar, S.D., 2009. Epidemiological, clinical and molecular features of norovirus infections in western India. J. Med. Virol. 81, 922–932. Etherington, G.J., Dicks, J., Roberts, I.N., 2006. High throughput sequence analysis reveals hitherto unreported recombination in the genus Norovirus. Virology 345, 88–95. Glass, P.J., White, L.J., Bull, J.M., Leparc-Goffart, I., Hardy, M.E., Estes, M.K., 2000. Norwalk virus open reading frame 3 encodes a minor structural protein. J. Virol. 74, 6581–6591. Green, K.Y., Chanock, R.M., Kapikian, A.Z., 2001. Human caliciviruses. In: Knipe, D.M., Howley, P.M. (Eds.), Field Virology. Lippincott Williams and Wilkins, Philadelphia, PA, pp. 841–874. Han, M.G., Smiley, J.R., Thomas, C., Saif, L.J., 2004. Genetic recombination between two genotypes of genogroup III bovine noroviruses (BoNVs) and capsid sequence diversity among BoNVs and Nebraska-like bovine enteric caliciviruses. J. Clin. Microbiol. 42, 5214–5224. Kamel, A.H., Ali, M.A., El-Nady, H.G., de Rougemont, A., Pothier, P., Belliot, G., 2009. Predominance and circulation of enteric viruses in the region of greater Cairo, Egypt. J. Clin. Microbiol. 47, 1037–1045. Katayama, K., Horikoshi-Shirato, H., Kojima, S., Kageyama, T., Oka, T., Hoshino, F., Fukushi, S., Shinohara, M., Uchida, K., Suzuki, Y., Gojobori, T., Takeda, N., 2002. Phylogenetic analysis of the complete genome of 18 Norwalk-like viruses. Virology 299, 225–239. Kosakovsky Pond, S.L., Frost, S.D.W., Muse, S.V., 2005. HyPhy: hypothesis testing using phylogenies. Bioinformatics 21, 676–679. Kroneman, A., Vennema, H., van Duijnhoven, Y., Duizer, E., Koopmans, M., 2004. High number of norovirus outbreaks associated with a GGII.4 variant in The Netherlands and elsewhere: does this herald a worldwide increase? Eurosurveill. Wkly. 8, 51–52. Kustera, J., Ayers, M., Nishikawa, J., McIntyre, L., Petric, M., Tellier, R., 2008. Amplification of long RT-PCR of near full-length norovirus genomes. J. Virol. Methods 149, 226–230. Lee, Y.F., Nomoto, A., Detjen, B.M., Wimmer, E., 1977. A protein covalently linked to poliovirus genome RNA. Proc. Natl. Acad. Sci. U.S.A. 74, 59–63.
Lindell, A.T., Grillner, L., Svensson, L., Wirgart, B.Z., 2005. Molecular epidemiology of norovirus infections in Stockholm, Sweden, during the years 2000 to 2003: Association of the GGIIb genetic cluster with infection in children. J. Clin. Microbiol. 43, 1086–1092. Lindesmith, L.C., Donaldson, E.F., LoBue, A.D., Cannon, J.L., Zheng, D.-P., Vinje´, J., Baric, R.S., 2008. Mechanism of GII.4 norovirus persistence in human populations. PloS Med. 5, e31. Lochridge, V.P., Jutila, K.I., Graff, J.W., Hardy, M.E., 2005. Epitopes in the P2 domain of norovirus VP1 recognized by monoclonal antibodies that block cell interactions. J. Gen. Virol. 86, 2799–2806. Lole, K.S., Bollinger, R.C., Paranjape, R.S., Gadkari, D., Kulkarni, S.S., Novak, N.G., Ingersoll, R., Sheppard, H.W., Ray, S.C., 1999. Full-length human immunodeficiency virus type 1 genomes from subtype C-infected seroconverters in India, with evidence of intersubtype recombination. J. Virol. 73, 152–160. Nayak, M.K., Balasubramanian, G., Sahoo, G.C., Bhattacharya, R., Vinje, J., Kobayash, N., Sarkar, M.C., Bhattacharya, M.K., Krishnan, T., 2008. Detection of a novel intergenogroup recombinant norovirus from Kolkata, India. Virology 377, 117– 123. Nayak, M.K., Chatterjee, D., Nataraju, S.M., Pativada, M., Mitra, U., Chatterjee, M.K., Saha, T.K., Sarkar, U., Krishnan, T., 2009. A new variant of norovirus GII.4/2007 and inter-genotype recombinant strains of NVGII causing acute watery diarrhoea among children in Kolkata, India. J. Clin. Virol. 45, 223–229. Nilssson, M.K., Hedlund, O., Thorhagen, M., Larson, G., Johansen, K., Ekspong, A., Svensson, L., 2003. Evolution of human calicivirus RNA in vivo: accumulation of mutations in the protruding P2 domain of the capsid leads to structural changes and possibly a new phenotype. J. Virol. 77, 13117–13124. Noel, J.S., Ando, T., Leite, J.P., Green, K.Y., Dingle, K.E., Estes, M.K., Seto, Y., Monroe, S.S., Glass, R.I., 1997. Correlation of patient immune responses with genetically characterized small round-structured viruses involved in outbreaks of nonbacterial acute gastroenteritis in the United States, 1990 to 1995. J. Med. Virol. 53, 372–383. Noel, J.S., Fankhauser, R.L., Ando, T., Monroe, S.S., Glass, R.I., 1999. Identification of a distinct common strain of ‘‘Norwalk-like viruses’’ having global distribution. J. Infect. Dis. 179, 1334–1378. Reuter, G., Vennema, H., Koopmans, M., Sz?cs, G., 2006. Epidemic spread of recombinant noroviruses with four capsid types in Hungary. J. Clin. Virol. 35, 84–88. Rohayem, J., Munch, J., Rethwilm, A., 2005. Evidence of recombination in the norovirus capsid gene. J. Virol. 79, 4977–4990. Rueckert, R.R., Wimmer, E., 1984. Systematic nomenclature of picornavirus proteins. J. Virol. 50, 957–959. Pal, C., Papp, B., Lercher, M.J., 2006. An integrated view of protein evolution. Nat. Rev. Genet. 7, 337–348. Phan, T.G., Kuroiwa, T., Kaneshi, K., Ueda, Y., Nakaya, S., Nishimura, S., Yamamoto, A., Sugita, K., Nishimura, T., Yagyu, F., Okitsu, S., Mu¨ller, W.E., Maneekarn, N., Ushijima, H., 2006a. Changing distribution of norovirus genotypes and genetic analysis of recombinant GIIb among infants and children with diarrhea in Japan. J. Med. Virol. 78, 971–978. Phan, T.G., Takanashi, S., Kaneshi, K., Ueda, Y., Nakaya, S., Nishimura, S., Sugita, K., Nishimura, T., Yamamoto, A., Yagyu, F., Okitsu, S., Maneekarn, N., Ushijima, H., 2006b. Detection and genetic characterization of norovirus strains circulating among infants and children with acute gastroenteritis in Japan during 2004– 2005. Clin. Lab. 52, 519–525. Phan, T.G., Kaneshi, K., Ueda, Y., Nakaya, S., Nishimura, S., Yamamoto, A., Sugita, K., Takanashi, S., Okitsu, S., Ushijima, H., 2007. Genetic heterogeneity, evolution and recombination in noroviruses. J. Med. Virol. 79, 1388–1400. Ruuska, T., Vesikari, T., 1990. Rotavirus disease in Finnish Children: Use of numerical scores for clinical severity of diarrhoeal episodes. Scand. J. Infect. Dis. 22, 259–267. Salminen, M.O., Carr, J.K., Burke, D.S., McCutchan, F.E., 1995. Identification of breakpoints in intergenotypic recombinants of HIV type 1 by bootscanning. AIDS Res. Hum. Retrov. 11, 1423–1425. Siebenga, J.J., Vennema, H., Renckens, B., de Bruin, E., ven der Veer, B., Siezen, R.J., Koopmans, M., 2007. Epochal evolution of GII.4 norovirus capsid proteins from 1995 to 2006. J. Virol. 81, 9932–9941. Smith, J.M., 1992. Analyzing the mosaic structure of genes. J. Mol. Evol. 34, 126–129. Tamura, K., Dudley, J., Nei, M., Kumar, S., 2007. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol. Biol. Evol. 24, 1596– 1599. Tan, M., Huang, P., Meller, J., Zhong, W., Farkas, T., Jiang, X., 2003. Mutations within the P2 domain of norovirus capsid affect binding to human histo-blood group antigens: evidence for a binding pocket. J. Virol. 77, 12562–12571. Thackray, L.B., Wobus, C.E., Chachu, K.A., Liu, B., Alegre, E.R., Henderson, K.S., Kelley, S.T., Virgin, H.W., 2007. Murine norovirus comprising a single genogroup exhibit biological diversity despite limited sequence divergence. J. Virol. 81, 10460–10473. Thompson, J.D., Higgins, D.G., Gibson, T.J., 1994. Clustal W: improving the sensitivity of progressive multiple sequence alignment through sequencing weighting, position-specific gap penalties and weight matrix choice. Nucleic Acid Res. 22, 4673–4680. Tu, E.T., Bull, R.A., Greening, G.E., Hewitt, J., Lyon, M.J., Marshall, J.A., McIver, C.J., Rawlinson, W.D., White, P.A., 2008. Epidemics of gastroenteritis during 2006 were associated with the spread of norovirus GII.4 variants 2006a and 2006b. Clin. Infect. Dis. 46, 413–420. Waters, A., Coughlan, S., Hall, W.W., 2007. Characterization of a novel recombination event in the norovirus polymerase gene. Virology 363, 11–14.
P. Chhabra et al. / Infection, Genetics and Evolution 10 (2010) 1101–1109 Yang, Z., Swanson, W.J., Vacquier, V.D., 2000. Maximum-likelihood analysis of molecular adaptation in abalone sperm lysin reveals variable selective pressures among lineages and sites. Mol. Biol. Evol. 17, 1446–1455. Yuen, L.K.W., Catton, M.G., Cox, J.B., Wright, P.J., Marshall, J.A., 2001. Heminested multiplex reverse transcription-PCR for detection and differentiation of
1109
Norwalk-like virus genogroups 1 and 2 in fecal samples. J. Clin. Microbiol. 39, 2690–2694. Zheng, D.-P., Ando, T., Fankhauser, R.L., Beard, R.S., Glass, R.I., Monroe, S.S., 2006. Norovirus classification and proposed strain nomenclature. Virology 346, 312–323.