Identification of Indian sub-continent as hotspot for HCV genotype 3a origin by Bayesian evolutionary reconstruction

Identification of Indian sub-continent as hotspot for HCV genotype 3a origin by Bayesian evolutionary reconstruction

Infection, Genetics and Evolution 28 (2014) 87–94 Contents lists available at ScienceDirect Infection, Genetics and Evolution journal homepage: www...

2MB Sizes 0 Downloads 40 Views

Infection, Genetics and Evolution 28 (2014) 87–94

Contents lists available at ScienceDirect

Infection, Genetics and Evolution journal homepage: www.elsevier.com/locate/meegid

Identification of Indian sub-continent as hotspot for HCV genotype 3a origin by Bayesian evolutionary reconstruction q Manish Chandra Choudhary a,b, Vidhya Natarajan a, Priyanka Pandey c, Ekta Gupta c, Shvetank Sharma a, Rachana Tripathi d, M. Shesheer Kumar d, Syed N. Kazim b, Shiv K. Sarin a,⇑ a

Department of Research, Institute of Liver & Biliary Sciences (ILBS), Vasant Kunj, New Delhi 110070, India Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi 110025, India Department of Virology, Institute of Liver & Biliary Sciences (ILBS), Vasant Kunj, New Delhi 110070, India d Molecular Diagnostics, RAS Lifesciences Pvt. Ltd., Nacharam, Hyderabad, Andhra Pradesh 500076, India b c

a r t i c l e

i n f o

Article history: Received 3 March 2014 Received in revised form 4 September 2014 Accepted 6 September 2014 Available online 16 September 2014 Keywords: HCV Genotype 3a India Molecular evolution Bayesian skyline Hepatitis C

a b s t r a c t Background: Recent emphasis in Hepatitis C virus (HCV) evolutionary biology has focused on analysis using Core, E1/E2 and/or NS5b regions, with limited appreciation of full length genome. While HCV genotypes have been described as endemic in the Indian subcontinent, there has been no confirmation at the molecular evolutionary level of these genotypes. We have attempted here to determine the status of Indian HCV genotype 3a sequences in relation to similar genotypes from other parts of the world. Methods: Cloning, sequencing and molecular characterization was performed on 9 Indian sequences and comparative analyses were performed with 46 sequences from other countries. Evolutionary-rate and molecular-clock hypothesis testing was addressed by Bayesian MCMC. Results: Genetic analysis of full length genome revealed two hypervariable regions (HVR) in E2 region – HVR496 and HVR576, with a variable 5–8 amino-acid insertion sequence and a putative N-glycosylation site. Phylogenetic analysis revealed a divergence resulting in 2 distinct clades: clade-1 represented by HCV 3a subtype and clade-2 represented by other 3 subtype genomes. Clade-2 shows earlier divergence than clade-1. Analysis revealed that genotype 3a genomes from India roots out first (99 years ago) in clade1. Bayesian skyline plot analysis revealed an increase in effective number of infections from 1940s to 1990s, followed by a gradual decrease after 2000. Conclusions: Genotype 3a sequences appear to have originated in India and later dispersed to United Kingdom around mid 1940s, most likely around the time of Indian independence and World War II. Ó 2014 Elsevier B.V. All rights reserved.

1. Introduction Hepatitis C virus (HCV) infection is a global health problem and around 170–180 million people carry infection worldwide (Global Burden of Hepatitis C Working Group, 2004). It is a major cause of chronic liver disease, ultimately leading to liver cirrhosis and hepatocellular carcinoma, which signifies its importance in public health (de Oliveria Andrade, 2009). HCV is a positive-sense singlestrand RNA virus with an approximate size of 9.6 kb. It belongs to genus Hepacivirus in the family Flaviviridae and encodes an error q Financial support: Department of Biotechnology vide Grant No. BT/BIPP/03/07/ 0322/10 and BT/PR7404/MED/29/658/2012, India. Data presented anywhere: No. ⇑ Corresponding author at: Department of Hepatology, Institute of Liver & Biliary Sciences (ILBS), New Delhi 110070, India. Tel.: +91 11 46300000x6007; fax: +91 11 26123504. E-mail address: [email protected] (S.K. Sarin).

http://dx.doi.org/10.1016/j.meegid.2014.09.009 1567-1348/Ó 2014 Elsevier B.V. All rights reserved.

prone RNA-dependent RNA-polymerase which lacks proof-reading activity thereby, generating high genetic diversity (Shepard et al., 2005). Based on its genetic relatedness, HCV has been classified into seven genotypes and a number of subtypes. Among these, genotypes 1–3 have a worldwide distribution with subtype 1a being the most common genotypes in the United States and Europe (Takada et al., 1993). Subtype 1b predominates in Japan and subtypes 2a and 2b are relatively common in North America, Europe, and Japan whereas subtype 2c is common in northern Italy (Takada et al., 1993). Among genotype 3, subtypes 3a and 3b have worldwide distribution (Sievert et al., 2011) while other subtypes are mainly distributed in and around South Asia indicating their possible long-term circulation in the region (Narahari et al., 2009; Kumar et al., 2007; Chakravarti et al., 2013). Also, HCV genotype 3a is the most common genotype circulating in Indian subcontinent including northern part of India (Sood et al., 2012; Singh et al., 2004; Sievert et al., 2011) and Pakistan (Idrees and

88

M.C. Choudhary et al. / Infection, Genetics and Evolution 28 (2014) 87–94

Riazuddin, 2008). HCV genotype 4 is prevalent in North as well as central Africa and the Middle East (Iles et al., 2013). Genotype 5a is commonly found in the northern parts of South Africa (Murphy et al., 1996), central France (Henquell et al., 2004) and Belgium and is also sporadically found in other regions of Europe and Latin America (Chamberlain et al., 1997; Davidson et al., 1995; Jover et al., 2001; Levi et al., 2002), whereas genotype 6 so far is known to be confined to Hong Kong and few other South-Asian countries (Mondelli and Silini, 1999; Centers for Disease Control and Prevention, 1998). Genotype 7, the latest entrant in the family is said to have its origin in Central Africa (Smith et al., 2014). Since HCV genotypes and subtypes respond differentially to antiviral therapy, it is important to have prior information regarding the same while determining the type and duration of therapy (Pang et al., 2009). Recently, Pang et al. determined the correlation between response to therapy to that of evolutionary branching order of major HCV genotypes (Pang et al., 2009). Although, HCV genotype 3a responds well to standard treatment regimen of Peg-interferon alfa-2a, it is also associated with significant hepatic steatosis and fibrosis (Singh et al., 2010; Hissar et al., 2009, 2006). Recent trials on HCV genotype 3a with Telaprevir and Boceprevir have reported negligible to modest decrease in RNA level (Foster et al., 2011; Silva et al., 2013; Gottwein et al., 2011). Also, Daclatasvir & Sofosbuvir trials have reported lower SVR rate in genotype 3a compared to genotype 2 (Scheel et al., 2011). In addition, HCV genotype 3a is highly prevalent among intravenous drug users and is showing increased incidence in various parts of the world (Samimi-Rad et al., 2012; Paintsil et al., 2009; Aitken et al., 2004). Recent efforts to analyze HCV infection rates based on genotype/phenotype suggest that genotype 3a infections comprise almost 60–70% of HCV related infections in India (Sievert et al., 2011; Narahari et al., 2009). Recently multiple investigations across the globe have attempted to trace the evolutionary history of various HCV genotypes (Verbeeck et al., 2006; Pybus et al., 2007; Gededzha et al., 2013). Nearly all of these studies employed sub-genome regions (Core, E1E2, NS5a/5b) analysis to trace the evolutionary history. The evolutionary trace of HCV in context of its whole genome is not well-investigated. In light of this information, we performed molecular characterization of evolutionary history of HCV genotype 3a using smaller genomic fragments, i.e., E2P7NS2 and NS5B and the results were compared with full genome dataset. 2. Materials and methods 2.1. Patient selection The study protocol was approved by the Institutional Review Board at the Institute of Liver and Biliary Sciences (ILBS), New Delhi, India and patients were enrolled after their informed consent. Only treatment naïve patients with HCV genotype 3a (n = 9) were included and patients with co-infection with either hepatitis B or HIV were excluded. Liver biopsy was performed and staging was done according to the Ishak’s scoring system. 2.2. Viral RNA extraction, sequencing data analysis and N-glycosylation site prediction HCV RNA was extracted from 140 lL serum using the QIAamp viral RNA Mini Kit (QIAGEN GmBH, Germany) and cDNA synthesized as per manufacturer’s protocol. 3–5 lL of cDNA was used for PCR amplification of the full genome HCV using Phusion Hot Start II DNA polymerase (Thermo Fisher Scientific Inc., Waltham, MA). The PCR products were further cloned using Clonejet PCR cloning kit (Thermo Fisher Scientific Inc., Waltham, MA), and 3–5

colonies were picked for bidirectional sequencing using vector sequencing primers and additional internal primers (Table 2). Sequence alignments were visualized, compared, and edited using ClustalX version in BioEdit program v7.2.0. The HCV genotype and subtype were confirmed by BLAST (NCBI, USA). N-glycosylation site prediction was performed by N-glycosylation tool (http://hcv.lanl.gov). 2.3. Compilation of HCV dataset For phylogenetic analysis dataset was prepared consisting of 53 sequences; HCV genotype 3 full-length or near to full-length genomes retrieved from NCBI (n = 46) along with full genomes reported in this paper (n = 9). Sequences with known year and country of isolation were only selected for analysis. Recombinant sequences or sequences isolated from a single outbreak or same cohort were excluded. E2p7NS2 and NS5b sequences from the above datasets were additionally compared. 2.4. Determination of nucleotide substitution model and likelihood analysis The most appropriate nucleotide substitution model for the phylogenetic analyses was determined by jMODELTEST v2.1.4 and likelihood scores were determined using hierarchical likelihood ratio tests (hLRT) and the Akaike Information Criterion (AIC) Sanchez et al. (2011). Likelihood-mapping analysis was performed on quartets using TreePuzzle to obtain an overall impression of the phylogenetic signal present in the HCV genotype 3 full genome sequences (Schmidt et al., 2002). For each analysis all 10,000 possible quartets for each of the full genome sequences were evaluated. 2.5. Time-based phylogeny and evolutionary reconstruction by Bayesian MCMC Time-measured phylogeny and evolutionary rates were inferred using a Bayesian Markov Chain Monte Carlo (MCMC) method implemented in the BEAST package version 1.7.5 (http://beast. bio.ed.ac.uk/) (Drummond and Rambaut, 2007). MCMC analysis was run for a chain length of 50 million for 4 independent cycles and sampling was done every 1000 steps. Adequate chain convergence was assessed using Tracer software version 1.5 (beast.bio. ed.ac.uk). The effective sample size (ESS) was calculated for each parameter, and all ESS values > 200 were considered sufficient for sampling and chain mixing. The maximum clade credibility tree was selected from the posterior tree distribution after considering 10% burn-in using Tree Annotator v1.7.5. The final tree was analyzed and modified in FigTree version 1.4.0 (tree.bio.ed.ac.uk) for display purposes. 2.6. Bayesian coalescent analysis Bayesian skyline plot (BSP) analysis was carried out using all possible combinations of the molecular clock models; relaxed uncorrelated lognormal, relaxed uncorrelated exponential and strict models using the coalescent analysis (non-parametric piecewise-constant model) with an external rate substitution rate of 1.3  103 per site per year (Gray et al., 2011). E2p7NS2 dataset was also analyzed using the same external rate while NS5b dataset was analyzed using rate obtained from E2p7NS2 analysis. 2.7. GenBank accession numbers The GenBank accession numbers of the HCV genotype 3a sequences isolated from Indian patients reported in this paper

89

M.C. Choudhary et al. / Infection, Genetics and Evolution 28 (2014) 87–94

are HQ738645, JN714194 and JQ717254–JQ717260. Details of all other sequences used in this study are available upon request. 3. Results 3.1. Patient characteristics Detailed characteristics of the nine patients included in the study are summarized in Table 1. Baseline plasma samples were collected from HCV genotype 3a infected patients before the start of the treatment. All patients underwent liver biopsy, with mild liver disease defined as Ishak’s fibrosis stage 0–1, severe liver disease considered as Ishak’s fibrosis stages 5–6. AST and ALT levels were raised while AFP level was normal in all the 9 patients studied. 3.2. Sequence diversity in HCV GT3a vis-à-vis other genotype and subtypes The sequences reported here have been submitted in the Genbank database with accession numbers HQ738645, JN714194 and JQ717254–JQ717260. The average nucleotide composition of nine HCV genotype 3a sequences reported here calculated by MEGA 5.2.1 was 21.1% A, 23.1% T, 27.6% G, 28.3% C. Full genome amino-acid sequence alignment of HCV genotype 3a vis-à-vis other genotype and subtypes by ClustalX version2.0 revealed two additional regions of hypervariability named as HVR496 and HVR576 respectively based on its amino-acid position in the full genome. HVR496 spans amino acids 496 to 502 and is seven amino acids long while HVR576 represents amino acids 576 to 584 and is nine amino acids long with a unique 5–6 amino-acid insertion site (Fig. 1). Interestingly, it was found that

the number of amino-acids which were inserted at this site were variable as revealed by multiple alignment of available HCV genotype 3a sequences present in the database. Next, N-linked glycosylation site prediction was performed using N-glycosylation tool represented by Nx[ST] patterns, where x can be any amino acid. It was found that within HVR576 lies a putative N-glycosylation site which was found to be highly conserved in HCV genotype 3a, as is evident from the alignment (Fig. 1). 3.3. Best-fit nucleotide substitution model and phylogenetic signal analysis Best-fit nucleotide substitution model was determined by jMODELTEST by comparing the likelihood scores from full-length genome and using the hLRTs and the AIC. The General TimeReversible (GTR) substitution model along with incorporated invariant sites (I) and assumed rate heterogeneity (C) across sites returned as the best-fit model and was subsequently employed. The likelihood mapping analysis of the HCV genotype 3 data set revealed that the percentage of dots falling in the central region accounted for 0.1% of total samples, samples falling on the axes accounted for 0.4% of total samples (0.1% + 0.2% + 0.1%) while samples falling in the corners of the triangle accounted for 99.5% (32.9% + 32.8% + 33.8%) of the total samples, thus indicating a fully resolved phylogenetic tree (Fig. 2). 3.4. Evolutionary rate estimate and phylogeographic tree analysis of HCV 3a The mean evolutionary rate of the whole genome sequences of HCV genotype 3 was estimated based on the alignment of dated

Table 1 Demographic profile of subjects enrolled in the study. Genbank accession No.

Age (years)

Sex

HCV genotype

HCV RNA (IU/mL)

Ishak’s fibrosis score

AST (U/L)

ALT (U/L)

AFP (ng/mL)

HQ738645 JN714194 JQ717254 JQ717255 JQ717256 JQ717257 JQ717258 JQ717259 JQ717260

52 43 35 39 46 50 41 38 47

M F F M M M F M F

3a 3a 3a 3a 3a 3a 3a 3a 3a

1.69  106 7.53  106 8.27  104 4.19  105 6.67  106 9.01  104 5.87  105 1.32  106 5.93  105

0.1 2 2 0.1 5.6 1.2 5.6 5.6 0.1

117 219 133 97 69 45 214 139 151

74 73 133 204 79 87 77 73 81

8.62 5.1 4.7 4.1 7.2 6.9 4.2 3.8 6.1

AST – aspartate transaminase, ALT – alanine transaminase, AFP – alpha-fetoprotein. All patients were infected with HCV genotype 3a, were having high viral load with presence of liver disease (indicated by fibrosis score) having raised AST and ALT and with normal AFP values. All values are indicated in the units as mentioned in the table.

Table 2 List of primers used for amplifying HCV genotype 3a full genome. S. No.

Product No.

Primer name

Position in whole genome

Primer sequence

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1

UF UR1 CF2 NS1R2 NS2F3 NS3R3 NS4aF4 NS4bR4 NS5aF5 NS5aR5 NS5b1F6 NS5b1R6 NS5B2F6 NS5B2

1–25 339–315 339–363 2566–2545 2545–2566 5328–5307 5329–5348 6273–6250 6239–6264 7629–7608 7600–8585 8585–8561 8388–8413 9425–8098

ACCTGCCTCTTACGAGGCGACACTC GTTGCACGGTCTACGAGACCTCCCG ATGAGCACACTTCCTAAACCCCAAA GCCAAAGAGCAACGCACACGCG CGCGTGTGCGTTGCTCTTTGGC GGTGGTTACTTCCAGATCAGCT AGCACCTGGGTGTTGCTTGG ACAAGGGCTTGGGTAGTCTTCATT ACCAGTGGATCAATGAAGACTACCCA GCAGCAGACCACGCTCTGCTCC GACAGCGAGGAGCAGAGCGTGGTCT TCATCTCCGCAGACAAGAAAGTCCG CTCCTCCCTCACGGAGCGGCTTTACT TGGAGTGTTATCCTACCAGCTCACC

2 3 4 5 6 7

90

M.C. Choudhary et al. / Infection, Genetics and Evolution 28 (2014) 87–94 HVR496

HVR576

-

Indian HCV 3a sequence

Unique inser onal domain W ith puta ve N glycosyla on site

Reference sequence

Reference sequence

Variable region in HVR496 as observed in HCV genotype 3a Fig. 1. Amino acid sequence alignment showing HVR496 & HVR576 using full genome sequence of different genotypes. Amino acid alignment between different genotypes and subtypes along with Indian HCV 3a sequence (highlighted in yellow box) is shown for the E2 region containing HVR496 & HVR576. Amino-acid variability in HVR496 in HCV genotype 3a sequences can be ascertained by the presence of different amino-acid between amino-acid positions 496–502 and encircled in red box. Further, 5–6 aminoacid insertion is shown with a red box encompassing HVR576 region with N-glycosylation site encircled in blue. Consensus sequence is shown at the extreme bottom with the height of alphabets indicating percentage conservation of amino-acid at that particular position. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

A

B

C 32.9%

%

34.0%

0.1

0.1 %

33.0%

0.1%

33.0% 33.8%

0.2%

32.8%

Fig. 2. Likelihood mapping of HCV sequences. (A) Represents the distribution patterns of the sequences in the dataset. (B) The occupancies (in percent) for the three areas of attraction are depicted in the form of equilateral triangle. (C) The same equilateral triangle is further divided and shown for the seven areas of attraction with values of the sides included. Data interpretation in view of the values in each box is described in results section of this paper. Each likelihood is represented by a dot which means three possible unrooted trees for a set of four sequences (quartets) selected randomly from the data set: dots close to the corners or the sides represent, respectively, tree-like, or network-like phylogenetic signal in the data. The central area of the likelihood map represents star-like signal.

sequences using an external substitution rate of 1.3  103 per site per year using a relaxed uncorrelated lognormal which was the best molecular clock obtained when compared with other molecular clocks (data not shown). Under this condition, the estimate was 1.07  103 (95% HPD 6.79  104–1.48  103) substitutions/site/ year. Tree branched with a high posterior probability of 0.99–1.0, indicative of fully resolved branched tree with a greater degree of confidence. The approximated age of the entire tree calculated based on genotype 3 full length genomes was estimated to be about 366 years (95% HPD; 226.64–528.05) (Fig. 3). Phylogenetic analysis comprising of Indian HCV genotype 3a and reference HCV genotype 3 full-genome by Bayesian MCMC algorithm revealed a bifurcation in divergence with 2 distinct clades: clade1 represented by genotype 3a genomes, and clade-2 represented by other HCV genotype 3 sequences. While clade-2 appears to have diverged earlier around 366 years ago; among clade-1, HCV genotype 3a genomes from India rooted out earlier. Phylogenetic analysis using relaxed molecular clock approach revealed that most probable origin of HCV genotype 3a was in India about 99 years ago (95% HPD; 63.23–137.05) among clade-2, sequences from United Kingdom branched very closely together with Indian sequences (Fig. 3) with sequences branching out as early as 85 years ago. Phylogenetic reconstruction using E2p7NS2 yielded similar results (Fig. 4a and b) with Indian sequences rooting out earlier about 105 years ago (95% HPD; 66.73–153.50) as observed in phylogenetic tree with full genome dataset with an estimated substitution rate of 1.17  103 (95% HPD; 6.88  104–1.47  103) substitutions/site/year, while in case of NS5b dataset, earliest sequence of HCV genotype 3a branched about 126 years ago (95% HPD;

91

M.C. Choudhary et al. / Infection, Genetics and Evolution 28 (2014) 87–94

Clade 2

1 0.99

1 0.61 1 0.92 0.97 1

1

1

Clade 1

1

1

0.99

0.65

1

1

0.75 1 0.79 1

0.75

0.78

1

1

1

0.99

1 1

0.63 0.99 1

0.99 0.88 0.98

1

0.99 1 0.99 0.58 0.79 1 1

1

1

0.57 0.57 0.55

1 1 0.95

1

D49374.3b94 JQ065709.3b11 JX227954.3g12 JF735123.3g13 JX227969.3i12 JF735125.3i13 FJ407092.3i.02 JX227956.3i12 JX227955.3g12 JQ717258.3a11 JQ717256.3a06 HQ738645.3a GQ275355.3a03 JQ717260.3a08 JQ717259.3a10 JQ717257.3a11 JQ717255.3a10 1 JN714194.3a11 GQ356213.3a06 JF509176.3a06 GQ356202.3a06 GU294484.3a08 D28917.3a94 AY956467.3a02 HQ639942.3a10 HQ639941.3a10 GQ356216.3a06 D17763.3a93 HQ912953.3a08 GU814263.3a09 JQ717254.3a10 DQ430819.3a00 1 DQ430820.3a02 X76918.3a93 AF046866.3a06 JF509177.3a06 GQ356203.3a06 GQ356210.3a06 AB691596.3a11 1 AB691595.3a11 GQ356212.3a07 GQ356205.3a06 GQ356214.3a06 GQ356200.3a06 GQ356211.3a06 DQ437509.3a06 GQ356206.3a05 GQ356204.3a06 GQ356209.3a06 GQ356207.3a06 GQ356215.3a06 GQ356208.3a06 GQ356201.3a08 GQ356217.3a06 JF509175.3a06

JAPAN CHINA ENGLAND CANADA CANADA CANADA INDIA ENGLAND ENGLAND INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA ENGLAND ENGLAND ENGLAND PAKISTAN JAPAN USA CHINA CHINA ENGLAND NEW ZEALAND CHINA ITALY INDIA USA USA DENMARK AUSTRALIA ENGLAND ENGLAND ENGLAND JAPAN JAPAN ENGLAND ENGLAND ENGLAND ENGLAND ENGLAND ENGLAND ENGLAND ENGLAND ENGLAND ENGLAND ENGLAND ENGLAND ENGLAND ENGLAND ENGLAND

40.0 350.0

300.0

250.0

200.0

150.0

100.0

50.0

0.0

Fig. 3. Bayesian based phylogeographical tree of 55 HCV GT3 sequences. Phylogenetic tree is constructed using BEAST software using GTR + C + I model from 55 full genome sequences comprising all HCV genotype 3 sequences with final tree edited in FigTree v1.4.0. Each sample is denoted by its accession number followed by genotype and subtype details followed by country of isolation. Clades are represented by red encircled box. Nodes indicate posterior probabilities. The scale at the bottom of the tree represents the time in years with 0 indicating present year. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

62.24–214.68) with a rate of 6.23  104 (95% HPD; 1.56  104–1.08  103) substitutions/site/year. 3.5. Evolutionary demography of entire HCV-3 dataset The Bayesian skyline-plot analysis using entire HCV genotype 3 dataset revealed that the number of infections remained relatively constant until the early 1940s, after which there was one log increase in effective number of infections till early 1990s (pre-antiviral introduction period). 4. Discussion Full length genome molecular characterization revealed that apart from commonly reported HVR1 located at 50 end of E2 protein, HCV genotype 3a also contained 2 additional regions of hypervariability termed as HVR496 and HVR576, also reported by Humphreys et al. (2009). HCV envelope proteins have a unique property of undergoing glycosylation (Helle et al., 2011, 2010). The E2 protein of HCV genotype 3a contain eight potential N-glycosylation sites (Helle et al., 2011). Sequence comparison analysis by ClustalX method revealed that HVR576 contained an additional putative glycosylation site. Various studies have indicated the role of N-linked glycosylation in virion assembly and infectivity (Goffard et al., 2007). Further, it has also been reported that mutation in these glycosylation sites leads to envelope protein instability and virion assembly defects (Helle et al., 2010). These results indicate that N-glycosylation sites are critically important for E1E2 folding and heterodimerization and thus for virion assembly.

Analysis of more datasets may be required to ascertain the functional relevance of putative N-glycosylation site found in HVR576 in HCV genotype 3a viral sequences. HCV epidemiology at present is explained by isolated studies and data from blood bank (Chowdhury et al., 2003; Kumar et al., 2007). Among different HCV genotypes, genotype 3a is most common genotype circulating in Indian subcontinent including northern part of India (Sood et al., 2012; Singh et al., 2004; Sievert et al., 2011; Thakur et al., 2000) as well in Pakistan (Idrees and Riazuddin, 2008), Nepal (Tokita et al., 1994) and in United Kingdom. Narahari et al. (2009) performed HCV genotyping in 2118 patients from different geographic regions of India and reported that HCV3 (3a/3b) was prevalent in 62% and HCV1 (1a/1b) in 31% of the patients, with predominance of HCV3 in northern (p = 0.01) and eastern (p = 0.008) regions of India (Narahari et al., 2009). A recent study by Chakravarti et al. (2013) has highlighted the changing trends of prevalence and genotypic distribution of hepatitis C virus among high risk groups in North India. The authors report that genotype 3 is the predominant genotype, though the subtype distribution within genotype 3 may be changing and is well supported by previous studies (Tokita et al., 1994; Valliammai et al., 1995). The presence of high degree of genetic variability (as reported in this manuscript and discussed below) with the coexistence of different subtypes of 3 in the northern part of the India suggests that this is the most probable site of origin of HCV genotype 3, also discussed by Zehender et al. (2013). Analyses from various countries have demonstrated the origin of HCV to be in and around Indian-subcontinent (Pakistan). In an attempt to investigate this fact the present study was carried out considering the fact that India and Pakistan share a fairly large

92

M.C. Choudhary et al. / Infection, Genetics and Evolution 28 (2014) 87–94 D49374_3b JQ065709_3b JX227954_3g JF735123_3g JX227955_3i JF735125_3i FJ407092_3i JX227956_3i JX227969_3i JQ717259_3a JQ717257_3a JQ717255_3a 1 JN714194_3a GQ275355_3a HQ738645_3a JQ717260_3a JQ717256_3a JQ717258_3a GQ356213_3a GQ356202_3a JF509176_3a GU294484_3a HQ639942_3a HQ639941_3a D28917_3a AY956467_3a JQ717254_3a AF046866.1_3a GQ356203_3a JF509177_3a X76918_3a HQ912953_3a D17763_3a GQ356216_3a GU814263_3a DQ430819_3a DQ430820_3a GQ356207_3a GQ356209_3a DQ437509_3a GQ356206_3a GQ356204_3a GQ356201_3a GQ356215_3a GQ356208_3a AB691596_3a 1 AB691595_3a GQ356214_3a GQ356211_3a GQ356200_3a GQ356205_3a GQ356210_3a GQ356212_3a JF509175_3a GQ356217_3a

1

A

0.85 1

0.8

1 0.35 0.24 0.38 1 1

0.98 1

1

1 1 0.72

0.82

1 0.96 1 0.9 0.541 0.24 0.36 0.43 1 1

1

0.38 0.89 0.44 0.52 0.45

1

1

0.65 0.69 0.93

1

0.38 0.71 0.98 0.41

0.88 0.57 0.82 0.19 0.14 0.69 0.71

1

JAPAN CHINA ENGLAND CANADA ENGLAND CANADA INDIA ENGLAND CANADA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA ENGLAND ENGLAND ENGLAND PAKISTAN CHINA CHINA JAPAN USA INDIA AUSTRALIA ENGLAND ENGLAND DENMARK CHINA NEW ZEALAND ENGLAND ITALY USA USA ENGLAND ENGLAND SWITZERLAND ENGLAND ENGLAND ENGLAND ENGLAND ENGLAND JAPAN JAPAN ENGLAND ENGLAND ENGLAND ENGLAND ENGLAND ENGLAND ENGLAND ENGLAND

0.0 350.0

300.0

250.0

B

200.0

150.0

100.0

50.0

0.0

1 0.65 1

1

1

0.62 0.84 1

1 1 0.66 0.69

1

0.86

1

0.66

1

1 1

0.77

1

1 0.89 1 1 1

0.97 0.47 0.53 0.59 0.88 0.18 0.53 0.52

0.99

0.28 0.79

1

0.33

1

0.47 0.63

1

0.25 0.81

0.5

0.05 0.15 0.18 0.13 0.58 0.85

1

JQ065709_3b D49374_3b JF735123_3g JX227954_3g JX227969_3i FJ407092_3i JF735125_3i JX227956_3i JX227955_3i GU294484_3a GQ356202_3a JF509176_3a GQ356213_3a JQ717258_3a JQ717256_3a HQ738645_3a GQ275355_3a JQ717260_3a JQ717259_3a JQ717257_3a JQ717255_3a JN714194_3a D28917_3a AY956467_3a HQ639941_3a HQ639942_3a GQ356212_3a GQ356216_3a GU814263_3a D17763_3a HQ912953_3a X76918_3a GQ356203_3a JF509177_3a JQ717254_3a AF046866_3a DQ430820_3a DQ430819_3a GQ356210_3a AB691595_3a 1 AB691596_3a GQ356205_3a DQ437509_3a GQ356206_3a GQ356204_3a GQ356214_3a GQ356200_3a GQ356211_3a GQ356207_3a GQ356201_3a GQ356209_3a JF509175_3a GQ356217_3a GQ356208_3a GQ356215_3a

CHINA JAPAN CANADA ENGLAND CANADA INDIA CANADA ENGLAND ENGLAND PAKISTAN ENGLAND ENGLAND ENGLAND INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA JAPAN USA CHINA CHINA ENGLAND ENGLAND ITALY NEW ZEALAND CHINA DENMARK ENGLAND ENGLAND INDIA AUSTRALIA USA USA ENGLAND JAPAN JAPAN ENGLAND SWITZERLAND ENGLAND ENGLAND ENGLAND ENGLAND ENGLAND ENGLAND ENGLAND ENGLAND ENGLAND ENGLAND ENGLAND ENGLAND

40.0 350.0

300.0

250.0

200.0

150.0

100.0

50.0

0.0

Fig. 4. A time-scaled phylogenetic tree estimated using 55 sequences of E2p7NS2 (a) and NS5b region (b). Each branch indicates an isolate and is denoted in the following format: Genbank accession number followed by subtype, followed by country of isolation. Nodes indicate posterior probabilities. The scale at the bottom of the tree represents the time in years with 0 indicating present year.

human ancestry. As the results suggest, the origin of HCV genotype 3a is in the Indian subcontinent, even when analyzed at the whole genome level as well as by E2p7NS2 region (Fig. 4a). The diversity

of HCV in the Indian sub-continent has also not been thoroughly examined in this regard. Sequences from India tend to cluster together forming a distinct India-specific cluster genotype 3a.

93

M.C. Choudhary et al. / Infection, Genetics and Evolution 28 (2014) 87–94 1.E5

Effec ve popula on size

1.E4

1.E3

1.E2

1.E1

1800

1850

1900

1950

2000

Time

Fig. 5. Bayesian skyline plot representing the estimates of the effective number of HCV genotype 3 infections in the studied population. The y-axis measures the effective number of infections in log10 scale while the x-axis represents time in years. The solid line represents the median estimate and the credibility interval based on 95% highest posterior density (HPD) interval is represented by filled area.

The tMRCA of HCV genotype 3a was calculated at about 99 years ago. It is interesting to note that sequences from India rooted out at the same time and later diverged in the subcontinent. The analysis further showed that other HCV genotype 3a sequences from the Indian sub-continent (including Pakistan) rooted closely followed by sequences of United Kingdom, leading us to speculate that the viral strain would have disseminated to United Kingdom and later to other European countries. The Bayesian skyline-plot analysis revealed that the number of infections remained relatively constant until the early 1940s, after which there was one log increase in effective number of infections till early 1990s (pre-antiviral introduction period), a period also coincided with rapid industrialization and blood transfusions together with unsafe use of blood products (Fig. 5). The period also coincided with the time of Indian independence in 1940s and 1950s and end of British colonization as well as World war, which involved massive migration of human population and possibly have led to spread of HCV to other parts of the world. The estimation of evolutionary history of HCV genotype 3a infection in India and Indian sub-continent will not only will be beneficial for epidemiological and clinical analyses, but it will also help in a long way in devising strategy for effective disease management and surveillance program to facilitate better public health service. References Aitken, C.K., McCaw, R.F., Bowden, D.S., et al., 2004. Molecular epidemiology of hepatitis C virus in a social network of injection drug users. J. Infect. Dis. 190, 1586–1595. Centers for Disease Control and Prevention, 1998. Recommendations for prevention and control of hepatitis C virus (HCV) infection and HCV-related chronic disease. MMWR Recomm. Rep. 47, 1–39. Chakravarti, A., Ashraf, A., Malik, S., 2013. A study of changing trends of prevalence and genotypic distribution of hepatitis C virus among high risk groups in North India. Indian J. Med. Microbiol. 31, 354–359. Chamberlain, R.W., Adams, N.J., Taylor, L.A., Simmonds, P., Elliott, R.M., 1997. The complete coding sequence of hepatitis C virus genotype 5a, the predominant genotype in South Africa. Biochem. Biophys. Res. Commun. 236, 44–49. Chowdhury, A., Santra, A., Chaudhuri, S., et al., 2003. Hepatitis C virus infection in the general population: a community-based study in West Bengal, India. Hepatology 37, 802–809. Davidson, F., Simmonds, P., Ferguson, J.C., et al., 1995. Survey of major genotypes and subtypes of hepatitis C virus using RFLP of sequences amplified from the 50 non-coding region. J. Gen. Virol. 76 (Pt. 5), 1197–1204. de Oliveria Andrade, L.J., D’Oliveira, A., Melo, R.C., De Souza, E.C., Costa Silva, C.A., Parana, R., 2009. Association between hepatitis C and hepatocellular carcinoma. J. Glob. Infect. Dis. 1, 33–37.

Drummond, A.J., Rambaut, A., 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7, 214. Foster, G.R., Hezode, C., Bronowicki, J.P., et al., 2011. Telaprevir alone or with peginterferon and ribavirin reduces HCV RNA in patients with chronic genotype 2 but not genotype 3 infections. Gastroenterology 141 (881–9), e1. Gededzha, M.P., Selabe, S.G., Blackard, J.T., Kyaw, T., Mphahlele, M.J., 2013. Near fulllength genome analysis of HCV genotype 5 strains from South Africa. Infect. Genet. Evol. 21C, 118–123. Global Burden of Hepatitis C Working Group, 2004. Global burden of disease (GBD) for hepatitis C. J. Clin. Pharmacol. 44, 20–29. Goffard, A., Lazrek, M., Bocket, L., Dewilde, A., Hober, D., 2007. Role of N-linked glycans in the functions of hepatitis C virus envelope glycoproteins. Ann. Biol. Clin. (Paris) 65, 237–246. Gottwein, J.M., Scheel, T.K., Jensen, T.B., Ghanem, L., Bukh, J., 2011. Differential efficacy of protease inhibitors against HCV genotypes 2a, 3a, 5a, and 6a NS3/4A protease recombinant viruses. Gastroenterology 141, 1067–1079. Gray, R.R., Parker, J., Lemey, P., Salemi, M., Katzourakis, A., Pybus, O.G., 2011. The mode and tempo of hepatitis C virus evolution within and among hosts. BMC Evol. Biol. 11, 131. Helle, F., Vieyres, G., Elkrief, L., et al., 2010. Role of N-linked glycans in the functions of hepatitis C virus envelope proteins incorporated into infectious virions. J. Virol. 84, 11905–11915. Helle, F., Duverlie, G., Dubuisson, J., 2011. The hepatitis C virus glycan shield and evasion of the humoral immune response. Viruses 3, 1909–1932. Henquell, C., Cartau, C., Abergel, A., et al., 2004. High prevalence of hepatitis C virus type 5 in central France evidenced by a prospective study from 1996 to 2002. J. Clin. Microbiol. 42, 3030–3035. Hissar, S.S., Goyal, A., Kumar, M., et al., 2006. Hepatitis C virus genotype 3 predominates in North and Central India and is associated with significant histopathologic liver disease. J. Med. Virol. 78, 452–458. Hissar, S.S., Kumar, M., Tyagi, P., et al., 2009. Natural history of hepatic fibrosis progression in chronic hepatitis C virus infection in India. J. Gastroenterol. Hepatol. 24, 581–587. Humphreys, I., Fleming, V., Fabris, P., et al., 2009. Full-length characterization of hepatitis C virus subtype 3a reveals novel hypervariable regions under positive selection during acute infection. J. Virol. 83, 11456–11466. Idrees, M., Riazuddin, S., 2008. Frequency distribution of hepatitis C virus genotypes in different geographical regions of Pakistan and their possible routes of transmission. BMC Infect. Dis. 8, 69. Iles, J.C., Abby Harrison, G.L., Lyons, S., et al., 2013. Hepatitis C virus infections in the Democratic Republic of Congo exhibit a cohort effect. Infect. Genet. Evol. 19, 386–394. Jover, R., Perez-Serra, J., de Vera, F., et al., 2001. Infection by genotype 5a of HCV in a district of southeast Spain. Am. J. Gastroenterol. 96, 3042–3043. Kumar, A., Sharma, K.A., Gupta, R.K., Kar, P., Chakravarti, A., 2007. Prevalence & risk factors for hepatitis C virus among pregnant women. Indian J. Med. Res. 126, 211–215. Levi, J.E., Takaoka, D.T., Garrini, R.H., et al., 2002. Three cases of infection with hepatitis C virus genotype 5 among Brazilian hepatitis patients. J. Clin. Microbiol. 40, 2645–2647. Mondelli, M.U., Silini, E., 1999. Clinical significance of hepatitis C virus genotypes. J. Hepatol. 31 (Suppl. 1), 65–70. Murphy, D.G., Willems, B., Vincelette, J., Bernier, L., Cote, J., Delage, G., 1996. Biological and clinicopathological features associated with hepatitis C virus type 5 infections. J. Hepatol. 24, 109–113.

94

M.C. Choudhary et al. / Infection, Genetics and Evolution 28 (2014) 87–94

Narahari, S., Juwle, A., Basak, S., Saranath, D., 2009. Prevalence and geographic distribution of Hepatitis C Virus genotypes in Indian patient cohort. Infect. Genet. Evol. 9, 643–645. Paintsil, E., Verevochkin, S.V., Dukhovlinova, E., et al., 2009. Hepatitis C virus infection among drug injectors in St Petersburg, Russia: social and molecular epidemiology of an endemic infection. Addiction 104, 1881–1890. Pang, P.S., Planet, P.J., Glenn, J.S., 2009. The evolution of the major hepatitis C genotypes correlates with clinical response to interferon therapy. PLoS One 4, e6579. Pybus, O.G., Markov, P.V., Wu, A., Tatem, A.J., 2007. Investigating the endemic transmission of the hepatitis C virus. Int. J. Parasitol. 37, 839–849. Samimi-Rad, K., Nasiri Toosi, M., Masoudi-Nejad, A., et al., 2012. Molecular epidemiology of hepatitis C virus among injection drug users in Iran: a slight change in prevalence of HCV genotypes over time. Arch Virol 157, 1959–1965. Sanchez, R., Serra, F., Tarraga, J., et al., 2011. Phylemon 2.0: a suite of web-tools for molecular evolution, phylogenetics, phylogenomics and hypotheses testing. Nucleic Acids Res. 39, W470–W474. Scheel, T.K., Gottwein, J.M., Mikkelsen, L.S., Jensen, T.B., Bukh, J., 2011. Recombinant HCV variants with NS5A from genotypes 1–7 have different sensitivities to an NS5A inhibitor but not interferon-alpha. Gastroenterology 140, 1032–1042. Schmidt, H.A., Strimmer, K., Vingron, M., von Haeseler, A., 2002. TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18, 502–504. Shepard, C.W., Finelli, L., Alter, M.J., 2005. Global epidemiology of hepatitis C virus infection. Lancet Infect. Dis. 5, 558–567. Sievert, W., Altraif, I., Razavi, H.A., et al., 2011. A systematic review of hepatitis C virus epidemiology in Asia, Australia and Egypt. Liver Int. 31 (Suppl. 2), 61–80. Silva, M.O., Treitel, M., Graham, D.J., et al., 2013. Antiviral activity of boceprevir monotherapy in treatment-naive subjects with chronic hepatitis C genotype 2/ 3. J. Hepatol. 59, 31–37.

Singh, S., Malhotra, V., Sarin, S.K., 2004. Distribution of hepatitis C virus genotypes in patients with chronic hepatitis C infection in India. Indian J. Med. Res. 119, 145–148. Singh, S., Gupta, R., Malhotra, V., Sarin, S.K., 2010. Predictors of histological activity and fibrosis in chronic Hepatitis C infection: a study from North India. Indian J. Pathol. Microbiol. 53, 238–243. Smith, D.B., Bukh, J., Kuiken, C., et al., 2014. Expanded classification of hepatitis C Virus into 7 genotypes and 67 Subtypes: updated criteria and genotype assignment web resource. Hepatology 59, 318–327. Sood, A., Sarin, S.K., Midha, V., et al., 2012. Prevalence of hepatitis C virus in a selected geographical area of northern India: a population based survey. Indian J. Gastroenterol. 31, 232–236. Takada, N., Takase, S., Takada, A., Date, T., 1993. Differences in the hepatitis C virus genotypes in different countries. J. Hepatol. 17, 277–283. Thakur, V., Guptan, R.C., Sarin, S.K., 2000. Prevalence of hepatitis GB virus C/ hepatitis G virus infection in blood donors in India. J. Assoc. Physicians India 48, 818–819. Tokita, H., Shrestha, S.M., Okamoto, H., et al., 1994. Hepatitis C virus variants from Nepal with novel genotypes and their classification into the third major group. J. Gen. Virol. 75 (Pt. 4), 931–936. Valliammai, T., Thyagarajan, S.P., Zuckerman, A.J., Harrison, T.J., 1995. Diversity of genotypes of hepatitis C virus in southern India. J. Gen. Virol. 76 (Pt 3), 711–716. Verbeeck, J., Maes, P., Lemey, P., et al., 2006. Investigating the origin and spread of hepatitis C virus genotype 5a. J. Virol. 80, 4220–4226. Zehender, G., Sorrentino, C., Lai, A., et al., 2013. Reconstruction of the evolutionary dynamics of hepatitis C virus subtypes in Montenegro and the Balkan region. Infect. Genet. Evol. 17, 223–230.