Analysing the evolutionary history of HCV: Puzzle of ancient phylogenetic discordance

Analysing the evolutionary history of HCV: Puzzle of ancient phylogenetic discordance

Infection, Genetics and Evolution 7 (2007) 354–360 www.elsevier.com/locate/meegid Analysing the evolutionary history of HCV: Puzzle of ancient phylog...

463KB Sizes 0 Downloads 46 Views

Infection, Genetics and Evolution 7 (2007) 354–360 www.elsevier.com/locate/meegid

Analysing the evolutionary history of HCV: Puzzle of ancient phylogenetic discordance G. Magiorkinis 1, F. Ntziora 1, D. Paraskevis *, E. Magiorkinis, A. Hatzakis National Retrovirus Reference Center Department of Hygiene and Epidemiology University of Athens, School of Medicine Alexandroupoleos 25, 11527 Goudi, Athens, Greece Received 14 February 2006; received in revised form 4 April 2006; accepted 6 April 2006 Available online 23 May 2006

Abstract Though recombination is an important evolutionary strategy in RNA viruses, only two cases of HCV recombinant strains have been reported. Our objective was to analyze the evolutionary history of the HCV genotypes aiming to obtain evidence of significant phylogenetic discordance due to either recombination or selective forces leading to convergent/divergent evolution. The data support an evolutionary preservation of the interferon-resistance related genomic region (ISDR) for the genotypes 1 and 4. On the other hand, there was no evidence that recombination has occurred in the past with the possible exception of genotype 4. Moreover, it is evidenced that genotypes 3 and 10 split more recently than genotypes 6–9 and 11. This analysis reverberates a commonly found pattern in rapidly evolving viruses, that is the strongly disturbed evolutionary history which deforms the uniform distribution of the phylogenetic relationships across the genome, and introduces a conservative inference framework for approaching this kind of data. # 2006 Elsevier B.V. All rights reserved. Keywords: HCV; Recombination; Phylogenetic discordance; Interferon resistance

1. Introduction Hepatitis virus C is a positive sense single-stranded RNA virus (104 nucleotides) that is most similar to viruses belonging to the Pestivirus and Flavivirus genera. As in other Flaviviruses, the three N-terminal proteins of HCV (core, envelope 1 and envelope 2) are probably structural and the four C-terminal proteins (nonstructural 2–5) are thought to function in viral replication (Nolte, 2001). Phylogenetic analysis of the El region has suggested that HCV is grouped into six major genotypes and 12 subtypes (Bukh et al., 1993). This classification scheme has been confirmed by the sequence of 573 nt of the core region (Bukh et al., 1994). Sequence analysis of the NS5B region also has classified HCV isolates into the same six major genotypes and numerous subtypes (Simmonds et al., 1993). The genotypes have been numbered 1–6 and the subtypes a, b, c, etc. in order of discovery (Simmonds, 2001). Genotypes 7–11 that have been

* Corresponding author. Tel.: +30 210 7486382; fax: +30 210 7486382. E-mail address: [email protected] (D. Paraskevis). 1 These authors have equally contributed to the study. 1567-1348/$ – see front matter # 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.meegid.2006.04.003

reported are distributed among the clades of HCV in a way that genotypes 3 and 10 are members of clade 3, and genotypes 6–9 and 11 are all members of clade 6 (Fig. 1) (Robertson et al., 1998). Each of the main genotypes is equally diverged from the others differing at 31–34% of nucleotide positions on pairwise comparison of complete genomic sequences, and leading to approximately 30% amino acid sequence divergence between the encoded polyproteins (Simmonds, 2001). The different genotypes of HCV are thought to be associated with regional distribution, clinical manifestation and response to treatment of HCV infection (Huang et al., 1999). HCV genotypes la and lb are the most common genotypes in the United States and in Europe (Zein et al., 1996; Dusheiko et al., 1994; McOmish et al., 1994; Nousbaum et al., 1995). Genotype lb is also predominant in Japan (Takada et al., 1993). Genotypes 2a and 2b are relatively common in North America, Europe and Japan, while genotype 2c is commonly found in northern Italy. Genotype 3a is particularly prevalent in intravenous drug users in Europe and the United States (Pawlotsky et al., 1995). Genotype 4 is common in North Africa and Middle East (Abdulkarim et al., 1998; Chamberlain et al., 1997), whereas genotypes 5 and 6 are prevalent in South Africa and Hong Kong, respectively (Cha et al., 1992; Simmonds et al., 1993).

G. Magiorkinis et al. / Infection, Genetics and Evolution 7 (2007) 354–360

355

mentioned genotype reference panel. These datasets were subsequently blinded to the analyst and treated as being the same. 2.2. Exploratory analysis

Fig. 1. ML tree inferred by means of the Tree-Puzzle program including fulllength sequences of all the genotypes (1 through 11).

Genotypes 7–9 have been isolated only from Vietnamese patients (Tokita et al., 1994), while genotypes 10 and 11 are found in Indonesia (Tokita et al., 1996; Zein, 2000). Up to now one case of recombination between different genotypes (lb and 2k) in HCV has been identified in St. Petersburg (Kalinina et al., 2002), while recombination has also been mentioned to occur intratypically (Colina et al., 2004). In the view of these findings, the purpose of our study was to explore the evolutionary history of HCV investigating the possibility of recombination in the available full length sequences. 2. Materials and methods 2.1. Datasets All the HCV full length sequences (120) available at the beginning of the study (October 2003) were retrieved from the Genbank, aligned by means of the CLUSTAL W program (v 1.81) (Thompson et al., 1994) and manually corrected. We did not include more recent sequences assuming that since recombination was identified for HCV in 2002 each full-length sequence submitted to Genbank has been explored for recombination from that time point. The reference sequences were determined as following: Genotype 1 (la: AF009606, AF387806, AF290978; lb: D50483, AB049093, D85516), Genotype 2 (2a: AF169005, AB047645, D00944; 2b: AB030907; 2c: D50409), Genotype 3 (3a: D28917, D17763; 3b: D49374), Genotype 4 (Yl 1604), Genotype 5 (AF064490, Y13184), Genotype 6 (6a: Y12083; 6b: D84262), Genotype 7 (6d) (D84263), Genotype 8 (6k) (D84264), Genotype 9 (6h) (D84265), Genotype 10 (3k) (D63821) and Genotype 11 (6g) (D63822). Two datasets were formed using different combinations of randomly selected sequences from the previously

Bayesian scanning plots were constructed by utilizing the General Time Reversible substitution model with g-distributed rates of heterogeneity among sites (GTR + G) as described previously (Paraskevis et al., 2005). Four Metropolis Coupled Markov Chains Monte Carlo (MC3) were run for 50,000 generations, sampling a tree for every 100 generations. Each Bayesian scanning plot was performed 10 times independently in order to discriminate randomly assigned sub-genomic monophyletic relationships. Several loci of the alignment were run 4 MC3 up to 5,000,000 generations in order to define that stationarity had been reached up to 10,000 generations and rationally determine a global burnin at 100 sampled steps. Finally, the estimated sample size (ESS) was calculated for a random sample of windows of the bayesian scanning plots by means of the program Tracer (http://evolve.zoo.ox.ac.uk/software.html?id= tracer) in order to define if appropriate mixing of the MC3 sampler was achieved in the posterior target distribution (ESS > 100). Genomic regions showing constant clustering for more than 300 nt (>5 adjacent sliding windows) in the Bayesian scanning plots were selected as more probable to estimate the true evolutionary relationship. Subsequently, phylogenetic analysis was performed into the chosen fragments by implementing multiple approaches. First, using maximum likelihood (ML) method with Tamura-Nei substitution model (Tamura and Nei, 1993) and G-distributed rates of heterogeneity among sites (TNr + G) as implemented in Tree-Puzzle 5.0 (Schmidt et al., 2002). Secondly, ML estimators of the topologies were obtained applying the GTR + G through heuristic search of the likelihood as programmed in PAUP (Swofford, 2003). So as to keep the computational time in reasonable temporal frames the statistical confidence for each cluster was valuated by bootstrapping the alignments and neighbor-joining (NJ) reconstruction of the ML estimated distance matrices. Moreover, phylogenies were estimated through a Bayesian framework by utilizing maximum-likelihood evolutionary models of gradually increasing complexity by means of the MrBayes program (Huelsenbeck and Ronquist, 2001). The basic model was the GTR substitution model and subsequently we added parameters accounting for rates of heterogeneity among sites assuming firstly a common Gdistribution over all sites and secondly three distinct Gdistributions for grouped sites according to their codon position (1st, 2nd or 3rd). Moreover, we added an additional parameter accounting for across-tree rate variation using a covarion based model (Tuffley and Steel, 1998). Each of the previously mentioned topologies was estimated by running at least 1,000,000 generations and stationarity was examined for every tree in order to define a rational burnin. 2.3. Reducing trees estimation In order to define if there is a genotype specific evolutionary process that significantly affects the estimation of the topology,

356

G. Magiorkinis et al. / Infection, Genetics and Evolution 7 (2007) 354–360

we introduced a new method called after ‘‘reducing trees estimation’’. This method is a jack-knife like algorithm applying resampling to the taxa, that is six GTR + G trees were constructed (one for each of the reference genotypes) under the Bayesian framework as following: each one of these trees did not contain the reference sequences for only one genotype in turns, since each time the reference sequences for each one of the six genotypes were removed from the alignment. 2.4. Significance of phylogenetic discordance So as to determine the statistical significance of the topology discordance among loci, we performed approximately unbiased tests (AU) as implemented in the Consel program (Shimodaira and Hasegawa, 2001). The optimal tree inferred from each partial region (sub-genomic optimal) was compared to the topology inferred from the full-length genomic sequence. This form of null hypothesis was chosen because the full-genomic tree is less susceptible to sampling error bias than the subgenomic optimal trees. However, under the alternative hypothesis (that recombination is present), the full-genomic tree is biased and may lead to identifying less probably a possible recombination event. Thus, failure of null hypothesis’ rejection is not evidence for absence of recombination. On the other hand, the rejection of null hypothesis is more probable to be due to true phylogenetic discordance, thus fulfilling our choice for more stringent criteria. 2.5. Phylogenetic information and substitution saturation Likelihood mapping analysis was performed as implemented in the Tree-Puzzle program to discriminate if the analysed regions signalled a star-like evolution. Substitution saturation tests were performed using the DAMBE (Xia and Xie, 2001) program for the most divergent regions. 2.6. Analysis’ algorithm Our objective was to determine the phylogenetic relationships between the different HCV genotypes throughout the genome under the concern that recombination is a possible event that has affected the evolutionary history of HCV. To define candidate regions that might support discordance of the evolutionary history, we implemented a workflow algorithm as following: (1) Full-length genomic tree was inferred utilizing the TreePuzzle program including all genotypes. (2) Exploratory analysis was performed by constructing Bayesian scanning plots as described previously. (3) Putative phylogenetic discordance was taken into account if the Bayesian scanning plot failed to show a uniformly distributed relationship across the genome between the query sequence and the HCV genotypes. Subsequent phylogenetic analysis was carried out in loci, for which Bayesian scanning plots suggested partial constant clustering for at least 300 nt (Fig. 2).

Fig. 2. Bayesian scanning plot for genotype 1. Regions that supported partial constant clustering for at least 300 nt (grey shaded) were selected for further analysis.

(4) AU test was performed in order to define the significance of the phylogenetic discordance for regions selected at step 3. (5) Reducing trees estimation was performed at the regions selected at step 3 so as to reveal genotype specific evolution. (6) Likelihood mapping analysis was performed in order to define whether the tree-like evolution assumption is significantly disturbed. 3. Results Our objective was to define the phylogenetic relationships firstly among the main genotypes (1 through 6) and secondly among the ‘‘secondary’’ genotypes (7 through 11). Results are reported for every genotype and each specific region separately according to the predefined analysis’ algorithm. 3.1. Genotypes 1 through 6 Concerning genotype 1, the ML trees (Fig. 3) using the TreePuzzle program based on the full length sequences supported a closer though old relationship with genotype 4. The Bayesian scanning plot suggested four regions (Fig. 2) for subsequent

Fig. 3. ML tree inferred by means of the Tree-Puzzle program including fulllength sequences of the main genotypes (1 through 6).

G. Magiorkinis et al. / Infection, Genetics and Evolution 7 (2007) 354–360

phylogenetic analysis, two of which were indicative of a closer relationship with genotype 4, whilst the remainder two with genotypes 6 and 2 (Suppl. Table 1). The AU test evidenced discordance only for the region supporting the cluster with genotype 6 against the global topology, whereas this relationship was concordant for all analyses taken (Suppl. Tables 1 and 2). These data suggest that genotypes 1 and 4 may share a more recent ancestor in most of the parts of the genome that is also reflected in the monophyletic relationship observed in the global topology. This relationship is especially supported in regions spanning nt 2644–2994 (P7-NS2) and 7502–7922 (NS5ANS5B) (positions in regard to the HCV-H genome). On the other hand, in region 6545–6895 (NS5A), where genotype 1 clusters with 2, there is no strong evidence for discordance, since this region cannot significantly reject the global topology (AU test). In addition, as shown by removing one by one the genotypes from the alignment, the cluster with genotype 2 is not constant (Table 2) demonstrating that the model estimations are not universally distributed along all branches in such a small region. For the genotype 2, the ML trees based on the full length sequences were not supportive for a closer relationship with any of the remaining genotypes. The Bayesian scanning plot recommended four regions for subsequent phylogenetic analysis, two of which supported clustering with genotype 5 and the remaining with genotypes 1,3 and 4 (Suppl. Table 1). Concordance between topologies was obtained only for the region supporting clustering with genotype 3 (4394–4794)

357

(NS3) and the AU test supported this estimate against the global topology. On the other hand, the reduced trees were not concordant for the region spanning nt 6545–6895 (NS5A) as also mentioned earlier. Moreover, for the region concerning nt 4394–4794 (NS3), though five of the reduced trees were concordant, when genotype 3 was removed, genotype 1 clustered with genotype 2 indicating that the estimation was strongly affected by branch 3 specific evolution or it is indicative of long branch attraction (Fig. 4). Consequently, neither this edge’s monophyletism is trustworthy. As for the genotype 3 the ML trees based on the full length sequences were supportive of a closer though old relationship with genotype 5. Bayesian scanning plot has suggested six regions for further phylogenetic analysis. The inferred topologies confirmed that different parts of the genome cluster with genotypes 2 and 4–6 (Suppl.Table 1). The most important relationship was found at the region spanning nt 8522–8925 (NS5B), where all the estimations were found to coincide for a closer relationship with genotype 6. Although when genotype 6 is removed, genotype 3 forms a significant cluster with genotype 4, this finding should not be considered to be discordant with the rest of the reduced trees, since genotypes 3, 4 and 6 form a supercluster (Fig. 5). Consequently, this topology is reliable and significantly different from the global topology. As for the genotype 4, the ML trees based on the full length sequences were supportive for a closer relationship with genotype 1, as mentioned previously. Bayesian scanning plot

Fig. 4. Full taxon tree and reduced trees for the region spanning nt 4394–4794 showing the stability of the phylogenetic relationships among genotype 2 and the rest of the genotypes.

358

G. Magiorkinis et al. / Infection, Genetics and Evolution 7 (2007) 354–360

cluster, but not of the existence of the cluster itself. On the other hand, no significant disturbance of the estimation was observed for the second fragment. The AU tests did not confer for a significant difference of the likelihood between these estimations and the overall tree. As for genotype 9, the most important relationship was found to be with genotype 6 again according to the full-length genome. Still, as indicated by the Bayesian scanning plots in fragments concerning nt 1890–2278 (E2) and 3894–4344 (NS3) closer associations between genotypes 5 and 3, respectively, were found, but not consistently supported (Suppl. Tables 1 and 2). As an exception to the phylogenetic discordance discussed we have to contrast genotype 10, for which the Bayesian scanning plot indicated a uniform relationship with genotype 3. The constructed phylogenetic trees proposed that genotype 10 clusters with genotype 3, thus confirming that these two genotypes clearly form together the genotypic branch 3. Regarding genotype 11, the overall tree supported closer relationship with genotype 6, but the exploratory analysis was suggestive for phylogenetic discordance in two regions. However, none of them performed a constant signal for all the methods undertaken concluding in weak significance. For all regions analyzed the star-like evolution was not supported by likelihood mapping analysis showing that there is not strong degradation of the evolutionary information (>70% fully resolved quartets). Moreover, we failed to detect substitution saturation for the most divergent regions of the HCV genome (E2-P7). Fig. 5. Full-taxon and genotype-6-reduced trees for the region spanning nt 8522–8925 showing the stability of the genotypes’ 3, 4 and 6 supercluster.

3.3. Full length sequences

suggested 7 regions for further phylogenetic analysis (Suppl. Table 1). In addition to the fragments commented previously, the region spanning nt 4195–4645 (NS3) was found to consistently support closer relationship with genotype 6 and significantly reject the global topology. The reduced trees estimation gave no evidence for disturbance of this relationship. For the genotypes 5 and 6 no additional significant partial relationships were found.

Additional full length sequences were downloaded and examined through Bayesian scanning analysis for possible intergenotypic recombination events. Ninety-eight Bayesian scanning plots were built, but none indicated any discordant phylogeny, while subsequent phylogenetic analysis showed that all the sequences clustered with subtype lb.

3.2. Genotypes 7 through 11 (subtypes 6d, 6k, 6h, 3k and 6g, respectively)

Our phylogenetic analysis was in general divided into two parts: first we analyzed the evolutionary relationships among the six main HCV genotypes (1–6) and then we phylogenetically analyzed these genotypes together each of the remaining genotypes (7–11). The exploratory analysis based on Bayesian inference suggested a complex pattern of partial differential clustering. As initial screening for detecting candidates loci which may have distinct evolutionary histories, we chose to use a sensitive method in detecting phylogenetic discordance, such as the sliding window approach under the Bayesian inference. This method’s increment of sensitivity is bounded with a significant increase of the probability to infer erroneously that a locus has evolved differentially (type I error) due to multiple testing. However, we deliberately chose to have a progressive criterion in the recombination hypothesis at the beginning of the analysis

Phylogenetic analysis of the whole genome of genotype 7 showed a close relationship with genotype 6, thus confirming the clustering of these two genotypes in clade 6. Even though the Bayesian scanning analysis suggested partial clustering with genotypes 3 and 5 (Table 1), the estimations were not coinstantaneous, concluding in insolvency of these estimations. Regarding genotype 8, the analysis was supportive for the hypothesis that genotype 8 forms a cluster with genotype 6 apart from the regions concerning nt 1719–2213 (E2) and 3894–4294 (NS3), for which there is evidence that genotype 8 groups with genotypes 5 and 3, respectively. Interestingly, the reduced trees for the first region suggested a branch-specific disturbance of the statistical significance of the 5–8 genotype

4. Discussion

G. Magiorkinis et al. / Infection, Genetics and Evolution 7 (2007) 354–360

for the following reason: this approach has the advantage that by splitting the alignment into distinct parts, according to the Bayesian scanning plots, and analysing separately each fragment, we chose to be strongly conservative in the treelike (not network-like) evolutionary assumption. In order to balance the type I error, we predetermined a selection criterion: the postulation for including a region as candidate was to evidence constant significant closer relationship for at least six windows in a raw. The possibility that 6 windows could perform by chance the same clustering pattern is less (the first and the last window have less than 50% common sites). Nevertheless, we used the Bayesian scanning plots as indicative (exploratory) analysis and we implemented the AU tests as strongly evidential for rejecting the hypothesis of common evolutionary history. Still, since the AU test is strongly conservative, this is definitely not evidential for accepting the null hypothesis. As already described, we performed additional analysis under various evolutionary models and different reference datasets. The use of different models and phylogenetic approaches constrains methodological problems that may occur by simplification assumptions. Apart from that, we used different reference datasets in order to identify whether a sampling error lying in the reference sequences’ choice could systematically affect the topology estimators. However, we did not include all the reference sequences in a single topology, so as not to degrade the statistical power of the estimation. Moreover, over-representation of specific genotypes, for which more information is available (e.g. genotype 1), would have biased the model estimations towards an equilibrium that tip the scales towards the over-sampled genotype. We assume that concordant estimations under different assumptions and datasets are fair evidence that the observed clustering is not biased (Suppl. Table 1). According to our analysis, a very strong relationship is evidenced for genotypes 1 and 4 manifesting most likely that the origin of genotype 4 is from a genotype 1 sub-branch. Since the distribution of genotype 1 is global and genotype 4 is geographically restricted, the vice-versa scenario is not equally probable. Besides, our analysis manifested that in specific parts of the genome an accurate estimation of the overall topology could be made as long as concordance between the phylogenetic estimations was consistently documented by strong monophyletic relationships. This is suggestive that evolutionary forces may have restricted the additive mutational noise resulting in making possible to recall more ancient evolutionary relationships in small sub-genomic fragments that have less statistical power (strongly informative sub-genomic fragments). It is interesting that one of these regions (7502–7922) is documented to play a significant role in interferon resistance (Interferon Sensitivity Determining Region, ISDR) (Saiz et al., 1998), while it is established that genotype 1 or 4 infections are similarly resistant to interferon therapies (Abdo and Lee, 2004). Since interferon is a nonspecific antiviral factor, our hypothesis is that the particular region has probably evolved under strong selective pressure to maintain the specific advantageous sequence space. Moreover, there is strong evidence for phylogenetic discordance in the

359

origin of genotype 4, since at least one region (4195–4645) consistently inferred a different topology than the one proposed from the full length sequence, suggesting a strong monophyletic relationship with genotype 6. The most probable reason for this observation, since the data are not in favour for genotype specific evolution that could result either in convergent or divergent evolution, is an ancient recombination event between ancestral strains of genotypes 1 and 6. The exploratory analysis of genotypes 7–11 has denoted possibly discordant phylogenies for genotypes 7–9 and 11 and a strictly clonal history for genotype 10. While genotype 10 is clearly closer to genotype 3 along the genome, for the remaining genotypes several regions were suggested to have evolved differentially. If we assume that no recombination has taken place in the evolution of these branches and that accumulation of mutations was the only evolutionary force, then the clear consistency of the Bayesian scanning plot of genotype 3 in contrast to the noisy signal of the rest of the genotypes should be a reliable indication that genotypes 10 and 3 separated more recently than the genotypes 7–9, 11 and 6. Interestingly, genotypes 8 and 9 were found to have very similar patterns in the exploratory analysis and subsequent phylogenetic estimations. Still, for genotype 8, our findings were consistent for all the undertaken analysis, while for genotype 9 discrepancies in-between different topologies were observed. Yet, these two genotypes are very probable to have a more recent common ancestor, as proposed by their common geographical distribution (Simmonds, 2001; Tokita et al., 1994) and the closer link in the full genomic topology. It is very likely that genotype 9 has undergone a more noisy evolutionary process, which resulted in weak signal in small genomic parts, while the observed discrepancy between the overall tree and the partial estimators is the result of evolutionary events taken place before the separation of these two genotypes. Finally, we have introduced a new methodology called after ‘‘reducing trees estimation’’ for testing whether a branch-specific evolutionary process disturbs the model estimations. This method shows if the cluster under concern in the estimated topology is biased by the assumptions that the base-frequency and base transition matrices are similar for all branches. Consequently, this could be a rough test for convergent and/or divergent evolution. However, simulation studies should be undertaken to establish whether this test can discriminate in-between ancient recombination events and these evolutionary processes both cause significant phylogenetic discordance. Acknowledgments We wish to acknowledge Andrew Rambaut, for his supportive comments and suggestions regarding this study. G.M., F.N. and E.M. were supported by the Hellenic Scientific Society of AIDS and Sexually Transmitted Diseases. D.P. was supported by the Hellenic Center for Infectious Diseases Control (HCIDC) of the Ministry of Health & Welfare.

360

G. Magiorkinis et al. / Infection, Genetics and Evolution 7 (2007) 354–360

Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.meegid.2006.04.003. References Abdo, A.A., Lee, S.S., 2004. Management of hepatitis C virus genotype 4. J. Gastroenterol. Hepatol. 19, 1233–1239. Abdulkarim, A.S., Zein, N.N., Germer, J.J., Kolbert, C.P., Kabbani, L., Krajnik, K.L., Hola, A., Agha, M.N., Tourogman, M., Persing, D.H., 1998. Hepatitis C virus genotypes and hepatitis G virus in hemodialysis patients from Syria: identification of two novel hepatitis C virus subtypes. Am. J. Trop. Med. Hyg. 59, 571–576. Bukh, J., Purcell, R.H., Miller, R.H., 1993. At least 12 genotypes of hepatitis C virus predicted by sequence analysis of the putative El gene of isolates collected worldwide. Proc. Natl. Acad. Sci. USA 90, 8234–8238. Bukh, J., Purcell, R.H., Miller, R.H., 1994. Sequence analysis of the core gene of 14 hepatitis C virus genotypes. Proc. Natl. Acad. Sci. USA 91, 8239– 8243. Cha, T.A., Beall, E., Jrvine, B., Kolberg, J., Chien, D., Kuo, G., Urdea, M.S., 1992. At least five related, but distinct, hepatitis C viral genotypes exist. Proc. Natl. Acad. Sci. USA 89, 7144–7148. Chamberlain, R.W., Adams, N., Saeed, A.A., Simmonds, P., Elliott, R.M., 1997. Complete nucleotide sequence of a type 4 hepatitis C virus variant, the predominant genotype in the Middle East. J. Gen. Virol. 78 (Pt 6), 1341– 1347. Colina, R., Casane, D., Vasquez, S., Garcia-Aguirre, L., Chunga, A., Romero, H., Khan, B., Cristina, J., 2004. Evidence of intratypic recombination in natural populations of hepatitis C virus. J. Gen. Virol. 85, 31–37. Dusheiko, G., Schmilovitz-Weiss, H., Brown, D., McOmish, F., Yap, P.L., Sherlock, S., McIntyre, N., Simmonds, P., 1994. Hepatitis C virus genotypes: an investigation of type-specific differences in geographic origin and disease. Hepatology 19, 13–18. Huang, F., Zhao, G.Z., Li, Y., 1999. HCV genotypes in hepatitis C patients and their clinical significances. World J. Gastroenterol. 5, 547–549. Huelsenbeck, J.P., Ronquist, F., 2001. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17, 754–755. Kalinina, O., Norder, H., Mukomolov, S., Magnius, L.O., 2002. A natural intergenotypic recombinant of hepatitis C virus identified in St. Petersburg. J. Virol. 76, 4034–4043. McOmish, F., Yap, P.L., Dow, B.C., Follett, E.A., Seed, C., Keller, A.J., Cobain, T.J., Krusius, T., Kolho, E., Naukkarinen, R., et al., 1994. Geographical distribution of hepatitis C virus genotypes in blood donors: an international collaborative survey. J. Clin. Microbiol. 32, 884–892. Nolte, F.S., 2001. Hepatitis C virus genotyping: clinical implications and methods. Mol. Diagn. 6, 265–277. Nousbaum, J.B., Pol, S., Nalpas, B., Landais, P., Berthelot, P., Brechot, C., 1995. Hepatitis C virus type lb (II) infection in France and Italy. Collaborative Study Group. Ann. Int. Med. 122, 161–168. Paraskevis, D., Deforche, K., Lemey, P., Magiorkinis, G., Hatzakis, A., Vandamme, A.M., 2005. SlidingBayes: exploring recombination using a sliding window approach based on Bayesian phylogenetic inference. Bioinformatics 21, 1274–1275.

Pawlotsky, J.M., Tsakiris, L., Roudot-Thoraval, F., Pellet, C., Stuyver, L., Duval, J., Dhumeaux, D., 1995. Relationship between hepatitis C virus genotypes and sources of infection in patients with chronic hepatitis C. J. Infect. Dis. 171, 1607–1610. Robertson, B., Myers, G., Howard, C., Brettin, T., Bukh, J., Gaschen, B., Gojobori, T., Maertens, G., Mizokami, M., Nainan, O., Netesov, S., Nishioka, K., Shin i, T., Simmonds, P., Smith, D., Stuyver, L., Weiner, A., 1998. Classification, nomenclature, and database development for hepatitis C virus (HCV) and related viruses: proposals for standardization. International Committee on Virus Taxonomy. Arch. Virol. 143, 2493–2503. Saiz, J.C., Lopez-Labrador, F.X., Ampurdanes, S., Dopazo, J., Forns, X., Sanchez-Tapias, J.M., Rodes, J., 1998. The prognostic relevance of the nonstructural 5A gene interferon sensitivity determining region is different in infections with genotype lb and 3a isolates of hepatitis C virus. J. Infect. Dis. 177, 839–847. Schmidt, H.A., Strimmer, K., Vingron, M., von Haeseler, A., 2002. Tree-Puzzle: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18, 502–504. Shimodaira, H., Hasegawa, M., 2001. CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics 17, 1246–1247. Simmonds, P., 2001. The origin and evolution of hepatitis viruses in humans. J. Gen. Virol. 82, 693–712. Simmonds, P., Holmes, E.C., Cha, T.A., Chan, S.W., McOmish, F., Jrvine, B., Beall, E., Yap, P.L., Kolberg, J., Urdea, M.S., 1993. Classification of hepatitis C virus into six major genotypes and a series of subtypes by phylogenetic analysis of the NS-5 region. J. Gen. Virol. 74 (Pt 11), 2391– 2399. Swofford, D., 2003. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Mehods). Version 4. Takada, N., Takase, S., Takada, A., Date, T., 1993. Differences in the hepatitis C virus genotypes in different countries. J. Hepatol. 17, 277–283. Tamura, K., Nei, M., 1993. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol. 10, 512–526. Thompson, J.D., Higgins, D.G., Gibson, T.J., 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucl. Acids Res. 22, 4673–4680. Tokita, H., Okamoto, H., Iizuka, H., Kishimoto, J., Tsuda, F., Lesmana, L.A., Miyakawa, Y., Mayumi, M., 1996. Hepatitis C virus variants from Jakarta, Indonesia classifiable into novel genotypes in the second (2e and 2f), tenth (10a) and eleventh (11a) genetic groups. J. Gen. Virol. 77 (Pt 2), 293–301. Tokita, H., Okamoto, H., Tsuda, F., Song, P., Nakata, S., Chosa, T., Iizuka, H., Mishiro, S., Miyakawa, Y., Mayumi, M., 1994. Hepatitis C virus variants from Vietnam are classifiable into the seventh, eighth, and ninth major genetic groups. Proc. Natl. Acad. Sci. USA 91, 11022–11026. Tuffley, C., Steel, M., 1998. Modeling the covarion hypothesis of nucleotide substitution. Math Biosci. 147, 63–91. Xia, X., Xie, Z., 2001. DAMBE: data analysis in molecular biology and evolution. J. Heredity 92, 371–373. Zein, N.N., 2000. Clinical significance of hepatitis C virus genotypes. Clin. Microbiol. Rev. 13, 223–235. Zein, N.N., Rakela, J., Krawitt, E.L., Reddy, K.R., Tominaga, T., Persing, D.H., 1996. Hepatitis C virus genotypes in the United States: epidemiology, pathogenicity, and response to interferon therapy. Collaborative Study Group. Ann. Int. Med. 125, 634–639.