COMMENT
Footprint of selection
code, we must construct our own Rosetta stone based on models of evolutionary processes. Two recent papers1,2 provide us with tools to extract more information about the form of selection from genetic data. The test developed by Fay and Wu1, for example, could be used to infer whether genetic differences between humans and chimps represent chance substitutions of neutral mutations, or whether they might be the products of directional selection. The interpretation of our evolutionary past will not, however, be simple. We need to investigate the robustness of our conclusions to changes in assumptions. For example, do the results of the Fay and Wu test1 depend on the assumption that each mutation generates a unique allele? Furthermore, we need to account explicitly for the various sources of uncertainty in our conclusions. For example, how certain are we that an allele is ancestral or derived? Finally, we need to investigate the statistical power of the References 1 Fay, J.C. and Wu, C.I. (2000) Hitchhiking under positive Darwinian selection. Genetics 155, 1405–1413 2 Kim, Y. and Stephan, W. (2000) Joint effects of genetic hitchhiking and background selection on neutral variation. Genetics 155, 1415–1427 3 Haldane, J.B.S. (1927) A mathematical theory of natural and artificial selection, Part V: Selection and Mutation. Proc. Cambridge Philos. Soc. 23, 838–844 4 Haldane, J.B.S. (1935) The rate of spontaneous mutation of a human gene. J. Genetics 31, 317–326 5 Cavalli-Sforza, L.L. and Bodmer, W.F. (1971) The Genetics of Human Populations, W.H. Freeman 6 Kimura, M. (1983) The Neutral Theory of Molecular Evolution,
various tests. How can we best estimate a number of parameters, including the strength of background selection, the strength of directional selection and the timing of hitchhiking events? How can we distinguish signals left by selection from those left by demographic events, such as the population expansion and widespread migration that have characterized human history13,14? Over the next few years we will learn much about the strengths and weaknesses of different approaches to infer evolutionary events from DNA sequence data. It will be fascinating to see what clues to our evolutionary past are revealed.
Acknowledgements I gratefully acknowledge helpful comments on the manuscript from P. Awadalla, A. Poon, M. Whitlock and two anonymous reviewers. Funding was provided by the Natural Science and Engineering Research Council (Canada).
Cambridge University Press 7 McDonald, J.H. and Kreitman, M. (1991) Adaptive protein evolution at the Adh locus in Drosophila. Nature 351, 652–654 8 Maynard-Smith, J. and Haigh, J. (1974) The hitch-hiking effect of a favorable gene. Genet. Res. Camb. 23, 23–35 9 Charlesworth, B. et al. (1993) The effect of deleterious mutations on neutral molecular variation. Genetics 134, 1289–1303 10 Tajima, F. (1983) Evolutionary relationship of DNA sequences in finite populations. Genetics 105, 437–460 11 Fu, Y.X. (1997) Statistical tests of neutrality of mutations against population growth, hitchhiking and background selection. Genetics 147, 915–925 12 Fu, Y.X. (1995) Statistical properties of segregating sites. Theor. Popul. Biol. 48, 172–197
Horizontal gene transfer in the ribosome
Outlook
13 Excoffier, L. and Schneider, S. (1999) Why hunter-gatherer populations do not show signs of pleistocene demographic expansions. Proc. Natl. Acad. Sci. U. S. A. 96, 10597–10602 14 Fay, J.C. and Wu, C.I. (1999) A human population bottleneck can account for the discordance between patterns of mitochondrial versus nuclear DNA variation. Mol. Biol. Evol. 16, 1003–1005 15 Hudson, R.R. and Kaplan, N.L. (1995) The coalescent process and background selection. Philos. Trans. R. Soc. London Ser. B 349, 19–23 16 Kaplan, N.L. et al. (1989) The ‘hitchhiking effect’ revisited. Genetics 123, 887–899 17 Watterson, G.A. (1975) On the number of segregating sites. Theor. Popul. Biol. 7, 256–276
GENOME ANALYSIS
Outlook
The evolutionary history of ribosomal protein RpS14: horizontal gene transfer at the heart of the ribosome T
he comparative study of the flow of complete genome sequences has helped to reveal the important role of horizontal gene transfer (HGT) in prokaryotic evolution1–5. Complete genome sequences not only allow the identification of the set of transferred genes present in an organism, but also help to reconstruct the fate of these genes once incorporated into the receiving genome. In fact, if homologues were already present in the host, transferred genes can replace the host genes via recombination, or they can simply be integrated at another site in the host
0168-9525/00/$ – see front matter © 2000 Elsevier Science Ltd. All rights reserved. PII: S0168-9525(00)02142-9
genome leading to the presence of multiple copies. This is especially interesting in the case of genes belonging to conserved operons, in order to know whether the selective advantage that the transferred genes provide to the host can compensate the disadvantage intrinsic to operon disruption (which hinder the coordinated expression of all genes). As a model, we have analysed the spectinomycin (spc) operon, a conserved operon that groups the genes coding for the ribosomal proteins RpL14, RpL24, RpL5, RpS14, TIG December 2000, volume 16, No. 12
529
Outlook
GENOME ANALYSIS
Horizontal gene transfer in the ribosome
FIGURE 1. Bacterial phylogeny based on rps14 sequence.
Archaea
I
Eucarya
Aquifex aeolicus Desulfovibrio vulgaris Geobacter sulfurreducens Leptospira interrogans Borrelia burgdorferi Treponema denticola Treponema pallidum Thermotoga maritima Thiobacillus ferrooxidans Campylobacter jejuni Helicobacter pylori Helicobacter pylori Thermus aquaticus Thermus thermophilus Streptomyces coelicolor Mycobacterium tuberculosis Mycobacterium bovis Mycobacterium avium Mycobacterium leprae Mycoplasma gallisepticum Mycoplasma genitalium Mycoplasma pneumoniae Mycoplasma capricolum Ureaplasma urealyticum Clostridium difficile Clostridium acetobutylicum Staphylococcus aureus Enterococcus faecalis Streptococcus mutans Streptococcus pyogenes Streptococcus equi Bacillus stearothermophilus Bacillus halodurans Bacillus anthracis Bacillus subtilis
100
100
86
100
Aquificales δ-Proteobacteria Spirochetes Thermotogales γ-Proteobacteria ε-Proteobacteria Thermus/Deinococcus
High GC Gram positives
Low GC Gram positives
Chlorobium tepidum Porphyromonas gingivalis Deinococcus radiodurans Streptococcus pneumoniae Staphylococcus aureus Enterococcus faecalis Bacillus subtilis Streptococcus equi Streptococcus pyogenes
Green Sulfur Cytophagales Thermus/Deinococcus Low GC Gram positives
III
Cyanobacteria, Chloroplasts
Céline Brochier Celine.Brochier@ snv.jussieu.fr Hervé Philippe Herve.Philippe@ snv.jussieu.fr
5
David Moreira*
[email protected] Equipe Phylogénie, BioInformatique et Génome, Bâtiment B, 6eme étage, Université Pierre et Marie Curie, 9 quai Saint Bernard, 75252 Paris, Cedex 05, France. *Universidad Miguel Hernández, Facultad de Medicina, División de Microbiología, 03550 San Juan, Alicante, Spain. 530
II
Chlamydophila psittaci Chlamydophila pneumoniae Chlamydia muridarum Chlamydia trachomatis Mycobacterium tuberculosis Mycobacterium bovis Streptomyces coelicolor Corynebacterium diphtheriae
Chlamydiales
High GC Gram positives
Mitochondria
α-, β-, γ-Proteobacteria trends in Genetics
Bacterial rps14 sequences cluster within three main groups (I, II and III) in the phylogeny constructed using the neighbour-joining algorithm and rooted on the archaeal and eukaryotic sequences. Several monophyletic groups are displayed as solid triangles for clarity. Names of groups that are considered monophyletic on the basis of other phylogenetic markers but split in the RpS14 phylogeny are in bold. Species with two rps14 copies are in bold. The rps14 sequences mapping within spc canonical operon are indicated by †, those mapping within rearranged operons by C. Bootstrap proportions estimated using 1000 replicates are shown for the three main groups. The scale bar represents the number of substitutions per 100 sites for a unit branch length. The alignment, sequence accession numbers and the complete phylogenetic tree are available at http://sorex.snv.jussieu.fr.
TIG December 2000, volume 16, No. 12
GENOME ANALYSIS
Horizontal gene transfer in the ribosome
Outlook
FIGURE 2. Archaeal and bacterial canonical spc operons for the rps14 genes
L14 L24
S4
L5 S14 S8
L6
L14 L24 L5 S14 S8
Catalase
L33
S14
L32 L19
L6
L18
S5
L30 L15
L18 S5 L30 L15
GMP reductase
SecY
Archaeal canonical
Bacterial canonical
SecY
Staphylococcus aureus
HP
Enterococcus faecalis ATPase L33 S14
GTPase
Group II
Zn-binding protein
Bacillus subtilis Alkanesulfate monooxigenase
HP
S14 HP
Nucleoside Uridin S14 permease phosphorylase
ABC transporter
ATPbinding
HP
HP cytosolic
Streptococcus pyogenes
EndoAcetylpeptidase transferase
HP
S14
S8
L6
Oxetanocin A-resistance
S14
Phageshock
L18
Low GC Gram positives
S5 L30 L15
SecY
Deinococcus radiodurans
Group II Thermus/ Deinococcus
Synechocystis sp. SPHX
Group III Nostoc punctiforme HP
EndoS14 nuclease
HP
Leucyltransferase
Cyanobacteria
Synechococcus sp. S14
HP
RNPA L34 L36 S14 HP
Adhesion protein
Regulators S18
Regulator
Oligopeptide permease
HP
L31
L32
L28 S14 S18 L33
HP
Excinuclease ABC-C
Chlamydophila pneumoniae Chlamydia trachomatis
Group III Chlamydiales
Mycobacterium tuberculosis Mycobacterium bovis
USF protein
Group III
L31 L28 Membrane L33 S14 protein
L28 S14 S18 L33
HP
Streptomyces coelicolor HP
High GC Gram positives
Corynebacterium diphtheriae HP trends in Genetics
Archaeal and bacterial canonical spc operons, and genetic environments for the rps14 genes acquired by horizontal gene transfer (HGT) by several species. Boxes corresponding to genes for ribosomal proteins are in colour. tRNA genes are indicated by stars. Arrows indicate the sense of transcription. Species with two rps14 copies, one within a canonical operon and the other within a rearranged operon, are in bold. Abbreviations: HP, hypothetical protein; RNPA, RNase P protein A; SPHX, periplasmic phosphate-binding protein; USF, putative carboxymethylenebutenolidase.
RpS8, RpL6, RpL18, RpS5, RpL30 and RpL15, and the secretion pathway protein SecY (Ref. 6). We were particularly interested in protein RpS14, because it is required for the assembly of ribosomal 30S subunits and is part of the peptide environment of the peptidyl transferase center, which is involved in the essential process of peptide elongation7. According to common assumptions, no successful HGT events should be expected for this protein, given its important role and its large number of physical interactions with other components of the ribosome. This is in agreement with the recently proposed complexity
hypothesis, which states that HGT of genes with products involved in large, complex systems (as the ribosome) is very unlikely8.
Phylogenetic analysis We have carried out a phylogenetic analysis of all available RpS14 sequences, including several from ongoing genome projects. Despite the reduced length of this protein (~100 positions), the monophyly of most of the bacterial phyla is retrieved. Sequences appear in the phylogenetic tree within three main groups (I, II and III; Fig. 1) supported by high TIG December 2000, volume 16, No. 12
531
Outlook
GENOME ANALYSIS
bootstrap proportions. Group I is composed of low- and high-GC Gram positives, Spirochaetes, Thermus spp., Aquifex aeolicus, Thermotoga maritima and d- and eproteobacteria. Group II clusters several low-GC Gram positives, Deinococcus radiodurans, Chlorobium tepidum and Porphyromonas gingivalis. Group III encompasses cyanobacteria, chloroplasts, a-, b-, g-proteobacteria, mitochondria, Chlamydiales and some high-GC Gram positives. Moreover, sequences from each group are distinguishable by characteristic insertions/deletions (indels) near the N-termini. Group III sequences exhibit an insertion of ~38 amino acids, whereas group II sequences possess an insertion of ~23 amino acids, homologous to the initial part of the group III insertion (alignments are available at http://sorex.snv.jussieu.fr). Group I sequences do not show these insertions and, because the outgroup (archaea and eukaryotes) also lacks them, the ancestral bacterial rpsS14 sequence was most probably group I-like. The RpS14 tree is at odds in some cases with our knowledge about bacterial phylogeny. For instance, this tree supports polyphyly of proteobacteria and of the Thermus/Deinococcus group, whereas they are monophyletic for most other phylogenetic markers. In addition, some species such as Mycobacterium tuberculosis or Bacillus subtilis have two RpS14 copies, each one belonging to different groups (Fig. 1). This situation can be explained by a complex series of gene duplications and losses, but a more parsimonious interpretation, supported by the analysis of operon structure (see below), implies several ancient HGT events between different bacterial groups.
Operon structure One of the most remarkable large-scale HGT of rps14 genes seems to be the acquisition of proteobacterial group III-type genes by Chlamydiales, Cyanobacteria and several high-GC Gram-positive species (Fig. 1). In all cases, these acquired genes do not map within canonical spc operons, which further strengthens the hypothesis of their HGT origin. In the high-GC Gram-positive species, the transferred rps14 sequences are located near other ribosomal protein genes that are more closely related to proteobacterial homologues than to Gram-positive ones. In addition, these species (except Corynebacterium diphtheriae, although its genome sequence is not yet completed) possess typical Gram-positive, group I-type sequences that map within canonical spc operons. Other species, such as Chlamydiales and Cyanobacteria, have only retained the transferred copy and have lost the rps14 gene in their spc operon. The single-copy rps14 gene from Chlamydiales seems to be co-transcribed with rpl36 (Fig. 2), whereas the cyanobacterial rps14 gene appears isolated from other ribosomal protein genes. The situation in Synechocystis sp. is particularly interesting, because its rps14 gene maps very closely to an Arg-tRNA gene and an oxetanocin A-resistance gene (Fig. 2). The latter gene is found frequently in plasmids and pathogenicity islands, which are regularly associated to tRNA genes and transferred horizontally between species9–11. These data strongly suggest that Cyanobacteria and Chlamydiales acquired their rps14 genes horizontally from proteobacterial species followed by the deletion of the copy primitively present in the spc operon. These represent ancient HGT events because, for example, the acquisition of the rps14 gene by Cyanobacteria pre-dated 532
TIG December 2000, volume 16, No. 12
Horizontal gene transfer in the ribosome
the divergence of chloroplasts. A transfer in the opposite sense is exemplified by the proteobacterial species Thiobacillus ferrooxidans, whose original group III-type sequence seems to have been replaced by homologous recombination by a group I-type sequence, without disruption of its spc operon (Fig. 1). Sequences from group II are especially interesting. Although they are found mostly in low-GC Gram-positive species, they are more akin to group III-type sequences than to the main Gram-positive group I cluster – a relationship supported by the N-terminal indel present in these sequences. All but one of these Gram-positive species have a second typical group I sequence mapping within canonical spc operons, whereas group II homologues map at diverse sites in the genome, unlinked to other ribosomal protein genes. It can thus be hypothesised that the ancestor of these Gram positives acquired a second rps14 gene by HGT, which then diverged and moved to different positions of the genome. The case of Streptococcus pneumoniae is the exception, because its only rps14 copy, despite belonging to group II, maps within the canonical spc operon. This suggests that its original group I-type rps14 gene was replaced by homologous recombination by the transferred group II-type gene. The unusually long intergenic region found in this species between rps14 and rps8 (287 base pairs, with a long inverted repeat) could be a result of this recombination event. It is clear that HGT has occurred in D. radiodurans, which shows an spc operon that is disrupted precisely near the rps14 gene. A Leu-tRNA gene maps just upstream, together with a hypothetical protein, an ABC transporter ATP-binding protein, and an ABC transporter protein (Fig. 2). BLAST searches12 with these three proteins yield best matches with high-GC Gram-positive homologues, especially Streptomyces sequences, suggesting that they were acquired together with rps14 by HGT from a Gram positive (not shown). In this case, a recombination event could have replaced the original rps14 copy by the transferred copy, disrupting the spc operon. This possibility is supported by the phylogenetic position of D. radiodurans in the RpS14 tree, surprisingly far from the Thermus spp. sequences. The presence of a tRNA gene at the point of disruption of the D. radiodurans spc operon might be significant, because tRNA genes are frequently involved in recombination events and integration of foreign DNA acquired by HGT (Ref. 10). In addition, a tRNA gene (Gly-tRNA) is present close to the group II-type rps14 gene of B. subtilis (Fig. 2), suggesting that tRNA-mediated recombination was a common phenomenon in the acquisition of these rps14 genes.
Conclusion The multiple HGT events that seem to have affected the rps14 gene are difficult to explain unless there are predominant selective pressures favouring them. An appealing possibility could be that antibiotic resistance is conferred by the transferred sequences coming from resistant species – this is made all the more likely because rps14 is known to be involved in antibiotic resistance13,14. One possible example is the case of D. radiodurans, which seems to have acquired a DNA stretch from a Streptomyces species containing, among others, an rps14 gene. Because the streptomycetes are the most important group of bacterial antibiotic producers15, it is not surprising that other species acquire their naturally resistant
GENOME ANALYSIS
Horizontal gene transfer in the ribosome
genes by HGT. By contrast, high-GC Gram positives seem to have acquired an rps14 gene from proteobacteria. Therefore, these species could have a mixed population of ribosomes with different antibiotic sensitivities. It would be worth analysing whether the regulation of expression of the two rps14 copies in these species depends on antibiotic pressure. An important conclusion of our work is that the recurrent transfer of rps14 genes among different bacterial groups is at odds with the recently proposed complexity hypothesis8. In fact, it seems that important selective factors, such as antibiotic resistance, can overcome the restrictions that macromolecular interactions in complex structures could impose on HGT. In addition, these results concern the hypothesis that HGT might mediate References 1 Aravind, L. et al. (1998) Evidence for massive gene exchange between archaeal and bacterial hyperthermophiles. Trends Genet. 14, 442–444 2 Lawrence, J.G. and Ochman, H. (1998) Molecular archaeology of the escherichia coli genome. Proc. Natl. Acad. Sci. U. S. A. 95, 9413–9417 3 Doolittle, W.F. (1999) Phylogenetic classification and the universal tree. Science 284, 2124–2129 4 Martin, W. (1999) Mosaic bacterial chromosomes: a challenge en route to a tree of genomes. Bioessays 21, 99–104 5 Wolf, Y.I. et al. (1999) Rickettsiae and Chlamydiae: evidence of horizontal gene transfer and gene exchange. Trends Genet. 15, 173–175 6 Nomura, M. (1984) The control of ribosome synthesis. Sci. Am. 250, 102–114
Outlook
the formation of operons in bacteria16 because, for the spc operon, HGT has been an important agent of operon disruption in diverse bacterial species. Finally, using a combined approach of molecular phylogeny and operon analysis, we have shown how different types of HGT (with/without host gene replacement and with/without operon disruption) can be discriminated.
Acknowledgements We thank P. Lopez for critical reading of the manuscript. Sequences made available prior to publication by Chicago University, The Institute of Genome Research, Joint Genome Initiative, Sanger Center and Washington University are acknowledged.
7 Bischof, O. et al. (1995) Peptide environment of the peptidyl transferase center from Escherichia coli 70 S ribosomes as determined by thermoaffinity labeling with dihydrospiramycin. J. Biol. Chem. 270, 23060–23064 8 Jain, R. et al. (1999) Horizontal gene transfer among genomes: The complexity hypothesis. Proc. Natl. Acad. Sci. U. S. A. 96, 3801–3806 9 Wood, M.W. et al. (1998) Identification of a pathogenicity island required for Salmonella enteropathogenicity. Mol. Microbiol. 29, 883–891 10 Hou, Y.M. (1999) Transfer RNAs and pathogenicity islands. Trends Biochem. Sci. 24, 295–298 11 Morita, M. et al. (1999) Cloning of oxetanocin A biosynthetic and resistance genes that reside on a plasmid of Bacillus megaterium strain NK84-0128. Biosci. Biotechnol. Biochem. 63, 563–566
12 Altschul, S.F. et al. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410 13 Buck, M.A. and Cooperman, B.S. (1990) Single protein omission reconstitution studies of tetracycline binding to the 30S subunit of Escherichia coli ribosomes. Biochemistry 29, 5374–5379 14 Wittmann-Liebold, B. et al. (1995) Structural and functional implications in the eubacterial ribosome as revealed by proteinrRNA and antibiotic contact sites. Biochem. Cell Biol. 73, 1187–1197 15 Crandall, L.W. and Hamill, R.L. (1986) Antibiotics produced by Streptomyces: major structural classes. In The bacteria: Antibiotic producing Streptomyces (Vol. 9) (Queener, S.W. and Day, L.E., eds), pp. 355–401, Academic Press 16 Lawrence, J.G. and Roth, J.R. (1996) Selfish operons: horizontal transfer may drive the evolution of gene clusters. Genetics 143, 1843–1860
Mining archaeal proteomes for eukaryotic proteins with novel functions: the PACE case he complete sequencing of the human genome is identifying numerous putative proteins that have unknown functions. It is increasingly important to design rational strategies to focus future analyses on the most promising of these putative proteins. Here we have developed such a strategy using comparative genomics to identify proteins from Archaea without assigned function that are conserved in Eukarya (PACEs). Because they remain highly conserved after several billions years of divergence, these proteins could have important functions. We first focused on PACEs that are not present in Bacteria because many previously studied proteins that are present only in Archaea and Eukarya are involved in information transfer (translation, transcription and replication)1,2. Thus, some of the putative proteins specific to Archaea and Eukarya might be novel informational proteins with important functions in eukaryotic cell biology. Several human informational proteins are already targets for drug design, such as type I and II DNA topoisomerases and other DNA metabolic enzymes3,4. We first identified putative PACEs using a BLAST search of the archaeon Pyrococcus abyssi genome (available online at http://www.genoscope.cns.fr/cgi-bin/Pab.cgi).
T
0168-9525/00/$ – see front matter © 2000 Elsevier Science Ltd. All rights reserved. PII: S0168-9525(00)02137-5
A protein was considered as a PACE candidate in this work if: • it was annotated as a hypothetical protein; • its best match with at least one eukaryotic homologues has an ‘Expect value’ in the BLAST search of less than 1 3 10210; • it has no bacterial homologue (except for hyperthermophilic bacteria) with an Expect value less than 1 3 10210. We made exception for proteins with homologues in hyperthermophilic bacteria because gene transfers between archaea and hyperthermophilic bacteria are well documented5,6. We excluded proteins with a very different size from those of their eukaryotic homologues (suggesting modular evolution with change in function) and proteins that are homologous only because of the presence of a coiled-coil region. All proteins homologous to these PACEs were identified by BLAST searches of protein and DNA in public protein and expressed sequence tag (EST) databases, and subsequently aligned using the ClustalW program. We used phylogenetic analyses to identify gene duplications in Archaea so that a eukaryotic protein corresponds to a single PACE. We found an additional PACE using the ‘phylogenetic TIG December 2000, volume 16, No. 12
Oriane Matte-Tailliez matte@ igmors.u-psud.fr Yvan Zivanovic yvanz@ igmors.u-psud.fr Patrick Forterre forterre@ igmors.u-psud.fr Institut de Génétique et Microbiologie, UMR C8621, Université ParisSud, 91405 Orsay Cedex, France. 533