Organisation of the S10, spc and alpha ribosomal protein gene clusters in prokaryotic genomes

Organisation of the S10, spc and alpha ribosomal protein gene clusters in prokaryotic genomes

FEMS Microbiology Letters 242 (2005) 117–126 www.fems-microbiology.org Organisation of the S10, spc and alpha ribosomal protein gene clusters in prok...

923KB Sizes 2 Downloads 41 Views

FEMS Microbiology Letters 242 (2005) 117–126 www.fems-microbiology.org

Organisation of the S10, spc and alpha ribosomal protein gene clusters in prokaryotic genomes Tom Coenye *, Peter Vandamme Laboratorium voor Microbiologie, Universiteit Gent, K.L. Ledeganckstraat 35, B-9000 Gent, Belgium Received 13 September 2004; received in revised form 27 October 2004; accepted 27 October 2004 First published online 11 November 2004 Edited by M.R. Soria

Abstract Although it is well known that there is no long range colinearity in gene order in bacterial genomes, it is thought that there are several regions that are under strong structural constraints during evolution, in which gene order is extremely conserved. One such region is the str locus, containing the S10–spc–alpha operons. These operons contain genes coding for ribosomal proteins and for a number of housekeeping genes. We compared the organisation of these gene clusters in 111 sequenced prokaryotic genomes (99 bacterial and 12 archaeal genomes). We also compared the organisation to the phylogeny based on 16S ribosomal RNA gene sequences and the sequences of the ribosomal proteins L22, L16 and S14. Our data indicate that there is much variation in gene order and content in these gene clusters, both in bacterial as well as in archaeal genomes. Our data indicate that differential gene loss has occurred on multiple occasions during evolution. We also noted several discrepancies between phylogenetic trees based on 16S rRNA gene sequences and sequences of ribosomal proteins L16, L22 and S14, suggesting that horizontal gene transfer did play a significant role in the evolution of the S10–spc–alpha gene clusters. Ó 2004 Federation of European Microbiological Societies. Published by Elsevier B.V. All rights reserved. Keywords: Ribosomal protein; S10; Phylogeny

1. Introduction Shortly following the completion of the first two prokaryotic genomes (that of Haemophilus influenzae and that of Escherichia coli) it was noted that there is no long range colinearity in gene order in bacterial genomes [1,2]. Subsequent studies on more taxa confirmed this initial finding: apparently dynamic rearrangements have occurred frequently enough to break up operon structures [3–5] and although gene order is extremely conserved in closely related taxa, it rapidly becomes less conserved with evolutionary distance [6,7]. However, *

Corresponding author. Tel.: +32 9 2645128/5114; fax: +32 9 2645092. E-mail address: [email protected] (T. Coenye).

even in distantly related genomes, several highly conserved regions can be found, probably regions that are under strong structural constraints during evolution [5,7]. Systematic genome comparisons have revealed that functionally related genes tend to be neighbours more often than unrelated genes [8] and this provides strong support for the concept that conserved gene order could be correlated with physical interactions between the encoded proteins [9]. One region in which gene order generally appears to be conserved is the str locus that contains the S10–spc–alpha operons, encoding ribosomal proteins and a number of housekeeping genes [5,10]. In E. coli, 53 ribosomal proteins have been identified [11,12]. Approximately half of these are encoded by genes that are located at the str locus, while the rest

0378-1097/$22.00 Ó 2004 Federation of European Microbiological Societies. Published by Elsevier B.V. All rights reserved. doi:10.1016/j.femsle.2004.10.050

118

T. Coenye, P. Vandamme / FEMS Microbiology Letters 242 (2005) 117–126

are scattered around the genome in clusters of 1–4 genes. The genetic organisation of the ribosomal protein clusters is complex, with many operons containing genes

for non-ribosomal proteins. In addition, the organisation of many ribosomal protein operons does not follow the promotor-structural gene-terminator paradigm [12].

Fig. 1. Dendrogram derived from the unweighted pair group average linkage of categorical coefficients between the organisation of S10,spc and alpha gene clusters in sequenced bacterial genomes.

T. Coenye, P. Vandamme / FEMS Microbiology Letters 242 (2005) 117–126

The physiological relevance of this complex organisation is at present not entirely clear. In E. coli, the S10 operon contains the genes coding for ribosomal proteins S10, L3, L4, L23, L2, S19, L22, S3, L16, L29 and S17. The spc operon contains the genes coding for ribosomal proteins L14, L24, L5, S14, S8, L6, L18, S5, L15 and L36. In addition, between the genes coding for L15 and L36, the secY gene is found, coding for a preprotein translocase. The alpha operon contains ribosomal proteins S13, S11, S4 and L17, with rpoA (coding for the a-subunit of RNA polymerase) inserted between S4 and L17. Subsequently, the organisation of the S10, spc and alpha gene clusters was determined for a number of other bacterial taxa (including Mycoplasma capricolum [13], Chlamydia trachomatis [14], Bacillus subtilis [15], Synechococcus sp. [16] and Sinorhizobium meliloti [15]), as well as for a number of archaeal species (including Sulfolobus solfataricus [17] and Halobacterium halobium [18]). While for several organisms the organisation of S10, spc and alpha gene clusters was very similar to the organisation seen in E. coli, deletions or insertions of additional genes and/or translocations of genes were often noticed. For example, in contrast to E. coli and H. influenzae, the spc gene clusters of B. subtilis and Mycoplasma genitalium contain three additional genes coding for non-ribosomal proteins: adk (coding for

119

adenylate kinase), map (coding for methionine aminopetidase) and infA (coding for translation initiation factor I) [5]. When comparing the gene order conservation in 35 sequenced prokaryotic genomes, Tamames [7] found varying levels of conservation of gene order (15–88%, expressed as the ratio between the number of times the gene is conserved in the run and the total number of times the gene is present) for members of these gene clusters. Although it has been hypothesised that genes coding for proteins involved in multiple interactions, including ribosomal proteins, are less likely to be horizontally transferred (the complexity hypothesis [19]), horizontal gene transfer has been described for some ribosomal protein genes, including S14 and L27 [20–23]. Especially the case of the S14 gene is intriguing, as there seems to have been recurrent transfers of this gene between various bacterial groups [20]. Other studies have demonstrated the importance of ribosomal protein gene duplications and lineage-specific gene loss [21,24]. This suggests that many evolutionary forces are involved in shaping the organisation of ribosomal protein gene clusters. Now that bacterial genome sequences are published almost weekly, it is possible to compare the organisation of S10, spc and alpha gene clusters in a wide range of

Fig. 2. Schematic overview of the organisation of some S10, spc and alpha gene clusters in 99 bacterial genomes. The three lateral arrows below the gene names represent the operon organisation in Escherichia coli. +, presence; , absence; x, found in another position in genome. Numbers in first column refer to organisation of the cluster as indicated in Table S1 (which taxa belong to which group can also be found in Table S1).

120

T. Coenye, P. Vandamme / FEMS Microbiology Letters 242 (2005) 117–126

taxa. In the present study we compared the organisation of the S10, spc and alpha gene clusters in 99 sequenced bacterial genomes. Twelve archaeal genomes were included for comparison. We also compared the organisation of S10, spc and alpha gene clusters with groupings obtained by comparing 16S ribosomal RNA gene sequences and the sequences of multiple ribosomal proteins.

2. Materials and methods 2.1. Genome sequences We downloaded 99 bacterial and 12 archaeal genome sequences from the GenBank database. If several strains from a single species were sequenced we only included one. An overview of all taxa included (including strain number and GenBank accession number) is given in Tables S1 and S2 that can be found as supplementary

data online at http://allserv.ugent.be/~tcoenye/cepacia/ page40.html. 2.2. Sequence alignment and numerical analysis 16S ribosomal RNA gene and amino acid sequences from ribosomal proteins L16, L22 and S14 were extracted from the whole-genome sequence. Sequences were aligned using the emma interface (EMBOSS). Tree construction and bootstrap analyses (100 replicates) were performed using the Bionumerics 3.5 (Applied Maths) and Treecon [25] software packages. Phylogenetic trees were constructed using the neighbour-joining method [26] (no specific substitution model was applied). In the case of the genetic organisation of the S10, spc and alpha gene clusters, all individual genes were considered as multistate characters. Genes were considered to belong to one of the following categories: (i) present in the genome in the same place and order as in the E. coli and/or B. subtilis genomes; (ii) present in

Table 1 Overview of non-ribosomal proteins inserted in the S10, spc and alpha gene clusters in sequenced bacterial genomes Organism

Inserted non-ribosomal genes

Location

Bacillus halodurans C125 Bifidobacterium longum NCC2705 Corynebacterium diphtheriae NCTC13129

Hypothetical protein Hypothetical protein Putative secreted protein, putative ABC transport system ATP-binding protein and putative ABC transport system integral membrane protein Serine transporter, L -serine dehydratase and putative secreted amino acid hydrolase Putative transport protein, putative sugar binding secreted protein, putative sugar ABC transport system membrane protein and putative ABC transport system membrane protein Putative sialidase precursor and putative secreted protein 2 Hypothetical proteins and putative glucose-6-phosphate dehydrogenase Hypothetical protein 2 Hypothetical proteins Uncharacterised protein Hypothetical protein InsA and InsB Conserved hypothetical protein Unknown protein Hypothetical protein yvfC, IS 1077F transposase, hypothetical protein yvfD and IS 904H transposase 2 Unknown proteins Possible arylsulfatases ATSa and ATSb, conserved hypothetical protein and conserved transmembrane protein Possible protease IV sppA, possible D -xylulose kinase B and conserved hypothetical protein Arylsulfatase pseudogene and 2 hypothetical proteins Possible protease IV sppA, possible D -xylulose kinase B and conserved hypothetical protein Arylsulfatase and 2 hypothetical proteins Possible protease IV sppA, possible d-xylulose kinase B and conserved hypothetical protein 3 Hypothetical proteins Hypothetical protein Hypothetical protein

Between map and infA Between S13 and rpoA Between S17 and L14

Corynebacterium efficiens YS-314

Corynebacterium glutamicum ATCC13032 Coxiella burnetii RSA493 Haemophilus ducreyi 35000HP Helicobacter hepaticus ATCC51449 Lactococcus lactis IL1403

Mesorhizobium loti Mycobacterium bovis AF2122/97

Mycobacterium leprae TN

Mycobacterium tuberculosis H37Rv

Neisseria meningitidis Z2491 Pirellula sp. strain 1 Thermoanaerobacter tengcongensis MB4T

Between L5 and S8 Between L15 and secY

Between map and infA Between L5 and S8 Between Between Between Between Between Between Between Between

map and infA L5 and S8 map and infA secY and S13 S17 and L14 L5 and S8 S14 and S8 adk and infA

Between L18 and L15 Between S17 and L14 Between L15 and secY Between S17 and L14 Between L15 and secY Between S17 and L14 Between L15 and secY Between S10 and L3 Between S10 and L3 Between map and infA

T. Coenye, P. Vandamme / FEMS Microbiology Letters 242 (2005) 117–126

the genome but in a different place and/or order than in the E. coli and/or B. subtilis genomes or (iii) absent from the genome. Trees were constructed using the

121

Bionumerics 3.5 software package, using the categorical coefficient. Absence of a gene from a genome was confirmed by performing a BLASTP analysis [27], using

Fig. 3. Phylogenetic tree based on 16S rRNA gene sequences. The scale bar represents 10% sequence disimilarity.

122

T. Coenye, P. Vandamme / FEMS Microbiology Letters 242 (2005) 117–126

the ribosomal protein sequence of the closest relative as the query sequence.

3. Results and discussion 3.1. Organisation of S10, spc and alpha gene clusters in bacterial genomes When we compared the organisation of S10, spc and alpha gene clusters in 99 sequenced bacterial genomes, 42 different organisations were observed (see Table S1). Based on this organisation we constructed a dendrogram, using the categorical coefficient (Fig. 1). A schematic overview of the organisations observed is given in Fig. 2. Most variation occurs in the 3 prime half of the spc gene cluster and in the alpha gene cluster, while the S10 gene cluster appears to be more conserved. It is worth noting that only 14 organisms showed the same organisation as seen in E. coli. Many bacterial genomes do not encode all ribosomal proteins found in the S10–spc–alpha operons in E. coli (Table S1 and Fig. 2). While some of these gene losses appear to be specific for one or more lineages (for example L30 is absent from the genomes of members of the Chlamydiae, the e-Proteobacteria, the Cyanobacteria and the mycoplasmas), other losses are restricted to one or a few members of a lineage (for example L2 is present in all bacterial genomes, except in that of Streptococcus mutans). Most gene loss is seen in Clostridium tetani: the genome of C. tetani appears to lack the genes that code for 10 ribosomal proteins proteins found in the S10–spc–alpha operons in E. coli. In a number of genomes, additional genes coding for non-ribosomal proteins can be found in the S10, spc and alpha gene clusters. In several cases the inserted genes encode unknown and/or hypothetical proteins (Table 1). Some of these are very short and it remains to be determined if these are true protein coding genes or open reading frames that occur by chance [28]. However, in several genomes, there is evidence for the insertion of true protein-coding genes in the S10, spc and alpha gene clusters (Table 1). Multiple genes found in the S10–spc–alpha operons in E. coli, are found outside these gene clusters in many bacterial genomes investigated (Table S1 and Fig. 2). Two different classes can be distinguished. In several genomes, genes coding for ribosomal proteins found in the S10–spc–alpha operons in E. coli are now found outside these clusters. These genes are not grouped together but are found on different positions, scattered throughout the genome. This is the case for Agrobacterium tumefaciens, Rhodopseudomonas palustris, Rickettsia conorii, Rickettsia prowazekii, S. meliloti, Neisseria meningitidis, Ralstonia solanacearum, Staphylococcus aureus, Pasteurella multocida, Photorhabdus luminescens, Pseudomonas

aeruginosa, Pseudomonas putida, Pseudomonas syringae, Salmonella enterica, Shewanella oneidensis, Vibrio cholerae, Vibrio parahaemolyticus, Vibrio vulnificus, Xanthomonas axonopodis, Xanthomonas campestris, Xylella fastidiosa, Prochlorococcus marinus, Synechococcus sp., Synechocystis sp., Thermosynechococcus elongatus, Pirellula sp. and Treponema pallidum. However, it appears that in other genomes, multiple genes found in the S10–spc–alpha operons in E. coli have formed novel, separate ribosomal gene clusters (data not shown). For example, in the Campylobacter jejuni genome the genes infA, L36, S13, S11, S4, rpoA and L17 form a separate gene cluster located in a different position of the genome. Whether these genes or gene clusters found outside the S10, spc and alpha gene operons represent horizontal gene transfer followed by deletion of the original gene or gene clusters in the S10, spc and alpha operons, or are the result of a single or multiple genomic rearrangement(s) within the genome is at present not clear. In several bacterial genomes, genes located in the S10– spc–alpha operons in E. coli are found in other ribosomal gene clusters (data not shown). For example, the S10 gene is located in the S12 ribosomal gene cluster in the genome of all species of the Chlamydiae and the Cyanobacteria. Similarly, the S4 gene of Mycoplasma gallisepticum is also located in the S12 cluster. There are also several examples of changes in gene order within the S10, spc and alpha gene clusters; for example, in Thermotoga maritima, infA is localised at the 3Õ end of the L17, while in Mycoplasma penetrans, S3 is localised between S17 and L29. The genome of several organisms included in this study consists of multiple replicons (A. tumefaciens, Brucella melitensis, Brucella suis, R. solanacearum, V. cholerae, V. vulnificus, V. parahaemolyticus, Leptospira interrogans and Deinococcus radiodurans). In all these organisms the S10, spc and alpha gene clusters were located on the largest replicon. 3.2. Organisation of S10, spc and alpha gene clusters in archaeal genomes When we compared the organisation of S10, spc and alpha gene clusters in 12 sequenced archaeal genomes, 10 different organisations were observed (see Table S2). The organisation of the S10, spc and alpha gene Table 2 Pearson product moment correlation coefficients between the similarity matrices of the different data sets Organisational similarity

100

Sequence similarity of 16S rRNA gene L16 L22 S14

86.7

100

77.4 76.9 88.0

82.3 80.7 72.7

100 79.5 73.9

100 74.5

100

T. Coenye, P. Vandamme / FEMS Microbiology Letters 242 (2005) 117–126

123

Fig. 4. Concordance between the phylogeny derived from the organisation of the S10, spc and alpha gene clusters, and the phylogenies derived from the 16S rRNA gene sequences and the sequences of ribosomal proteins L16, L22 and S14.

124

T. Coenye, P. Vandamme / FEMS Microbiology Letters 242 (2005) 117–126

Fig. 5. Phylogenetic tree based on S14 sequences. The scale bar represents 10% sequence disimilarity.

clusters of Methanosarcina acetivorans, Pyrococcus furiosus, Archaeoglobus fulgidus, S. solfataricus, Halobacterium sp., Thermoplasma acidophilum and Methan-

othermobacter thermoautotrophicum is somewhat similar to the organisation in bacterial genomes, with most variation being localised in the 3 0 half of the spc

T. Coenye, P. Vandamme / FEMS Microbiology Letters 242 (2005) 117–126

gene cluster and in the alpha gene cluster. However, the organisation of the S10, spc and alpha gene clusters of Aeropyrum pernix, Methanocaldococcus janaschii, Methanopyrus kandleri, Nanoarchaeum equitans and Pyrobaculum aerophilum is totally different (Table S2) and these gene clusters actually appear to be nonexisting in P. aerophilum and N. equitans. We also noted the insertion of several genes coding for other ribosomal proteins between genes localised in the S10, spc and alpha gene clusters (data not shown). For example, the genes coding for ribosomal proteins L32 and L19 were inserted between the genes coding for L6 and L18 in all archaeal genomes (except in A. pernix, N. equitans and P. aerophilum), while L7 was inserted between S5 and L15 in T. acidophilum. The S10 gene is colocalised with the S12 gene cluster in all archaeal genomes (except those of P. furiosus, N. equitans and P. aerophilum) while the S4 gene is located between L24 and L5 in all archaeal genomes (except those of A. pernix, M. thermoautotrophicum, N. equitans and P. aerophilum). There are also several examples of genes normally found in the S10, spc and alpha gene clusters that now form a separate gene cluster on another location in the genome. This is for example the case for the S13, S4 and S11 genes in all archaeal genomes investigated, and the L3, L4 and L23 genes in A. pernix. 3.3. Phylogenies based on amino acid sequences of ribosomal proteins L22, L16 and S14 and comparison with phylogenies based on 16S rRNA gene sequences and on organisation of S10, spc and alpha gene clusters The 16S rRNA gene has been widely used to infer phylogenetic relationships among prokaryotes. There is however considerable concern that single-gene trees may not adequately reflect phylogenetic relationships, because of the possibility of horizontal gene transfer. For this reason, the sequences of protein coding genes have been used to deduce phylogenetic relationships between organisms, including genes coding for ribosomal proteins [29,30]. Data from the present study indicate that, from the ribosomal proteins encoded by genes localised in the S10, spc and alpha gene clusters, L22 and L16 are the most ‘‘stable’’ genes (i.e. they are present in all bacterial genomes in the same location within the S10, spc and alpha gene clusters). As there is some evidence that the S14 gene might be horizontally transferred [20], we also included the S14 protein in our phylogenetic analysis. A phylogenetic tree based on 16S rRNA gene sequences is shown in Fig. 3. Overall, the phylogenies derived from L16 and L22 sequences were similar to each other and to the phylogeny derived from the 16S rRNA gene sequences (Table 2, Fig. 4). The main discrepancies between the 16S rRNA gene sequence based tree and the tree based on L16 sequences were: (i) the close relation-

125

ships between the Actinobacteria and the Firmicutes; (ii) the separate postions of the mycoplasmas and members of the genus Clostridia; (iii) the fact that the b-Proteobacteria appear as a subgroup of the c-Proteobacteria; (iv) the fact that the d-proteobacterium Geobacter sulfurreducens does not group with the other Proteobacteria; (v) the separate position of the spirochaete L. interrogans. The main discrepancies between the 16S rRNA gene sequence based tree and the tree based on L22 sequences were: (i) the fact that the b-Proteobacteria appear as a subgroup of the c-Proteobacteria; (ii) the fact that the d- and e-Proteobacteria seem unrelated to each other and the other Proteobacteria; (iii) the separate position of L. interrogans; (iv) the close relationship between the Cyanobacteria and the Actinobacteria. The correlation between the grouping obtained based on 16S rRNA gene sequence similarity and S14 protein sequence similarities was lower, and several differences between both trees can be observed (Figs. 3 and 5). The main discrepancies between the 16S rRNA gene sequence based tree and the tree based on S14 sequences were: (i) the subdivision of the Actinobacteria; (ii) the separate position of Clostridium perfringens and Clostridium acetobutylicum; (iii) the separate position of Streptococcus pneumoniae; (iv) the separate positions of the d- and e-Proteobacteria. When comparing the sequence-based trees to the tree based on the organisation of the S10, spc and alpha gene clusters, several differences were noted. Most remarkable were the diversity of the e-Proteobacteria, and the positions of the clostridia, the spirochaetes, S. mutans, Gloeobacter violaceus, M. gallisepticum and M. penetrans in the tree based on the organisation of the S10, spc and alpha gene clusters (Fig. 1). The overall Pearson product moment correlation coefficients between organisational similarity and 16S rRNA gene, L16, L22 and S14 sequence similarity were high (86.7%, 77.4%, 76.9% and 88.0%, respectively) (Table 2, Fig. 4).

4. Conclusions Although it was previously reported that the S10– spc–alpha operon, encoding ribosomal proteins and a number of housekeeping genes, was similar in all bacterial genomes [5,9], data from the present study clearly indicate that there is much variation in gene order and content in these gene clusters. Whether or not the differences in organisation are partially or entirely due to: (i) genomic rearrangements in the genome; (ii) lineagespecific gene loss (preceded by gene duplications or not) and/or (iii) horizontal gene transfer, is at present not clear. However, evidence for the role of horizontal gene transfer in the evolution of ribosomal proteins was presented before [20–23] and the observed discrepancies between phylogenetic trees based on 16S rRNA

126

T. Coenye, P. Vandamme / FEMS Microbiology Letters 242 (2005) 117–126

gene sequences and sequences of ribosomal proteins L16, L22 and S14 also suggest that horizontal gene transfer may have played a significant role in the evolution of the S10–spc–alpha operon. More detailed studies will be required to confirm this. Our data also indicate that differential gene loss has occurred on multiple occasions during evolution. In addition, the determination of the organisation of the S10, spc and alpha gene clusters can provide additional, sequence-independent, information that can be used to deduce phylogenetic relationships between prokaryotes. Acknowledgements T.C. and P.V. are indebted to the Fund for Scientific Research – Flanders (Belgium) for a position as postdoctoral fellow and research grants, respectively. T.C. also acknowledges the support from the Belgian Federal Government (Federal Office for Scientific, Technical and Cultural Affairs). References [1] Tatusov, R.L., Mushegian, A.R., Bork, P., Brown, N.P., Hayes, W.S., Borodovsky, M., Rudd, K.E. and Koonin, E.V. (1996) Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coli. Curr. Biol. 6, 279–291. [2] Mushegian, A.R. and Koonin, E.V. (1996) Gene order is not conserved in bacterial evolution. Trends Genet. 12, 289–290. [3] Kolsto, A.B. (1997) Dynamic bacterial genome organisation. Mol. Microbiol. 24, 241–248. [4] Siefert, J.L., Martin, K.A., Abdi, F., Widger, W.R. and Fox, G.E. (1997) Conserved gene clusters in bacterial genomes provide further support for the primacy of RNA. J. Mol. Evol. 45, 467–472. [5] Watanabe, H., Mori, H., Ithoh, T. and Gojobori, T. (1997) Genome plasticity as a paradigm of eubacterial evolution. J. Mol. Evol. 44, S57–S64. [6] Suyama, M. and Bork, P. (2001) Evolution of prokaryotic gene order: genome rearrangements in closely related species. Trends Gen. 17, 10–13. [7] Tamames, J. (2001) Evolution of gene order conservation in prokaryotes. Genome Biol. 2, 0020.1–0020.11. [8] Tamames, J., Casari, G., Ouzounis, C. and Valencia, A. (1997) Conserved gene clusters of functionally related genes in two bacterial genomes. J. Mol. Evol. 44, 66–73. [9] Dandekar, T., Snel, B., Huynen, M. and Bork, P. (1998) Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem. Sci. 23, 324–328. [10] Itoh, T., Takemoto, K., Mori, H. and Gojobori, T. (1999) Evolutionary instability of operon structures disclosed by sequence comparisons of complete microbial genomes. Mol. Biol. Evol. 16, 332–346. [11] Nomura, M., Gourse, R. and Baughman, G. (1984) Regulation of the synthesis of ribosomes and ribosomal components. Annu. Rev. Microbiol. 53, 75–117. [12] Lindahl, L. and Zengel, J.M. (1986) Ribosomal genes in Escherichia coli. Annu. Rev. Genet. 20, 297–326. [13] Ohkubo, S., Muto, A., Kawauchi, Y., Yamao, F. and Osawa, S. (1987) The ribosomal protein gene cluster of Mycoplasma capricolum. Mol. Gen. Genet. 210, 314–322.

[14] Kaul, R., Gray, G.J., Koehncke, N.R. and Gu, L. (1992) Cloning and sequence analysis of the Chlamydia trachomatis spc ribosomal protein gene cluster. J. Bacteriol. 174, 1205–1212. [15] Barloy-Huber, F., Lelaure, V. and Galibert, F. (2001) Ribosomal protein gene cluster analysis in eubacterium genomics: homology between Sinorhizobium meliloti strain 1021 and Bacillus subtilis. Nucleic Acids Res. 29, 2747–2756. [16] Sugita, M., Sugishita, H., Fujishiro, T., Tsuboi, M., Sugita, C., Endo, T. and Sugiura, M. (1997) Organisation of a large gene cluster encoding ribosomal proteins in the cyanobacterium Synechococcus sp. strain PCC6301: comparison of gene clusters among Cyanobacteria, Eubacteria and Chloroplast genomes. Gene 195, 73–79. [17] Ianniciello, G., Gallo, M., Arcari, P. and Bocchini, V. (1994) Organisation of a Sulfolobus solfataricus gene cluster homologous to the Escherichia coli str operon. Biochem. Mol. Biol. Int. 33, 927–937. [18] Fujita, T. and Itoh, T. (1995) Organisation and nucleotide sequence of a gene cluster comprising the translation elongation factor 1 alpha, ribosomal protein S10 and tRNA(Ala) from Halobacterium halobium. Biochem. Mol. Biol. Int. 37, 107–115. [19] Jain, R., Rivera, M.C. and Lake, J.A. (1999) Horizontal gene transfer among genomes: the complexity hypothesis. Proc. Natl. Acad. Sci. USA 96, 3801–3806. [20] Brochier, C., Philippe, H. and Moreira, D. (2000) The evolutionary history of ribosomal protein RpS14: horizontal gene transfer at the heart of the ribosome. Trend Genet. 16, 529–533. [21] Makarova, K.S., Ponomarev, V.A. and Koonin, E. (2001) Two C or not two C: recurrent disruption of Zn-ribbons, gene duplication, lineage-specific gene loss, and horizontal gene transfer in evolution of bacterial ribosomal proteins. Genome Biol. 2, 0033.1–0033.14. [22] Garcia-Vallve´, S., Simo, F.X., Montero, M.A., Arola, L. and Romeu, A. (2002) Simultaneous horizontal gene transfer of a gene coding for ribosomal protein L27 and operational genes in Arthrobacter sp. J. Mol. Evol. 55, 632–637. [23] Matte-Tailliez, O., Brochier, C., Forterre, P. and Philippe, H. (2002) Archaeal phylogeny based on ribosomal proteins. Mol. Biol. Evol. 19, 631–639. [24] Lecompte, O., Ripp, R., Thierry, J.C., Moras, D. and Poch, O. (2002) Comparative analysis of ribosomal proteins in complete genomes: an example of reductive evolution at the domain scale. Nucleic Acids Res. 30, 5382–5390. [25] Van de Peer, Y. and De Wachter, R. (1994) TREECON for Windows: a software package for the construction and drawing of evolutionary trees for the Microsoft Windows environment. Comput. Appl. Biosci. 10, 569–570. [26] Saitou, N. and Nei, M. (1987) The neighbour-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425. [27] Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. (1997) Gapped BLAST and PSIBLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402. [28] Skovgaard, M., Jensen, L.J., Brunak, S., Ussery, D. and Krogh, A. (2001) On the total number of genes and their length distribution in complete microbial genomes. Trends Genet. 17, 425–428. [29] Brown, J.R., Douady, C.J., Italia, M.J., Marshall, W.E. and Stanhope, M.J. (2001) Universal trees based on large combined protein sequence data sets. Nat. Genet. 28, 281–285. [30] Wolf, Y.I., Rogozin, I.B., Grishin, N.V., Tatusov, R.L. and Koonin, E.V. (2001) Genome trees constructed using five different approaches suggest new major bacterial clades. BMC Evol. Biol. 1, 8.