Gene 238 (1999) 241–252 www.elsevier.com/locate/gene
The role of recombination and mutation in 16S–23S rDNA spacer rearrangements V. Gu¨rtler * Department of Microbiology, Austin & Repatriation Medical Centre (Austin Campus), Studley Road, Heidelberg, Vic. 3084, Australia Received 5 February 1999; received in revised form 11 May 1999; accepted 1 June 1999; Received by G. Bernardi
Abstract The intragenomic heterogeneity of the bacterial intergenic (16S–23S rDNA) spacer region (ISR) was analysed from the following species in which sequences for the complete rRNA operon (rrn) set have been determined (rrn number): Enterococcus faecalis (6) and E. faecium (6), Bacillus subtilis (10), Staphylococcus aureus (9), Vibrio cholerae (4), Haemophilus influenzae (6) and Escherichia coli (7). It was found that some spacer sequence blocks were highly conserved between operons of a genome, whereas the presence of others was variable. When these variations were analysed using the program PLATO and partial likelihood phylogenies determined by DNAml for each operon set, three regions showed significant (Z>3.3) spatial variation [Region I was 78–184 nt long (2.1
4.4) possibly due to recombination or selection. Within Region I, there was sequence block variation in all operon sets [some operons contained tRNA genes (tRNAala, tRNAile or tRNAglu), whereas others had sequence blocks such as VS2 (S. aureus) or rsl (E. coli)]. Q Analysis of the ISR sequence from E. faecalis and E. faecium showed that there was more interspecies than intraspecies variation (both in DNA sequence and in the presence or absence of blocks). Dot matrix analysis of the sequence blocks in the nine rrn ISRs from S. aureus showed that there was significant homology between VS2 and VS5/VS6. Furthermore, repeat motifs with only A or T were present in higher copy numbers in VS5/VS6 than in VS2. Since these sequence blocks ( VS2 and VS5–VS6) are related, intragenic evolution resulting in AT expansion may have occurred between these two regions. A model is proposed that postulates a role for recombination and AT-expansion in intra-genomic ISR variations. This process may represent a general mechanism of concerted evolution for bacterial ISR rearrangements. © 1999 Elsevier Science B.V. All rights reserved. Keywords: Concerted evolution; Homologous recombination; Maximum likelihood; Phylogenomic analysis; rrn operon
1. Introduction The ribosomal RNA (rRNA) genes (16S, 23S and 5S) and transfer RNA (tRNA) genes are highly conserved in the bacterial and archaebacterial kingdoms. This makes them ideal candidates for evolutionary studies ( Woese, 1987). The number of rRNA operons varies ( Table 1) from one in Mycoplasma species (Sawada et al., 1981; Fraser et al., 1995; Himmelreich et al., 1996) and Mycobacterium species (Cole et al., 1998) to ten in Bacillus subtilus ( Kunst et al., 1997). The gene organization of the rRNA operon is 16S–23S–
Abbreviations: ala, alanine; dsPS, double-stranded processing stem; glu, glutamine; ile, isoleucine; ISR, intergenic (16S–23S rDNA) spacer region; nt, nucleotide; rrn, rRNA operon; rRNA, ribosomal RNA; tRNA, transfer RNA. * Tel.: +61-3-9496-3136. fax: +61-3-9459-1674. E-mail address: [email protected] ( V. Gu¨rtler)
5S in most bacteria (Gu¨rtler and Stanisich, 1996). Exceptions are Borrelia burgdorferi with two tandem copies of 23S–5S and one copy of 16S rRNA genes separated from the first 23S gene by 2 kb containing tRNAile and tRNAala genes (Ojaimi et al., 1994) and Helicobacter pylori with two separate sets of 23S–5S and 16S rRNA genes along with one orphan 5S gene ( Tomb et al., 1997). In most bacteria, rRNA genes are part of a multigene family making variation between operons a possibility. Single nucleotide differences between multiple genomic copies of the 16S gene have been detected (Gu¨rtler et al., 1991; Mylvaganam and Dennis, 1992; Cilia et al., 1996; Ninet et al., 1996; Wang et al., 1997; Reischl et al., 1998). However, even more variation has been detected in the intergenic (16S–23S rDNA) spacer region (ISR) between multiple genomic copies [for a review, see Gu¨rtler and Stanisich (1996)]. This variation has been used for bacterial identification (e.g. Jensen et al., 1993), typing (e.g. Gu¨rtler, 1993;
0378-1119/99/$ – see front matter © 1999 Elsevier Science B.V. All rights reserved. PII: S0 3 7 8 -1 1 1 9 ( 9 9 ) 0 0 22 4 - 3
242
V. Gu¨rtler / Gene 238 (1999) 241–252
Table 1 Complete 16S–23S rDNA spacer sets from eubacteria and archaea Species
Complete genome sequence
Number of rrn
tRNAala a
tRNAile a
tRNAglu a
Spacer length (bp)
References
Lan and Reeves (1998); Chun et al. (1999)b Fleischmann et al. (1995) Naı¨mi et al. (1997)
Eubacteria Vibrio cholerae
No
7–9
3
1
3
431, 509, 607, 711, 750
Haemophilus influenzae Enterococcus faecium
Yes No
6 6
1 2
1 0
1 0
Enterococcus faecalis
No
6
1
0
0
478, 723 367, 384, 387, 389, 484, 487 267, 369
Bacillus subtilis
Yes
10
1
1
0
165, 168, 345
Escherichia coli
Yes
7
3
3
4
Staphylococcus aureus
No
9–10
2
5
0
Borrelia burgdorferi Mycoplasma genitalium Mycoplasma pneumoniae Mycobacterium tuberculosis Helicobacter pylori Chlamydia trachomatis Treponema pallidum Synechocystis sp. Archaea Methanococcus jannaschii Archaeoglobus fulgidus Methanobacterium thermoautotrophicum
Yes Yes Yes Yes Yes Yes Yes Yes
1.5 1 1 1 2e 2 2 2
1 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
345, 354, 355, 431, 437, 440, 446 302, 319, 336, 382, 458, 460, 469, 473, 546, 551 2000 203 224 276 N/Af 326 294
Yes Yes Yes
2 1 2
2 1 2
0 0 0
0 0 0
346 236 328
Hall (1994); Naı¨mi et al. (1997) Loughney et al. (1982); Kunst et al. (1997) Blattner et al. (1997)c Gu¨rtler and Barrie (1995)d Fraser et al. (1997) Fraser et al. (1995) Himmelreich et al. (1996) Cole et al. (1998) Tomb et al. (1997) Stephens et al. (1998) Fraser et al. (1998) Kaneko et al. (1996) Bult et al. (1996) Klenk et al. (1997) Smith et al. (1997)
a Number of operons per genome with tRNAala, tRNAile or tRNAglu. A maximum of only one tRNA type and two different tRNAs were found per 16S–23S spacer. b This recent report found one ISR in V. cholerae (strain RC2T) contained tRNAlys and tRNAval as well as tRNAglu. c References with partial operon sets for E. coli are listed in Gu¨rtler and Stanisich (1996) and Anto´n et al. (1998). The first studies that identified genes for tRNAile and tRNAglu ( Wu and Davidson, 1975; Lund et al., 1976) and tRNAala (Ikemura and Nomura, 1977; Morgan et al., 1977) were from E. coli. d References with partial operon sets for S. aureus are Forsman et al. (1997) and GenBank A48073, A48074. e Unusual organization with two 23S–5S, two 16S and one orphan 5S rRNA genes f Not applicable because there is no 16S–23S spacer in H. pylori.
Cartwright et al., 1995) and evolutionary studies (e.g. Anto´n et al., 1998; Lan and Reeves, 1998; Gu¨rtler et al., 1999). Phylogenomic analysis ( Eisen, 1998) of multigene and orthologous gene families has been made possible with the availability of whole genome sequences from the eubacteria and archaea listed in Table 1. From these sequences, it has been possible to determine sequence variations in the ISR from the complete rrn operon set of these genomes. This review will analyse the sequence variability in the spacer sets from three complete genomes (H. influenzae, B. subtilis and E. coli) and from four species where the complete number of rrn operons have been determined and sequenced (V. cholerae, Enterococcus faecalis, E. faecium and S. aureus). The ISR sets from other completed genomes that have less than two operon sets (B. burgdorferi, M. genitalium, M. pneumoniae, Mycobacterium tuberculosis and A. ful-
gidus), no sequence variation between operon sets (T. pallidum and M. jannaschii) or unusual gene organization resulting in the lack of an ISR (H. pylori) will not be analysed further. Recombination has been proposed to explain the rearrangement of rrn operons observed under laboratory conditions in E. coli (Harvey et al., 1988), in 16S rRNA sequences from Aeromonas (Sneath, 1993) and more recently in natural isolates of Salmonella enterica Typhi and Paratyphi (Liu and Sanderson, 1998), Vibrio cholera (Lan and Reeves, 1998) and Haemophilus parainfluenzae (Privitera et al., 1998). Rearrangements of whole rrn operons have been detected by specific cleavage of the 23S rRNA gene using I-CeuI (Lan and Reeves, 1998; Liu and Sanderson, 1998) and by variation in a BglI site in the rrn operon (Lan and Reeves, 1998). Furthermore, three types of rearrangement have been detected in the ISR: (1) loss of rsl in E. coli (Harvey
V. Gu¨rtler / Gene 238 (1999) 241–252
et al., 1988), (2) the exchange of ISRs between rrn operons (Lan and Reeves, 1998) and (3) the rearrangement of 20–50 nt sequence blocks within the ISR (Privitera et al., 1998). These authors suggest that homologous recombination explains the rearrangement of whole rrn operons and the three types of spacer rearrangements, but statistical analysis to determine the likelihood of recombination has not been carried out. A phylogenomic analysis of the ISR of the rrn gene family was carried out in this review using maximum likelihood as the phylogenetic hypothesis (Felsenstein, 1981) and partial likelihoods assessed through optimisation to test statistically for evidence of recombination or selection (Grassly and Holmes, 1997). In eukaryotes, slipped-strand mispairing of simple repetitive DNA (including tandem repeats, palindromes, non-contiguous repeats and cryptic simplicity) has been proposed as a major mechanism for DNA sequence evolution ( Tautz et al., 1986; Levinson and Gutman, 1987). Expansion segments found in the eukaryotic rRNA genes consisting of hypervariable runs of trinucleotide motifs are susceptible to slipped-strand mispairing (Hancock and Dover, 1990). There appears to be no precise equivalent within the rrn operon in prokaryotes (Clark, 1987). Furthermore, compensatory mutations within the eukaryotic expansion segments conserve RNA secondary structure (Hancock and Vogler, 1998). An example of intra-genomic sequence homology within the ISR is presented that shows some of the above characteristics. Based on the statistical analysis of the ISR sequence block rearrangements and possible compensatory slippage of one of these sequence blocks, a model of homologous recombination that may result in concerted evolution of the rrn gene family is proposed.
243
to transversion ( Ti:Tv) ratio and phylograms with branch lengths proportional to the maximum number of nucleotide substitutions. The following GCG programs on the Australian National Genomic Information Service (ANGIS ) were used: COMPARE to create a file of points of similarity between two ISR sequences (with a window 21-symbols wide and a stringency of 14) and DOTPLOT to produce a ‘dot matrix’ plot of the points; MFOLD for secondary structure analysis; REPEAT to search for repeat sequences; PALINDROME to search for palindromic sequences; and ECOMPOSITION to determine the AT, di- and trinucleotide content.
3. Two types of sequence blocks in ISRs of rrn operon families A diagramatic representation of the ISR sequence alignments from the rrn operon sets from six bacterial species is shown in Fig. 1. Table 1 lists data for each rrn operon set, including the number of rrn operons, tRNA content, spacer length and whether the operon set was obtained from a whole genome. With respect to the latter, sequence data were obtained for E. faecium from eight isolates, E. faecalis from seven isolates, E. coli from four isolates, S. aureus from six isolates and Vibrio cholerae, Bacillus subtilis and Haemophilus influenzae from one isolate. A striking feature of all the alignments is that the rate of evolution (or substitution rate) varies considerably in different parts of the spacer region. These regions can be divided into conserved and variable sequence blocks (a statistical analysis suggesting recombination of variable sequence blocks between operons of a genome is presented in Section 4). 3.1. Conserved sequence blocks
2. Source and analysis of DNA sequences Table 1 lists the 16S–23S rDNA sequence sources, and these were aligned using Clustal X ( Thompson et al., 1997) and MacClade (Maddison and Maddison, 1992). For the multiple sequence alignment and corresponding maximum-likelihood phylogenetic hypothesis determined by DNAml (Felsenstein, 1993), regions showing significant (Z>3.3) spatial variation due to recombination or selection were determined using PLATO v2.11 (Grassly and Holmes, 1997). For each alignment, the HKY model of PLATO was used, and 16 different Z-values were obtained by varying the shape of the gamma rate heterogeneity (a=0.1, 1, 10, 100 or 1000) and the number of categories for the discrete gamma rate heterogeneity model ( g=2, 4, 16 or 32). The alignment files and the complete data from the PLATO analyses are available on request from the author. MacClade was used to determine the transition
The first sequence block type shows very little (if any) variation between operons of a genome and is present in the spacers of all rrn operons within a species (shown as black lines in Fig. 1). The number and length (sizes in nt and 5∞ to 3∞ order) of the constant sequence blocks vary between rrn operon sets: V. cholerae (50, 264, 12), E. faecium/faecalis (73, 67, 133), E. coli (41, 85, 20), H. influenzae (54, 242), B. subtilis (79, 81) and S. aureus (88, 142, 10). These sequences diverge between species becoming increasingly divergent the greater the phylogenetic distance (Gu¨rtler and Stanisich, 1996). The analysis of E. faecalis and E. faecium (Fig. 1d) with 27 interspecies nt differences and no intra-species (i.e. intragenomic) differences within the three constant sequence blocks demonstrates this point. Furthermore, the sequence alignments of Forsman et al. (1997) in Staphylococcus species and Naı¨mi et al. (1997) in Enterococcus species also show interspecies sequence
244
V. Gu¨rtler / Gene 238 (1999) 241–252
Fig. 1. Maps ( left) and phylograms (right) of ISR from the complete rrn operon set of (a) Haemophilus influenzae, (b) Bacillus subtilis, (c) Vibrio cholerae, (d) Enterococcus faecium and E. faecalis, (e) Escherichia coli and (f ) Staphylococcus aureus genomes. The maximum numbers of nucleotide substitutions are marked on branches of the phylograms. In all maps (a–f ) the scale is shown; solid black horizontal lines correspond to conserved sequence blocks: &, tRNAile; b, tRNAala; %, tRNAglu; coloured boxes correspond to other variable blocks, with each colour corresponding to a different DNA sequence and the following lengths (nt): (a) H. influenzae (25, 85, 163 and 28), (b) B. subtilis (7, 11 and 10), (c) V. cholerae (38, 64, 105 and 15), (d ) E. faecium (12, 6, 25 and 120) and E. faecalis (2, 12, 6 and 25), (e) E. coli (24, 27, 108 [rsl ], 20, 42, 61, 17 and 8) and (f ) S. aureus (5, 31, 3, 15, 104 ( VS2), 16, 21, 19, 11, 6, 16 [ VS5], 94 [ VS6 ] 20 and 9). The triangle (( ) in (d ) depicts the deletion of the Tsp509I site present in E. faecalis. In maps (b) and (d ), the position of variable nucleotides is shown by one of two coloured vertical lines with spacers that have common nucleotides intersected by the same coloured line. In B. subtilitis (b), the differences are: Region I, nt 10 ‘T’ in rrnAO and JW and ‘A’ in rrnED, HGI and B; nt 27–29 ‘CAA— in rrnB and ‘TTG’ in rrnAO, ED, HGI and JW; nt 80–83 insertion of ‘GTT’ in rrnAO and nt 87-88 ‘GA’ in rrnAO and ‘CG’ in rrnED, HGI, B and JW and Region II, nt 307–309 insertion of ‘ATT’ in rrnAO, ED and B and nt 15 ‘C’ in rrnAO and B and ‘T’ in rrnED, HGI and JW. In E. faecalis/E. faecium (d ); 1, for nt 214–224, Z=6.4 (a=100, g=4) and †, Z<1. In the map for E. coli (e), the position of secondary structures I, II, III, IV (Anto´n et al., 1998) is shown boxed, and the variable sequence blocks rsl and 20-mer (Harvey et al., 1988; Anto´n et al., 1998) are shown by light and dark green coloured boxes, respectively. A possible tRNA-like secondary structure and possible transposition sites have been postulated for the rsl sequence (Brosius et al., 1981). In the map for S. aureus (f ), V2 is depicted by a pinkcoloured box, VS5∞ a light-blue-coloured box and VS6 a purple-coloured box. The branch lengths of the phylograms in (b–f ) are proportional to the maximum number of nucleotide differences, as marked on the respective branches. The boxed areas adjoining the vertical lines with arrows to the Z-values deduced from PLATO (see Section 2) show the boundaries of the regions of the alignment where there is significant spatial variation due to recombination or selection. The nucleotide sequence sources for the six species are listed in Table 1, and for (c), (d ), (e) and (f ) GenBank Accession Nos (except for those starting with rrn) corresponding to sequences from other isolates are listed in: (c) Lan and Reeves (1998); (d ) I (AF070677, AF0003922, EF16S23S, AF028836), II (AF003921, AF070676), III (AF070678), long [617, 805, 775-1 (Hall, 1994), X87186, rrnA and C (Gu¨rtler et al., 1999)], short ( X87182); (e) [rrnA40 (operon rrnA, isolate ECOR40), rrnB35 (isolate ECOR35, operon rrnB) from Anto´n et al. (1998)], rrnD1 and rrnX( Young et al., 1979); (f ) SAU39769 (Forsman et al., 1997), A48073 and A48074.
V. Gu¨rtler / Gene 238 (1999) 241–252
Fig. 1. (continued )
245
246
V. Gu¨rtler / Gene 238 (1999) 241–252
similarities in spacer regions equivalent to the constant sequence blocks defined in this paper. The function of these regions is thought to be the formation of doublestranded processing stems (dsPS1 and dsPS2) involved in the maturation of 16S and 23S rRNA (Chiaruttini and Milet, 1993; Naı¨mi et al., 1997; Nour, 1998). In addition to these conserved sequence blocks, the tRNAala, tRNAile and tRNAglu genes make up an intermediate group of three sequence blocks that are highly conserved in bacteria, respectively. However, they are not present in all copies of an rrn operon set, and their presence is variable between species ( Table 1). They share this intra-operon variability with other sequence blocks described in the next section. 3.2. A mosaic organization of variable sequence blocks All of the filled boxes (coloured boxes correspond to sequences of unknown function and black boxes correspond to tRNA genes) in the maps shown in Fig. 1 are variable in sequence and their presence or absence between operons of a species (different shading corresponds to sequence differences). The tRNA genes (three shades to imply a tRNA[ile, ala or glu] gene family relationship) are only present in some operons of a species and they are 73–76 nt long. The other variable sequence blocks (coloured ) vary from 7 to 120 (nt) in length (see Fig. 1 legend ). In 5/7 spacers from the rrn operon sets described here, one of the variable blocks is at least 100 nt long. In contrast, a mosaic-like organization of only 20–50 nt sequence blocks has been described for H. parainfluenzae ISRs (Privitera et al., 1998).
4. Phylogenomic analysis of ISR sequences within variable and conserved sequence blocks The maximum-likelihood model was used to phylogenomically analyse ISR sequence alignments by the number of nucleotide changes between spacers. The transition-to-transversion ratios ( Ti:Tv) were calculated for each species and used as a parameter in DNAml and PLATO [V. cholerae (0.41), E. faecium/faecalis (1.3), E. coli (0.59), H. influenzae (0.5), B. subtilis (0.5) and S. aureus (0.66)]. Identification of regions not evolving as predicted by the maximum-likelihood model were identified statistically using PLATO. These regions involved sequence blocks (coloured and tRNA genes) that show significantly greater nucleotide changes between two or more sequence blocks. Sequence block insertions and deletions where one rrn operon has a sequence block that is absent in all the other sequences of the aligned rrn operon set cannot be phylogenetically (DNAml ) or statistically (PLATO) analysed. The following is a summary of the regions contributing to
possible recombination in the six rrn operon sets shown in Fig. 1. 4.1. Evidence of recombination contributed by single nt mutations Differences at single nt positions between operons were found in B. subtilis, E. faecium/E. faecalis, E. coli and S. aureus. For B. subtilis (Fig. 1b), the tRNAile and ala genes were present in rrnA and O and deleted in the remaining eight operons. Evidence of recombination was found due to: Region I, 4 operon specific differences and Region II, 2 operon specific differences (see Fig. 1 legend for the positions of the nt differences). For E. faecalis and E. faecium ( Fig. 1d ), the conserved sequence blocks show no intra-species variation. However, between species, the conserved sequence blocks vary at 27 nt positions in two regions ( Regions I and II; Fig. 1d vertical lines). Statistical evidence of recombination was found in part of Region I ( Fig. 1d). The block compositions of the long spacer of E. faecalis and spacer ‘III’ of E. faecium are identical except for the loss of the Tsp509I site before the tRNAile gene. Furthermore, a study of this Tsp509I site has demonstrated intra-genomic sequence differences between three copies of the long spacer of E. faecalis isolates demonstrating significant intra-isolate heterogeneity consistent with recombination of rrn alleles (Gu¨rtler et al., 1999). A comprehensive study of intra-operon differences in the ISR sequences from 12 ECOR isolates revealed 30 single nt variable sites (Anto´n et al., 1998). The study of ISR sequences from some of the operons of three S. aureus isolates showed five single nt variable sites in rrnJ, two in rrnH and five in rrnC (Gu¨rtler and Barrie, 1995). The contribution to recombination of these variable sites from both studies was not analyzed here by PLATO. In both these studies, much greater intraoperon differences in the ISR sequences were detected as block substitutions and insertions/deletions. An analysis of these differences will be presented next. 4.2. Statistical evidence of recombination contributed by variable sequence blocks Statistical evidence of recombination was detected by PLATO due to the presence or absence of variable sequence blocks in H. influenzae, V. cholerae, E. coli and S. aureus. For H. influenzae (Fig. 1a), rrnA, C, D are long and identical with tRNAile and ala, whereas rrnB, E, F are short and identical with tRNAglu: evidence of recombination was detected between blocks in one region of rrnA, C, D (rRNAala, green) and rrnE, F, B (tRNAglu, blue, pink). For V. cholerae (Fig. 1c), recombination was detected in two regions: Region I, 75 nt differences between M1 (tRNAile) and S, M2, L
V. Gu¨rtler / Gene 238 (1999) 241–252
(tRNAglu), and Region II, differences between S, L, M2 (red ) and M1 (green). For E. coli (Fig. 1e), evidence of recombination was found in three regions: Region I, nt 15–21; Region II, 196 nt differences between rrnA, D, H (tRNAile and ala), rrnG and B (rsl [ light green] and tRNAglu), rrnB, E, C, A [20-mer (dark green)] and rrnG, B, A40, E, C [variable sequence with conserved secondary structure (I, II, III and IV ) in orange block (Anto´n et al., 1998]; Region III, due to the rearrangement of two sequence blocks (17-mer [purple] and 8-mer [pink]) with 9 nt deleted in conjunction with the 8-mer. For S. aureus (Fig. 1f ), evidence of recombination was found in three regions: Region I, between rrnA, F, J, C, G (tRNAile and ala), rrnL, K, A48073, E, H [ VS1-4 (red, pink, blue)], 67 nt differences in rrnL, K (tRNAala, replaced with VS3 and VS4), 130 nt differences in A48073 ( VS1-4 and tRNAile and ala) and 76 nt differences in rrnE ( VS2 [pink]); Region II, due to rearrangement of two sequence blocks ( VS5∞ [ light blue] and VS6 [purple]) and Region III, due to rearrangement of two sequence blocks ( VS7 [brown] and VS8 [blue]). For E. faecalis and E. faecium ( Fig. 1d ), two insertion or deletion events are consistent with recombination: (1) three sequence blocks in region I (red, tRNAala and blue) are not always present and (2) between regions I and II, there is one variable E. faecium-specific sequence block (pink) in alleles I and III. The statistical significance of recombination for the variable sequence blocks in the ISRs present in conjunction with gaps in all the other sequences of the alignments for the six rrn operon sets shown in Fig. 1 cannot be determined even though insertion/deletion of these sequences is consistent with recombination. Deletion of rsl (rrnA40 and EC ) and replacement with the 20-mer have been previously suggested to be due to recombination (Harvey et al., 1988), but PLATO did not detect recombination.
5. Anti-parallel homology of sequence blocks between spacer pairs from S. aureus In order to determine whether there was any sequence homology between S. aureus ISRs other than those found by parallel sequence alignments ( Fig. 1), a pairwise dot matrix analysis was performed using COMPARE and DOTPLOT (Section 2). The dots and lines in Fig. 2 represent regions of homology between sequence pairs. There was a total of 45 possible heterologous spacer combinations, and all except for those shown in Fig. 2 produced a single broken (gaps correspond to deletions or regions of non-homology) or unbroken line of homology. In the remaining four spacer combinations shown in Fig. 2 (rrnE/A48073, rrnE/rrnA, rrnA/A48073 and rrnF/rrnE), an extra anti-parallel region of homology was found between VS2 (pink block; 5∞ end of spacer from rrnE and A48073) and VS5/VS6 (purple block; 3∞ end of spacer from rrnA, F
247
and A48073). There was also a number of smaller regions of homology shown as dots or small lines in all four dot matrices shown in Fig. 2 corresponding to the small repeat motifs listed in Table 2. To define the precise region and extent of homology, pairwise sequence alignments were performed. Once the region of homology had been defined, a multiple sequence alignment was carried out from all the available sequences ( Fig. 3). This sequence alignment demonstrated three main points: (1) the homology between VS2 and VS5/VS6 ranged from 77 to 81%; (2) there were 22 inter-block (1) differences; and (3) there were an additional seven intra-VS5/VS6 block nt differences (§) with only one (†) of those also different between VS2 block sequences. The inter-block differences were at different nt positions to the intra-block differences. The greater number of intra-block differences in VS5/VS6 than in VS2 suggests a higher mutation rate in VS5/VS6. To further characterize the mutations detected, a secondary structure analysis was performed on VS2 and VS5/VS6 ( Fig. 4). The overall secondary structure of two stems and a shorter stem-loop in between, is conserved between VS2 and VS5/VS6. Many of the interblock mutations are compensatory or semi-compensatory mutations, especially those at the base of each stem. The compensatory mutations at the base of each stem are the result of a change in AT-rich repeat copy number between VS2 and VS5/VS6 ( Table 2): there are more AT-rich repeats in VS5/VS6 than in VS2, and there are more repeats with all four nucleotides in VS2 than in VS5/VS6. This is consistent with a higher mutation rate in VS5/VS6 in conjunction with expansion of AT-rich repeat motifs. These forces may have led to the conservation of secondary structure between VS2 and VS5/VS6. Non-contiguous repeat sequences have been implicated in slipped-strand mispairing which may lead to significant tracts of DNA becoming single-stranded during replication (Levinson and Gutman, 1987). It is postulated that the expansion of AT-repeat sequences in VS5/VS6 and VS2 may increase the likelihood of DNA becoming single-stranded during replication resulting in the observed intragenic rearrangement of sequence blocks.
6. Conclusions: a model for homologous recombination of ISRs 6.1. Summary — evidence suggesting homologous recombination of ISRs The following is a list of features common to the ISR regions from the 6 rrn operon sets reviewed in this paper: $ Conserved sequence blocks may allow homologous pairing within spacers; $ tRNA genes are not always present in single spacers or in rrn operon sets;
248
V. Gu¨rtler / Gene 238 (1999) 241–252
Fig. 2. Dot matrix analysis of Staphylococcus aureus ISR sequences from four rrn operon pairs. Dot matrix analysis was performed on all 45 possible pairwise combinations (Section 2), and only the four combinations with additional anti-parallel homology are shown: (a) rrnE and A48073; (b) rrnE and rrnA; (c) rrnA and A48073 and (d ) rrnF and rrnE. The scale on each axis shows the spacer nucleotide position (nt), and alongside each axis is the alignment map from the corresponding operon (see Fig. 1f for the colours of VS2, VS5 and VS6).
Fig. 3. Sequence alignment showing antiparallel sequence homology between VS2 and VS5/VS6. The first three allele sequences shown are for VS2 (all is shown except for 16 nt from the 5∞ end) and the last four allele sequences are for VS5/VS6 (all of VS5 is shown between the arrows and the remainder is VS6 except for the last 15 bp at the 3∞ end ). The block name and allele are listed at the left of each sequence [overall block structures are shared by (rrnE and A48074) and (rrnF and A48073)]. The symbols refer to: 1, identical (75) in both VS2 and VS5/VS6; §, nucleotide differences (7) between VS6 only; †, nucleotide differences (1) between VS2 and VS5/VS6 alleles. The lengths of the sequences are 93–97 nt. $
$
$
Variable sequence blocks are present within rrn operon sets; Within regions of alignments containing variable sequence blocks, regions of variable nucleotide exchange can be detected corresponding to recombination or selection; Deletions/insertions of variable sequence blocks from only one spacer from a sequence type from an rrn
$
$
$
operon set are also consistent with recombination even though they are statistically undetectable; Mutations common to a subset of an operon set can define a recombination region; Intra-genomic heterogeneity of a mutation with a Tsp509I site in ISRs from E. faecalis also suggests recombination; There is anti-parallel homology and secondary struc-
V. Gu¨rtler / Gene 238 (1999) 241–252
249
Fig. 4. Secondary structures of VS2 and VS5/VS6. The arrows (( ) show the beginning of the region of homology shown in Fig. 3. The filled circles show nt differences between VS2 and VS5/VS6 (unmarked nucleotides in Fig. 3), and the open circles show intra-specific nt differences within VS2 (§ in Fig. 3) and VS5/VS6 († in Fig. 3). Insertions and deletions are marked.
ture conservation between two variable regions ( VS2 and VS5/VS6) of S. aureus; $ There is a higher mutation rate and an expansion of AT-rich repeats in VS5/VS6. In addition to the evidence presented in Section 1, the above comprises a large body of evidence suggesting that homologous recombination is responsible for the ISR rearrangements observed in bacterial rrn operons.
6.2. A model for homologous recombination of ISRs To explain the above features, a model for spacer homologous recombination is proposed using the rrn operons of S. aureus as an example (Fig. 5). Two possibilities are presented, which are both dependent on the much longer (homologous between rrn operons of a genome) flanking regions (1550 bp of the 16S rRNA
Fig. 5. Two possible models of homologous recombination for the ISR from S. aureus. (a) Parallel recombination between one (rrnE and rrnA) of 45 heterologous combinations (pairing between constant regions CS1, CS2 and CS3 and (b) anti-parallel recombination (pairing between VS2 and VS5/VS6) showing one (rrnE and rrnA) of five heterologous combinations (additional pairing between VS2 and VS5/VS6). In both models, homologous pairing occurs between the 16S rRNA gene and the 23S rRNA gene of both alleles.
250
V. Gu¨rtler / Gene 238 (1999) 241–252
Table 2 Presence of repeat motifs in VS2 and VS5/VS6 Repeat motif More in VS2 than VS5/VS6: AAAATa ATTAAa TTTTa AAAGCGG AAGAAA GCATT TTAAAG TTGT TTTG TTTGA TTTTG AAGC More in VS5/VS6 than VS2: TAAAa TAAATa TTATa TTTAa TTTAAAa TTTTAa AATGA GTTTA
VS2 rrnE/A073
VS5/VS6 rrnA&F
2 2 3 2 2 2 2 2 3 2 2 2
1 0 2 0 1 1 1 0 0 0 0 1
3 1 0 2 1 1 1 1
6 3 3 6 3 2 3 3
a Repeat motifs with only A or T.
gene and 2900 bp of the 23S rRNA gene) pairing for stability between non-homologous rrn operons. There are 45 possible combinations of parallel recombination (Fig. 5a) between non-homologous rrn operons with conserved spacer sequence blocks pairing and some variable spacer sequence blocks looped out due to a lack of homology. Similarly, there are five possible combinations of anti-parallel recombination (Fig. 5b) with CS1 from both rrn operons pairing and VS2 (pink) and VS5∞/VS6 ( light blue/purple) pairing also resulting in a looped out region. During replication, the looped out sequences could result in an insertion or deletion. Of the two possibilities, anti-parallel recombination is the least probable because of the lower sequence homology between VS2 and VS5/VS6. However, it is possible that the VS5/VS6 sequence was originally more similar to VS2, and only through replication slippage, AT-expansion and a higher mutation rate at the VS5/VS6 locus did the final VS5/VS6 sequence evolve (assuming that the direction of evolution is from VS2 to VS5/VS6). These models of recombination could explain the fixation of mutations or sequence blocks through the rrn gene family within a genome associated with concerted evolution. An experimental system has been devised for introducing plasmid-borne rrn operons via homologous recombination into Mycobacterium smegmatis (Sander et al., 1996). Using this system, a rRNA mutation was found to confer antibiotic resistance in vivo (Sander et al., 1996). More recently, all the genomic rrn operons of an E. coli strain were deleted and replaced with
plasmid borne rRNA genes (Asai et al., 1999; Nomura, 1999). These systems have two important features: (1) rrn operons can be made non-functional by replacement with a plasmid-borne rDNA segment so that the effect of this loss can be studied, and (2) the effect of singleand block-substitutions in regions of an rrn operon (introduced via a plasmid) such as the ISR variable sequence blocks can be studied. These experimental systems could be used to test the mechanism postulated by the ISR recombination model presented above. Furthermore, the ISR recombination model forms a framework for the design of experiments to test the functional significance of ISR variable sequence blocks.
Acknowledgements Thanks to Drs Tim Littlejohn and Bruno Gaeta from ANGIS for help with DNA sequence data analysis. I am grateful to Drs Barrie Mayall and Nick Grassly for helpful advice.
References Anto´n, A.I., Martı´nez-Murcia, A.J., Rodrı´guez-Valera, F., 1998. Sequence diversity in the 16S–23S intergenic spacer region (ISR) of the rRNA operons in representatives of the Escherichia coli ECOR collection. J. Mol. Evol. 47, 62–72. Asai, T., Zaporojets, D., Squires, C., Squires, C.L., 1999. An Escherichia coli strain with all chromosomal rRNA operons inactivated: complete exchange of rRNA genes between bacteria. Proc. Natl. Acad. Sci. USA 96, 1971–1976. Blattner, F.R., Plunkett 3rd, G., Bloch, C.A., Perna, N.T., Burland, V., Riley, M., Collado-Vides, J., Glasner, J.D., Rode, C.K., Mayhew, G.F., Gregor, J., Davis, N.W., Kirkpatrick, H.A., Goeden, M.A., Rose, D.J., Mau, B., Shao, Y., 1997. The complete genome sequence of Escherichia coli K-12. Science 277, 1453–1474. Brosius, J., Dull, T.J., Sleeter, D.D., Noller, H.F., 1981. Gene organization and primary structure of a ribosomal RNA operon from Escherichia coli. J. Mol. Biol. 148, 107–127. Bult, C.J., White, O., Olsen, G.J., Zhou, L., Fleischmann, R.D., Sutton, G.G., Blake, J.A., FitzGerald, L.M., Clayton, R.A., Gocayne, J.D., Kerlavage, A.R., Dougherty, B.A., Tomb, J.F., Adams, M.D., Reich, C.I., Overbeek, R., Kirkness, E.F., Weinstock, K.G., Merrick, J.M., Glodek, A., Scott, J.L., Geoghagen, N.S.M., Venter, J.C., 1996. Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii. Science 273, 1058–1073. Cartwright, C.P., Stock, F., Beekmann, S.E., Williams, E.C., Gill, V.J., 1995. PCR amplification of rRNA intergenic spacer regions as a method for epidemiologic typing of Clostridium difficile. J. Clin. Microbiol. 33, 184–187. Chiaruttini, C., Milet, M., 1993. Gene organization, primary structure and RNA processing analysis of a ribosomal RNA operon in Lactococcus lactis. J. Mol. Biol. 230, 57–76. Chun, J., Huq, A., Colwell, R.R., 1999. Analysis of 16S–23S rRNA intergenic spacer regions of Vibrio cholerae and Vibrio mimicus. Appl. Environ. Microbiol. 65, 2202–2208. Cilia, V., Lafay, B., Christen, R., 1996. Sequence heterogeneities among 16S ribosomal RNA sequences, and their effect on phylogenetic analyses at the species level. Mol. Biol. Evol. 13, 451–461.
V. Gu¨rtler / Gene 238 (1999) 241–252 Clark, C.G., 1987. On the evolution of ribosomal RNA. J. Mol. Evol. 25, 343–350. Cole, S.T., Brosch, R., Parkhill, J., Garnier, T., Churcher, C., Harris, D., Gordon, S.V., Eiglmeier, K., Gas, S., Barry 3rd, C.E., Tekaia, F., Badcock, K., Basham, D., Brown, D., Chillingworth, T., Connor, R., Davies, R., Devlin, K., Feltwell, T., Gentles, S., Hamlin, N., Holroyd, S., Hornsby, T., Jagels, K., Barrell, B.G., 1998. Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 393, 537–544. Eisen, J.A., 1998. Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Res. 8, 163–167. Felsenstein, J., 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17, 368–376. Felsenstein, J., 1993. PHYLIP (Phylogeny Inference Package) Version 3.5c. University of Washington, Seattle, WA. Fleischmann, R.D., Adams, M.D., White, O., Clayton, R.A., Kirkness, E.F., Kerlavage, A.R., Bult, C.J., Tomb, J.F., Dougherty, B.A., Merrick, J.M., et al., 1995. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269, 496–512. Forsman, P., Tilsala-Timisjarvi, A., Alatossava, T., 1997. Identification of staphylococcal and streptococcal causes of bovine mastitis using 16S–23S rRNA spacer regions. Microbiology 143, 3491–3500. Fraser, C.M., Gocayne, J.D., White, O., Adams, M.D., Clayton, R.A., Fleischmann, R.D., Bult, C.J., Kerlavage, A.R., Sutton, G., Kelley, J.M., et al., 1995. The minimal gene complement of Mycoplasma genitalium. Science 270, 397–403. Fraser, C.M., Casjens, S., Huang, W.M., Sutton, G.G., Clayton, R., Lathigra, R., White, O., Ketchum, K.A., Dodson, R., Hickey, E.K., Gwinn, M., Dougherty, B., Tomb, J.F., Fleischmann, R.D., Richardson, D., Peterson, J., Kerlavage, A.R., Quackenbush, J., Salzberg, S., Hanson, M., van Vugt, R., Palmer, N., Adams, M.D., Gocayne, J., Venter, J.C., et al., 1997. Genomic sequence of a Lyme disease spirochaete, Borrelia burgdorferi. Nature 390, 580–586. Fraser, C.M., Norris, S.J., Weinstock, G.M., White, O., Sutton, G.G., Dodson, R., Gwinn, M., Hickey, E.K., Clayton, R., Ketchum, K.A., Sodergren, E., Hardham, J.M., McLeod, M.P., Salzberg, S., Peterson, J., Khalak, H., Richardson, D., Howell, J.K., Chidambaram, M., Utterback, T., McDonald, L., Artiach, P., Bowman, C., Cotton, M.D., Venter, J.C., et al., 1998. Complete genome sequence of Treponema pallidum, the syphilis spirochete. Science 281, 375–388. Grassly, N.C., Holmes, E.C., 1997. A likelihood method for the detection of selection and recombination using nucleotide sequences. Mol. Biol. Evol. 14, 239–247. Gu¨rtler, V., Wilson, V.A., Mayall, B.C., 1991. Classification of medically important clostridia using restriction endonuclease site differences of PCR-amplified 16S rDNA. J. Gen. Microbiol. 137, 2673–2679. Gu¨rtler, V., 1993. Typing of Clostridium difficile strains by PCR-amplification of variable length 16S–23S rDNA spacer regions. J. Gen. Microbiol. 139, 3089–3097. Gu¨rtler, V., Barrie, H.D., 1995. Typing of Staphylococcus aureus strains by PCR-amplification of variable-length 16S–23S rDNA spacer regions: characterization of spacer sequences. Microbiology 141, 1255–1265. Gu¨rtler, V., Stanisich, V.A., 1996. New approaches to typing and identification of bacteria using the 16S–23S rDNA spacer region. Microbiology 142, 3–16. Gu¨rtler, V., Rao, Y., Pearson, S.R., Bates, S.M., Mayall, B., 1999. DNA sequence heterogeneity in the three copies of the long 16S–23S rDNA spacer of Enterococcus faecalis isolates. Microbiology 145, 1785–1796. Hall, L., 1994. Are point mutations or DNA rearrangements responsible for the restriction fragment length polymorphisms that are used to type bacteria? Microbiology 140, 197–204. Hancock, J.M., Dover, G.A., 1990. ‘Compensatory slippage’ in the
251
evolution of ribosomal RNA genes. Nucleic Acids Res. 18, 5949–5954. Hancock, J.M., Vogler, A.P., 1998. Modelling the secondary structures of slippage-prone hypervariable RNA regions: the example of the tiger beetle 18S rRNA variable region V4. Nucleic Acids Res. 26, 1689–1699. Harvey, S., Hill, C.W., Squires, C., Squires, C.L., 1988. Loss of the spacer loop sequence from the rrnB operon in the Escherichia coli K-12 subline that bears the relA1 mutation. J. Bacteriol. 170, 1235–1238. Himmelreich, R., Hilbert, H., Plagens, H., Pirkl, E., Li, B.C., Herrmann, R., 1996. Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae. Nucleic Acids Res. 24, 4420–4449. Ikemura, T., Nomura, M., 1977. Expression of spacer tRNA genes in ribosomal RNA transcription units carried by hybrid Col E1 plasmids in E. coli. Cell 11, 779–793. Jensen, M.A., Webster, J.A., Straus, N., 1993. Rapid identification of bacteria on the basis of polymerase chain reaction-amplified ribosomal DNA spacer polymorphisms. Appl. Environ. Microbiol. 59, 945–952. Kaneko, T., Sato, S., Kotani, H., Tanaka, A., Asamizu, E., Nakamura, Y., Miyajima, N., Hirosawa, M., Sugiura, M., Sasamoto, S., Kimura, T., Hosouchi, T., Matsuno, A., Muraki, A., Nakazaki, N., Naruo, K., Okumura, S., Shimpo, S., Takeuchi, C., Wada, T., Watanabe, A., Yamada, M., Yasuda, M., Tabata, S., 1996. Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions Supplement. DNA Res. 3, 185–209. Klenk, H.P., Clayton, R.A., Tomb, J.F., White, O., Nelson, K.E., Ketchum, K.A., Dodson, R.J., Gwinn, M., Hickey, E.K., Peterson, J.D., Richardson, D.L., Kerlavage, A.R., Graham, D.E., Kyrpides, N.C., Fleischmann, R.D., Quackenbush, J., Lee, N.H., Sutton, G.G., Gill, S., Kirkness, E.F., Dougherty, B.A., McKenney, K., Adams, M.D., Loftus, B., Venter, J.C., et al., 1997. The complete genome sequence of the hyperthermophilic, sulphate-reducing archaeon Archaeoglobus fulgidus. Nature 390, 364–370. Kunst, F., Ogasawara, N., Moszer, I., Albertini, A.M., Alloni, G., Azevedo, V., Bertero, M.G., Bessieres, P., Bolotin, A., Borchert, S., Borriss, R., Boursier, L., Brans, A., Braun, M., Brignell, S.C., Bron, S., Brouillet, S., Bruschi, C.V., Caldwell, B., Capuano, V., Carter, N.M., Choi, S.K., Codani, J.J., Connerton, I.F., Danchin, A., et al., 1997. The complete genome sequence of the gram-positive bacterium Bacillus subtilis. Nature 390, 249–256. Lan, R., Reeves, P.R., 1998. Recombination between rRNA operons created most of the ribotype variation observed in the seventh pandemic clone of Vibrio cholerae. Microbiology 144, 1213–1221. Levinson, G., Gutman, G.A., 1987. Slipped-strand mispairing: a major mechanism for DNA sequence evolution. Mol. Biol. Evol. 4, 203–221. Liu, S.L., Sanderson, K.E., 1998. Homologous recombination between rrn operons rearranges the chromosome in host-specialized species of Salmonella. FEMS Microbiol. Lett. 164, 275–281. Loughney, K., Lund, E., Dahlberg, J.E., 1982. tRNA genes are found between 16S and 23S rRNA genes in Bacillus subtilis. Nucleic Acids Res. 10, 1607–1624. Lund, E., Dahlberg, J.E., Lindahl, L., Jaskunas, S.R., Dennis, P.P., Nomura, M., 1976. Transfer RNA genes between 16S and 23S rRNA genes in rRNA transcription units of E. coli. Cell 7, 165–177. Maddison, W.P., Maddison, D.R., 1992. MacClade: Analysis of Phylogeny and Character Evolution. Sinauer Associates, Sunderland, MA. Morgan, E.A., Ikemura, T., Nomura, M., 1977. Identification of spacer tRNA genes in individual ribosomal RNA transcription units of Escherichia coli. Proc. Natl. Acad. Sci. USA 74, 2710–2714. Mylvaganam, S., Dennis, P.P., 1992. Sequence heterogeneity between
252
V. Gu¨rtler / Gene 238 (1999) 241–252
the two genes encoding 16S rRNA from the halophilic archaebacterium Haloarcula marismortui. Genetics 130, 399–410. Naı¨mi, A., Beck, G., Branlant, C., 1997. Primary and secondary structures of rRNA spacer regions in enterococci. Microbiology 143, 823–834. Ninet, B., Monod, M., Emler, S., Pawlowski, J., Metral, C., Rohner, P., Auckenthaler, R., Hirschel, B., 1996. Two different 16S rRNA genes in a mycobacterial strain. J. Clin. Microbiol. 34, 2531–2536. Nomura, M., 1999. Engineering of bacterial ribosomes: replacement of all seven Escherichia coli rRNA operons by a single plasmidencoded operon. Proc. Natl. Acad. Sci. USA 96, 1820–1822. Nour, M., 1998. Studies on the large subunit rRNA genes and their flanking regions of Leuconostocs. Can. J. Microbiol. 44, 807–818. Ojaimi, C., Davidson, B.E., Saint Girons, I., Old, I.G., 1994. Conservation of gene arrangement and an unusual organization of rRNA genes in the linear chromosomes of the Lyme disease spirochaetes Borrelia burgdorferi, B. garinii and B. afzelii. Microbiology 140, 2931–2940. Privitera, A., Rappazzo, G., Sangari, P., Giannino, V., Licciardello, L., Stefani, S., 1998. Cloning and sequencing of a 16S/23S ribosomal spacer from Haemophilus parainfluenzae reveals an invariant, mosaic-like organisation of sequence blocks. FEMS Microbiol. Lett. 164, 289–294. Reischl, U., Feldmann, K., Naumann, L., Gaugler, B.J., Ninet, B., Hirschel, B., Emler, S., 1998. 16S rRNA sequence diversity in Mycobacterium celatum strains caused by presence of two different copies of 16S rRNA gene. J. Clin. Microbiol. 36, 1761–1764. Sander, P., Prammananan, T., Bottger, E.C., 1996. Introducing mutations into a chromosomal rRNA gene using a genetically modified eubacterial host with a single rRNA operon. Mol. Microbiol. 22, 841–848. Sawada, M., Osawa, S., Kobayashi, H., Hori, H., Muto, A., 1981. The number of ribosomal RNA genes in Mycoplasma capricolum. Mol. Gen. Genet. 182, 502–504.
Smith, D.R., et al., 1997. Complete genome sequence of Methanobacterium thermoautotrophicum deltaH: functional analysis and comparative genomics. J. Bacteriol. 179, 7135–7155. Sneath, P.H., 1993. Evidence from Aeromonas for genetic crossingover in ribosomal sequences. Int. J. Syst. Bacteriol. 43, 626–629. Stephens, R.S., Kalman, S., Lammel, C., Fan, J., Marathe, R., Aravind, L., Mitchell, W., Olinger, L., Tatusov, R.L., Zhao, Q., Koonin, E.V., Davis, R.W., 1998. Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. Science 282, 754–759. Tautz, D., Trick, M., Dover, G.A., 1986. Cryptic simplicity in DNA is a major source of genetic variation. Nature 322, 652–656. Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., Higgins, D.G., 1997. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25, 4876–4882. Tomb, J.F., White, O., Kerlavage, A.R., Clayton, R.A., Sutton, G.G., Fleischmann, R.D., Ketchum, K.A., Klenk, H.P., Gill, S., Dougherty, B.A., Nelson, K., Quackenbush, J., Zhou, L., Kirkness, E.F., Peterson, S., Loftus, B., Richardson, D., Dodson, R., Khalak, H.G., Glodek, A., McKenney, K., Fitzegerald, L.M., Lee, N., Adams, M.D., Venter, J.C., et al., 1997. The complete genome sequence of the gastric pathogen Helicobacter pylori. Nature 388, 539–547. Wang, Y., Zhang, Z., Ramanan, N., 1997. The actinomycete Thermobispora bispora contains two distinct types of transcriptionally active 16S rRNA genes. J. Bacteriol. 179, 3270–3276. Woese, C.R., 1987. Bacterial evolution. Microbiol. Rev. 51, 221–271. Wu, M., Davidson, N., 1975. Use of gene 32 protein staining of singlestrand polynucleotides for gene mapping by electron microscopy: application to the phi80d3ilvsu+7 system. Proc. Natl. Acad. Sci. USA 72, 4506–4510. Young, R.A., Macklis, R., Steitz, J.A., 1979. Sequence of the 16S–23S spacer region in two ribosomal RNA operons of Escherichia coli. J. Biol. Chem. 254, 3264–3271.