Gene 528 (2013) 295–303
Contents lists available at ScienceDirect
Gene journal homepage: www.elsevier.com/locate/gene
The distribution of recombination repair genes is linked to information content in bacteria A. Garcia-Gonzalez 1, L. Vicens 1, M. Alicea 1, S.E. Massey ⁎ Department of Biology, PO Box 23360, University of Puerto Rico - Rio Piedras, San Juan 00931, Puerto Rico
a r t i c l e
i n f o
Article history: Accepted 28 May 2013 Available online 21 June 2013 Keywords: Proteomic constraint Proteome size Recombination repair GC content DNA repair Information content
a b s t r a c t The concept of a ‘proteomic constraint’ proposes that the information content of the proteome exerts a selective pressure to reduce mutation rates, implying that larger proteomes produce a greater selective pressure to evolve or maintain DNA repair, resulting in a decrease in mutational load. Here, the distribution of 21 recombination repair genes was characterized across 900 bacterial genomes. Consistent with prediction, the presence of 17 genes correlated with proteome size. Intracellular bacteria were marked by a pervasive absence of recombination repair genes, consistent with their small proteome sizes, but also consistent with alternative explanations that reduced effective population size or lack of recombination may decrease selection pressure. However, when only non-intracellular bacteria were examined, the relationship between proteome size and gene presence was maintained. In addition, the more widely distributed (i.e. conserved) a gene, the smaller the average size of the proteomes from which it was absent. Together, these observations are consistent with the operation of a proteomic constraint on DNA repair. Lastly, a correlation between gene absence and genome AT content was shown, indicating a link between absence of DNA repair and elevated genome AT content. © 2013 Elsevier B.V. All rights reserved.
1. Introduction DNA repair is a critical process for the maintenance of genetic fidelity, and a wide variety of repair pathways and mechanisms have evolved in all three domains of life. However, little is known about the factors affecting the distribution and relative complexity of DNA repair pathways in different organisms. One factor proposed to influence the presence or absence of DNA repair is the ‘proteomic constraint’ theory (Garcia-Gonzalez et al., 2012; Massey, 2008, 2013; Massey and Garey, 2007). This proposes that the information content of a genome, which approximates to the size of the proteome (the total number of amino acids/codons in a genome), exerts a selective pressure on genetic fidelity proportional to its size. This is because the mutational load will be greater in larger proteomes. The selective pressure acts to reduce mutation rates, and provides a simple explanation for the negative power law relationship with exponent − 1 observed between mutation rates and proteome size, ranging from DNA viruses to vertebrates (Massey, 2008), this analysis being a development of Drake's observation of a correlation between the genome
Abbreviations: UV, ultraviolet; GC, guanine/cytosine; AT, adenine/thymine; NHEJ, non-homologous end joining; IMG, Integrated Microbial Genomes; NCBI, National Center for Biotechnology Information; LGT, lateral gene transfer. ⁎ Corresponding author. Tel.: +1 787 764000x7798. E-mail address:
[email protected] (S.E. Massey). 1 Tel.: +1 787 764000x7798. 0378-1119/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.gene.2013.05.082
size of DNA microbes and mutation rates (Drake, 1991; Drake et al., 1998). The potential effect of information content on genetic fidelity can be understood in the following way. If a genome possesses a larger number of nucleotides that carry genetic information, then the size of the mutational target will be larger. This means that the mutational load will be higher, given the random occurrence of mutations. Thus, as most mutations are deleterious, a greater selective pressure will be exerted to reduce the occurrence of mutations. This may be achieved by the evolution (positive selection) and maintenance (negative selection) of modulators of mutation rates, such as DNA repair genes or DNA polymerase proofreading activities. This in turn implies that the size of the proteome exerts a selection pressure on DNA repair, with genomes encoding larger proteomes possessing more repair genes and more elaborate repair pathways. Consistent with this prediction, in a preliminary study the presence of four DNA repair genes mutM, mutY, mutL and mutS was found to be positively correlated with proteome size in bacteria (Garcia-Gonzalez et al., 2012). The argument might be made that a large proteome is statistically more likely to have a particular gene. However, this is more likely to apply to operational genes, with larger proteomes possessing more genes which reflect greater organismal phenotypic complexity in terms of behavior, metabolism and morphology. In contrast, informational genes involved in essential core functions such as replication or repair, are not directly tied to increased organismal phenotypic complexity. Thus in bacteria, informational genes are usually assumed to
296
A. Garcia-Gonzalez et al. / Gene 528 (2013) 295–303
be widely distributed and not as variable as operational genes. The proteomic constraint theory provides a refinement of this view, in that while many functions of informational genes are universal, it is proposed that the efficiency and complexity of some may vary according to the amount of information present in the genome, according to whether they are involved in the maintenance of genetic fidelity. There are a variety of alternative factors that might affect the distribution and complexity of DNA repair. These include the oxygen content of the environment (oxygen leading to oxidative damage), association with a host (leading to exposure to oxidative stress or an environment that promotes hypermutation, Jolivet-Gougeon et al., 2011), or exposure to sunlight due to photosynthesis (which may lead to UV and free radical damage). How these factors might affect evolution of DNA repair in bacteria has not been elucidated. Other fundamental features such as the mutational robustness of the genome, effective population size and recombination rate might influence the overall strength of selection and so could also influence the presence of DNA repair genes, but presently there is little information available regarding these factors for sequenced genomes. Given a variation in the distribution of DNA repair genes, a fundamental feature of genomes that may be affected by the level and type of DNA repair present is that of GC/AT content. Sueoka (1961) first suggested that DNA repair may affect overall genome GC content, while King and Jukes (1969) more explicitly proposed that differences in GC content were the result of changes in mutation bias due to alterations in DNA repair and is a largely neutral process. However, recent work has implied a genome wide selective pressure for increased GC content (Hershberg and Petrov, 2010; Hildebrand et al., 2010; Raghavan et al., 2012; van Leuven and McCutcheon, 2012), but what that selective pressure may be is not clear, or the cause of differences in GC content between genomes. An additional problem is that such a selective force in favor of increased GC content would be likely to favor a change in underlying mutation bias so that there is a bias in mutations leading to GC, however the underlying mutation bias is actually toward AT (Lee et al., 2012; Lind and Andersson, 2008; Sung et al., 2012a, 2012b). An interesting case study is presented by the intracellular bacteria. Their genomes are mostly AT rich, they have small proteomes compared to non-intracellular bacteria, and also have lost some of their DNA repair genes (McCutcheon and Moran, 2011). The loss of DNA repair genes in intracellular bacteria may be attributed to a reduction in selection pressure and this could be due variously to reduced population size (Wernegreen and Moran, 1999), Muller's ratchet (Lynch, 1996; Moran, 1996) or a reduced proteomic constraint (Garcia-Gonzalez et al., 2012; Massey, 2008). A link between the loss of repair genes and an elevation in AT content may be proposed. Consistent with this idea, there is a correlation between AT bias and the absence of the mutM and mutY genes, which correct AT→GC mutations (Garcia-Gonzalez et al., 2012). Recombination repair is a major mechanism of DNA repair in bacteria and so serves as a useful model to examine the evolution of DNA repair. In bacteria, there are two main pathways; non homologous end joining (NHEJ) and homologous recombination, both involved in the repair of double stranded breaks. NHEJ involves the ligation of break ends without a homologous template to guide repair and is error prone (Shuman and Glickman, 2007). Homologous recombination in contrast utilizes a homologous sequence and is less error prone (Ayora et al., 2011). There are a variety of genes involved in both processes, the mechanisms of which have not been fully elucidated. While some recombination repair genes are known to be widely distributed, such as recA, the distributions of many other genes involved in recombination are not well characterized. Here, the relationship between the presence of recombination repair genes and factors such as proteome size, pathogenicity, aerobiosis and photosynthesis was examined using 900 bacterial genomes. Recombination repair genes were more likely to be absent in bacteria with smaller
proteomes, including and excluding intracellular bacteria in the analysis, consistent with prediction of the proteomic constraint theory. The other factors examined were not as important in affecting gene distribution. Lastly, genome AT bias was correlated with absence of genes involved in recombination repair and may reflect a change in mutational bias.
2. Methods 2.1. Genome and gene data All 900 completed eubacterial genomes present in the Integrated Microbial Genomes (IMG) database (Joint Genomes Institute) on 4th February 2010 were used for analysis. Recombinational repair genes chosen for the analysis were recA (involved in all recombination pathways) and recX its regulator, recB, recC and recD (involved in end repair), ruvA, ruvB, ruvC, recG, recU (involved in Holliday junction resolution), recJ, recF, recO, recR (involved in gap repair), addA, addB, recE, recN, priA (repair of double stranded breaks), ligD (nonhomologous end joining) and recQ (initiates recombination). The presence or absence of a gene in a complete genome was initially determined using genomic annotation using the IMG compare genomes tool at the IMG website (img.jgi.doe.gov). In order to verify the absence of the gene in genomes that lacked an annotated gene, they were Blast searched (using tblastn), using the respective gene sequence from a related bacterium, chosen according to the phylogenetic relationships displayed at the NCBI microbial Blast website (www.ncbi.nlm.nih.gov/sutils/genom_table.cgi). Hits with an Expect value larger than e −15 were discounted. For each gene this process was conducted separately by at least two workers. Proteome sizes were calculated from the respective Genbank genome entries using a Perl script and included plasmids. Total genome GC contents were obtained from the NCBI microbial genomes website (www.ncbi.nlm. nih.gov/genomes/lproks.cgi?view=1). The definitions of the different bacterial lifestyles utilized in the study are as follows. ‘Intracellular’ bacteria are those bacteria with an obligate intracellular existence in a host cell. ‘Non-intracellular’ are all the remaining bacteria; these may live outside a host cell but inside a host tissue (‘extracellular’), or in the environment without a close association with another organism (‘free living’). ‘Pathogenic’ bacteria may act as a pathogen and can be ‘host associated’ or ‘opportunistic’. Host associated pathogenic bacteria reside within or on a host for any part of their lifecycle and typically cause disease, while opportunistic pathogens do not necessarily reside in or on a host. If they do, they do not typically cause disease. ‘Aerobic’ bacteria have an obligate aerobic existence, while ‘anaerobic’ bacteria are obligately anaerobic. 2.2. Hierarchical cluster analysis of gene distribution across genomes The relative distribution of recombination repair genes was analyzed using the enhanced heatmap (heatmap.2) package of the R statistical project ( R development core team, 2011). The complete linkage methodology was used to cluster the columns and rows, using euclidean distances. Data was binary, with genes present (1), or absent (0). 2.3. Co-distribution of genes The co-distribution of the 21 recombination genes within the 900 genomes was examined using the Φ coefficient, as described previously (Garcia-Gonzalez et al., 2012). The Φ coefficient was calculated for each possible pair of genes, and the results were plotted as a heatmap with color indicating whether the genes were co-distributed (red) or whether they were anti-distributed (blue).
A. Garcia-Gonzalez et al. / Gene 528 (2013) 295–303
3. Results and discussion 3.1. Relationship between proteome size and genes involved in recombinational repair For 17 of the 21 genes examined, the average proteome size is larger if the gene is present than if the gene is absent (Table 1), with the exception of recF, recJ and addB, for which the average sizes are not significantly different, and recU, for which the average proteome size is larger if the gene is absent (average proteome size if gene absent = 1,129,673 amino acids, average proteome size if gene present = 961,449 amino acids). When the analysis is repeated, excluding intracellular bacteria from the dataset, which generally have reduced proteome sizes (the average size in this analysis is 1,502,700 amino acids, compared to 393,200 amino acids for non-intracellular bacteria; Table 2a), a similar relationship is observed. Again, for most of the genes examined (15), the average proteome size is larger if the gene is present than if the gene is absent. The average proteome sizes for recF, recJ, addA and recA are not significantly different if the genes are present or absent, while the genomes from which recU and addB are absent have a larger average proteome size. The data indicate that recU and addB are under different evolutionary pressures from the
297
majority of recombination repair genes, in that the genomes from which it is absent have larger proteomes on average. The data indicate that mostly there is a positive relationship between proteome size and the presence of recombination repair genes, including when only non-intracellular bacteria are considered, consistent with the operation of the proposed proteomic constraint. The analysis doesn't incorporate the factor of phylogenetic nonindependence. This refers to the observation that each datapoint is not necessarily independent, given that the common ancestor of related lineages may acquire or lose DNA repair genes. This is difficult to incorporate into the analysis as it requires the construction of a phylogenetic tree for the 900 bacterial species, which is problematic computationally if more accurate likelihood or Bayesian methods are used. Addition of more genomes to the analysis would exacerbate this problem. Even using these approaches a substantial proportion of nodes in the tree would likely not be statistically supported. Until an accurate phylogeny of large numbers of bacterial species can be accomplished it will be hard to incorporate this factor. The argument might be made that the smaller the size of a proteome, any gene is statistically more likely to be absent. However, this argument ignores the role of selection pressure in maintaining the genes; if a gene has an essential role then selection will maintain it
Table 1 Relationship of the presence/absence of recombination genes with proteome size and GC content of 900 bacterial genomes was examined (total dataset), of which 782 were non-intracellular bacteria. Proteome sizes were calculated as described in Methods, and genome GC contents were obtained from the NCBI. Numbers of genomes that lack the respective genes are indicated in brackets in the second column. P values were generated using a two-tailed Mann–Whitney test. ‘n.s.’ denotes ‘not significant’. Gene
Number of genes absent from 900 total genomes or 782 genomes of non-intracellular bacteria
Mean proteome size if gene absent
Mean proteome size if gene present
p value
Mean genome GC content if gene absent
Mean genome GC content if gene present
p value
recA
Total dataset (20) Non-intracellular (6) Total dataset (480) Non-intracellular (411) Total dataset (594) Non-intracellular (511) Total dataset (421) Non-intracellular (353) Total dataset (845) Non-intracellular (729) Total dataset (749) Non-intracellular (659) Total dataset (83) Non-intracellular (38) Total dataset (114) Non-intracellular (101) Total dataset (259) Non-intracellular (185) Total dataset (72) Non-intracellular (36) Total dataset (280) Non-intracellular (197) Total dataset (133) Non-intracellular (100) Total dataset (657) Non-intracellular (562) Total dataset (451) Non-intracellular (349) Total dataset (50) Non-intracellular (28) Total dataset (22) Non-intracellular (6) Total dataset (205) Non-intracellular (165) Total dataset (463) Non-intracellular (389) Total dataset (649) Non-intracellular (542) Total dataset (63) Non-intracellular (38) Total dataset (698) Non-intracellular (592)
480,766 1,199,127 996,885 1,081,325 1,024,972 1,109,822 952,877 1,062,190 1,061,036 1,160,260 1,084,128 1,170,442 481,160 716,232 1,164,689 1,271,609 875,282 1,097,198 618,949 971,601 614,033 739,524 829,690 997,916 1,129,673 1,242,044 946,270 1,116,362 619,041 905,742 308,673 700,294 750,582 863,279 1,023,541 1,140,389 1,076,045 1,210,901 700,776 995,242 930,262 1,031,289
1,097,968 1,179,685 1,184,100 1,288,965 1,199,326 1,211,851 1,199,719 1,276,638 1,440,940 1,449,076 1,084,868 1,230,160 1,145,521 1,203,513 1,072,586 1,166,223 1,168,688 1,205,442 1,124,713 1,189,883 1,296,609 1,328,110 1,128,394 1,206,509 961,449 1,020,919 1,222,849 1,230,994 1,111,617 1,190,013 1,103,686 1,183,542 1,182,673 1,264,489 1,148,576 1,218,878 1,105,474 1,109,677 1,113,116 1,189,263 1,616,358 1,642,670
b0.05 n.s. b0.05 b0.05 b0.05 b0.05 b0.05 b0.05 b0.05 b0.05 n.s. n.s. b0.05 b0.05 n.s. n.s. b0.05 b0.05 b0.05 n.s. b0.05 b0.05 b0.05 b0.05 b0.05 b0.05 b0.05 b0.05 b0.05 b0.05 b0.05 b0.05 b0.05 b0.05 b0.05 n.s. n.s. b0.05 b0.05 b0.05 b0.05 b0.05
0.34 0.47 0.47 0.49 0.47 0.49 0.47 0.49 0.48 0.50 0.49 0.51 0.35 0.40 0.51 0.53 0.45 0.49 0.40 0.52 0.41 0.44 0.43 0.47 0.50 0.52 0.46 0.50 0.37 0.43 0.31 0.41 0.36 0.38 0.48 0.51 0.49 0.52 0.42 0.50 0.45 0.46
0.49 0.50 0.49 0.51 0.50 0.53 0.50 0.50 0.52 0.52 0.46 0.48 0.50 0.51 0.48 0.50 0.50 0.51 0.49 0.50 0.52 0.52 0.49 0.51 0.44 0.45 0.51 0.51 0.49 0.50 0.49 0.50 0.52 0.53 0.48 0.50 0.47 0.47 0.49 0.50 0.61 0.62
b0.05 n.s. b0.05 b0.05 b0.05 b0.05 b0.05 b0.05 b0.05 n.s. b0.05 b0.05 b0.05 b0.05 b0.05 b0.05 b0.05 n.s. b0.05 n.s. b0.05 b0.05 b0.05 b0.05 b0.05 b0.05 b0.05 n.s. b0.05 b0.05 b0.05 n.s. b0.05 b0.05 n.s. n.s. b0.05 b0.05 b0.05 n.s. b0.05 b0.05
recB recC recD recE recF recG recJ recN recO recQ recR recU recX ruvA ruvB ruvC addA addB priA ligD
298
A. Garcia-Gonzalez et al. / Gene 528 (2013) 295–303
Table 2 Recombination repair gene contents of different categories of bacteria. Different categories of bacterial genomes were examined for their average proteome sizes, average GC contents and gene contents. a) Intracellular bacteria were compared with non-intracellular bacteria; b) pathogenic (host associated and opportunistic) non-intracellular bacteria were compared with non-pathogenic non-intracellular bacteria; c) host associated non-intracellular pathogenic bacteria were compared with all other non-intracellular bacteria (including opportunistic pathogens); d) cyanobacteria were compared with non-cyanobacteria; e) aerobic bacteria were compared with anaerobic bacteria. For each gene, the category of bacteria that has a higher number of genomes that possess the gene is highlighted in gray. The different categories of bacteria are defined in Methods. The difference in means for proteome size and GC content was tested using a two-tailed Mann–Whitney test. The difference in percentages was tested using a two-proportion Z test. ‘n.s.’ denotes not significant.
Table 2 (continued)
c
Mean GC content Percentage that lack recA Percentage that lack recB Percentage that lack recC Percentage that lack recD Percentage that lack recE Percentage that lack recF Percentage that lack recG Percentage that lack recJ Percentage that lack recN Percentage that lack recO Percentage that lack recQ Percentage that lack recR Percentage that lack recU Percentage that lack recX Percentage that lack ruvA Percentage that lack ruvB Percentage that lack ruvC Percentage that lack addA Percentage that lack addB Percentage that lack priA Percentage that lack ligD
a
Mean proteome size (amino acids) Mean GC content
Intracellular bacteria (118 genomes)
Non-intracellular bacteria (782 genomes)
p value
393200
1502700
p < 0.05
0.37
0.50
p < 0.05
Percentage that lack recA
11.9
0.8
p < 0.05
Percentage that lack recB
58.5
52.3
n.s.
Percentage that lack recC
70.3
65.3
n.s.
Percentage that lack recD
57.6
45.1
p < 0.05
Percentage that lack recE
98.3
93.2
p < 0.05
Percentage that lack recF
76.2
84.3
p < 0.05
Percentage that lack recG
38.1
4.9
p < 0.05
Percentage that lack recJ
11.0
12.9
n.s.
Percentage that lack recN
62.7
23.7
p < 0.05
Percentage that lack recO
30.5
4.6
p < 0.05
Percentage that lack recQ
70.3
25.2
p < 0.05
Percentage that lack recR
27.9
12.8
p < 0.05
Percentage that lack recU
80.5
71.9
p < 0.05
Percentage that lack recX
86.4
44.6
p < 0.05
Percentage that lack ruvA
18.6
3.6
p < 0.05
Percentage that lack ruvB
13.6
0.8
p < 0.05
Percentage that lack ruvC
33.9
21.1
p < 0.05
Percentage that lack addA
62.3
49.7
p < 0.05
Percentage that lack addB
90.7
69.3
p < 0.05
Percentage that lack priA
21.2
4.9
p < 0.05
Percentage that lack ligD
89.8
75.7
p < 0.05
Non-intracellular pathogens (334 genomes)
Non-intracellular nonpathogens (448 genomes)
p value
1105439
1235300
p < 0.05
0.47 0.5 47.0 55.4 46.7 89.8 80.5 6.2 9.3 23 1.7 30.8 10.2 65.6 27.5 3.3 0 28.7 49.1 66.8 1.8 80.2
0.52 0.9 56.6 72.8 44 95.8 87.1 3.8 15.6 24.1 6.7 21 14.7 76.6 57.4 3.8 1.1 15.4 50.2 71.2 7.1 72.3
p < 0.05 n.s. p < 0.05 p < 0.05 n.s. p < 0.05 p < 0.05 n.s. p < 0.05 n.s. p < 0.05 p < 0.05 n.s. p < 0.05 p < 0.05 n.s. p < 0.05 p < 0.05 n.s. n.s. p < 0.05 p < 0.05
c Host associated nonAll other nonintracellular pathogens intracellular bacteria (96 genomes) (686 genomes) Mean proteome size (amino acids) Mean GC content
p value
1117947
1188496
n.s.
0.50
0.50
n.s.
All other nonintracellular bacteria (686 genomes)
p value
0.50 0 33.0 46.9 46.9 86.5 78.1 7.3 8.3 26.0 0 28.1 9.4 88.5 37.5 3.1 1.0 5.2 61.5 79.2 0 88.5
0.50 0.8 55.0 67.9 44.9 94.2 85.1 4.5 13.6 23.3 5.2 24.8 13.3 69.5 45.6 3.6 0.7 23.3 48.1 67.9 5.5 73.9
n.s. p < 0.05 p < 0.05 p < 0.05 n.s. p < 0.05 n.s. n.s. n.s. n.s. p < 0.05 n.s. n.s. p < 0.05 n.s. n.s. n.s. p < 0.05 p < 0.05 p < 0.05 p < 0.05 p < 0.05
Cyanobacteria (36 genomes)
Non-cyanobacteria (864 genomes)
p value
1010200 0.4711 0 44.4 50.0 33.3 100 88.9 0 5.6 22.2 2.8 27.8 2.8 100 91.7 0 0 2.8 36.1 97.2 8.3 97.2
1087300 0.4845 2.3 53.7 66.7 47.3 93.6 83.0 9.6 13.0 29.0 8.2 31.2 15.3 71.9 48.4 5.8 2.5 23.6 52.1 71.1 6.9 76.7
n.s. n.s. p < 0.05 n.s. p < 0.05 n.s. p < 0.05 n.s. p < 0.05 n.s. n.s. n.s. n.s. p < 0.05 p < 0.05 p < 0.05 p < 0.05 p < 0.05 p < 0.05 n.s p < 0.05 n.s. p < 0.05
Aerobic bacteria (358 genomes)
Anaerobic bacteria (155 genomes)
p value
1190052 0.52 1.1 52.5 63.7 45.3 97.5 79.9 7.3 14.8 26.8 5.9 36.0 10.3 82.7 52.0 4.5 1.4 7.0 50.0 78.5 3.1 63.7
1043206 0.47 5.2 63.2 78.7 47.8 98.7 93.6 10.3 15.5 30.3 11.0 31.0 27.7 85.8 53.6 6.11 3.0 33.2 51.6 72.9 8.4 83.2
n.s. n.s. p < 0.05 p < 0.05 p < 0.05 n.s. n.s. p < 0.05 n.s. n.s. n.s. n.s n.s. p < 0.05 n.s. n.s. n.s. n.s. p < 0.05 n.s. n.s. p < 0.05 p < 0.05
d
Mean proteome size (aa) Mean GC content Percentage that lack recA Percentage that lack recB Percentage that lack recC Percentage that lack recD Percentage that lack recE Percentage that lack recF Percentage that lack recG Percentage that lack recJ Percentage that lack recN Percentage that lack recO Percentage that lack recQ Percentage that lack recR Percentage that lack recU Percentage that lack recX Percentage that lack ruvA Percentage that lack ruvB Percentage that lack ruvC Percentage that lack addA Percentage that lack addB Percentage that lack priA Percentage that lack ligD
b
Mean proteome size (amino acids) Mean GC content Percentage that lack recA Percentage that lack recB Percentage that lack recC Percentage that lack recD Percentage that lack recE Percentage that lack recF Percentage that lack recG Percentage that lack recJ Percentage that lack recN Percentage that lack recO Percentage that lack recQ Percentage that lack recR Percentage that lack recU Percentage that lack recX Percentage that lack ruvA Percentage that lack ruvB Percentage that lack ruvC Percentage that lack addA Percentage that lack addB Percentage that lack priA Percentage that lack ligD
Host associated nonintracellular pathogens (96 genomes)
e
Mean proteome size (aa) Mean GC content Percentage that lack recA Percentage that lack recB Percentage that lack recC Percentage that lack recD Percentage that lack recE Percentage that lack recF Percentage that lack recG Percentage that lack recJ Percentage that lack recN Percentage that lack recO Percentage that lack recQ Percentage that lack recR Percentage that lack recU Percentage that lack recX Percentage that lack ruvA Percentage that lack ruvB Percentage that lack ruvC Percentage that lack addA Percentage that lack addB Percentage that lack priA Percentage that lack ligD
A. Garcia-Gonzalez et al. / Gene 528 (2013) 295–303
299
Fig. 1. Relationship between the total number of recombination genes in a genome and proteome size. The presence/absence of a gene in a genome, and the proteome size of that genome, were calculated as described in Methods. Intracellular bacteria are highlighted in red.
in the genome, and when the selection pressure is reduced the gene is more likely to be lost, an evolutionary version of ‘if you don't use it you lose it’. This evolutionary perspective implies that when genes are absent there is insufficient selection pressure to evolve or maintain that particular gene function, and is not simply a stochastic effect. Thus an evolutionary explanation is required when the presence of a gene scales with the size of the proteome. The positive relationship between the total number of recombination repair genes in a genome and proteome size can be observed in Fig. 1. Intracellular bacteria form a distinct group (in red) with reduced proteome sizes and lower numbers of recombination genes, discussed further below. The negative power law relationship between proteome size and gene presence is consistent with the action of the proteomic constraint and Eq. (1) (below). This equation results from the proposal that the selection pressure to maintain or evolve DNA repair is inversely proportional to the size of the proteome and leads to a negative power law relationship, exponent -1. Fig. 2 shows that there is an inverse relationship between the level of gene conservation (indicated by the number of genomes where the gene is absent; a low number indicates a highly conserved gene) and the average proteome size of genomes where the gene is absent. Hence, more widely distributed (conserved) genes are absent from bacteria with smaller proteomes. The analysis was conducted for both the total dataset (Fig. 2a) and non-intracellular bacteria only (Fig. 2b), and the results are consistent with the action of a proteomic constraint. The correlation may be understood by considering that a highly conserved gene is going to need an extreme reduction in the size of selective pressure that is maintaining it (which is proportional to proteome size), before it can be lost. Hence, highly conserved genes will only be lost from very small proteomes where the selective pressure (proportional to proteome size) is substantially reduced. The data constitute a preliminary test of the proteomic constraint hypothesis, and while the data are consistent with prediction, further work with a greater range of DNA repair genes in a greater diversity of organisms is required to confirm that its operation is general. The proteomic constraint is not proposed to be the only factor affecting the distribution of DNA repair genes, but one of several potential factors. These might include the amount of mutational robustness and genetic redundancy in a genome, the rate of recombination and the population size, in addition to lifestyle and habitat variables discussed below. These factors are included in the ‘constraint factor’, C, in the following equation, which was derived to explain the relationship between mutation rate and proteome size in DNA viruses, microbes and multicellular organisms: −1
μ ¼ CP
½1
(Massey, 2008, 2013) where μ is the mutation rate per site per cell division and P is the proteome size (total number of codons).
3.2. Relationship of recombination repair genes to lifestyle and habitat The distribution of the recombination repair genes was examined with regard to lifestyle and habitat, in order to identify additional factors that may affect gene distribution. 17 genes were found to be
Fig. 2. Plot of gene conservation versus average proteome size when gene is absent from a genome a) total dataset; b) non-intracellular bacteria. Gene conservation is represented by the number of genomes where the gene is absent; a small number means that the gene is highly conserved. If the average proteome size was larger when the gene was absent from the genome, the gene was excluded from the analysis.
300
A. Garcia-Gonzalez et al. / Gene 528 (2013) 295–303
more common in non-intracellular bacteria, 1 was more common in intracellular bacteria and 3 were not significantly different (Table 2a). Thus, there appears to be a pervasive loss of recombination repair genes in intracellular bacteria, consistent with earlier observations (Dale et al., 2003). Intracellular bacteria have smaller proteomes on average, mostly resulting from the loss of genes that become redundant when the bacteria adopt an intracellular lifestyle. For example, amino acid biosynthesis genes are often lost in intracellular bacteria, as the host cell is able to provide a supply of amino acids (Yu et al., 2009). However, it is not clear why genes involved in recombination repair should become redundant when bacteria adopt an intracellular habitat. There are three potential evolutionary explanations for the absence of recombination repair genes in intracellular bacteria. Firstly, Muller's ratchet may be operating in intracellular bacteria, as they are effectively asexual (Lynch, 1996; Moran, 1996). This is expected to lead to a genome wide reduction in selection pressure and the concomitant loss of those genes that as a result are only weakly maintained. Secondly, increased drift as a result of a decrease in effective population size has been proposed to lead to a reduction in genome wide selection pressure (Wernegreen and Moran, 1999). This factor might be expected to affect the strength of selection on DNA repair, and has been proposed to affect selection on DNA repair in eukaryotes as well (Sung et al., 2012a, 2012b). Thirdly, when proteomes are reduced in size, there may be less selective pressure to maintain genes involved in DNA repair as the information content is reduced thus reducing the mutational load i.e. a reduced proteomic constraint. The first two explanations apply particularly to the absence of DNA repair genes in intracellular bacteria, while the proteomic constraint can also explain the absence of recombination repair genes in smaller non-intracellular bacteria (Table 1). This includes Prochlorococcus strains (Dufresne et al., 2005; Partensky and Garczarek, 2010) and Candidatus Pelagibacter ubique (Viklund et al., 2012). The proteomic constraint theory can also explain why highly conserved genes are absent in the genomes of non-intracellular bacteria with smaller proteome sizes (Fig. 2b). Recombination rates are often linked to pathogenicity in the literature, one mechanism mediated by recombination being that of antigenic variation (Vink et al., 2011). In addition, pathogens often encounter oxidative stress due to the host immune response, which causes DNA damage (O'Rourke et al., 2003), and it has been proposed that some pathogens may elevate their mutation rates in order to adapt to stressful conditions, such as competition with the host immune system, in a process known as adaptive mutagenesis (Cairns et al., 1988). There is an apparent difference in the distribution of recombination repair genes between pathogenic (both host associated and opportunistic) and non-pathogenic non-intracellular bacteria (Table 2b). 10 genes were more common in the former and 3 genes were more common in the latter, while 8 genes were not significantly different. However, this is not likely to be the result of adaptive differences, given that less difference was detected between host associated non-intracellular pathogenic bacteria (which are adapted to survival in the host) and other non-intracellular bacteria, including opportunistic pathogens (Table 2c). Here, 7 genes were more common in the former, 4 genes were more common in the latter, while 10 were not significantly different. These observations indicate that if the pathogenic lifestyle exerts an evolutionary pressure on recombination, it is subtle. Photosynthesis implies an exposure to UV radiation, a cause of DNA damage, while photosynthesis itself results in the production of singlet oxygen which can cause DNA damage (Agnez-Lima et al., 2012, a review). Likewise, aerobiosis results in the production of singlet oxygen (Agnez-Lima et al., 2012). Thus, there may be an enhanced selective pressure for DNA repair in photosynthetic and aerobic organisms. When the major group of photosynthetic bacteria, the cyanobacteria, were examined no strong difference was observed with noncyanobacteria, with 7 genes being more common in cyanobacteria and
5 genes being more common in non-cyanobacteria and 9 showing no significant difference (Table 2d). This indicates that exposure to UV radiation and the photosynthetic process does not require enhanced recombination repair in the cyanobacteria. This is consistent with the observation that UV sensitive nucleotides are not biased in light exposed bacterial genomes (Palmeira et al., 2006). When aerobic bacteria were compared to anaerobic bacteria, there was some evidence that recombination repair genes are more frequent in aerobes; 8 genes were more common in aerobes, none were more common in anaerobes and 13 did not show any significant difference (Table 2e). This may be consistent with elevated selection pressure to repair the effects of free radical damage associated with exposure to oxygen. Genes involved in recombination repair may also be involved in other functions in the bacterial cell in addition to DNA repair. For example, recombination is activated in order to integrate foreign DNA taken up via transformation or from conjugation, involving the ‘repair’ of double stranded breaks. However, a correlation was not observed between the amount of LGT in the genome and the distribution of individual recombination repair genes (Supplementary Table 1). In summary, there is a marked absence of recombination repair genes in intracellular bacteria. This may be related to their reduced proteome size, and hence reduced proteomic constraint, or reduced selection due to reduced population size or asexual mode of reproduction. There is little evidence for the influence of other lifestyle differences on the distribution of recombination repair genes, with the possible exception of aerobiosis, which may promote the acquisition, evolution and retention of genes associated with recombination. 3.3. Relationship between genome GC content and absence of recombination genes In principle, if a DNA repair pathway is biased in the types of mutations it corrects, then it may be able to alter genome GC content over time. Consistent with this, a potential link was demonstrated between two genes involved in base excision repair, mutM and mutY, and bacterial genome GC content, with genomes lacking the genes being more AT biased, consistent with their function in correcting AT→GC mutations (Garcia-Gonzalez et al., 2012). Therefore, we investigated whether recombination repair genes might be linked to genome GC content. In general, genomes that lack recombination genes are more AT rich than those that possess the genes (Table 1), with the exception of recF, recJ, recU and addB, which were more GC rich on average, and addA (no significant difference). However, the differences were small (b 5% GC), with the exception of recG, ruvA, ruvB, ruvC and priA. When intracellular bacteria are excluded from the analysis, genomes that lack recJ, recF, recU and addB were again more GC rich on average, as with the total dataset. However, this time recA, recE, recN, recO, recX, ruvB, addA and priA did not show a significant difference in AT contents between non-intracellular genomes that possess the genes and those that do not. This indicates that the relationship of these genes to reduced AT content in the total dataset is likely due to the inclusion of intracellular bacteria in the analysis, which are more AT biased (average GC content of intracellular bacteria = 37%, average GC content of non-intracellular bacteria = 50%; Table 2a). Thus, these genes are not likely to be responsible for differences in genome AT content. Intracellular bacterial genomes that lack recA and priA have been shown to have strong asymmetric mutational bias (Klasson and Andersson, 2006). In the light of our results, this appears to be the result of the co-correlation of higher AT content with the smaller proteome sizes of intracellular bacteria for both recA and priA. Of the remaining genes, the differences in genome GC content are small, again with the exception of recG, ruvA, ruvB, ruvC and ligD. The absence of these five genes is linked with increased AT content in both the total dataset, and in non-intracellular bacteria only. There is some experimental evidence that both ruvABC and recG are involved in error prone repair of double stranded breaks (Harris et al., 1996; He et al., 2006), and
A. Garcia-Gonzalez et al. / Gene 528 (2013) 295–303
possibly biased mutations could be introduced as a result. Likewise, ligD is involved in mutagenic double stranded break repair (Gong et al., 2005; Stephanou et al., 2007). These genes therefore could be affecting underlying genome GC content by the introduction of biased mutations during the repair process. An alternative explanation is that the link with AT bias may be the result of correlation with proteome size, which is itself correlated with AT bias. Evidence in support of this co-correlation is the observation that those genes that do not show a positive relationship with proteome size for both datasets (recF, recJ, and recU), show the opposite relationship with GC content than those genes that have a positive relationship with proteome size. Lastly, an alternative scenario is that if individual genes do not have substantial effects on genome GC content, it may be that AT bias in a genome arises from the combined effects of the cumulative absence of DNA repair genes, each one having a small effect on genome composition. This idea is consistent with the apparently universal underlying mutation bias towards AT. 3.4. Co-distribution of recombination genes A hierarchical cluster analysis was conducted in order to visualize the distribution of recombination repair genes in the genomes examined (Fig. 3). Groups of genomes with similar complements of
301
recombination repair genes cluster together on the left hand vertical axis, while genes with similar distributions across the genomes cluster on the top horizontal axis. Of particular interest is the bottom cluster of bacteria (indicated) that have a marked tendency to lack recombination genes. These include intracellular species of the genuses Buchnera, Mycoplasma, Ehrlichia and Ureaplasma and is consistent with the data in Figure 1 and Table 2a. The Buchneras are gammaproteobacteria, the Ehrlichias alphaproteobacteria and the Mycoplasmas/Ureasplasmas are mollicutes, thus the patterns of gene absence in these three groups appear to be an example of convergent evolution with similar evolutionary pressures functioning in each genus. Interestingly, a number of non-intracellular species were also found in the same cluster including Myxococcus xanthus, Mesorhizobium loti, Opitutus terrae, Desulfohalobium retbaense, Aquifex aeolicus, Hyphomonas neptunium and Flavobacteriaceae bacterium 3519–10. M. xanthus, O. terrae, H. neptunium are characterized by a degree of cellular complexity and differentiation. D. retbaense is a halophile while A. aeolicus is a thermophile belonging to a deep branching bacterial lineage. M. loti is a symbiont of lotus, and F. bacterium 3519–10 is a free living chemo-organotroph. recA, ruvA and ruvB are highly conserved throughout the majority of the genomes examined indicating functions of fundamental importance
Fig. 3. Cluster analysis of recombination repair gene distributions across genomes. Rows represent genomes, and columns represent genes. Black denotes the absence of a gene in a genome, and gray indicates presence. The clustering of columns uses the complete linkage method. The right hand side of the figure shows if a genome belongs to intracellular bacteria.
302
A. Garcia-Gonzalez et al. / Gene 528 (2013) 295–303
to bacterial survival i.e. housekeeping functions. Of interest is the small number of genomes that lack these genes. A previous study reported that a number of bacteria have lost these genes (Rocha et al., 2005). These bacteria either have a unique habitat and lifestyle, or the loss of the genes may be explained by a strong reduction in proteome size and concomitant reduction in the size of the proteomic constraint selection pressure, responsible for maintaining them. The genes recE and recF are poorly conserved, indicating that their role is often superfluous to bacterial survival, and that their presence may be related to a particular lifestyle or habitat characteristics. Pairs of genes that were co-distributed or anti-distributed were identified using the Φ coefficient and a heatmap approach (Fig. 4). The majority of gene pairs examined were not strongly correlated, but there were some that were both co-distributed and anti-distributed, as follows. i) recB and recC The gene products of recB, recC and recD form the recBCD complex in E. coli (Amundsen et al., 1986). In our data, recB and recC are co-distributed, while recD is not (Fig. 4). This is consistent with the physical interaction of the protein products of recB with recC, essential for function, and the non-essential interaction of the protein products of recD with recBC in E. coli (Pavankumar et al., 2010). The results indicate that the recD gene is a non-essential member of the complex in other bacteria as well. We show that the recBCD genes are more widely distributed than recF, in contrast with Rocha et al., 2005, probably because of the greater size of the dataset. ii) addA and addB The addA and addB genes form addAB helicase-nuclease (Chedin et al., 1998) and are co-distributed, consistent with their physical interaction and function in double stranded break repair. iii) ruvA, ruvB and recA The gene products of ruvA, ruvB and ruvC form the ruvABC complex in E. coli, which functions in the processing of Holliday junctions (West, 1996, a review). Our data show that ruvA and ruvB are
co-distributed, while ruvC is not. ruvA and ruvB are also highly conserved in terms of their wide distribution, while ruvC is not (Table 1). Both these observations imply that ruvC is a nonessential part of the complex in some bacteria, consistent with experimental evidence that the ruvC protein product acts independently of the ruvA and ruvB protein products in E. coli (West and Connolly, 1992, a review). recA is co-distributed with ruvA and ruvB, consistent with the role of all three proteins in homologous recombination (Seigneur et al., 2000). iv) Negative co-distributions; addA/addB with recA/recB/ruvC and recU with ruvC Negative co-distribution of two genes may be explained by functional redundancy between the genes, or antagonism between the gene functions. From Fig. 4, there are two examples where recombination repair genes have a negative co-distribution. The genes addA and addB were both negatively co-distributed with the genes recA, recB and ruvC. The known functional redundancy of the addAB complex with the recBCD complex (Chedin et al., 1998) may explain the negative co-distribution of addA and addB with recBC. recU is negatively co-distributed with ruvC. These two proteins both act to resolve Holliday junctions, and appear to be functionally interchangeable (McGregor et al., 2005), thus their negative co-distribution may also be explained by functional redundancy. 4. Conclusion The data presented here are consistent with the idea that the size of the proteome exerts a selective pressure to evolve and maintain DNA repair genes. This does not exclude other factors from affecting the distribution of DNA repair genes. The relationship between the absence of recombination repair genes and smaller proteome size is observed for both intracellular and non-intracellular bacteria, indicating that the alternative explanations of Muller's ratchet and increased drift for the absence of DNA repair genes in intracellular bacteria are not generally applicable. No clear link was observed between the presence/absence of recombination genes and pathogenicity or photosynthesis, while there is a potential link with aerobiosis. A relationship was observed between the absence of recombination repair genes and AT bias, however it is unclear if individual genes have a substantial effect. An alternative explanation is that a cumulative effect of loss of DNA repair genes results in the underlying AT mutation bias becoming exposed. A basic expectation is that many gene families undergo expansions in size in larger genomes, as organisms have more complex phenotypic characters, however there have been few quantitative studies. There are several questions to be answered such as which gene families expand the quickest, which gene families do not correlate to proteome size and what are the evolutionary forces that influence the rate of scaling with proteome size. This study shows the relationship of one gene family, recombination repair genes, with proteome size. In this particular case we propose that increasing information content appears to exert a greater selective pressure to evolve DNA repair, as other aspects of lifestyle and habitat do not seem to be as important. Supplementary data to this article can be found online at http:// dx.doi.org/10.1016/j.gene.2013.05.082. Conflict of interest statement The authors declare that they have no competing interests. Acknowledgments
Fig. 4. Heatmap showing co-distribution of recombination repair genes. A pairwise matrix was computed to examine co-distribution of genes pairs across genomes. The Φ coefficient was used to calculate the gene distribution correlations (see Methods); red indicates co-distribution, while blue indicates antidistribution.
This work was funded by the Faculty of Natural Sciences, University of Puerto Rico–Rio Piedras. A significant proportion of gene cataloging was undertaken by graduate and undergraduate students in the
A. Garcia-Gonzalez et al. / Gene 528 (2013) 295–303
Biology Department's ‘Introduction to Bioinformatics’ course, taught by SEM in Spring and Fall 2010, and the UPR Bioinformatics Lab. They were Bianca Rivera, Elmo Rodgar, Ivan Adames, Catalina Davila, Lace Calderon, Miriam Torres, Ida Pantoja, Annet Rosado, Jessenia Laguna, Yara Sanchez, Elmer Mendez, Lourdes Gonzalez, Yadira Nogueras, Jeannely Arias, Tasha Santiago, Keven LeBoy, Arisnel Soto, Judith Perez Valle, Gwendolyn Arguello, Yacid Rodriguez, Edwin Portalatin, Andrea Gonzalez, Jean Ruiz (Spring 2010), Samir Bello, Maria Pagan, Yarelis Reyes, Jose Luis Ortiz, Mariela Rivera Sanchez (Fall 2010), Miguelina Carela, Johnathan Cordero, Amy Reyes, Franklin Colon and Javier Gonzalez (UPR Bioinformatics Lab). References Agnez-Lima, L.F., et al., 2012. DNA damage by singlet oxygen and cellular protective mechanisms. Mutat. Res./Rev. Mutat. Res. 751, 15–28. Amundsen, S.K., Taylor, A.F., Chaudhury, A.M., Smith, G.R., 1986. recD: the gene for an essential third subunit of exonuclease V. Proc. Natl. Acad. Sci. U. S. A. 83, 5558–5562. Ayora, S., et al., 2011. Double-strand break repair in bacteria: a view from Bacillus subtilis. FEMS Microbiol. Rev. 35, 1055–1081. Cairns, J., Overbaugh, J., Miller, S., 1988. The origin of mutants. Nature 335, 142–145. Chedin, F., Noirot, P., Blaudet, V., Ehrlich, S.D., 1998. A five-nucleotide sequence protects DNA from exonucleolytic degradation by AddAB, the RecBCD analogue of Bacillus subtilis. Mol. Microbiol. 29, 1369–1377. Dale, C., Wang, B., Moran, N., Ochman, H., 2003. Loss of DNA recombinational repair enzymes in the initial stages of genome degeneration. Mol. Biol. Evol. 20, 1188–1194. Drake, J.W., 1991. A constant rate of spontaneous mutation in DNA-based microbes. Proc. Natl. Acad. Sci. U. S. A. 88, 7160–7164. Drake, J.W., Charlesworth, B., Charlesworth, D., Crow, J.F., 1998. Rates of spontaneous mutation. Genetics 148, 1667–1686. Dufresne, A., Garczarek, L., Partensky, F., 2005. Accelerated evolution associated with genome reduction in a free-living prokaryote. Genome Biol. 6, R14. Garcia-Gonzalez, A., Rivera-Rivera, R., Massey, S.E., 2012. The presence of the DNA repair genes mutM, mutY, mutL and mutS is related to proteome size in bacterial genomes. Front. Evol. Genomic Microbiol. 3, 3. Gong, C., et al., 2005. Mechanism of nonhomologous end-joining in mycobacteria: a lowfidelity repair system driven by Ku, ligase D and ligase C. Nat. Struct. Mol. Biol. 12, 304–312. Harris, R.S., Ross, K.J., Rosenberg, S.M., 1996. Opposing roles of the Holliday junction processing system of Escherichia coli in recombination-dependent adaptive mutation. Genetics 142, 681–691. He, A.S., Rohatgi, P.R., Hersh, M.N., Rosenberg, S.M., 2006. Roles of E. coli double-strandbreak-repair proteins in stress-induced mutation. DNA Repair 5, 258–273. Hershberg, R., Petrov, D.A., 2010. Evidence that mutation is universally biased toward AT in bacteria. PLoS Genet. 6, e1001115. Hildebrand, F., Meyer, A., Eyre-Walker, A., 2010. Evidence of selection upon genomic GC-content in bacteria. PLoS Genet. 6, e1001107. Jolivet-Gougeon, A., et al., 2011. Bacterial hypermutation: clinical implications. J. Med. Microbiol. 60, 563–573. King, J.L., Jukes, T.H., 1969. Non-Darwinian evolution. Science 164, 788–798. Klasson, L., Andersson, S.G.E., 2006. Strong asymmetric mutation bias in endosymbiont genomes coincide with loss of genes for replication restart pathways. Mol. Biol. Evol. 23, 1031–1039. Lee, H., Popodi, E., Tang, H., Foster, P.L., 2012. Rate and molecular spectrum of spontaneous mutations in the bacterium Escherichia coli as determined by whole-genome sequencing. Proc. Natl. Acad. Sci. U. S. A. 109, E2774–E2783. Lind, P.A., Andersson, D.A., 2008. Whole-genome mutational biases in bacteria. Proc. Natl. Acad. Sci. U. S. A. 105, 17878–17883.
303
Lynch, M., 1996. Mutation accumulation in transfer RNAs: molecular evidence for Muller's ratchet in mitochondrial genomes. Mol. Biol. Evol. 13, 209–220. Massey, S.E., 2008. The proteomic constraint and its role in molecular evolution. Mol. Biol. Evol. 25, 2557–2565. Massey, S.E., 2013. Proteome size as the major factor determining mutation rates. Proc. Natl. Acad. Sci. U. S. A. 110, E858–E859. Massey, S.E., Garey, J.R., 2007. A comparative genomics analysis of codon reassignments reveals a link with mitochondrial proteome size and a mechanism of genetic code change via suppressor tRNAs. J. Mol. Evol. 64, 399–410. McCutcheon, J.P., Moran, N.A., 2011. Extreme genome reduction in symbiotic bacteria. Nat. Rev. Microbiol. 10, 13–26. McGregor, N., et al., 2005. The structure of Bacillus subtilis RecU Holliday junction resolvase and its role in substrate selection and sequence-specific cleavage. Structure 13, 1341–1351. Moran, N.A., 1996. Accelerated evolution and Muller's rachet in endosymbiotic bacteria. Proc. Natl. Acad. Sci. U. S. A. 93, 2873–2878. O'Rourke, E.J., et al., 2003. Pathogen DNA as target for host-generated oxidative stress: role for repair of bacterial DNA damage in Helicobacter pylori colonization. Proc. Natl. Acad. Sci. U. S. A. 100, 2789–2794. Palmeira, L., Gueguen, L., Lobry, J.R., 2006. UV-targeted dinucleotides are not depleted in light-exposed prokaryotic genomes. Mol. Biol. Evol. 23, 2214–2219. Partensky, F., Garczarek, L., 2010. Prochlorococcus: advantages and limits of minimalism. Ann. Rev. Mar. Sci. 2, 305–331. Pavankumar, T.L., Sinha, A.K., Ray, M.K., 2010. All three subunits of RecBCD enzyme are essential for DNA repair and low-temperature growth in the Antartic Pseudomonas syringae Lz4W. PLoS One 5, e9412. R Development Core Team, 2011. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria3-900051-07-0 (http://www.R-project.org/). Raghavan, R., Kelkar, Y.D., Ochman, H., 2012. A selective force favoring increased G + C content in bacterial genes. Proc. Natl. Acad. Sci. U. S. A. 109, 14504–14507. Rocha, E.P.C., Cornet, E., Michel, B., 2005. Comparative and evolutionary analysis of the bacterial homologous recombination systems. PloS Genet. 1, e15. Seigneur, M., Ehrlich, S.D., Michel, B., 2000. RuvABC-dependent double-strand breaks in dnaBts mutants require recA. Mol. Microbiol. 38, 565–574. Shuman, S., Glickman, M.S., 2007. Bacterial DNA repair by non-homologous end joining. Nat. Rev. Microbiol. 5, 852–861. Stephanou, N.C., et al., 2007. Mycobacterial nonhomologous end joining mediates mutagenic repair of chromosomal double-strand DNA breaks. J. Bacteriol. 189, 5237–5246. Sueoka, N., 1961. Correlation between base composition of deoxyribonucleic acid and amino acid composition of protein. Proc. Natl. Acad. Sci. U. S. A. 47, 1141–1149. Sung, W., Ackerman, M.S., Miller, S.F., Doak, T.G., Lynch, M., 2012a. Drift-barrier hypothesis and mutation-rate evolution. Proc. Natl. Acad. Sci. U. S. A. 109, 18488–18492. Sung, W., Tucker, A.E., Doak, T.G., Choi, E., Thomas, W.K., Lynch, M., 2012b. Extraordinary genome stability in the ciliate Paramecium tetraurelia. Proc. Natl. Acad. Sci. U. S. A. 109, 19339–19344. van Leuven, J.T., McCutcheon, J.P., 2012. An AT mutational bias in the tiny GC-rich endosymbiont genome of Hodgkinia. Genome Biol. Evol. 4, 24–27. Viklund, J., Ettema, T.J.G., Andersson, S.G.E., 2012. Independent genome reduction and phylogenetic reclassification of the oceanic SAR11 clade. Mol. Biol. Evol. 29, 599–615. Vink, C., Rudenko, G., Seifert, H.S., 2011. Microbial antigenic variation mediated by homologous DNA recombination. FEMS Microbiol. Rev. 36, 917–948. Wernegreen, J.J., Moran, N.A., 1999. Evidence for genetic drift in endosymbionts (Buchnera): analyses of protein-coding genes. Mol. Biol. Evol. 16, 83–97. West, S.C., 1996. The RuvABC proteins and Holliday junction processing in Escherichia coli. J. Bacteriol. 178, 1237–1241. West, S.C., Connolly, B., 1992. Biological roles of the Escherichia coli RuvA, RuvB and RuvC proteins revealed. Mol. Microbiol. 6, 2755–2759. Yu, X.-J., Walker, D.H., Liu, Y., Zhang, L., 2009. Amino acid biosynthesis deficiency in bacteria associated with human and animal hosts. Infect. Genet. Evol. 9, 514–517.