Update
232
TRENDS in Genetics Vol.20 No.6 June 2004
Evidence against the selfish operon theory Csaba Pa´l1,2 and Laurence D. Hurst2 1 2
MTA, Theoretical Biology Research Group, Eo¨tvo¨s Lora´nd University, Pa´zma´ny Pe´ter Se´ta´ny 1/C, Budapest, H-1117, Hungary Department of Biology and Biochemistry, University of Bath, Bath, Somerset BA2 7AY, UK
According to the selfish operon hypothesis, the clustering of genes and their subsequent organization into operons is beneficial for the constituent genes because it enables the horizontal gene transfer of weakly selected, functionally coupled genes. The majority of these are expected to be non-essential genes. From our analysis of the Escherichia coli genome, we conclude that the selfish operon hypothesis is unlikely to provide a general explanation for clustering nor can it account for the gene composition of operons. Contrary to expectations, essential genes with related functions have an especially strong tendency to cluster, even if they are not in operons. Moreover, essential genes are particularly abundant in operons. There is an increasing realization that gene order is not random and that linked genes tend to share expression characteristics. Nowhere is this more striking than in bacterial operons, in which functionally related genes often cluster. Although it is tempting to suppose that operon evolution came about by the selection for coexpression, this logic has been challenged [1,2]. Coregulation of genes in operons can provide selection for the maintenance of operon structure; however, it is not clear how it can explain the assembly of gene clusters by gradual steps because no benefit is expected to be derived from proximity until co-transcription is possible [1]. The selfish operon hypothesis posits an alternative set of selective conditions that can potentially address this concern. The hypothesis postulates that the linkage of two or more functionally related genes is favored because it increases the probability that genes will be co-transferred during horizontal gene transfer. The model posits [2] that operons generally consist of genes that can only together fulfill a given function. If the function is under weak selection or required for specific conditions, the genes in the operon can be lost easily by a combination of mutation pressure and genetic drift but can be regained by horizontal gene transfer at a later stage. Given that an upper limit on the length of donor DNA segment exists [3], the closer in proximity the genes are, the higher the possibility that they can be regained in one step. The theory therefore predicts that ‘genes for essential processes should not cluster’ [2]. At first sight the theory is not about operons but is about the clustering of genes. However, the majority of the data applied in support of the theory relates to operons [2] and, as is clear from the title of the hypothesis, the developers posit that the model is Corresponding author: Laurence D. Hurst (
[email protected]). www.sciencedirect.com
relevant to the evolution of operons. It predicts, they argue, that bacterial genomes should be ‘interspersed with novel, horizontally transferred operons providing peripheral metabolic functions’ [2]. (The authors define peripheral as meaning non-essential [2].) Here we report the results of a whole-genome analysis to find out if this hypothesis has the power to explain both the patterns of clustering and the patterns of gene content of operons in Escherichia coli. First, we ask whether nonessential genes are relatively enriched within operons compared with essential genes. This is an important issue because the authors of this hypothesis have stated that ‘by contrast [to the selfish operon model], the co-regulation model predicts that essential genes whose co-regulation is most critical are those most likely to be found in operons’ [2]. Given that it has already been established that essential genes cluster in the E. coli genome [4], we next ask whether the extent of gene clustering, in a given functional category, is more pronounced for non-essential genes than for essential genes. Testing the hypothesis The classification of E. coli K12 genes as either essential or nonessential was taken from a recent systematic study [4]. The list of essential genes was augmented with a collection of gene deletion studies that was retrieved from the profiling of E. coli chromosome (PEC) database (http:// www.shigen.nig.ac.jp/ecoli/pec/index.jsp). Information on the operons was retrieved from RegulonDB [5] (http:// www.cifn.unam.mx/Computational_Genomics/regulondb/), a database that was compiled from the literature on the regulation of transcription in E. coli K12. Consistent with the co-regulation model, essential genes tend to occur in operons compared with non-essential genes. From the 3445 genes with appropriate data on gene dispensability, 602 genes were designated as essential by at least one study. Approximately 28% of these are known to be in operons, whereas this figure is reduced to 23% among the non-essential genes (x2 ¼ 6.73, df ¼ 1, P ¼ 0.009). To allow for possible bias in the operon dataset, we repeated the analysis to include the operons that were predicted computationally. The trend remains the same (x2 ¼ 6.57, df ¼ 1, P ¼ 0.01), suggesting that essential genes have a slightly higher tendency to reside in operons. Although the results described here provide no obvious support for the selfish operon theory, one could argue that operon formation is only one possible but unnecessary outcome of the clustering offunctionally related genes. There is strong evidence for the physical clustering of functionally related genes that are unrelated to operons [6,7].
Update
TRENDS in Genetics Vol.20 No.6 June 2004
Relative increase in number of functionally related pairs
However, we found that the clustering of functionally related genes is particularly pronounced for essential genes. Using the functional classification of E. coli genes that was derived from the clusters of orthologous groups of proteins (COG) database [8] (http://www.ncbi.nlm.nih.gov/ COG/), we calculated the number of essential gene pairs located ‘y’ genes away from each other, and calculated the frequency of pairs that are in the same broad functional category. We repeated the same procedure for all nonessential gene pairs. We found that the relative increase in the number of essential gene pairs with related functions in a given physical distance is always higher than the relative increase in number of non-essential, functionally related pairs (Figure 1). Using different, more detailed functional classification [9] does not alter this finding (Mantel – Haenszel test: x2 ¼ 41.2, df ¼ 1, P ! 1027). We have good reason to believe that this result is independent of operon structure. First, the size of the clusters appears to be larger than the usual size of operons. More importantly, the trend remains even when gene pairs of the same operon are excluded from the analysis (Mantel – Haenszel test: x2 ¼ 81.03, df ¼ 1, P ! 1027). It has been noted previously that ribosomal proteins tend to be essential and cluster in bacterial genomes [2]. Importantly, the difference in the tendency for functional clustering is not a 16 14 12 10 8 6 4 2 0 −2 1
3 5 7 9 11 13 15 17 Physical distance measured in number of genes (y)
19
Essential pairs Non-essential pairs TRENDS in Genetics
Figure 1. The relative increase in the number of essential and non-essential gene pairs in the same functional category y genes away from each other along the chromosome. The relative increase is defined as (O 2 E)/E, where O and E are the observed and expected numbers of functionally related gene pairs, respectively. The expected numbers were derived from the average of 1000 sets with randomized gene order. Overlapping neighboring gene pairs, genes with unknown dispensability or function and tandem duplicates were excluded from the analysis. The Mantel –Haenszel procedure [17] was employed to calculate an overall probability for departures from equal frequency of gene pairs within the same functional category among essential and non-essential gene pairs across contingency tables from different physical distances. The frequencies of gene pairs within the same functional categories were compared in 2 £ 2 contingency tables. The Mantel –Haenszel test provides a summary chi-square test for the stratified data. Overall, essential gene pairs have a higher possibility to encode proteins within the same functional category than non-essential gene pairs (Mantel-Haenszel test x2 ¼ 133.1, df ¼ 1, P ! 107). www.sciencedirect.com
233
peculiarity of ribosomal genes. After excluding all of the gene pairs that were involved in translation, the difference in the clustering tendency between essential and nonessential genes remains (Mantel –Haenszel test: x2 ¼ 12.3, df ¼ 1, P , 0.0005). Unfortunately, it is difficult to investigate the relative contribution of the different functional categories to the patterns observed because the observed number of essential gene pairs will be too low to be statistically meaningful (data not shown). The analyses described here can not be considered definitive. One might question, for example, whether we can really define operons. Similarly, one might conjecture that some essential genes were not always essential but might have been transferred horizontally and only subsequently became essential. However, unless these problems are substantial, the analyses presented here strongly suggest that the selfish operon theory does not, for the most part, explain the evolution of operon structure and gene clustering on a larger scale in the E. coli genome. Our results are further supported by the finding that, when more than one horizontally transferred gene is found in a given operon, they are often the result of independent transfer events [10]. Alternative hypotheses Nonetheless, there remains the issue of how the clustering of genes originates, how the clusters evolve gradually and why operons are most prevalent in bacteria. One established idea is that chromosome organization (and possibly chromatin formation) has an important role in the timing of gene expression and gene dosage [11]. Another possibility relates to the peculiarity of transcription and translation in prokaryotes. In this group, the translational process often occurs while the 30 end of the mRNA is still being synthesized. This means that the protein product is manufactured in the vicinity of the gene. We have shown previously that an imbalance in the concentrations of proteins involved in complexes has a major effect on fitness in yeast [12]. One might then imagine that the genes for such proteins might be under selection to be linked, so as to ensure the minimal time in which the proteins are not bound together in the complex [13]. Such selection would provide a gradual advantage to linkage and explain why operons are much more prevalent in prokaryotes than in eukaryotes. This model makes two predictions: (i) genes in which protein products form complexes should be more tightly linked than expected by the null hypothesis – there is some evidence that this is the case [14]; and (ii) if such a process is to account for operon formation, then such genes should also be more prevalent in operons than expected by chance. We currently have no prokaryotic species in which there is a large quantity of protein complex data and experimentally resolved operon structures available. However, in Helicobacter pylori there exists a large body of yeast two-hybrid protein-interaction data [15] in addition to operon structures that were computationally predicted [16]. We found that the genes that encode interacting proteins reside next to each other in the genome more often than expected by chance (P , 0.001). Furthermore, of the 22 pairs of such genes, 18 of them are contained in the
234
Update
TRENDS in Genetics Vol.20 No.6 June 2004
same putative operon. These data are certainly suggestive but the definitive test will require more reliable sources of protein-interaction data (the yeast two-hybrid method has a high false-positive rate) and experimentally confirmed operons. Acknowledgements We thank Andrea Navratil and Bala´zs Papp for discussions, and an anonymous referee for the helpful suggestions.
References 1 Lawrence, J. (1999) Selfish operons: the evolutionary impact of gene clustering in prokaryotes and eukaryotes. Curr. Opin. Genet. Dev. 9, 642 – 648 2 Lawrence, J.G. and Roth, J.R. (1996) Selfish operons – horizontal transfer may drive the evolution of gene clusters. Genetics 143, 1843 – 1860 3 Syvanen, M. and Kado, C.I. (1998) Horizontal Gene Transfer, Kluwer Academic Publisher 4 Gerdes, S.Y. et al. (2003) Experimental determination and system level analysis of essential genes in Escherichia coli MG1655. J. Bacteriol. 185, 5673 – 5684 5 Salgado, H. et al. (2001) RegulonDB (version 3.2): transcriptional regulation and operon organization in Escherichia coli K-12. Nucleic Acids Res. 29, 72 – 74 6 Rogozin, I.B. et al. (2002) Connected gene neighborhoods in prokaryotic genomes. Nucleic Acids Res. 30, 2212– 2223
7 Lathe, W.C. et al. (2000) Gene context conservation of a higher order than operons. Trends Biochem. Sci. 25, 474 – 479 8 Tatusov, R.L. et al. (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41 9 Serres, M.H. et al. (2004) GenProtEC: an updated and improved analysis of functions of Escherichia coli K-12 proteins. Nucleic Acids Res. 32, D300– D302 10 Omelchenko, M.V. et al. (2003) Evolution of mosaic operons by horizontal gene transfer and gene displacement in situ. Genome Biol. 4, R55 11 Ussery, D. et al. (2001) Genome organisation and chromatin structure in Escherichia coli. Biochimie 83, 201– 212 12 Papp, B. et al. (2003) Dosage sensitivity and the evolution of gene families in yeast. Nature 424, 194– 197 13 Shapiro, L. and Losick, R. (1997) Protein localization and cell fate in bacteria. Science 276, 712 – 718 14 Dandekar, T. et al. (1998) Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem. Sci. 23, 324– 328 15 Rain, J.C. et al. (2001) The protein – protein interaction map of Helicobacter pylori. Nature 409, 211 – 215 16 Moreno-Hagelsieb, G. and Collado-Vides, J. (2002) A powerful nonhomology method for the prediction of operons in prokaryotes. Bioinformatics 18 (Suppl. 1), S329 – S336 17 Sokal, R. and Rohlf, M. (1995) Biometry, 3rd edn, Freeman, San Francisco 0168-9525/$ - see front matter q 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.tig.2004.04.001
Articles of interest in Trends and Current Opinion journals High-throughput phenomics: experimental methods for mapping fluxomes Uwe Sauer Current Opinion in Biotechnology 15, 58–63 Links between DNA replication and recombination in prokaryotes Peter McGlynn Current Opinion in Genetics and Development 14, 107–112 Chromosome segregation and genomic stability Viji M. Draviam, Stephanie Xie and Peter K. Sorger Current Opinion in Genetics and Development 14, 120–125 Disguising adult neural stem cells Cindi M. Morshead and Derek van der Kooy Current Opinion in Neurobiology 14, 125–131 Embryonic stem cells: potential for more impact Jennifer H. Elisseeff Trends in Biotechnology 22, 155–156 Searching for genetic influences on normal cognitive ageing Ian J. Deary, Alan F. Wright, Sarah E. Harris, Lawrence J. Whalley and John M. Starr Trends in Cognitive Sciences 8, 178–184 Hereditary neutropenia: dogs explain human neutrophil elastase mutations Marshall Horwitz, Kathleen F. Benson, Zhijun Duan, Feng-Qian Li and Richard E. Person Trends in Molecular Medicine 10, 163–170 p53: 25 years after its discovery Lorne J. Hofseth, S. Perwez Hussain and Curtis C. Harris Trends in Pharmacological Sciences 25, 177–181 www.sciencedirect.com