Conserved patterns in bacterial genomes: A conundrum physically tailored by evolutionary tinkering

Conserved patterns in bacterial genomes: A conundrum physically tailored by evolutionary tinkering

G Model CBAC-6354; No. of Pages 9 ARTICLE IN PRESS Computational Biology and Chemistry xxx (2014) xxx–xxx Contents lists available at ScienceDirect ...

2MB Sizes 0 Downloads 77 Views

G Model CBAC-6354; No. of Pages 9

ARTICLE IN PRESS Computational Biology and Chemistry xxx (2014) xxx–xxx

Contents lists available at ScienceDirect

Computational Biology and Chemistry journal homepage: www.elsevier.com/locate/compbiolchem

Research Article

Conserved patterns in bacterial genomes: A conundrum physically tailored by evolutionary tinkering Ivan Junier a,b,∗ a b

Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003 Barcelona, Spain Universitat Pompeu Fabra (UPF), Barcelona, Spain

a r t i c l e

a b s t r a c t

i n f o

Article history: Accepted 11 July 2014 Available online xxx Keywords: Organization of bacterial genomes Evolutionary genomics Co-evolution of amino acids Gene clustering Chromosome structuring Integrated functioning

The proper functioning of bacteria is encoded in their genome at multiple levels or scales, each of which is constrained by specific physical forces. At the smallest spatial scales, interatomic forces dictate the folding and function of proteins and nucleic acids. On longer length scales, stochastic forces emerging from the thermal jiggling of proteins and RNAs impose strong constraints on the organization of genes along chromosomes, more particularly in the context of the building of nucleoprotein complexes and the operational mode of regulatory agents. At the cellular level, transcription, replication and cell division activities generate forces that act on both the internal structure and cellular location of chromosomes. The overall result is a complex multi-scale organization of genomes that reflects the evolutionary tinkering of bacteria. The goal of this review is to highlight avenues for deciphering this complexity by focusing on patterns that are conserved among evolutionarily distant bacteria. To this end, I discuss three different organizational scales: the protein structures, the chromosomal organization of genes and the global structure of chromosomes. © 2014 Elsevier Ltd. All rights reserved.

“What distinguishes a butterfly from a lion, a hen from a fly, or a worm from a whale is much less a difference in chemical constituents than in the organization and the distribution of these constituents” Franc¸ois Jacob, Evolution and tinkering, 1977 “there cannot be any general law of evolution that accounts for increasing complexity at all levels.[...] The rules of the game differ at each level” Ibid. 1. Introduction The breadth of bacterial evolution can be appreciated from simply considering diversity in genome size, which ranges across species from 500 to more than 10,000 genes (Lynch, 2007; Abby and Daubin, 2007; Koonin and Wolf, 2008; Rocha, 2008). It can also be observed in an individual bacterium by the diversity of genetic functions encoded. This diversity reflects both the random nature of mutations and genome modifications, as well as the degeneracy

∗ Correspondence to: Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003 Barcelona, Spain. E-mail address: [email protected]

of possible evolutionary solutions. In contrast, conserved patterns reflect the irreversibility and convergence properties of evolution. These patterns can be seen both at the level of genomic content (conserved orthologous genes) and in the genomic organization of chromosomes, e.g. in groups of genes that have remained proximal along the chromosomes throughout billions years of evolution (Lathe et al., 2000). The tension between pattern randomization and pattern conservation occurs at all physical scales of genome organization, from the internal structure of proteins to the cellular structure of chromosomes. In this regard, the conservation of a protein domain, gene, a group of genes, or global chromosomal structure, may result from different evolutionary constraints on fitness, meaning that conserved patterns may reflect very distinct aspects of evolution (Jacob, 1977). Added to this the absence of a one-to-one relationship between the scale of a pattern and the scale of the structure it can affect (Fig. 2), reductionism has remained a particularly challenging issue in biology. In this review, I thus aim at highlighting avenues for deciphering the complex organization of genomes related to the integrated functioning of bacteria (Fig. 1). To this end, I discuss physical mechanisms that occur at different scales of cellular organization and provide insight into the evolutionary pressure associated with these mechanisms. Specifically, I present physical aspects of three types of conserved patterns. In the first section, I discuss the physics

http://dx.doi.org/10.1016/j.compbiolchem.2014.08.017 1476-9271/© 2014 Elsevier Ltd. All rights reserved.

Please cite this article in press as: Junier, I., Conserved patterns in bacterial genomes: A conundrum physically tailored by evolutionary tinkering. Comput. Biol. Chem. (2014), http://dx.doi.org/10.1016/j.compbiolchem.2014.08.017

G Model CBAC-6354; No. of Pages 9 2

ARTICLE IN PRESS I. Junier / Computational Biology and Chemistry xxx (2014) xxx–xxx

characterized by an important heterogeneity, both for the average mutation rates between genes and for the mutation rate of amino acids within proteins. Understanding the origin of these heterogeneities requires consideration of the physical principles of protein folding, and perhaps less obviously, consideration of both the constraints imposed by gene and protein interactions (Pazos and Valencia, 2008) as well as the mechanisms by which the physicochemical properties of environments are encoded in genomes (Denamur and Matic, 2006). In this context, I first review a few aspects of gene evolution, focusing in particular on the relationship that exists between the evolution rate of a gene and its adaptive capacity. Following this line, I next present the relationships that have been highlighted between the structural organization of proteins and their adaptive capabilities. From a physical point of view, I discuss the possibility of deciphering the structural basis of protein functions by investigating the co-evolution of amino acids. I also discuss the implication of local concentration effects in the co-evolution of interacting proteins. 2.1. Heterogeneities of evolutionary rates: in search of causal relationships

Fig. 1. Schematic representation of the multi-scale organization of conserved patterns in bacterial genomes. At the largest scale, the chromosomes of E. coli and B. subtilis are organized into macrodomains inside which genes make frequent contacts in space (Niki et al., 2000; Valens et al., 2004). Remarkably, these phylogenetically distant bacteria display similar macrodomains around the origin of replication (Ori, in green) and around the terminus of replication (Ter, in blue) – see Fig. 5 for a precise location of Ori and Ter in E. coli. Highly expressed genes are usually found close to the origin (Abby and Daubin, 2007; Rocha, 2008), whereas stationary phase responsive genes have a tendency, in Enterobacteria, to locate close to the terminus (Sobetzko et al., 2012). At a smaller scale, operons are fundamental transcriptional units containing several related genes that are transcribed into a single mRNA. Operons can also contain genes that have different functions. A striking example concerns the primase and the sigma factor  70 that are found in the same operon across most bacteria. Remarkably, genes and operons have been shown to remained clustered together across various bacteria (Lathe et al., 2000). This is all the more remarkable that bacterial genomes are highly dynamic, with a neutral rate of 10−2 to 10−4 recombination events per genome per generation (Rocha, 2008). In this context, units of synteny, called syntons, can be defined for various degree of conservation (Junier and Rivoire, 2013) and may harbor complex regulatory relationships (Fischbach and Voigt, 2010). At small scales, genes are not the smallest functional units of genomes (besides non-coding RNAs). Indeed, single-domain proteins can be further partitioned into “sectors” on the basis of the co-evolution of their amino acids (indicated in blue, red, and green on the left protein). Sectors are networks of physically connected amino acids and correspond to independent functional features of proteins (Halabi et al., 2009). From an evolutionary perspective, they transcend the notion of protein domains, as shown for the two-domain Hsp70 molecular chaperone (Smock et al., 2010) – on the right protein, the sector in green extends through the two domains of Hsp70. Finally, small fundamental groups of co-evolving amino, named “sectons” (Rivoire, 2013), have been defined on the basis of the direct coupling between amino acids (Weigt et al., 2009). These refine sectors by identifying conserved sets of “indivisible” residues. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

of protein structuring by focusing on the co-evolution of amino acids, both within and between genes. In the second section, I present several physical and fitness constraints associated with the tendency of genes to cluster in specific groups along chromosomes. In the last section, I describe the large-scale structuring of chromosomes in the context of the coordination of DNA replication, gene expression and cell division. 2. The structural basis of protein function: evolution of amino acid networks Recent work shows that structural and functional properties of proteins can be inferred from the evolution of their sequences (Marks et al., 2012). At the most basic level, this evolution is

The evolution rate of individual genes, i.e. the average rate of amino acid substitution, is negatively correlated with their expression level and, quite remarkably, barely to their essentiality (Rocha and Danchin, 2004; Charlesworth and Eyre-Walker, 2006) – see (Pál et al., 2006) for yeast. Correlations between evolutionary rates are thus expected for genes that are co-expressed, such as operons (Tenaillon et al., 2012), and for genes that belong to a common biological pathway, as has been shown explicitly for yeast species (Clark et al., 2012). In this regard, the high level of correlation between gene expression, genome organization, and biological pathways (Bork et al., 1998) hinders our ability to precisely determine the dominant constraints driving this heterogeneity. A particularly informative example concerns amino acid substitutions that stem from physical collisions of the transcription and replication machineries (Mirkin and Mirkin, 2005). These collisions are more frequent for lagging-strand genes due to the head-on configuration of the machineries (Rocha and Danchin, 2003). In this context, Paul et al. (2013) have recently shown, in Bacillus subtilis, that for genes with an equal rate of neutral mutation (as measured by the rate of synonymous mutation), amino acid substitutions in core genes, i.e. in ∼800 genes that are conserved in very divergent strains of B. subtilis (Paul et al., 2013), are more frequent along the lagging strand. In parallel, they have shown that the few core genes on the lagging strand are mostly stress-responsive. As proposed by the authors, this suggests that B. subtilis exploits the high mutation rate of the lagging strand in order to generate, at the population level, a broad spectrum of responses to environmental changes. In light of these results, it is tempting to hypothesize that a main factor of mutation rate heterogeneity between genes is the extent to which genes participate in adaptation. Although it may not capture the whole story, this hypothesis has the advantage of being experimentally testable. For instance, it could be done by studying the adaptive trajectories of individual genes within a population during the shift to adaptive conditions (see e.g. the morbidostat experiments described by Toprak et al. which quantitatively follow the evolutionary trajectories of genes within a population of bacteria as it develops drug resistance, Toprak et al., 2013). Note also that the routes of adaptation may be as diverse as the perturbation possibilities (Touchon et al., 2009). An interesting case related to that matter concerns the mutational response to antibiotics that deviate metabolic fluxes toward the production of protective pathways (Lee et al., 2010). Indeed, it has been shown that the response in Escherichia coli strongly depends on the operational mode of the antibiotics (Toprak et al., 2012). Specifically, ribosome-targeting

Please cite this article in press as: Junier, I., Conserved patterns in bacterial genomes: A conundrum physically tailored by evolutionary tinkering. Comput. Biol. Chem. (2014), http://dx.doi.org/10.1016/j.compbiolchem.2014.08.017

G Model CBAC-6354; No. of Pages 9

ARTICLE IN PRESS I. Junier / Computational Biology and Chemistry xxx (2014) xxx–xxx

3

Fig. 2. Schematic representation of the hierarchy of phenomena that are studied in physics (left) versus the complex relationships that exist between various length and time scales in biology (right). The grey areas indicate the influence domain between entities as a function of their size. As an example, in physics radiation is studied at the atomic scale. In biology, the effect of a base-pair mutation due to radiation may have strong effects from the gene level to the population level.

antibiotics were shown to induce an indirect response with mutations affecting genes mainly involved in translation, transcription and transport. In contrast, metabolic-targeting antibiotics were shown to induce a quasi-deterministic response confined to the targeted enzyme.

2.2. Co-evolution of amino acids: structural organization and adaptive capacities of proteins In addition to heterogeneity in mutational rates across genes, the rate of amino acid substitution varies substantially among residue positions within individual genes (an active site is much less likely to mutate than a site belonging to a non-functional region). In this context, we must keep in mind that several substitutions may be able to compensate for, or suppress, a detrimental mutation. These compensations most often occur through a direct physical contact with the mutated residue (Göbel et al., 1994). More generally, it has been shown that co-evolving residues form physically connected networks within the protein tertiary structure, which are associated with allosteric communication and long-range functional coupling (Lockless and Ranganathan, 1999; Süel et al., 2003; Reynolds et al., 2011). For these reasons, amino acid co-evolution has been used as a proxy for functional interaction between residues, and to dissect the functional organization of proteins. This has been demonstrated in a series of seminal works where Ranganathan and co-workers have combined accurate statistical analysis of co-evolution with molecular biology experiments (see Reynolds et al., 2013 and references therein). The analytical basis of these works relies on an appropriate spectral decomposition of the co-variance matrix of amino acid substitutions. Using methods akin to principal component analysis (Reynolds et al., 2013; Rivoire, 2013), the co-variance matrix can be decomposed into statistically independent components, which are now termed “sectors” (Halabi et al., 2009). A sector thus corresponds to a large set of co-evolving residues that form connected structures in space and, remarkably, encode specific aspects of the functioning of proteins (Halabi et al., 2009). For instance, three main sectors associated with distinct biochemical functions (catalytic power, substrate specificity, and thermodynamic stability) were identified for the trypsin family (a class of enzymes that catalyze the proteolytic cleavage of the other proteins) (Halabi et al., 2009). In other words, a bona fide functional decomposition of protein structures can be proposed on the basis of the conservation properties of the evolution of amino acids.

The intra-gene heterogeneity of mutation rates is thus expected to reflect the observed heterogeneity of functional constraints in proteins (e.g. into sectors). Along this line, McLaughlin et al. tested the idea that mutations at sector positions have a greater functional effect than those at non-sector positions by carrying out a comprehensive mutagenesis study in the PDZ domain (a structural domain found in the signaling proteins of most living organisms) (McLaughlin et al., 2012). They showed that sites within sectors were less tolerant to substitutions than non-sector positions. More strikingly, they showed that adaptation to a new functional challenge (or environmental pressure) was always initiated by a mutation inside the sector. In the same spirit, it was shown that it is sufficient to mutate a subset of co-evolving amino acids to completely switch the substrate specificity of a histidine kinase (Skerker et al., 2008). Altogether, this shows that adaptation at the gene level is shaped by the structural organization of proteins. From an engineering point of view, it suggests that sequence evolution can be used to design de novo synthetic proteins (Socolich et al., 2005; Reynolds et al., 2013).

2.3. Co-evolution of amino acids between proteins: the importance of enhanced local concentration Intra-gene heterogeneity of mutation rates does not reflect solely the internal properties of a protein. It can also reflect the relationship of the protein with its functional partners (Pazos and Valencia, 2008). This can be seen from the co-evolutionary signal between interface residues of physically interacting proteins (Kass and Horovitz, 2002; Hatley et al., 2003; Skerker et al., 2008). In order to understand the evolution of these interfaces, more particularly the conservation of their specificity, it is useful to consider compensatory mutations in the context of protein co-localization and increased effective concentration, as proposed by Kuriyan and Eisenberg (Kuriyan and Eisenberg, 2007). Protein co-localization can occur through cellular compartmentalization, the binding of proteins to DNA (Dröge and Müller-Hill, 2001), or the coordinated expression of operonic genes. This elevated local concentration of proteins can result in an appreciable fraction of bound complex even at low binding affinities, which can then facilitate the evolution of protein interfaces through stepwise mutations (Fig. 3). An extreme situation concerns the fusion of two proteins, a frequent event in many protein families (Kuriyan and Eisenberg, 2007). In this case, the resulting local concentration of the (hetero-)dimer is necessarily high. As a consequence, the dimer undergoes a strong

Please cite this article in press as: Junier, I., Conserved patterns in bacterial genomes: A conundrum physically tailored by evolutionary tinkering. Comput. Biol. Chem. (2014), http://dx.doi.org/10.1016/j.compbiolchem.2014.08.017

G Model CBAC-6354; No. of Pages 9

ARTICLE IN PRESS I. Junier / Computational Biology and Chemistry xxx (2014) xxx–xxx

4

Fig. 3. Schematic representation of the effect of an elevated local concentration on the ability of interacting proteins to co-evolve (inspired from (Kuriyan and Eisenberg, 2007)). Left panel: the fusion of two genes strongly enhances the coevolution of amino acids at the interface because any mutation, on one protein, can strongly affect the fitness of the bacterium and can also be compensated by a mutation in the other protein. Right panel: an elevated concentration of proteins can result from the expression of an operon that encodes a macro-complex. In particular, the low diffusion properties of protein complexes in the crowded cytoplasm (Klumpp et al., 2013; Parry et al., 2014) are expected to enhance the formation of larger macro-complexes if those are initially co-localized in space (Morelli et al., 2011).

selective pressure at almost any level of the concentration of the fused gene, even though the two proteins may have not been interacting before. The fused genes are therefore well-poised to tinker with new functions or refine existing interactions through multiple compensatory mutations (Fig. 3). This local concentration effect may thus be viewed as an “evolutionary catalyst” for protein-protein interactions, and may even be critical for the proper function and adaptive capacity of cellular systems (see below the discussion about gene clustering). An interesting example concerns the classic bacterial two-component signaling system, characterized by sensor histidine kinase (HK) and response regulator (RR) domains that interact in trans to transduce environmental signals into gene activation (Laub and Goulian, 2007). Within a genome, multiple pairs of paralogous HK-RR genes occur, and each cognate pair is consistently found within the same operon and shows a strong co-evolutionary pattern at the amino acid level (Skerker et al., 2008). 3. The organization of genes along the chromosome: optimizing the coordination of gene expression Just as residues of a protein behave collectively, genes are not randomly organized along bacterial chromosomes. Genomes are indeed characterized by a strong functional organization (Rocha, 2008; Koonin and Wolf, 2008) – see (Hurst et al., 2004; Michalak, 2008) for the case of eukaryotes; see also (Dios et al., 2014) in this volume for a discussion about the clustering of DNA elements in the human genome. In particular, genes tend to form localized clusters that are conserved across many bacteria and that often extend operons (Lathe et al., 2000; Tamames, 2001; Audit and Ouzounis, 2003; Rocha, 2006; Fang et al., 2008; Junier et al., 2012; Junier and Rivoire, 2013). In this section, I thus first review and discuss the constraints that have been proposed for the existence of these operons. By going beyond the operon level, I then discuss more generally the problem of gene clustering. 3.1. The multiple facets of operons Operons are the canonical example of gene clustering in bacteria. They are ubiquitous structures containing several genes transcribed as a single mRNA (Jacob et al., 1960) (Fig. 1). Operons are used primarily to facilitate the co-expression of genes (Pál and

Hurst, 2004; Price et al., 2005; Rocha, 2008; Yin et al., 2010), and their block structure can facilitate their spreading among bacteria via horizontal transfer (Lawrence and Roth, 1996; Lawrence, 2003; Ballouz et al., 2010), particularly in the case of autonomous pathways (Fischbach et al., 2008). Why would a bacterium convert a cluster of transcriptionally independent genes into an operon? (Price et al., 2006) Several factors come into play. First, interacting proteins more efficiently co-fold when they are translated from the same transcript (Fribourg et al., 2001), which may explain the finding that large macrocomplexes (e.g. ribosome, flagellum) are always clustered in one or several operons. Moreover, proteins often remain located close to their translation site (Montero Llopis et al., 2010; Nevo-Dinur et al., 2011), i.e. close to their mRNA. Operons can thus ensure a high local concentration of proteins. In that sense, operons may be consider as “evolutionary catalysts” for protein-protein interactions (see above) (Fig. 3). Proteins within the same operon do not always belong to the same physical complex. Operons have thus also been proposed as devices to reduce the information cost due to separate gene regulation (Price et al., 2006), and to coordinate expression in a cell-cycle dependent fashion (Zaslaver et al., 2006; Kovács et al., 2009), especially in the case of low-abundance regulatory proteins (Bartl et al., 2013). The importance of temporal control of protein abundance has also been invoked to explain the correspondence in certain operons, between the order of genes and the order of enzymatic activities in the metabolic pathway (Kovács et al., 2009) – this correspondence could also reflect some historical contingency due to the continuous building of pathways (Fischbach et al., 2008). In all cases, operons present an efficient mechanism to ensure the proper relative proportion of proteins (Lim et al., 2011). Due to the highly integrated nature of cellular systems, it is therefore not surprising that genes encoding apparently different aspects of cell functioning are sometimes found within the same operon (Price et al., 2006; Rocha, 2008; de Lorenzo and Danchin, 2008) (Fig. 4). A striking example concerns the genes that encode the initiation of replication (DNA primase) and the initiation of transcription ( 70 ), which are found in the same operon across most bacteria (Bryant et al., 2010). Even more strikingly, in Gram-negative bacteria, these genes are found together with a gene that is related to the initiation of translation (Versalovic et al., 1993). This suggests that the transcription, translation and replication processes are linked by the relative amount of their initiatory proteins. In this regard, let us note that the expression level of constitutive genes depends directly on the abundance of RNA polymerases and ribosomes (Klumpp et al., 2009), and that variation in both quantities is the main source of gene expression variation between cells (Taniguchi et al., 2010). Maintaining a proper ratio between these macro-molecules is therefore crucial to the proper cell functioning. The role of operons as “protein ratio keepers” is also reflected in the internal structure of operons. For instance, the two-component signal transduction operons have a highly conserved structure, where the end codon of the first gene (sensor protein) overlaps the start codon of the second gene (response regulator) (Laub and Goulian, 2007). This arrangement is thought to be responsible for maintaining the ∼1 :35 ratio between the two proteins, a ratio likely maintained to prevent undesired cross-talk between the numerous two-component systems (Siryaporn and Goulian, 2008). Accordingly, overlapping genes are primarily found in operons and their patterns are strongly conserved between phylogenetically distant bacteria (Johnson and Chisholm, 2004). More strikingly, operons can also have internal regulatory elements (e.g. alternative transcription start sites (Cho et al., 2009; Güell et al., 2009)) that likely override the intrinsic ratio maintained by basal functioning. This allows these operons to be differentially expressed, as observed recently in Mycoplasma pneumoniae (Güell et al., 2009).

Please cite this article in press as: Junier, I., Conserved patterns in bacterial genomes: A conundrum physically tailored by evolutionary tinkering. Comput. Biol. Chem. (2014), http://dx.doi.org/10.1016/j.compbiolchem.2014.08.017

G Model CBAC-6354; No. of Pages 9

ARTICLE IN PRESS I. Junier / Computational Biology and Chemistry xxx (2014) xxx–xxx

5

Fig. 4. Example in E. coli of a cluster of 15 genes (partitioned into 4 operons) that remain co-localized in many different bacteria. About half these genes encode the fundamental units of the flagellum machinery. The other half encodes chemotaxis, a different aspect of bacterial motility. The rightmost operon (insAB) encodes a transposase, suggesting that this cluster can be transferred horizontally between bacteria (Zarei et al., 2013). Several scenarios can explain the need for these genes and operons to remain cluster along DNA (see main text). In this regard, an interesting proposal concerns the possible existence of an “hyper-structure” (Norris et al., 1999) that would contain the whole set of proteins and that would coordinate the motility process of bacteria (Cabin-Flaman et al., 2005).

3.2. Gene clustering: co-regulation and co-expression in a crowded environment While the organization of operons can be quickly shuffled by several mechanisms (chromosomal rearrangement, gene loss, and gene transpositions, including horizontal transfer), the conservation of gene clusters extends beyond the scale of operons (Lathe et al., 2000; Tamames, 2001; Audit and Ouzounis, 2003; Rocha, 2006; Fang et al., 2008; Junier et al., 2012; Junier and Rivoire, 2013). In other words, operon formation is not necessary to maintain gene co-localization on the chromosome. For example, genes that are regulated by the same transcription factor (TF) often cluster together with the gene that encodes the TF (Képès, 2004; Kolesov et al., 2007). Gene clustering is thus believed to facilitate coregulation, which can be seen from the tendency of clustered genes to be co-transcribed even though they do not belong to the same operon (Jeong et al., 2004; Carpentier et al., 2005). This has important consequences for synthetic biology (Képès et al., 2012), more particularly for the optimization of complex functions (Fischbach and Voigt, 2010; Temme et al., 2012). From a mechanistic point of view, clustering has been proposed to enhance the speed at which TFs find their targets (Kolesov et al., 2007; Pulkkinen and Metzler, 2013). While recent experimental studies have challenged this effect in vivo (Block et al., 2012; Liang et al., 2013), it has been shown that TFs localize in space (Taniguchi et al., 2010). More strikingly, recent studies revealed that not only does the distance between a TF and its regulated gene impact the efficiency of regulation, but that the position of the regulated gene itself leads to different patterns in TF spatial distribution (Kuhlman and Cox, 2012). These findings suggest a multi-factorial relationship between chromosomal location and the cellular implementation of regulatory networks, which might explain why the effects of the chromosomal positioning of genes on their ability to be regulated is not always obvious (Liang et al., 2013). Not all genes and operons are regulated by a TF – only half the genes in E. coli are known to be regulated by a dedicated TF (Gama-Castro et al., 2011). Yet, genes without TF regulation tend to remain equally clustered (Junier and Rivoire, 2013), indicating the existence of other advantages to co-localization besides TF regulation. An interesting possibility concerns the presence of “chromatin states” that would control the activity of genes beyond operon structure. While observation of such states has remained elusive in bacteria, clustering properties in yeast, whose chromosomes are also very dense in genes, have been shown to relate to chromatin remodeling (Batada et al., 2007; Batada and Hurst, 2007). Moreover, in E. coli dense clusters of genes were shown to correlate to several structural properties of the chromosomes (Zarei et al., 2013), in particular to extended domains of proteins that are enriched in nucleoid associated proteins (Vora et al., 2009), i.e. in proteins that both can act as transcription factors and as chromosome modelers (Dillon and Dorman, 2010). Such a scenario would corroborate, at least in E. coli, the control of gene expression by DNA supercoiling (Travers and Muskhelishvili, 2005; Blot et al., 2006) through independent supercoiled domains (Postow et al., 2004). Note, also, that dense clustering of evolutionary correlated genes has been shown to be limited to about 20 genes in Enterobacteria (Junier et al., 2012), a size reminiscent of supercoiled domains.

Finally, let us mention that operons that are mainly composed of constitutive genes, i.e. of genes that are expressed continuously, are not generally regulated by specific transcription factors. Therefore, the chromosomal clustering of genes encoding fundamental macro-complexes (Junier and Rivoire, 2013), such as the translation/transcription machinery, the ATP-producing respiratory complex, the cell division complex and the cell envelope biogenesis machinery might truly represent some chromatin state whose structural properties reflect the metabolic state of the cell (Travers and Muskhelishvili, 2005). These clusters might in turn favor local concentration effects (see above), which would strongly enhance interactions between small macro-complexes whose diffusion properties are strongly affected by the growth rate of the cell due to the crowding of cytoplasm (Morelli et al., 2011; Klumpp et al., 2013; Parry et al., 2014) (Fig. 3). Some of these clusters, such as the flagellum, the respiratory complex and the cell division complex, might also coordinate the building of the complexes across the membrane. The precise positioning of the chromosomal loci close to the membrane, as it has been observed for membrane proteins (Libby et al., 2012), would then favor the coupling between transcription, translation and membrane insertion of proteins (Woldringh et al., 1995; Norris et al., 1996). 4. Large-scale organization of chromosomes: coordinating major processes In light of the fact that genome organization diverges more quickly than gene sequence, it is intriguing that we observe, across phylogenetically distant bacteria, conserved structural patterns for the global structuring of chromosomes. In this section, I thus first review general properties of chromosomal structural organization in well-studied bacteria. Then, I discuss possible roles for these long-range structural features. 4.1. Dynamic yet highly structured chromosomes Bacterial chromosomes whose length is comparable to that of E. coli or B. subtilis (4-5 Mbps long) are organized into a compact nucleoid comprised of DNA and DNA-bound proteins. In both E. coli and B. subtilis, this nucleoid occupies the centre of the cell, while ribosomes are mostly found at the poles (Lewis et al., 2000; Bakshi et al., 2012) (Fig. 5). Its formation can be explained by repulsive interactions between DNA and cytoplasmic proteins such that, under the action of supercoiling and nucleoid-associated proteins, the condensed DNA is separated from the cytoplasm due to osmotic pressure exerted by jiggling proteins (Odijk, 1998; De Vries, 2010). By contrast, smaller chromosomes as that of M. pneumoniae (∼820 kbp) occupy the whole (smaller) cell (Waites and Talkington, 2004), probably due to the low density of proteins (Kühner et al., 2009). Chromosomes within the nucleoid are continuously shaped by transcription and replication activities (Benza et al., 2012). Together with the action of topoisomerases, this generates DNA supercoiling, which is overall generally negative (in vivo DNA is mostly underwound). This supercoiling causes DNA molecules to adopt branched and plectonemic structures (Boles et al., 1990) (Fig. 5), whose sizes

Please cite this article in press as: Junier, I., Conserved patterns in bacterial genomes: A conundrum physically tailored by evolutionary tinkering. Comput. Biol. Chem. (2014), http://dx.doi.org/10.1016/j.compbiolchem.2014.08.017

G Model CBAC-6354; No. of Pages 9 6

ARTICLE IN PRESS I. Junier / Computational Biology and Chemistry xxx (2014) xxx–xxx

Fig. 5. Impact of macrodomains on the segregation of bacterial chromosomes. (A) The E. coli chromosome can be divided into four macrodomains (Ori, Left, Ter and Right) plus two non-structured domains. (B) Middle: cellular organization of a slowly growing E. coli at the beginning of the cell cycle. The brown area indicates the most likely localization of ribosomes. The blue area indicates the nucleoid with the different disks showing the most likely localization of each macrodomain. Top: a typical conformation of supercoiled DNA bound by nucleoid associated proteins (spheres) and transcribed by an RNA polymerase (blue ellipse). Bottom: typical nucleoid organization obtained from a numerical simulation of the chromosome based on a thick “chromatin fiber” modeling, which provides a coarse-grained description of the supercoiled DNA bound by proteins (Wiggins et al., 2010; Junier et al., 2014). The modeling further includes the formation of macrodomains (globules in blue, red, green and black) and the specific targeting of the origin and terminus of replication (green and blue stars, respectively) (Junier et al., 2014). (C) Top: symmetric replication of the circular chromosome with respect to the origin-terminus axis. The origin (oriC) is indicated in green, the terminus in blue and the replisomes in yellow. Middle: typical cellular organization of a slowgrowing E. coli at the onset of cell division. Bottom: same as in (B) but considering two replicated chromosomes that are bound to each other at the terminus of replication. Altogether, this figure highlights the possibility of reproducing experimental patterns for the cellular disposition of the E. coli chromosome by considering only the formation of macrodomains and the targeting of the origin and terminus regions to specific positions (Junier et al., 2014). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

go from tens of kbp (Postow et al., 2004) to several hundreds of kbp (Le et al., 2013). The whole process is further cross-regulated by nucleoid-associated proteins. The overall result is a dynamic chromosome that is, nevertheless, highly structured (Thanbichler et al., 2005; Espeli et al., 2008; Hadizadeh Yazdi et al., 2012; Fisher et al., 2013; Le et al., 2013). 4.2. The importance of large-scale structuring of chromosomes for the segregation of replicated DNA The E. coli chromosome is partitioned into four macrodomains plus two non-structured domains (Valens et al., 2004). Two macrodomains, Ori and Ter, correspond to genomic regions surrounding the origin and terminus of replication, respectively (Fig. 5). Remarkably, two similar macrodomains have been determined in B. subtilis (Niki et al., 2000). More remarkably, binding sites of MatP, a protein that is responsible for the condensation of Ter (Dupaigne et al., 2012), are strongly conserved in Enterobacteria (Mercier et al., 2008; Dame et al., 2011; Dupaigne et al., 2012). Macrodomain organization has also been shown to correlate with the insertion of horizontally transferred genes (Zarei et al., 2013). Altogether, this suggests that the integrity of macrodomains is crucial for proper cell functioning. One possible role for this specific structuring of bacterial chromosomes is to ensure proper chromosomal organization during DNA replication (Junier et al., 2014). Indeed, faithful transmission of genetic information to daughter cells at each generation requires a timely and spatially controlled segregation process that specifically shapes the nucleoid by positioning the chromosome in the cellular space (Possoz et al., 2012). While a spindle-like apparatus has been found in several bacteria (Fogel and Waldor, 2006; Ptacin et al., 2010; Minnen et al., 2011), it is absent from Enterobacteria, including E. coli. Different hypotheses have thus been proposed, with polymer physics playing an important role in the modeling process (Benza et al., 2012; Possoz et al., 2012). In particular, Jun and coworkers have shown that entropic forces could drive the process due to the tendency of replicated chromosomes to repulse each other, providing that these chromosomes are sufficiently structured (Jun and Mulder, 2006; Jun and Wright, 2010). In this context, we have recently shown that the folding of the

chromosome into macrodomains leads to good segregation properties of the E. coli chromosome (Junier et al., 2014). Specifically, the high mobility of the unstructured regions was shown to generate depletion forces on the condensed regions, which thus act as macroscopic objects. Comparison of in silico simulations with in vivo fluorescent imaging of chromosomes then shows that in the presence of macrodomains, targeting the origin and terminus regions to specific cellular positions is sufficient to generate a segregation pattern indistinguishable from experimentally observed patterns (Fig. 5). Other structural constraints, beyond macrodomain folding and the specific targeting of loci, are expected to be involved in the process of chromosome structure formation and segregation (Possoz et al., 2012). It is also important to note that physical properties of nucleoids may strongly differ between different bacteria (Rocha, 2008). A proper discussion on the evolutionary pressure acting on these mechanisms therefore requires structural data to compare with. In this regard, the possibility of generating contact maps of chromosomal loci in a high-throughput manner is very promising (De Wit and De Laat, 2012).

4.3. Conservation of long-range genomic patterns: coordinating the gene expression program with respect to growth rate One of the most conserved organizational features of genes along the chromosome is the tendency for highly expressed genes to remain close-by the origin of replication, with, in order of decreasing tendency: RNA polymerase, rRNA, ribosomal proteins and the most abundant tRNAs (Couturier and Rocha, 2006). This replication-bias has been related to replication-associated gene dosage (Abby and Daubin, 2007; Rocha, 2008), that is, to the effective number of genes present in a cell. This number is calculated as approximately 2n where n is the number of replication rounds that have “passed” the gene – rapidly growing bacteria can initiate several rounds of replication before having terminated the first round (Rocha, 2008). In particular, the localization of highly expressed genes close to the origin of replication is most prevalent in fast-growing bacteria, where up to eight replication forks may be present simultaneously (Couturier and Rocha, 2006).

Please cite this article in press as: Junier, I., Conserved patterns in bacterial genomes: A conundrum physically tailored by evolutionary tinkering. Comput. Biol. Chem. (2014), http://dx.doi.org/10.1016/j.compbiolchem.2014.08.017

G Model CBAC-6354; No. of Pages 9

ARTICLE IN PRESS I. Junier / Computational Biology and Chemistry xxx (2014) xxx–xxx

Gene dosage allows expression levels to increase even when the transcriptional rate is saturating. Remarkably, these dosage effects have been shown to constrain the positioning (close to the origin of replication) of genes that encode the translation and transcription processes only, and not of other highly expressed genes (Couturier and Rocha, 2006). Thus, just as transcription, translation and replication are strongly coupled by operon organization (Versalovic et al., 1993; Bryant et al., 2010) (see above), gene dosage might be a fundamental strategy for the bacteria to coordinate transcription, translation and replication at a given growth rate. Within this scope, it is tempting to re-analyze previous results about the respective positioning of evolutionary correlated genes in E. coli, i.e., of pairs of co-occurring genes that tend to remain proximal along chromosomes (Wright et al., 2007). Indeed, it has been shown that groups containing less than ∼20 genes are strongly clustered along the chromosome (Junier et al., 2012) and that over this size, groups are dispersed along the whole chromosome. Strikingly, some of the large groups were shown to adopt a very specific positioning; in particular, a group of 500 genes enriched in the process of macromolecular synthesis (including transcription, translation and replication) showed a periodic positioning in all bacterial phyla and a symmetrical disposition along the replichores of E. coli (Junier et al., 2012). Thus, while periodicity has been argued to facilitate the gathering of genes in space (Képès, 2004; Wright et al., 2007; Junier et al., 2010), these periodicities could also reflect the minimal separation of gene clusters in order to mitigate interference between the replication fork and the coordinated transcription of this large set of genes. The existence of symmetry in the properties of replichores has been reported several times (Abby and Daubin, 2007; Rocha, 2008) – see also Foster et al. (2013) for a recent study about the mutational landscape of E. coli across more than 15,000 generations. In this regard, a particularly interesting work by Sobetzko et al. (2012) has revealed both a symmetric trans-replichore organization of similar functions as well as a replichore organization of many genes corresponding to their temporal order during the shift from exponential to stationary phase. Remarkably, this order was shown to be strongly conserved across all phyla. It was also shown that stationary responsive genes have a tendency, in Enterobacteria, to locate within the Ter macrodomain whereas the genes particularly active during the exponential phase tend to spread symmetrically around the origin of replication, including Ori and the non-structured regions (Sobetzko et al., 2012). Altogether, these observations corroborate that the interplay between transcription, translation, replication and cell division is engraved in the layout of genomes.

5. Discussion and conclusion While, in theory, a plethora of genome sequences could be synthesized, natural selection has strongly constrained the outcomes. In this context, conserved patterns reflect the irreversibility and convergence aspects of evolution. Interestingly, conservation properties, from individual genes to the whole chromosome, often reflect the need to integrate or coordinate multiple cellular functions. A paradigmatic example concerns the strong coupling that exists between transcription, translation, replication and cell division. At the gene level, a defect in any one of the machineries induces multiple mutations in the others. At the genome level, the genes that initiate these processes are co-located within operons. At the chromosome level, related genes tend to be disposed symmetrically and periodically along the replichores, with a precise order that is conserved across all phyla. Finally, at the cellular level, this interplay is reflected by the physical structure of chromosomes.

7

From a causal point of view, both the emergence and the divergence of patterns is the result of physical forces that act at different levels of the organization of biological matter. At the smallest spatial scales, co-evolution patterns between amino acids reflect long-range dynamics and allostery within proteins. At larger scales, the cytoplasm is characterized by low diffusion properties. This property makes it advantageous to cluster along the chromosome both genes encoding physical complexes and genes that are coregulated in order to enhance the speed and the coordination of this regulation. Finally, at the largest scales, strong organizational forces that dictate the cellular disposition of chromosomes emerge from the specific structuring of chromosomes. Centrally, we see that many patterns reflect the adaptive properties of organisms. In this regard, we should keep in mind that genomic patterns provide information that goes beyond the concept of “organism”. In particular, just as horizontal transfer can be viewed as a meta-genomic gene duplication (Grassi et al., 2012), the evolution of genomic patterns should be considered within the scope of pan-genomes (Tettelin et al., 2008), i.e. of an ensemble of genes and genomes that transcend the notion of species (Lapierre and Gogarten, 2009).

Acknowledgements I am very grateful to Roderic Guigó for giving me the opportunity to express my point of view. I am also extremely grateful to Kim Reynolds both for her sound comments on the manuscript and for her tremendous edition work. I thank Rama Ranganathan, Kim Reynolds, Olivier Rivoire and Luis Serrano, who have been actively feeding my reflexion, as well as Luis Serrano’s group for illuminating discussions about the functioning of bacteria. I also thank the CRG in Barcelona, more particularly Miguel Beato, Roderic Guigó and Luis Serrano for both scientific and financial supports.

References Abby, S., Daubin, V., 2007. Comparative genomics and the evolution of prokaryotes. Trends Microbiol. 15 (3), 135–141. Audit, B., Ouzounis, C.A., 2003. From genes to genomes: universal scale-invariant properties of microbial chromosome organisation. J. Mol. Biol. 332 (3), 617–633. Bakshi, S., Siryaporn, A., Goulian, M., Weisshaar, J.C., 2012. Superresolution imaging of ribosomes and RNA polymerase in live Escherichia coli cells. Mol. Microbiol. 85 (1), 21–38. Ballouz, S., Francis, A.R., Lan, R., Tanaka, M.M., 2010. Conditions for the evolution of gene clusters in bacterial genomes. PLoS Comput. Biol. 6 (2), e1000672. Bartl, M., Kötzing, M., Schuster, S., Li, P., Kaleta, C., 2013. Dynamic optimization identifies optimal programmes for pathway regulation in prokaryotes. Nat. Commun. 4, 2243. Batada, N.N., Hurst, L.D., 2007. Evolution of chromosome organization driven by selection for reduced gene expression noise. Nat. Genet. 39 (8), 945–949. Batada, N.N., Urrutia, A.O., Hurst, L.D., 2007. Chromatin remodelling is a major source of coexpression of linked genes in yeast. Trends Genet. 23 (10), 480–484. Benza, V.G., Bassetti, B., Dorfman, K.D., Scolari, V.F., Bromek, K., Cicuta, P., Lagomarsino, M.C., 2012. Physical descriptions of the bacterial nucleoid at large scales and their biological implications Reports on progress in physics. Phys. Soc. (Great Britain) 75 (7), 076602. Block, D.H.S., Hussein, R., Liang, L.W., Lim, H.N., 2012. Regulatory consequences of gene translocation in bacteria. Nucleic Acids Res. 40 (18), 8979–8992. Blot, N., Mavathur, R., Geertz, M., Travers, A., Muskhelishvili, G., 2006. Homeostatic regulation of supercoiling sensitivity coordinates transcription of the bacterial genome. EMBO Rep. 7 (7), 710–715. Boles, T.C., White, J.H., Cozzarelli, N.R., 1990. Structure of plectonemically supercoiled DNA. J. Mol. Biol. 213 (4), 931–951. Bork, P., Dandekar, T., Diaz-Lazcoz, Y., Eisenhaber, F., Huynen, M., Yuan, Y., 1998. Predicting function: from genes to genomes and back. J. Mol. Biol. 283 (4), 707–725. Bryant, K.A., Kinkead, L.C., Larson, M.A., Hinrichs, S.H., Fey, P.D., 2010. Genetic analysis of the Staphylococcus epidermidis macromolecular synthesis operon: Serp1129 is an ATP binding protein and sigA transcription is regulated by both sigma(A)- and sigma(B)-dependent promoters. BMC Microbiol. 10, 8. Cabin-Flaman, A., Ripoll, C., Saier, M.H., Norris, V., 2005. Hypothesis: chemotaxis in Escherichia coli results from hyper-structure dynamics. J. Mol. Microbiol. Biotechnol. 10 (1), 1–14.

Please cite this article in press as: Junier, I., Conserved patterns in bacterial genomes: A conundrum physically tailored by evolutionary tinkering. Comput. Biol. Chem. (2014), http://dx.doi.org/10.1016/j.compbiolchem.2014.08.017

G Model CBAC-6354; No. of Pages 9 8

ARTICLE IN PRESS I. Junier / Computational Biology and Chemistry xxx (2014) xxx–xxx

Carpentier, A.-S., Torrésani, B., Grossmann, A., Hénaut, A., 2005. Decoding the nucleoid organisation of Bacillus subtilis and Escherichia coli through gene expression data. BMC Genomics 6, 84. Charlesworth, J., Eyre-Walker, A., 2006. The rate of adaptive evolution in enteric bacteria. Mol. Biol. Evol. 23 (7), 1348–1356. Cho, B.-K., Zengler, K., Qiu, Y., Park, Y.S., Knight, E.M., Barrett, C.L., Gao, Y., Palsson, B.Ø., 2009. The transcription unit architecture of the Escherichia coli genome. Nat. Biotechnol. 27 (11), 1043–1049. Clark, N.L., Alani, E., Aquadro, C.F., 2012. Evolutionary rate covariation reveals shared functionality and coexpression of genes. Genome Res. 22 (4), 714–720. Couturier, E., Rocha, E.P.C., 2006. Replication-associated gene dosage effects shape the genomes of fast-growing bacteria but only for transcription and translation genes. Mol. Microbiol. 59 (5), 1506–1518. Dame, R.T., Kalmykowa, O.J., Grainger, D.C., 2011. Chromosomal macrodomains and associated proteins: implications for DNA organization and replication in gram negative bacteria. PLoS Genet. 7 (6), e1002123. de Lorenzo, V., Danchin, A., 2008. Synthetic biology: discovering new worlds and new words. EMBO Rep. 9 (9), 822–827. De Vries, R., 2010. DNA condensation in bacteria: interplay between macromolecular crowding and nucleoid proteins. Biochimie 92 (12), 1715–1721. De Wit, E., De Laat, W., 2012. A decade of 3C technologies: insights into nuclear organization. Genes Dev. 26 (1), 11–24. Denamur, E., Matic, I., 2006. Evolution of mutation rates in bacteria. Mol. Microbiol. 60 (4), 820–827. Dillon, S.C., Dorman, C.J., 2010. Bacterial nucleoid-associated proteins nucleoid structure and gene expression. Nat. Rev. Microbiol. 8 (3), 185–195. Dios, F., Barturen, G., Lebrón, R., Rueda, A., Hackenberg, M., Oliver, J.L., 2014. DNA clustering and genome complexity. Comput. Biol. Chem., http://dx.doi.org/10.1016/j.compbiolchem.2014.08.011. Dröge, P., Müller-Hill, B., 2001. High local protein concentrations at promoters: strategies in prokaryotic and eukaryotic cells BioEssays: news and reviews in molecular. Cell. Dev. Biol. 23 (2), 179–183. Dupaigne, P., Tonthat, N.K., Espeli, O., Whitfill, T., Boccard, F., Schumacher, M.A., 2012. Molecular basis for a protein-mediated DNA-bridging mechanism that functions in condensation of the E. coli chromosome. Mol. Cell 48 (4), 560–571. Espeli, O., Mercier, R., Boccard, F., 2008. DNA dynamics vary according to macrodomain topography in the E. coli chromosome. Mol. Microbiol. 68 (6), 1418–1427. Fang, G., Rocha, E.P.C., Danchin, A., 2008. Persistence drives gene clustering in bacterial genomes. BMC Genomics 9, 4. Fischbach, M., Voigt, C.A., 2010. Prokaryotic gene clusters: a rich toolbox for synthetic biology. Biotechnol. J. 5 (12), 1277–1296. Fischbach, M.A., Walsh, C.T., Clardy, J., 2008. The evolution of gene collectives: How natural selection drives chemical innovation. Proc. Natl. Acad. Sci. U. S. A. 105 (12), 4601–4608. Fisher, J.K., Bourniquel, A., Witz, G., Weiner, B., Prentiss, M., Kleckner, N., 2013. Fourdimensional imaging of E. coli nucleoid organization and dynamics in living cells. Cell 153 (4), 882–895. Fogel, M.A., Waldor, M.K., 2006. A dynamic mitotic-like mechanism for bacterial chromosome segregation. Genes Dev. 20 (23), 3269–3282. Foster, P.L., Hanson, A.J., Lee, H., Popodi, E.M., Tang, H., 2013. On the mutational topology of the bacterial genome. G3 (Bethesda Md.) 3 (3), 399–407. Fribourg, S., Romier, C., Werten, S., Gangloff, Y.G., Poterszman, A., Moras, D., 2001. Dissecting the interaction network of multiprotein complexes by pairwise coexpression of subunits in E. coli. J. Mol. Biol. 306 (2), 363–373. Göbel, U., Sander, C., Schneider, R., Valencia, A., 1994. Correlated mutations and residue contacts in proteins. Proteins 18 (4), 309–317. Güell, M., van Noort, V., Yus, E., Chen, W.-H., Leigh-Bell, J., Michalodimitrakis, K., Yamada, T., Arumugam, M., Doerks, T., Kühner, S., Rode, M., Suyama, M., Schmidt, S., Gavin, A.-C., Bork, P., Serrano, L., 2009. Transcriptome complexity in a genome-reduced bacterium. Science 326 (5957), 1268–1271. Gama-Castro, S., Salgado, H., Peralta-Gil, M., Santos-Zavaleta, A., Muniz-Rascado, L., Solano-Lira, H., Jimenez-Jacinto, V., Weiss, V., Garcí a-Sotelo, J.S., López-Fuentes, A., Porrón-Sotelo, L., Alquicira-Hernández, S., Medina-Rivera, A., MartinezFlores, I., Alquicira-Hernández, K., Martí nez-Adame, R., Bonavides-Martinez, C., Miranda-Rios, J., Huerta, A.M., Mendoza-Vargas, A., Collado-Torres, L., Taboada, B., Vega-Alvarado, L., Olvera, M., Olvera, L., Grande, R., Morett, E., Collado-Vides, J., 2011. RegulonDB version 7.0: transcriptional regulation of Escherichia coli K-12 integrated within genetic sensory response units (Gensor Units). Nucleic Acids Res. 39, D98–105. Grassi, L., Caselle, M., Lercher, M.J., Lagomarsino, M.C., 2012. Horizontal gene transfers as metagenomic gene duplications. Mol. BioSyst. 8 (3), 790–795. Hadizadeh Yazdi, N., Guet, C.C., Johnson, R.C., Marko, J.F., 2012. Variation of the folding and dynamics of the Escherichia coli chromosome with growth conditions. Mol. Microbiol. 86 (6), 1318–1333. Halabi, N., Rivoire, O., Leibler, S., Ranganathan, R., 2009. Protein sectors: evolutionary units of three-dimensional structure. Cell 138 (4), 774–786. Hatley, M.E., Lockless, S.W., Gibson, S.K., Gilman, A.G., Ranganathan, R., 2003. Allosteric determinants in guanine nucleotide-binding proteins. Proc. Natl. Acad. Sci. U. S. A. 100 (24), 14445–14450. Hurst, L.D., Pál, C., Lercher, M.J., 2004. The evolutionary dynamics of eukaryotic gene order. Nat. Rev. Genet. 5 (4), 299–310. Jacob, F., Perrin, D., Sánchez, C., Monod, J., 1960. L’opéron: groupe de gènes à expression coordonnée par un opérateur [CR Acad. Sci. Paris 250 (1960) 1727–1729]. CR Acad. Sci. Paris 250, 1727–1729. Jacob, F., 1977. Evolution and tinkering. Science 196 (4295), 1161–1166.

Jeong, K.S., Ahn, J., Khodursky, A.B., 2004. Spatial patterns of transcriptional activity in the chromosome of Escherichia coli. Genome Biol. 5 (11), R86. Johnson, Z.I., Chisholm, S.W., 2004. Properties of overlapping genes are conserved across microbial genomes. Genome Res. 14 (11), 2268–2272. Jun, S., Mulder, B., 2006. Entropy-driven spatial organization of highly confined polymers: lessons for the bacterial chromosome. Proc. Natl. Acad. Sci. U. S. A. 103 (33), 12388–12393. Jun, S., Wright, A., 2010. Entropy as the driver of chromosome segregation. Nat. Rev. Microbiol. 8 (8), 600–607. Junier, I., Rivoire, O., 2013. Synteny in Bacterial Genomes: Inference, Organization and Evolution. arXiv:1307.4291. Junier, I., Martin, O., Képès, F., 2010. Spatial and topological organization of DNA chains induced by gene co-localization. PLoS Comput. Biol. 6 (2), e1000678. Junier, I., Hérisson, J., Képès, F., 2012. Genomic organization of evolutionarily correlated genes in bacteria: limits and strategies. J. Mol. Biol. 419 (5), 369–386. Junier, I., Boccard, F., Espeli, O., 2014. Polymer modeling of the E. coli genome reveals the involvement of locus positioning and macrodomain structuring for the control of chromosome conformation and segregation. Nucleic Acids Res. 42 (3), 1461–1473. Kühner, S., van Noort, V., Betts, M.J., Leo-Macias, A., Batisse, C., Rode, M., Yamada, T., Maier, T., Bader, S., Beltran-Alvarez, P., Casta no-Diez, D., Chen, W.-H., Devos, D., Güell, M., Norambuena, T., Racke, I., Rybin, V., Schmidt, A., Yus, E., Aebersold, R., Herrmann, R., Böttcher, B., Frangakis, A.S., Russell, R.B., Serrano, L., Bork, P., Gavin, A.-C., 2009. Proteome organization in a genome-reduced bacterium. Science 326 (5957), 1235–1240. Képès, F., Jester, B.C., Lepage, T., Raff, B., Rosu, I., Junier, 2012. The layout of a bacterial genome. FEBS Lett. 586 (15), 2043–2048. Képès, F., 2004. Periodic transcriptional organization of the E. coli genome. J. Mol. Biol. 340 (5), 957–964. Kass, I., Horovitz, A., 2002. Mapping pathways of allosteric communication in groel by analysis of correlated mutations, Proteins: Structure. Funct. Bioinf. 48 (4), 611–617. Klumpp, S., Zhang, Z., Hwa, T., 2009. Growth rate-dependent global effects on gene expression in bacteria. Cell 139 (7), 1366–1375. Klumpp, S., Scott, M., Pedersen, S., Hwa, T., 2013. Molecular crowding limits translation and cell growth. Proc. Natl. Acad. Sci. 110 (42), 16754–16759. Kolesov, G., Wunderlich, Z., Laikova, O., Gelfand, M., Mirny, L., 2007. How gene order is influenced by the biophysics of transcription regulation. Proc. Natl. Acad. Sci. 104 (35), 13948. Koonin, E.V., Wolf, Y.I., 2008. Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world. Nucleic Acids Res. 36 (21), 6688–6719. Kovács, K., Hurst, L.D., Papp, B., 2009. Stochasticity in protein levels drives colinearity of gene order in metabolic operons of Escherichia coli. PLoS Biol. 7 (5), e1000115. Kuhlman, T.E, Cox, E.C., 2012. Gene location and DNA density determine transcription factor distributions in Escherichia coli. Mol. Syst. Biol. 8 (610), http://dx.doi.org/10.1038/msb.2012.42. Kuriyan, J., Eisenberg, D., 2007. The origin of protein interactions and allostery in colocalization. Nature 450 (7172), 983–990. Lapierre, P., Gogarten, J.P., 2009. Estimating the size of the bacterial pan-genome. Trends Genet. 25 (3), 107–110. Lathe, W.C., Snel, B., Bork, P., 2000. Gene context conservation of a higher order than operons. Trends Biochem. Sci. 25 (10), 474–479. Laub, M.T., Goulian, M., 2007. Specificity in two-component signal transduction pathways. Annu. Rev. Genet. 41, 121–145. Lawrence, J.G., Roth, J.R., 1996. Selfish operons: horizontal transfer may drive the evolution of gene clusters. Genetics 143 (4), 1843–1860. Lawrence, J.G., 2003. Gene organization: selection, selfishness, and serendipity. Annu. Rev. Microbiol. 57, 419–440. Le, T.B.K., Imakaev, M.V., Mirny, L.A., Laub, M.T., 2013. High-resolution mapping of the spatial organization of a bacterial chromosome. Science 342 (6159), 731–734. Lee, H.H., Molla, M.N., Cantor, C.R., Collins, J.J., 2010. Bacterial charity work leads to population-wide resistance. Nature 467 (7311), 82–85. Lewis, P.J., Thaker, S.D., Errington, J., 2000. Compartmentalization of transcription and translation in Bacillus subtilis. EMBO J. 19 (4), 710–718. Liang, L.W., Hussein, R., Block, D.H.S., Lim, H.N., 2013. Minimal effect of gene clustering on expression in Escherichia coli. Genetics 193 (2), 453–465. Libby, E.A., Roggiani, M., Goulian, M., 2012. Membrane protein expression triggers chromosomal locus repositioning in bacteria. Proc. Natl. Acad. Sci. 109 (19), 7445–7450. Lim, H.N., Lee, Y., Hussein, R., 2011. Fundamental relationship between operon organization and gene expression. Proc. Natl. Acad. Sci. U. S. A. 108 (26), 10626–10631. Lockless, S.W., Ranganathan, R., 1999. Evolutionarily conserved pathways of energetic connectivity in protein families. Science 286 (5438), 295–299. Lynch, M., 2007. The Origins of Genome Architecture. Sinauer Associates Inc. Marks, D.S., Hopf, T.A., Sander, C., 2012. Protein structure prediction from sequence variation. Nat. Biotechnol. 30 (11), 1072–1080. McLaughlin, R.N., Poelwijk, F.J., Raman, A., Gosal, W.S., Ranganathan, R., 2012. The spatial architecture of protein function and adaptation. Nature 491 (7422), 138–142. Mercier, R., Petit, M., Schbath, S., Robin, S., El Karoui, M., Boccard, F., Espéli, O., 2008. The MatP/matS Site-Specific System Organizes the Terminus Region of the E. coli Chromosome into a Macrodomain. Cell 135 (3), 475–485. Michalak, P., 2008. Coexpression coregulation and cofunctionality of neighboring genes in eukaryotic genomes. Genomics 91 (3), 243–248.

Please cite this article in press as: Junier, I., Conserved patterns in bacterial genomes: A conundrum physically tailored by evolutionary tinkering. Comput. Biol. Chem. (2014), http://dx.doi.org/10.1016/j.compbiolchem.2014.08.017

G Model CBAC-6354; No. of Pages 9

ARTICLE IN PRESS I. Junier / Computational Biology and Chemistry xxx (2014) xxx–xxx

Minnen, A., Attaiech, L., Thon, M., Gruber, S., Veening, J.-W., 2011. SMC is recruited to oriC by ParB and promotes chromosome segregation in Streptococcus pneumoniae. Mol. Microbiol. 81 (3), 676–688. Mirkin, E.V., Mirkin, S.M., 2005. Mechanisms of transcription-replication collisions in bacteria. Mol. Cell. Biol. 25 (3), 888–895. Montero Llopis, P., Jackson, A.F., Sliusarenko, O., Surovtsev, I., Heinritz, J., Emonet, T., Jacobs-Wagner, C., 2010. Spatial organization of the flow of genetic information in bacteria. Nature 466 (7302), 77–81. Morelli, M.J., Allen, R.J., Wolde, P.R.t., 2011. Effects of macromolecular crowding on genetic networks. Biophys. J. 101 (12), 2882–2891. Nevo-Dinur, K., Nussbaum-Shochat, A., Ben-Yehuda, S., Amster-Choder, O., 2011. Translation-independent localization of mRNA in E. coli. Science 331 (6020), 1081–1084. Niki, H., Yamaichi, Y., Hiraga, S., 2000. Dynamic organization of chromosomal DNA in Escherichia coli. Genes Dev. 14 (2), 212–223. Norris, V., Turnock, G., Sigee, D., 1996. The Escherichia coli enzoskeleton. Mol. Microbiol. 19 (2), 197–204. Norris, V., Alexandre, S., Bouligand, Y., Cellier, D., Demarty, M., Grehan, G., Gouesbet, G., Guespin, J., Insinna, E., Le Sceller, L., Maheu, B., Monnier, C., Grant, N., Onoda, T., Orange, N., Oshima, A., Picton, L., Polaert, H., Ripoll, C., Thellier, M., Valleton, J.M., Verdus, M.C., Vincent, J.C., White, G., Wiggins, P., 1999. Hypothesis: hyperstructures regulate bacterial structure and the cell cycle. Biochimie 81 (8-9), 915–920. Odijk, T., 1998. Osmotic compaction of supercoiled DNA into a bacterial nucleoid. Biophys. Chem. 73 (1-2), 23–29. Pál, C., Hurst, L.D., 2004. Evidence against the selfish operon theory. Trends Genet. 20 (6), 232–234. Pál, C., Papp, B., Lercher, M.J., 2006. An integrated view of protein evolution. Nat. Rev. Genet. 7 (5), 337–348. Parry, B.R., Surovtsev, I.V., Cabeen, M.T., O’Hern, C.S., Dufresne, E.R., Jacobs-Wagner, C., 2014. The bacterial cytoplasm has glass-like properties and is fluidized by metabolic activity. Cell 156 (1-2), 183–194. Paul, S., Million-Weaver, S., Chattopadhyay, S., Sokurenko, E., Merrikh, H., 2013. Accelerated gene evolution through replication–transcription conflicts. Nature 495 (7442), 512–515. Pazos, F., Valencia, A., 2008. Protein co-evolution co-adaptation and interactions. EMBO J. 27 (20), 2648–2655. Possoz, C., Junier, I., Espeli, O., 2012. Bacterial chromosome segregation. Front. Biosci. 17, 1020–1034. Postow, L., Hardy, C.D., Arsuaga, J., Cozzarelli, N.R., 2004. Topological domain structure of the Escherichia coli chromosome. Genes Dev. 18 (14), 1766–1779. Price, M.N., Huang, K.H., Arkin, A.P., Alm, E.J., 2005. Operon formation is driven by co-regulation and not by horizontal gene transfer. Genome Res. 15 (6), 809–819. Price, M.N., Arkin, A.P., Alm, E.J., 2006. The life-cycle of operons. PLoS Genet. 2 (6), e96. Ptacin, J.L., Lee, S.F., Garner, E.C., Toro, E., Eckart, M., Comolli, L.R., Moerner, W.E., Shapiro, L., 2010. A spindle-like apparatus guides bacterial chromosome segregation. Nat. Cell Biol. 12 (8), 791–798. Pulkkinen, O., Metzler, R., 2013. Distance matters: the impact of gene proximity in bacterial gene regulation. Phys. Rev. Lett. 110 (19), 198101. Reynolds, K.A., McLaughlin, R.N., Ranganathan, R., 2011. Hot Spots for Allosteric Regulation on Protein Surfaces. Cell 147 (7), 1564–1575. Reynolds, K.A., Russ, W.P., Socolich, M., Ranganathan, R., 2013. Evolution-based design of proteins. Methods Enzymol. 523, 213–235. Rivoire, O., 2013. Elements of coevolution in biological sequences. Phys. Rev. Lett. 110 (17), 178102. Rocha, E.P.C., Danchin, A., 2003. Gene essentiality determines chromosome organisation in bacteria. Nucleic Acids Res. 31 (22), 6570–6577. Rocha, E.P.C., Danchin, A., 2004. An analysis of determinants of amino acids substitution rates in bacterial proteins. Mol. Biol. Evol. 21 (1), 108–116. Rocha, E.P.C., 2006. Inference and analysis of the relative stability of bacterial chromosomes. Mol. Biol. Evol. 23 (3), 513–522. Rocha, E.P.C., 2008. The organization of the bacterial genome. Annu. Rev. Genet. 42, 211–233. Süel, G.M., Lockless, S.W., Wall, M.A., Ranganathan, R., 2003. Evolutionarily conserved networks of residues mediate allosteric communication in proteins. Nat. Struct. Biol. 10 (1), 59–69. Siryaporn, A., Goulian, M., 2008. Cross-talk suppression between the CpxA-CpxR and EnvZ-OmpR two-component systems in E. coli. Mol. Microbiol. 70 (2), 494–506.

9

Skerker, J.M., Perchuk, B.S., Siryaporn, A., Lubin, E.A., Ashenberg, O., Goulian, M., Laub, M.T., 2008. Rewiring the specificity of two-component signal transduction systems. Cell 133 (6), 1043–1054. Smock, R.G., Rivoire, O., Russ, W.P., Swain, J.F., Leibler, S., Ranganathan, R., Gierasch, L.M., 2010. An interdomain sector mediating allostery in Hsp70 molecular chaperones. Mol. Syst. Biol. 6, 414. Sobetzko, P., Travers, A., Muskhelishvili, G., 2012. Gene order and chromosome dynamics coordinate spatiotemporal gene expression during the bacterial growth cycle. Proc. Natl. Acad. Sci. 109 (2), E42–50. Socolich, M., Lockless, S.W., Russ, W.P., Lee, H., Gardner, K.H., Ranganathan, R., 2005. Evolutionary information for specifying a protein fold. Nature 437 (7058), 512–518. Tamames, J., 2001. Evolution of gene order conservation in prokaryotes. Genome Biol. 2 (6). Taniguchi, Y., Choi, P.J., Li, G.-W., Chen, H., Babu, M., Hearn, J., Emili, A., Xie, X.S., 2010. Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science 329 (5991), 533–538. Temme, K., Zhao, D., Voigt, C.A., 2012. Refactoring the nitrogen fixation gene cluster from Klebsiella oxytoca. Proc. Natl. Acad. Sci. 109 (18), 7085–7090. Tenaillon, O., Rodriguez-Verdugo, A., Gaut, R.L., et al., 2012. The molecular diversity of adaptive convergence. Science 335 (6067), 457–461. Tettelin, H., Riley, D., Cattuto, C., Medini, D., 2008. Comparative genomics: the bacterial pan-genome. Curr. Opin. Microbiol. 11 (5), 472–477. Thanbichler, M., Wang, S.C., Shapiro, L., 2005. The bacterial nucleoid: a highly organized and dynamic structure. J. Cell. Biochem. 96 (3), 506–521. Toprak, E., Veres, A., Michel, J.-B., Chait, R., Hartl, D.L., Kishony, R., 2012. Evolutionary paths to antibiotic resistance under dynamically sustained drug selection. Nat. Genet. 44 (1), 101–105. Toprak, E., Veres, A., Yildiz, S., Pedraza, J.M., Chait, R., Paulsson, J., Kishony, R., 2013. Building a morbidostat: an automated continuous-culture device for studying bacterial drug resistance under dynamically sustained drug inhibition. Nat. Protoc. 8 (3), 555–567. Touchon, M., Hoede, C., Tenaillon, O., Barbe, V., Baeriswyl, S., Bidet, P., Bingen, E., Bonacorsi, S., Bouchier, C., Bouvet, O., Calteau, A., Chiapello, H., Clermont, O., Cruveiller, S., Danchin, A., Diard, M., Dossat, C., Karoui, M.E., Frapy, E., Garry, L., Ghigo, J.M., Gilles, A.M., Johnson, J., Le Bouguénec, C., Lescat, M., Mangenot, S., Martinez-Jéhanne, V., Matic, I., Nassif, X., Oztas, S., Petit, M.A., Pichon, C., Rouy, Z., Ruf, C.S., Schneider, D., Tourret, J., Vacherie, B., Vallenet, D., Médigue, C., Rocha, E.P.C., Denamur, E., 2009. Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths. PLoS Genet. 5 (1), e1000344. Travers, A., Muskhelishvili, G., 2005. DNA supercoiling - a global transcriptional regulator for enterobacterial growth? Nat. Rev. Microbiol. 3 (2), 157–169. Valens, M., Penaud, S., Rossignol, M., Cornet, F., Boccard, F., 2004. Macrodomain organization of the Escherichia coli chromosome. EMBO J. 23, 4330–4341. Versalovic, J., Koeuth, T., Britton, R., Geszvain, K., Lupski, J.R., 1993. Conservation and evolution of the rpsU-dnaG-rpoD macromolecular synthesis operon in bacteria. Mol. Microbiol. 8 (2), 343–355. Vora, T., Hottes, A.K., Tavazoie, S., 2009. Protein occupancy landscape of a bacterial genome. Mol. Cell 35 (2), 247–253. Waites, K.B., Talkington, D.F., 2004. Mycoplasma pneumoniae and its role as a human pathogen. Clin. Microbiol. Rev. 17 (4), 697–728. Weigt, M., White, R.A., Szurmant, H., Hoch, J.A., Hwa, T., 2009. Identification of direct residue contacts in protein-protein interaction by message passing. Proc. Natl. Acad. Sci. U. S. A. 106 (1), 67–72. Wiggins, P.A., Cheveralls, K.C., Martin, J.S., Lintner, R., Kondev, J., 2010. Strong intranucleoid interactions organize the Escherichia coli chromosome into a nucleoid filament. Proc. Natl. Acad. Sci. 107 (11), 4991–4995. Woldringh, C.L., Jensen, P.R., Westerhoff, H.V., 1995. Structure and partitioning of bacterial DNA: determined by a balance of compaction and expansion forces? FEMS Microbiol. Lett. 131 (3), 235–242. Wright, M., Kharchenko, P., Church, G., Segrè, D., 2007. Chromosomal periodicity of evolutionarily conserved gene pairs. Proc. Natl. Acad. Sci. 104 (25), 10559. Yin, Y., Zhang, H., Olman, V., Xu, Y., 2010. Genomic arrangement of bacterial operons is constrained by biological pathways encoded in the genome. Proc. Natl. Acad. Sci. U. S. A. 107 (14), 6310–6315. Zarei, M., Sclavi, B., Cosentino Lagomarsino, M., 2013. Gene silencing and large-scale domain structure of the E. coli genome. Mol. BioSyst. 9 (4), 758–767. Zaslaver, A., Mayo, A., Ronen, M., Alon, U., 2006. Optimal gene partition into operons correlates with gene functional order. Phys. Biol. 3 (3), 183–189.

Please cite this article in press as: Junier, I., Conserved patterns in bacterial genomes: A conundrum physically tailored by evolutionary tinkering. Comput. Biol. Chem. (2014), http://dx.doi.org/10.1016/j.compbiolchem.2014.08.017