12
Update
TRENDS in Genetics Vol.21 No.1 January 2005
Genome Analysis
Energy biogenesis: one key for coordinating two genomes Marco Sardiello1,2, Gaetano Tripoli1, Antonio Romito1, Crescenzio Minervini1, Luigi Viggiano1, Corrado Caggese1 and Graziano Pesole3 1
Dipartimento di Anatomia Patologica e di Genetica, Sezione di Genetica, Universita` di Bari, via Amendola 165/A, 70126 Bari, Italy Telethon Institute of Genetics and Medicine (TIGEM), via Pietro Castellino 111, 80131, Napoli, Italy 3 Dipartimento di Scienze Biomolecolari e Biotecnologie, Universita` di Milano, via Celoria 26, 20133 Milano, Italy 2
In metazoan organisms, energy production is the only example of a process that is under dual genetic control: nuclear and mitochondrial. We used a genomic approach to examine how energy genes of both the nuclear and mitochondrial genomes are coordinated, and discovered a novel genetic regulatory circuit in Drosophila melanogaster that is surprisingly simple and parsimonious. This circuit is based on a single DNA regulatory element and can explain both intra- and intergenomic coordinated expression of genes involved in energy production, including the full complement of mitochondrial and nuclear oxidative phosphorylation genes, and the genes involved in the Krebs cycle. Mitochondria have an essential role in the eukaryotic cell, supplying cellular energy through oxidative phosphorylation (OXPHOS) and producing essential metabolites for many biochemical pathways [1]. The OXPHOS system is composed of dozens of gene products from both nuclear and mitochondrial genomes. In most higher eukaryotes, the mitochondrial genome (mtDNA) encodes 13 of the w80 subunits of respiratory complexes I–V [2], which are the core of the OXPHOS system. The remaining w65 respiratory subunits, in addition to the rest of the mitochondrial proteome, are nuclear-encoded. Mitochondria can autonomously transcribe and translate the mtDNA-encoded respiratory genes with their own transcriptional machinery and ribosomes, which are remnants of their bacterial origin. How does the cell coordinate expression of its mitochondrial and nuclear sets of energy genes to properly assemble the OXPHOS apparatus and satisfy cell- and tissue-specific energy demands? In higher eukaryotes, the amount of mtDNA-encoded subunits has been reported to depend on the number of mtDNA copies per cell and on other mechanisms, including the rate of mtDNA transcription and RNA stability [3,4]. However, the amount of nuclear-encoded OXPHOS subunits appears to be controlled mainly at the transcriptional level [5]. Various experimental observations [6–8] suggest fine-tuned communication between mitochondrial and nuclear genetic systems that results in the Corresponding authors: Sardiello, M. (
[email protected]), Pesole, G. (
[email protected]). Available online 23 November 2004 www.sciencedirect.com
interdependent expression of mtDNA- and nuclearencoded OXPHOS genes. A comprehensive model that explains OXPHOS coordination and intergenomic crosstalk has still not been produced in a complex organism. In this study, we used a genomic approach to address the issue in Drosophila melanogaster. Drosophila nuclear energy genes share a common regulatory DNA motif Functionally related genes often share common cis-regulatory modules that mediate the control of their coordinated expression directly [9,10]. The coordination of energy genes, however, requires a higher order of complexity, considering that respiratory genes are distributed in two different genomes and rely on different types of expression control. We begun our analysis by examining the sets of nuclear respiratory genes in Drosophila melanogaster and Drosophila pseudoobscura, two fruitfly species that diverged 40–60 million years ago (Mya), to search for potential transcriptional-regulatory elements that coordinate their expression. The 65 D. melanogaster gene sequences were downloaded from the MitoDrome database (http://bighost. ba.itb.cnr.it/BIG/MitoDrome/), a collection of nuclear genes encoding mitochondrial proteins that we previously compiled [11]. The D. pseudoobscura counterparts were identified as potential orthologs of the D. melanogaster respiratory genes†. The peptide and nucleotide alignments of the pairs of respiratory genes are available on the following website: http://nrg.tigem.it. Pattern discovery analysis and inter-species sequence comparison (Box 1) resulted in the identification of a palindromic 10-bp RTTAYRTAAY motif shared by the respiratory genes in both Drosophila species; we designated this motif the nuclear respiratory gene (NRG) element (Figure 1a,b). An extensive search of the Transfac database (http://www.gene-regulation.com) [12] and published literature indicated that the NRG motif had not been previously described in Drosophila, whereas similar elements [e.g. Factor Name (Matrix ID): CRE-BP1 (M00179); E4BP4 (M00054); VBP (M00228)] had been characterized in vertebrates as targets of the PAR † Genomic sequences of D. pseudoobscura are available at the Human Genome Sequencing Center at the Baylor College of Medicine (http://www.hgsc.bcm.tmc.edu/ projects/drosophila/). The D. pseudoobscura putative orthologs were identified using the 65 D. melanogaster respiratory gene sequences as probes in BLASTN analyses.
Update
TRENDS in Genetics Vol.21 No.1 January 2005
13
Box 1. Sequence analysis Pattern discovery
Pattern search
DNA pattern discovery programs use either enumerative algorithms to examine all oligomers of a given length, reporting those that occur more often than expected as output, or alignment methods to identify unknown signals by local multiple alignment of submitted sequences. We used both approaches to analyze the non-coding sequences of the 65 known Drosophila melanogaster nuclear respiratory genes, using the programs Consensus [24] and Weeder [25,26]. The analyses were performed on the 800-bp upstream and downstream of the transcription-start site, and resulted in similar patterns that can be suitably described by the following position weight matrix (PWM):
DNA pattern search programs are based on a PWM description of the pattern to be searched. The weight score associated with each examined DNA segment represents a measure of its similarity to the collection of sequences that constitute the PWM – the more a given DNA segment matches the PWM, the higher its weight score. We used the PatSearch [27] program to scan the genomic sequences of interest with the NRG PWM. Analyses were performed with a weight score threshold of 5.81, established as the lower value that is associated with a conserved NRG element in the Drosophila respiratory orthologs.
Interspecies sequence comparison Aj Cj Gj Tj
56 10 48 16 R
0 0 0 130 T
0 0 7 123 T
125 0 5 0 A
0 75 0 55 Y
55 0 75 0 R
0 5 0 125 T
123 7 0 0 A
130 0 0 0 A
16 48 10 56 Y
This pattern defines a palindromic 10-bp consensus whose halfsequence RTTAY presents minor variations in the first, third and fourth positions (pyrimidine instead of purine and G instead of T or A, respectively).
subfamily of basic leucine zipper transcription factors [13]. Pattern search analysis (Box 1) on D. melanogaster respiratory genes showed that one or more NRG sites are present in all the 65 examined genes. NRG sites are (a)
(b) 4 3 Bits
Bits
2 1 0 1 2 3 4 5 6 7 8 9 10
2 1 0 1 2 3 4 5 6 7 8 9 10
(c)
Number of NRG elements
20 Single sites Multiple sites 15
10
5
1
80
160
240
320 400 480 560 640 Distance from TSS (bp)
720 800
TRENDS in Genetics
Figure 1. Consensus sequence and distribution of the nuclear respiratory gene (NRG) element. (a) Four-letter and (b) two-letter representations are used to better visualize the information content (expressed in bits) of the NRG motif. The relative sizes of the letters are a measure of the relative frequencies of the nucleotides in the given positions. R and Y indicate purine (A or G) and pyrimidine (C or T), respectively. (c) The position analysis of the NRG sites conserved in the two Drosophila species shows that single and multiple sites are distributed differently. Single NRG sites have a strong positional bias for the region from 160 to 280 bp downstream of the transcription-start site (TSS), whereas multiple NRG sites are distributed with no apparent preference within the examined region. Distance refers to Drosophila melanogaster genes. www.sciencedirect.com
Comparing the non-coding sequences of two or more orthologs (usually referred to as phylogenetic footprinting) is an efficient approach for revealing potential regulatory elements. It is based on the high degree of inter-species conservation of regulatory elements, as a result of their tendency to evolve much slower than the surrounding sequence less constrained by purifying selection. Comparing the 65 pairs of D. melanogaster and D. pseudoobscura respiratory gene sequences (http://nrg.tigem.it) resulted in the identification of NRG sites conserved in both species and subsequently in the determination of their lower weight score.
located within 800 bp downstream of the transcriptionstart site of genes, and are preferentially distributed in the first (65%) or second (19%) intron; the remaining sites are distributed between the 5 0 untranslated region (UTR) (13%) and 3 0 UTR (3%) of the gene. The sequence alignments (http://nrg.tigem.it) show that 90% of D. melanogaster NRG sites found in respiratory genes are conserved in their D. pseudoobscura counterparts in the same subgenic position, whereas the surrounding noncoding sequences show remarkable divergence at the sequence level. The conserved single NRG sites have a strong bias for location within the region 160–280 bp downstream of the transcription-start site (Figure 1c). This result is particularly interesting because its consistency and completeness. To evaluate the distribution and significance of the NRG element, we downloaded the complete D. melanogaster gene set (http://www. ebi.ac.uk/genomes/eukaryota.html) and performed a pattern-search analysis of the NRG motif across the 800 bp that is downstream of the transcription-start site. We also performed a control analysis with the reverse NRG (R-NRG) motif (consensus YAATRYATTR). Assuming that the majority of the R-NRG sequences had no biological significance, and because the NRG and R-NRG motifs share the same nucleotide composition, we used the results of the reverse analysis as a measure of the expected background noise. The results showed that NRG sites are overrepresented in the respiratory chain gene set (100%) compared with the complete gene set (15.5%) of D. melanogaster (P!0.0001). In addition, NRG sites are more abundant than R-NRG sites in the respiratory chain gene set (100% and 15.4%, respectively) (Fisher’s exact test P-value!0.0001). Conversely, NRG sites are slightly less represented than R-NRG sites in the complete gene set (15.5% and 19.2%, respectively), possibly because of a counter-selection that occurred in NRG-like sequences during evolution.
14
Update
TRENDS in Genetics Vol.21 No.1 January 2005
65 Respiratory genes (complexes I–V)
III IV V
Translation
NRG-mediated control of expression
>100 Genes with a mitochondrial function
I
II
mt rib pp mtEF-Ts mtEF-Tu mt RNA pol mtTFA mtSSB DNA pol γ -β
Other OXPHOS proteins Citric acid cycle (Krebs cycle)
mtRNAs Transcription
Replication mtDNA
Mitochondrial metabolism
Mitochondrion Nucleus TRENDS in Genetics
Figure 2. A proposed mechanism for coordination of nuclear and mitochondrial energy genes in Drosophila. The nuclear respiratory gene (NRG) element is shared by 65 nuclear respiratory genes and by many genes encoding mitochondrial proteins that are involved in energy pathways (blue arrows), connecting these genes in a common regulatory circuit. The amount of the 13 mitochondrial-encoded respiratory subunits depends on: (i) the number of mitochondrial DNA (mtDNA) molecules per mitochondrion (i.e. on the rate of mtDNA replication); (ii) the rate of mtDNA transcription; and (iii) the efficiency of translation of the mitochondrial transcripts (red arrows). All of these processes involve, as catalytic or enhancing factors, the products of genes containing NRG sites (green arrows), which constitutes a regulatory link between the nuclear and mitochondrial sets of energy genes. Abbreviations: mt rib pp, mitochondrial ribosomal proteins; mtEF-Ts, mitochondrial elongation factor Ts; mtEF-Tu, mitochondrial elongation factor Tu; mt RNA pol, mitochondrial RNA polymerase; mtTFA, mitochondrial transcription factor A; mtSSB, mitochondrial single strand binding protein; DNA pol g-b, accessory subunit of mitochondrial DNA polymerase.
Taken together, these data demonstrate that the NRG element is greatly enriched in respiratory genes. The distribution, conservation and positioning of NRG sites strongly suggest that nuclear respiratory genes are connected in a common regulatory circuit based on the NRG element, which might be the key through which coordination of nuclear respiratory genes is accomplished. Nuclear and mitochondrial energy genes are connected in a genetic regulatory network The comparison of the results from the complete gene set analysis with the MitoDrome [11] gene collection revealed that 187 (54%) genes with a mitochondrial function contain NRG sites, with a bias for energy-related genes (Table 1; supplementary material online). Remarkably, all Krebs cycle genes and OXPHOS genes that integrate the respiratory chain function contain one or more NRG sites. NRG sites are also present in genes involved in pyruvate metabolism and in many genes that encode carriers and enzyme activities (such as amino acid and lipid degradation) that are directly involved in mitochondrial energy pathways. Importantly, all of the steps that are required to produce the 13 mitochondrial-encoded respiratory subunits – replication of the mitochondrial genome, transcription of mitochondrial mRNAs and their translation by mitochondrial ribosomes – involve the products of genes that contain NRG sites as catalytic and/or enhancing factors. These include: (i) the accessory subunit of mitochondrial DNA polymerase and mitochondrial single strand binding www.sciencedirect.com
protein, both of which enhance the activity and processivity of mitochondrial DNA polymerase [14–16]; (ii) mitochondrial transcription factor A, a dual-role factor involved in transcriptional activation of mtDNA and in its replication, packaging and maintenance [17]; (iii) mitochondrial RNA polymerase; (iv) mitochondrial elongation factors, Ts and Tu, which have a key role in mitochondrial protein synthesis by promoting the elongation of the nascent peptide at the active site of the ribosome and enabling the recruitment of energy molecules required in the process [18]; and (v) mitochondrial ribosomal proteins. Taken together, our results provide evidence of the existence of a regulatory network in which the NRG element constitutes a strong genetic link between the nuclear and mitochondrial sets of energy genes. The working model we propose (Figure 2) might explain, for the first time in a complex organism, both intra- and intergenomic coordinated expression of all known respiratory genes. Furthermore, the inclusion of other key energy genes in the NRG circuit suggests its general involvement in the genetic management of energy production, opening new avenues towards functional studies on mitochondrial biogenesis and bioenergetics. Other analyses and concluding remarks Preliminary analysis that focused on the genes encoding the F0F1-ATP synthase (complex V of the respiratory chain) showed that the NRG circuit is conserved in the malaria mosquito Anopheles gambiae (Figure 1; supplementary material online), despite the relatively large
Update
TRENDS in Genetics Vol.21 No.1 January 2005
evolutionary distance (w250 million years) that separates it from the Drosophila genus. In mammals, nuclear respiratory factor 1 and 2 (NRF1, NRF2), peroxisome proliferative activated receptor a (PPARa and Sp1 transcription factor have been characterized as transcriptional regulators of many genes involved in mitochondrial energy pathways, including a subset of the genes that encode subunits of the respiratory chain [5,19–21]. The target genes of NRF1 include constituents of the mtDNA transcription and replication machinery [19], suggesting that NRF1 has an important role in nuclear-mitochondrial communication. Interestingly, none of the target sites of these transcription factors overlaps with the NRG motif, and Drosophila and Anopheles have no obvious homologs of NRF1, NRF2 or PPARa, suggesting that mammals and invertebrates could base the control of energy production on different genetic circuits. Further analysis will be required to investigate the possible occurrence of the NRG circuit in other species. Life-cycle transcriptional profiles for approximately a third of D. melanogaster genes are publicly available from Yale University (Drosophila development gene expression timecourse; http://genome.med.yale.edu/Lifecycle/) [22]. We performed a statistical analysis to determine whether the presence of NRG sites correlates with gene expression patterns, examining the transcriptional profiles of the following three datasets (details are provided at http://nrg. tigem.it): (i) S1, NRG-containing genes that encode mitochondrial proteins, including OXPHOS genes (a total of 83 genes with expression data available); (ii) S2, a random sample of 100 NRG-containing genes that resulted from the genome-wide analysis and are not included in S1; and (iii) S3, a random sample of 100 genes that do not contain a NRG site in the analyzed region. The clustering analysis was conducted using the EPCLUST software (available at the European Bioinformatics Institute; http://ep.ebi.ac.uk/EP/EPCLUST/). For each dataset the pairwise distances and the average distance were computed using the Pearson correlation distance function. The average distance among expression profiles of S1 (0.719) is significantly lower than the average distance of S3 (0.980) (two-sample t test P-value!0.0001 assuming equal variance), showing that a significant correlation exists among expression profiles of the NRGcontaining genes that encode mitochondrial proteins. This finding strongly supports a functional role of the NRG element in the transcriptional regulation of these genes, and provides experimental evidence for the model of gene coordination we propose. The average distance among expression profiles of S2 (0.963) is greater than S1, and only slightly lower than S3 (PZ0.016), indicating a weaker correlation among the expression patterns of S2 genes. This suggests that a subset of the NRG sites identified in the genome-wide analysis could be false positives (i.e. nonfunctional sequences, as commonly occurs in computational searches of regulatory signals [23]). In addition, some of the NRG-containing genes could belong to regulatory circuits based on the synergy between NRG sites and several other regulatory elements, which would result in different genespecific expression profiles. Distinguishing functional NRG sites from neutrally evolving DNA, and characterizing new www.sciencedirect.com
15
regulatory elements that are involved with the NRG element in the synergistic control of gene expression, will be among the next challenges in dissecting the genetic circuitry of energy production. Acknowledgements This work was supported by Ministero Istruzione, Universita` e Ricerca (Italy) and by Telethon. We thank Graciana Diez Roux for critical reading of the manuscript.
Supplementary data Supplementary data associated with this article can be found at doi:10.1016/j.tig.2004.11.009
References 1 Saraste, M. (1999) Oxidative phosphorylation at the fin de siecle. Science 283, 1488–1493 2 Attardi, G. and Schatz, G. (1988) Biogenesis of mitochondria. Annu. Rev. Cell Biol. 4, 289–333 3 Tullo, A. et al. (1994) Transcription of rat mitochondrial NADHdehydrogenase subunits. Presence of antisense and precursor RNA species. FEBS Lett. 354, 30–36 4 Kaufmann, P. et al. (1996) Mitochondrial DNA and RNA processing in MELAS. Ann. Neurol. 40, 172–180 5 Scarpulla, R.C. (2002) Transcriptional activators and coactivators in the nuclear control of mitochondrial function in mammalian cells. Gene 286, 81–89 6 Duborjal, H. et al. (2002) Large functional range of steady-state levels of nuclear and mitochondrial transcripts coding for the subunits of the human mitochondrial OXPHOS system. Genome Res. 12, 1901–1909 7 Pena, P. et al. (1995) Analysis of the mitochondrial ATP synthase beta-subunit gene in Drosophilidae: structure, transcriptional regulatory features and developmental pattern of expression in Drosophila melanogaster. Biochem. J. 312, 887–897 8 Talamillo, A. et al. (1998) Expression of the nuclear gene encoding mitochondrial ATP synthase subunit alpha in early development of Drosophila and sea urchin. Mol. Biol. Rep. 25, 87–94 9 GuhaThakurta, D. et al. (2002) Identification of a novel cis-regulatory element involved in the heat shock response in Caenorhabditis elegans using microarray gene expression and computational methods. Genome Res. 12, 701–712 10 Halfon, M.S. et al. (2002) Computation-based discovery of related transcriptional regulatory modules and motifs using an experimentally validated combinatorial model. Genome Res. 12, 1019–1028 11 Sardiello, M. et al. (2003) MitoDrome: a database of Drosophila melanogaster nuclear genes encoding proteins targeted to the mitochondrion. Nucleic Acids Res. 31, 322–324 12 Wingender, E. et al. (2001) The TRANSFAC system on gene expression regulation. Nucleic Acids Res. 29, 281–283 13 Haas, N.B. et al. (1995) DNA-binding specificity of the PAR basic leucine zipper protein VBP partially overlaps those of the C/EBP and CREB/ATF families and is influenced by domains that flank the core basic region. Mol. Cell. Biol. 15, 1923–1932 14 Thommes, P. et al. (1995) Mitochondrial single-stranded DNA-binding protein from Drosophila embryos. Physical and biochemical characterization. J. Biol. Chem. 270, 21137–21143 15 Wang, Y. and Kaguni, L.S. (1999) Baculovirus expression reconstitutes Drosophila mitochondrial DNA polymerase. J. Biol. Chem. 274, 28972–28977 16 Schultz, R.A. et al. (1998) Differential expression of mitochondrial DNA replication factors in mammalian tissues. J. Biol. Chem. 273, 3447–3451 17 Larsson, N.G. et al. (1998) Mitochondrial transcription factor A is necessary for mtDNA maintenance and embryogenesis in mice. Nat. Genet. 18, 231–236 18 Cai, Y.C. et al. (2000) Interaction of mitochondrial elongation factor Tu with aminoacyl-tRNA and elongation factor Ts. J. Biol. Chem. 275, 20308–20314
Update
16
TRENDS in Genetics Vol.21 No.1 January 2005
19 Kelly, D.P. and Scarpulla, R.C. (2004) Transcriptional regulatory circuits controlling mitochondrial biogenesis and function. Genes Dev. 18, 357–368 20 Li, R. et al. (1996) Expression of the human cytochrome c1 gene is controlled through multiple Sp1-binding sites and an initiator region. Eur. J. Biochem. 241, 649–656 21 Gulick, T. et al. (1994) The peroxisome proliferator-activated receptor regulates mitochondrial fatty acid oxidative enzyme gene expression. Proc. Natl. Acad. Sci. U. S. A. 91, 11012–11016 22 Arbeitman, M.N. et al. (2002) Gene expression during the life cycle of Drosophila melanogaster. Science 297, 2270–2275 23 Jegga, A.G. et al. (2002) Detection and visualization of compositionally similar cis-regulatory element clusters in orthologous and coordinately controlled genes. Genome Res. 12, 1408–1417
24 Hertz, G.Z. and Stormo, G.D. (1999) Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 563–577 25 Pavesi, G. et al. (2001) An algorithm for finding signals of unknown length in DNA sequences. Bioinformatics 17(Suppl. 1), S207–S214 26 Pavesi, G. et al. (2004) Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Res. 32, W199–W203 27 Grillo, G. et al. (2003) PatSearch: A program for the detection of patterns and structural motifs in nucleotide sequences. Nucleic Acids Res. 31, 3608–3612 0168-9525/$ - see front matter Q 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.tig.2004.11.009
Modular analysis of the transcriptional regulatory network of E. coli Osbaldo Resendis-Antonio, Julio A. Freyre-Gonza´lez, Ricardo Menchaca-Me´ndez, Rosa M. Gutie´rrez-Rı´os, Agustino Martı´nez-Antonio, Cristhian A´vila-Sa´nchez and Julio Collado-Vides Program of Computational Genomics, Nitrogen Fixation Research Center, Universidad Nacional Auto´noma de Me´xico, Ave Universidad s/n, Col Chamilpa, Cuernavaca, Morelos 62100 Me´xico
The transcriptional network of Escherichia coli is currently the best-understood regulatory network of a single cell. Motivated by statistical evidence, suggesting a hierarchical modular architecture in this network, we identified eight modules with well-defined physiological functions. These modules were identified by a clustering approach, using the shortest path to trace regulatory relationships across genes in the network. We report the type (feed forward and bifan) and distribution of motifs between and within modules. Feed-forward motifs tend to be embedded within modules, whereas bi-fan motifs tend to link modules, supporting the notion of a hierarchical network with defined functional modules. There is experimental evidence suggesting that, at certain times, different fractions of the complete set of transcriptional factors (TFs) are used depending on the growth conditions within the cell [1,2]. The global topological analysis of transcriptional networks supports the notion of their organization into modules or large groups of genes that, in many cases, respond to external or internal stimuli [2,3]. A network with a scale-free property is expected to be robust to failure of individual components, and could contribute to the ability of the cell to respond to changes in environmental and evolutionary conditions [4–9]. At a finer topological scale, over-represented topological units called network motifs, consisting of three or four genes, contribute to the local dynamic behavior of Corresponding author: Collado-Vides, J. (
[email protected]). Available online 25 November 2004 www.sciencedirect.com
transcriptional regulation [10,11]. Despite studies that are focused on motifs and modules, little is known about how these motifs can integrate to construct modular structures and whether their components correspond to subsets of genes that regulate integrated cellular responses to external conditions. It was recently reported that the transcriptional regulatory network of Escherichia coli is a scale-free network consisting of a hierarchy of modules [12]. In this article, we describe our reconstruction of eight modules (Figure 1a–d) with clearly defined physiological functions. In addition, using the fraction of feed forward (FF) and bi-fan (BF) motifs [11] that are shared between two modules, we quantified the overlap between them (Figure 2a–f). We found that the largest module that is involved in carbon sources has the greatest number of connection via motifs with other modules. We analyzed the network of known transcriptional interactions of E. coli K-12 that were organized in RegulonDB [13] (http://www.cifn.unam.mx/Computational_Genomics/regulondb/). Neglecting genes without experimental evidence, indicating that they encode TFs (supplementary material online), and ignoring autoregulation, the total number of TFs that regulate the expression of other TFs is 55, all of which control the expression of 747 genes. Based on the number of genes in the genome, and the total number of estimated TFs (w320), we estimate that this fraction represents w18% of the transcriptional network in E. coli. In our graphical representation, vertices represent genes and the transcriptional interactions between them are represented by edges.