Evolution of Transcriptional Regulation System through Promiscuous Coupling of Regulatory Proteins with Operons; Suggestion from Proteins Sequence Similarities inEscherichia coli

Evolution of Transcriptional Regulation System through Promiscuous Coupling of Regulatory Proteins with Operons; Suggestion from Proteins Sequence Similarities inEscherichia coli

J. theor. Biol. (1996) 178, 183–204 Evolution of Transcriptional Regulation System through Promiscuous Coupling of Regulatory Proteins with Operons; ...

4MB Sizes 0 Downloads 2 Views

J. theor. Biol. (1996) 178, 183–204

Evolution of Transcriptional Regulation System through Promiscuous Coupling of Regulatory Proteins with Operons; Suggestion from Protein Sequence Similarities in Escherichia coli J O†, H W‡  K T. M† † Department of Applied Biological Science, Faculty of Science and Technology, Science University of Tokyo, Noda 278 and ‡ Research Institute for Biosciences, Science University of Tokyo, Noda 278, Japan (Received on 12 September 1995, Accepted in revised form on 31 August 1995)

As an advanced molecular study of the problems of the evolution of organisms, the transcriptional regulation system is studied by investigating the amino acid sequence similarities between the proteins in the regulation system of Escherichia coli in which the data of sequenced proteins as well as of regulator–regulon relationships are accumulated. The similarities between the proteins are calculated by the FASTA algorithm and their homology is also evaluated in terms of statistical significance with the use of the RDF2 program. This investigation reveals that the similarity between the regulatory protein and the regulated protein is hardly found, but many similarities are found between regulatory proteins and between regulated proteins. These similarity relations are compared with the regulator– regulon relationships ascertained experimentally. From this comparison, it is found that similar regulatory proteins rarely regulate the transcription of similar protein genes. As most of the highly similar proteins are considered to have diverged from a common ancestral protein, this finding strongly suggests the possibility that descendant regulatory proteins have been promiscuously coupled with descendant operons, independently of their ancestral regulator–regulon relationship, and that some of the couplings have been fixed by selection to form the present system of transcriptional regulation. The compatibility of such promiscuous coupling with regulatory organization is illustrated in the carbohydrate transport systems and the succeeding metabolic pathways, whose organization is comprehensive in sending nutritious substances to the central path of glycolysis under different environmental conditions. The benefit of flexibility in regulator–regulon relationships in evolutionary processes is also discussed in connection with the punctuational divergence of species in macroevolution and the cell differentiation in multicellular organisms. 7 1996 Academic Press Limited

neutral hypothesis has argued that the rate constancy is reasonable if the changes in the amino acid mainly result from the fixation of selectively neutral mutants by random drift in individual species (Kimura, 1968; King & Jukes, 1969). As the sequencing technique is widespread, the amount of sequence data is sufficiently large to make it possible to investigate protein evolution itself, opening the second stage of studies of molecular evolution. First of all, various methods have been devised to facilitate the similarity search among the sequence data stored in databases (Smith &

Introduction Studies of molecular evolution have started from the enumeration of amino acid changes in the proteins that are called by the same name but derived from different species. This has produced a simple law, the rate constancy of amino acid changes in diverged species (Zuckerkandl & Pauling, 1965), stimulating the construction of a phylogenetic tree of species. The

‡ Present address: Association for Propagation of the Knowledge of Genetics, Yata 1171–195, Mishima 411, Japan. 0022–5193/96/020183+22 $12.00/0

183

7 1996 Academic Press Limited

184

.    .

Waterman, 1981; Lipman & Pearson, 1985; Pearson & Lipman, 1988; Myers & Miller, 1988). Now it is generally accepted to cluster the proteins into a ‘‘family’’ if they carry mutually similar amino acid sequences and show similar biological activities. The term ‘‘superfamily’’ is also proposed to be used for the set of proteins that share one or more consensus sequence motifs, presumably responsible for similar activities, even if they are different in total sequence length (Dayhoff et al. 1978). The more elaborate method of illustrating the divergence pattern and selective mode in proteins is also proposed, discriminating between the neutral amino acid changes and adaptive ones (Horimoto et al., 1990; Otsuka et al., 1993). In parallel with the identification of homologous proteins probably generated by gene duplication, the increasing sequence data also stimulate proposals of other mechanisms for the expansion of the repertoire of protein functionality; e.g., exon shuffling (Gilbert, 1978) and the generation of a new gene from the antisense strand of some pre-existing gene (Kunisawa & Otsuka, 1987). Although these mechanisms for molecular evolution themselves should be further looked into, some knowledge of molecular evolution is now becoming a powerful tool with which we can get an insight into the biological phenomena. Among such phenomena, transcriptional regulation is noteworthy in the sense that a slight change in a regulatory protein should have a great influence on the expression of many protein genes in the regulon. This important role of transcriptional regulation in the evolutionary process is already indicated (Paigen, 1986), but its indication is restricted to a discussion of modes based on the regulatory polymorphisms within contemporary species. In addition to the polymorphisms, there are many recent examples of sequence similarities between the transcriptional regulatory proteins (Henikoff et al., 1988; Stock et al., 1989; Tobin & Schleif, 1990; Weickert & Adhya, 1992) and this is also the case for the proteins regulated at the transcriptional level (Higgins et al., 1988; Saier, 1989). This suggests that the repertoire of regulatory proteins and that of regulated proteins have been enlarged respectively by the simple and well-known mechanism of gene duplication. In the present paper, we will study the regulator– regulated protein relationships by investigating the amino acid sequence similarities between the regulatory proteins and those between regulated proteins, as well as the similarities between regulatory proteins and regulated proteins. If most of the similar proteins are considered to be homologous, i.e., to have diverged from a common ancestral protein, a

systematic investigation along this line may be expected to provide information about how the regulator–regulon relationships have developed during the enlargement of the repertoire of both regulatory proteins and regulated protein genes. For this purpose, we take up the case of Escherichia coli, as sequenced protein data as well as regulator–regulon relationships are well documented for this organism. Although most experimental interests in transcriptional regulation are now directed to cell differentiation in multicellular organisms, studies in prokaryotes are also fruitful, still providing information for eukaryote studies. Even if we confine ourselves to transcriptional regulation in E. coli, we can expect to elucidate an evolutionary character of transcriptional regulation that may be common to both prokaryote and eukaryote in its essence. In fact, the increasing examples of transcriptional regulation in E. coli are revealing features common to those in eukaryotes. For example, the helix-turn-helix sequence motif for DNA binding has been found in many transcriptional regulators of both prokaryotes and eukaryotes (Steite et al., 1982; Ronson et al., 1987; Kahn & Ditta, 1991; Weickert & Adhya, 1992; Irvine & Guest, 1993), and there are some operons, each of which is under the transcriptional regulation of two or more kinds of regulatory proteins (Collado-Vides et al., 1991), equally confirmed or suggested to have a role in regulation in eukaryotes. It is also a common feature that nucleotide sequences recognized by a definite regulatory protein are not as constrained, but full of variety. Although the proposal of a consensus sequence is attempted to cover such a variety of nucleotide sequences, it is often altered by the discovery of new binding sites (Ames & Nikaido, 1985; Urbanowski & Stauffer, 1989; Goransson et al., 1989; Rampersaud et al., 1989; He et al. 1990; Rolfes & Zalkin, 1990). This feature of transcriptional regulation will also be discussed in connection with the results of the present study. Method The information about regulatory proteins and the protein genes regulated by them is obtained from a recent review of transcriptional regulation in E. coli (Collado-Vides et al., 1991), SWISS-PROT Protein Sequence Database, Release 23 (1992), and other original papers published recently, each of which will be discussed in the following sections. The amino acid sequence data of these regulatory proteins and regulated proteins are compiled from original papers published recently, as well as the SWISS-PROT

    Protein Sequence Database. The sequence data of proteins in E. coli thus stored amount to about one-third of all the proteins encoded by the genome of E. coli, and contain about 120 kinds of regulatory proteins and about 430 kinds of the proteins under the control of regulatory proteins. The similarity scores between the proteins are calculated by the FASTA algorithm (Pearson & Lipman, 1988), and the proteins are clustered on the basis of their similarity scores by our method (Watanabe & Otsuka, 1995), which is essentially the same method as the single-linkage clustering method (Romesburg, 1989), but was developed for the comprehensive representation of similarity relations of large numbers of proteins. Using this method, our preliminary investigation ascertained that most of the protein families and/or superfamilies indicated so far are clustered as the respective groups by similarity scores of more than 100. Thus, the clustering of proteins is carried out by setting the threshold of similarity score to 100, and the more similar proteins are arranged at the nearer position. For every pair of clustered proteins, its homology is further examined by evaluating the statistical significance with the RDF2 program (Pearson & Lipman, 1988) after 200 times shuffling of the amino acid sequence of the counterpart. This investigation reveals that the similarity sufficient to guarantee homology is hardly found between the regulatory protein and the regulated protein, except for the relation between the repressor RbsR and the regulated operon rbsDACBK, where RbsR has been indicated to show considerable similarity to the ribose binding protein RbsB (Mauzy & Hermodson, 1992), and for some of the regulatory proteins that are known to be self-regulated at the level of transcription. Moreover, there is hardly any case where a transcriptional regulatory protein regulates the transcription of other transcriptional regulatory protein genes. Thus, the result of clustering can be represented for the regulatory proteins and regulated proteins separately, although some of the regulatory proteins, if they are self-regulated, are also included in the latter category. Such representation is convenient for the present purpose of investigating whether the similar regulatory proteins regulate the gene transcription of proteins with similar sequences and functions. As for the regulated proteins, they are collected in the unit of operon before their clustering. This collection is pursued by the inference from the distance between the protein genes in E. coli genomic databases (Kunisawa et al., 1990; Rudd et al., 1991) as well as from the literature of transcriptional regulation mentioned in the first part of this section. The clustering of operons is carried out on the basis

185

of the highest similarity score among the scores that are calculated for all the pairs of proteins between the operons. The clustering of regulated proteins in the operon unit thus obtained is then compared with the result of clustering the regulatory proteins, with the indication of nature of regulation. The collection of regulated proteins in the operon unit is convenient for a brief representation of this comparison. Results The result of clustering the regulatory proteins is shown in Fig. 1 where the regulatory proteins that show the higher similarity score are arranged at the nearer position, and the similarity score calculated between the proteins is denoted with the size of closed square at the locus in the corresponding non-diagonal element. In this matrix representation, only one-half of the non-diagonal part is sufficient to show the similarity relations, and the regulatory proteins are denoted by gene names in the diagonal parts. As expected, 62 regulatory proteins out of 123 treated in the present study are clustered into 12 groups. The regulatory proteins clustered into the same group have probably diverged from a common ancestral protein, retaining similar structure and functional manner. In fact, most of the regulatory proteins clustered into the same group have amino acid sequences of similar length and carry similar functional domains, although the regulatory proteins in Group 1 can be divided into at least three subgroups (a), (b) and (c) by the pattern of clustering. For the demonstration of their similarities, the amino acid sequences of the regulatory proteins clustered into the same group or subgroup are aligned homologously in Fig. 2, together with the indication of functional domains. As the initial identification on the comparison of structure between CRP and Cro (Steite et al., 1982), the helix-turn-helix DNA binding motif has been detected in many regulatory proteins; regulatory proteins of Group 1(a) (Ronson et al., 1987), those of Group 1(c) (Kahn & Ditta, 1991), those of Group 2 (Drummond et al., 1990; Gallegos et al., 1993), those of Group 3 (Tho¨ny et al., 1991; Viale et al., 1991; Sung & Fuchs, 1992), those of Group 4 (Weickert & Adhya, 1992), those of Group 5 (von Bodman et al., 1992), those of Group 6 (Reizer et al., 1991), those of Group 8 (Lonetto et al., 1992), those of Group 9 (Irvine & Guest, 1993) and those of Group 12 (Willins et al., 1991). The region of such a DNA-binding motif is, however, only a small portion of a whole amino acid sequence in any regulatory protein, and the regulatory proteins clustered into the same group or subgroup are also similar in much

.   .

186

longer regions than the DNA binding motif. This probably means that the regulatory proteins in the same group or subgroup are similar to each other in their functional manner, including the protein– protein interaction as well as the DNA-binding domain and signal–receptor domain. The regulatory proteins in Group 1 are mainly response regulators of two-component regulatory systems and their derivatives (Stock et al., 1989). Although the similarity of phosphorylation domains leads to the aggregation of these response regulators into Group 1, the differences in other regions cause their division into three subgroups (a), (b) and (c). The regulatory proteins in subgroup (a) share a relatively long putative domain for the interaction with sigma factor (Austin et al., 1991; Austin & Dixon, 1992). The regulatory proteins of the subgroup (b), except for CadC, are response regulators coupled with the proteins sensing environmental stimuli (Stock et al., 1989), and they are highly similar to each other even in the C-terminal region probably containing DNA-binding site. NarL, FimZ, UhpA and RcsB of subgroup (c) also certainly fall into the category of response regulator from the existence of phosphorylation domain, although SdiA and MalT show the similarity to the above four regulatory proteins only in the C-terminal region containing the helix-turnhelix motif. The proteins in Group 2 belong to the XylS/AraC family (Tobin & Schleif, 1990) and

carry the helix-turn-helix motif in the region about one-third from the C-terminal (Gallegos et al., 1993), although the upstream region is missing in SoxS. The mutual similarity of the regulatory proteins in Group 3 has led to the proposal of LysR family (Henikoff et al., 1988). All the regulatory proteins in Group 4 are the members of LacI family (Weickert & Adhya, 1992). It is also seen in Groups 5–7 and 9–12 that the regulatory proteins clustered into the same group are similar to each other in their amino acid sequences. The proteins in Group 8 are sigma factors of RNA polymerase, and two of them, RpoS and RpoH, are much similar to the C-terminal half of RpoD where the recognition sites for 10 and 35 promoter regions have been identified (Lonetto et al., 1992). The clustering of the proteins, whose genes are under the transcriptional regulation, is shown in Fig. 3 in the unit of operon. Although the proteins encoded within an operon are not necessarily similar to each other in their amino acid sequences but mostly associated physiologically, similar sets of proteins are found between many operons, suggesting their generation by the duplication of operon unit. The similarity score denoted between operons in the figure is the highest one among the similarity scores calculated for all the pairs of proteins between the operons. The pair of proteins giving rise to the highest similarity score, and which is thus responsible for the clustering of operons, is shown beside the

F. 1. Single linkage clustering of regulatory proteins on the basis of their amino acid sequence similarity. The proteins are clustered by setting the threshold to the similarity score of 100. The groups of clustered proteins are numbered, according to the group size that is denoted in brackets, e.g., Group 1 consists of 17 proteins, Group 2 of 12 proteins and so on. The similarity scores calculated between the proteins in each group are denoted by 10 ranks of closed square sizes when they are evaluated to be statistically significant with more than 6.0 : , 100–199; , 200–299; Q, 300–399; Q,400–499; Q, 500–599; Q, 600–699; Q, 700–799; Q, 800–899; Q, 900–999; Q, 1000 and higher. The similarity score denoted by an open square indicates that the statistical significance is calculated to be less than 6.0 . Q

   

187

F. 2. Homologous alignment of the amino acid sequences of regulatory proteins by every group. The amino acid sequence of each protein is represented by a horizontal thick line, with some gaps needed in the homologous alignment. The amino acid residue conserved by more than 60% within the same group or subgroup is indicated by a vertical line. The conservation is evaluated on the basis of the six categories of biochemical properties of amino acid residues (Otsuka et al., 1992), which are slightly modified from the ones proposed by Dickerson (1980). For reference, the total number of constituent amino acid residues is denoted in parenthesis with the protein gene name. The functional domains ascertained experimentally and/or suggested by sequence similarity are denoted by the following symbols: PD, phosphorylation domain; SFID, sigma factor interacting domain; and HTH, helix-turn-helix DNA-binding motif. For the references of these domains, see the text.

188

.   .

   

189

F. 3. Clustering of proteins regulated at the transcriptional level. For convenience, the protein is denoted by gene name together with the other protein genes encoded by the same operon. The similarity score denoted between the operons is the highest of the scores that are calculated for all the pairs of proteins between the operons. The pair of proteins giving rise to the highest similarity score, and thus responsible for the clustering of operons, are denoted by thin letters in the row outside the matrix. The similarity score and statistical significance are denoted by the same symbols as those in Fig. 1.

operon name. Such high similarity scores are mostly calculated between the same type of transport proteins, between the proteins carrying the same coenzyme, or between the enzyme proteins catalysing similar reactions. The largest group, Group 1, has some connection between the operons encoding different types of proteins. Besides this connection, which should be more carefully examined by a more elaborate method, the operons encoding similar proteins with respect to both enzymatic function and amino acid sequence are aggregated as a more dense subcluster. Such subclusters are denoted as subgroups (a–g). Subgroup (a) consists of some operons, each

encoding the transcriptional regulator that is included in the regulatory Group 4 of Fig. 1. Subgroup (b) is the subcluster of operons encoding the components of high affinity transport systems (HATSs). Most of these operons are clustered by the high similarity of hydrophilic components, which have been indicated to be the members of ATPase superfamily (Higgins et al., 1988), but they also encode other partner components such as hydrophobic components, periplasmic components and further outer membrane proteins. Considerable degrees of similarity are also seen between corresponding components other than hydrophilic ones, and some operons such as btuB,

190

.   .

cirA, fepA and fecB, each of which solely encodes hydrophobic components or outer membrane proteins, are clustered as subgroup (c). The ion-driven transporters, or permeases, are not as mutually similar in their amino acid sequences, but most of them are linked in series with similarity scores more than 100. Thus, the operons encoding them are denoted as those in subgroup (d). Although the operon glpFK is also incorporated into the subgroup (d) together with the ion-driven transporter operons, this is caused by the fact that enzyme protein GlpK, which catalyses the reaction from glycerol to glycerol-3-phosphate is similar to the enzyme protein FucK encoded by the ion-driven transport operon fucPIK. The GlpF is a unique protein known only for the uptake of glycerol by the simple type of diffusion and does not show similarity to any of the ion-driven transporters. Subgroup (e) is denoted for the operons encoding the transport proteins of phosphoenolpyruvate-dependent phosphotransferase systems (PTSs). This clustering is mainly owing to the high similarity of Enzyme II-like domains and partly owing to the similarity of shorter Enzyme III-like domains in some cases. Although PtsG and Crr also show similarities to the PTS enzymes of subgroup (e), the transcriptional regulation of ptsG and crr genes is not known and thus not listed in Fig. 3. In this way, the transport proteins for the uptake of various substances across the cytoplasmic membrane are divided into subgroups in accordance with the classification proposed by Cronan et al. (1987), except for GlpF. Among the other proteins in Group 1, subgroup (f ) is denoted for the mixture of lyases, ligases and transferases, subgroup (g) for the transcriptional regulatory proteins which correspond to those of regulatory Group 3 in Fig. 1, and subgroup (h) is for oxidoreductases. In comparison with Group 1, other groups are much smaller in size. Groups 2, 3 and 17 contain the transcriptional regulators that are shown as Groups 1, 2 and 9, respectively, in Fig. 1. Remaining groups are also characterized mostly by the similarity in enzymatic function of member proteins: Group 4, pyridoxal phosphate dependent transferases; Group 5, acyltransferases and oxidoreductases with lipoyl or FAD cofactor; Group 6, porins; Group 7, lyases in aromatic amino acid synthesis; Group 8, oxidoreductases; Group 9, fimbrial proteins; Group 10, DNA repair proteins; Group 11, the subunits of ornithine carbamoyltransferases in arginine biosynthesis; Groups 12 and 13, lyases in tricarboxylic acid cycle; Group 14, lyases in branched chain amino acid biosynthesis; Group 15, lyases in methionine biosynthesis and Group 16, transferases in cysteine biosynthesis.

The similarity relations between the regulatory proteins and those between regulated operons thus obtained are compared with the regulator–regulon relationships ascertained experimentally. This comparison is shown in Table 1, where the citation of regulatory relations is denoted with upper and lower case letters except for the citation from SWISS-PROT Protein Sequence Database. As seen in this table, the type of regulation is not intrinsic to an individual regulatory protein, but depends on operons. By every regulatory protein, therefore, its type of regulation is denoted on the regulated operon by the symbol ‘‘+’’ or ‘‘−’’ according to activation or repression. The symbol ‘‘2’’ means the alternation of activation or repression, depending on the presence or absence of inducer. An operon denoted by ‘‘2’’ probably carries the binding sites that are chosen differently by the regulatory protein according to the presence or absence of the inducer. In Table 1, the regulatory proteins are arranged in the same order as shown in Fig. 1 and the operons are arranged in the order consistent with those in Fig. 3, although some of the regulatory proteins are excluded from the table in the case when the protein genes regulated by them are not well identified or not sequenced yet. The sigma factor, RpoD, is also omitted, because this factor is associated with the transcription of all the operons except for the operons transcribed by RpoS and RpoH. The regulatory proteins or regulated operons not clustered by the similarity scores higher than 100 are listed as ‘‘others’’ in the table, in the cases when the counterpart operons or regulatory proteins are grouped. One of the most remarkable features shown in Table 1 is that similar regulatory proteins clustered into the same group do not necessarily regulate the transcription of similar operons clustered into the same group. On the contrary, it seems probable that the regulator–regulon relationships are independent of the similarity relations between the regulatory proteins and of those between operons. This characteristic feature is clearly seen from the comparison of regulatory proteins that regulate a large group or subgroup of operons. For example, the regulatory proteins that are associated with the transcription of operons encoding affinity transport systems are scattered over the regulatory Groups 1–4, 8 and 9, and the regulatory proteins for ion-driven transport operons are scattered over the regulatory Groups 1–5 and 9. This tendency is also seen in smaller subgroups or groups. In practice, it is rather rare that the same group of regulatory proteins regulate the transcription of some specific group or subgroup of operons, except for the case of self-regulation.

    According to current knowledge of molecular evolution, most of the regulatory proteins clustered into the same group or subgroup by the present

191

threshold of similarity score are considered to be homologous, i.e., they have diverged from a common ancestral protein. The operons encoding similar

T 1 Comparison of clustering result with regulator-regulor relations ascertained experimentally

.   .

192

T 1 (continued)

   

T 1 (continued)

193

.   .

194 T 1 (continued)

+, activation; −, repression; 2, activation or repression; w, self-regulation; *, involved in transcriptional regulation but detailed property is unknown. (a) Collado-Vide et al. (1991); (b) Andrews et al. (1991); (c) Silver & Walderhaug (1992); (d) Hendrickson et al. (1990); (e) Kredich (1992); (f ) Geerse et al. (1989); (g) Lin (1976); (h) Stewart & Parales (1988); (i) Xiong et al. (1991); (j) Merkel et al. (1992); (k) Stock et al. (1989); (l) Spiro & Guest (1991); (m) Ganduri et al. (1993); (n) Iuchi & Lin (1993); (o) Byerly et al. (1991); (p) Neuhard & Nygaard (1987); (q) Weissbach & Brot (1991); (r) Newman et al. (1992); (s) Charlier et al. (1992); (t) Higashitani et al. (1993); (u) Iuchi et al. (1990); (v) Magnuson et al. (1993); (w) Dassa et al. (1991); (x) Nakamura & Ito (1993); (y) Bukau (1993); (z) Hanamura & Aiba (1991); (A) Reynolds et al. (1984); (B) Liu & Beacham (1990); (C) De Lorento et al. (1988); (D) Greenberg & Demple (1989); (E) Wilson & Turnbough (1990).

    proteins are also considerable to have diverged from a common ancestral operon by operon duplication. Thus, it is expected that regulatory proteins in the same group concentrically regulate the transcription of homologous operons in some specific group if the ancestral regulator–regulated protein relationship has succeeded to the descendant regulatory proteins and operons. However, such a pattern of coevolution between the regulator and regulon is hardly seen in the regulator–regulon relationships projected on the similarity relations, as shown in Table 1. This result strongly suggests the possibility that regulatory proteins have been coupled with operons independently of their issues, or promiscuously, and that some of the couplings have been fixed if they have been selectively advantageous. In order to confirm this possibility, we will calculate the similarity scores for all the pairs of regulatory proteins and compare them with the similarity scores calculated between the proteins encoded by their

195

regulons, including the case of lower similarity scores than 100. The result of this comparison is shown in Fig. 4 where all the sets of similarity scores (xij , yikjl ) are plotted. Here, xij is the similarity score calculated between the regulatory proteins i and j, and yikjl is the similarity score between the protein k regulated by protein i and the protein l regulated by protein j. The score sets thus plotted amount to 176742, as far as the available sequence data of E. coli are concerned. If the divergence of regulatory proteins and that of operons had taken place while retaining the original regulator–regulon relationship, the plot of the sets (xij , yikjl ) should be on a line passing through the original point (xij=10, yikjl=10) or become at least elliptic along this line. As is easily seen in Fig. 4, however, the plotted sets do not show such a pattern, at least in the range of high similarity scores that have been commonly accepted as the measure of homology. The level of similarity score that can be accepted as homology has been discussed by some

F. 4. Comparison of the similarity scores for every pair of regulatory proteins with the similarity scores of proteins encoded by the corresponding regulons. All the sets of similarity scores (xij , yikjl ) are plotted; xij is the similarity score calculated between the regulatory protein i and the regulatory protein j, and yikjl is the similarity score between the protein k regulated by protein i and the protein l regulated by protein j. The plotting of similarity scores is made in the logarithmic scale. The marginal distributions of the plotting are also shown in sub-diagrams.

196

.   .

authors from the aspect of statistical significance; e.g., greater than 3.0 .. is possibly significant, greater than 6.0 .. probably sufficient and greater than 10.0 .. is significant enough to establish homology (Lipman & Pearson, 1985). An .. of 8.0–10.0 can reflect a sufficient degree of sequence similarity to establish homology (Tam & Saier, 1993). Although the levels proposed for homology are somewhat different depending on the investigated proteins, most of the proteins that are clustered by the similarity scores of greater than 100 also show the statistical significance of more than 6.0 .. as noted in Figs 1 and 3. For such high similarity scores probably sufficient to establish homology, the diagram plotted in Fig. 4 clearly shows that the similar regulatory proteins rarely regulate the transcription of similar protein genes. In the case of lower similarity scores, this tendency is not visually seen only from the diagram because of the complexity owing to a great number of plots. For the resolution of this complexity, the marginal distributions of plotted diagram are also shown in the subdiagrams of Fig. 4; i.e. the numbers of regulatory protein pairs are each plotted against the similarity score of proteins between the regulons in the subdiagram along the ordinate, and the number of regulated proteins by every pair of regulons is plotted against the similarity score between their regulatory proteins in the subdiagram along the abscissa. The marginal distribution along the ordinate seems to be a lognormal distribution with the mean value of the similarity score 44.2, .. 22.5. The lognormal distribution, whose variable is necessarily positive, has been widely applied in an empirical way for fitting the data, including biological ones such as the sizes of organisms and the numbers of species, but its theoretical derivation is still restricted to a few examples of processes; the law of proportionate effect (Gibrat, 1930; Kalecki, 1945) and the asymptotic result of successive breakage of a particle into randomly sized particles (Kolmogoroff, 1941). Although it is not easy to derive the lognormal distribution theoretically from the protein sequence data, this distribution reconfirms the presence of much more similar regulatory proteins than those expected from random arrangements of amino acid sequences. The similar distribution is also seen in the marginal distribution along the abscissa. Although it contains local peaks at several values of similarity score, such deviation is mainly caused by the difference in regulon size; i.e. some regulons contain much larger numbers of protein genes than the other regulons. In spite of such heterogeneity in regulon size, the

maximum number of regulated protein pairs is also counted at the similarity score of about 43, suggesting that the similarity score around 43–44 is the mean value expected for a collection of numerous proteins. Thus, it is not expected that similar regulatory proteins tend to regulate the transcription of similar protein genes, even if the case of similarity scores lower than 100 is included. In practice, this is ascertained by calculating the correlation coefficient r that is defined by the following formula. N

s i=1, jQi

r=

X

6

(xij−x¯ ) s (yikjl−y¯ )

N

pk eRi ,pleRj

s nij (xij−x¯ )2 i=1, jQi

X

N

s

7

, (1)

s (yikjl−y¯ )2

i=1, jQi pk eRi ,pleRj

where Ri means the set of proteins under the control of regulatory protein i and Rj the set of proteins under the control of regulatory protein j. Here, x¯ is the mean value of xij ’s for all the pairs of regulatory proteins with the weight of the respective regulon sizes and y¯ is the mean value of yikjl ’s for all the pairs of regulated proteins, that is, N

x¯= s nij xij i=1, jQi

N

y¯= s

>

N

s nij ,

(2)

i=1, jQi

>

s yikjl

i=1, jQi pk ePi ,pl ePj

N

s

nij ,

(3)

i=1, jQi

where nij is the number of all the pairs of proteins, one chosen from Ri and the other from Rj . The correlation coefficient r thus defined is calculated to be only 0.0111 for all the plots in Fig. 4. In this way, the individual relations of regulatory proteins with regulated genes or operons themselves seem to be the result of promiscuous couplings. But this does not infer the randomness of regulatory relations. On the contrary, the regulatory relations seem to be well organized if the biological functions of the proteins encoded by the regulated operons and the regulation control at the levels other than transcription are systematically considered. Although the functional blocks that can be sufficiently followed by such consideration are still limited at present, the carbohydrate transport systems and the succeeding metabolic pathways leading to the central pathway of glycolysis seem to be suitable for the illustration of the organization by regulation. These pathways amount to more than 20, and the regulatory proteins for

   

+ + 2e 9a * 1-[c] melAB

α-Galactoside Raffinose

– 4b 1-[c] rafABD

Sucrose –

Glycerol

β-Glucoside

G6P

– + Ob 9a O

DHAP

nag

BA

RD

II

GAP BPG

uhpT

– + Ob 9a II O ptsM II – + ptsG Ob 9a II 1-[d] nagE

Fructose-6-P

Glucosamine N-Acetylglucosamine

± + 2c 9a 2-[d] araBAD

PP cycle

± + 2c 9a 1-[d] araE ± + 2c 9a 1-[b] araFGH

Xyclose-5-P – – + 4g 5c 9a 1-[e] de

Ribose-5-P

– 4f 1-[b] rbsDACBK

Arabinose

Ribose

– – 5a 5a 1-[c] fucPIK 2-[b] fucAO

D

aB

oC

Fucose

– + glpABC 5d 9b + + 1-[g] 2a 2b rhaDAB O rhaT

Glucose

Glucose-6-P + + 1b 9a 1-[c]

F6P

G3P

Rhamnose

Y

GlP

+ + Oa 9a II 1-[d] (bglFB)

– + Ob 9a ptsLPM manA O FBP – 4j II – + 1-[d] fruFKA 5d 9a 1-[g] glpD

Fructose

+ 1a 1-[b] ugpBAECQ

Maltose II

malX

pstG

II II

Mannose

Glycerol-3-P-OR

– + 4i 9a 1-[d]

– – + + 2d II 5b Oc 9a II 1-[d] (celABCF) 1-[d] srlABDMR

– 5d 1-[c] glpTQ

Glycerol-3-P

II

4h FB) 1-[d] (asc

Glucitol – 5d * 1-[c] glpFK

(GLC)n

+ + 1c 9a O

+ + 1c 9a 1-[b] malEFG

AC )

Lactose

Fructose

GLC

PQ

Galactose

+ – 1c 9a * 1-[b] malKlamBmalM

– – + 4d 4e 9a O galETK Galactose – + 4a 9a – + 1-[c] 4c 9a * (eb 1-[c] lacZYA g

m al

– + 4e 9a 1-[b] mglBAC

197

3PG 2PG

Lactate

– + 4g 9a 1-[e] udp

– – + 4g 5c 9a 1-[c] nupG nupC

– – + 4g 5c 9a O tsx

NdR

PEP PYR

TCA cycle

F. 5. Illustration of how different origins of regulatory proteins are coupled with the operons encoding the transport proteins of carbohydrates and the enzyme proteins in the succeeding metabolic pathways. Extracellular carbohydrates are circled and their products in the central metabolic pathway are boxed. The transportation of an extracellular substance into the cell is denoted by a dashed arrow and the succeeding metabolic pathway by a solid arrow. The operon encoding the transport proteins and/or catabolic enzymes is shown beside the arrow, together with its group number and subgroup name in a square bracket. The operon in parentheses is cryptic in the wild type. The transcriptional regulators for the operons are denoted by the following symbols for simplicity: 1a, PhoB; 1b, UhpA; 1c, MalT; 2a, RhaR; 2b, RhaS; 2c, AraC; 2d, CelD; 2e, MelR; 4a, EbgR; 4b, RafR; 4c, LacI; 4d, GalR; 4e, GalS; 4f, RbsR; 4g, CytR; 4h, AscG; 4i, MalI; 4j, FruR; 5a, FucR; 5b, SrlR; 5c, DeoR; 5d, GlpR; 9a, CRP; 9b, Fnr. Here, the number is the group number denoted in Fig. 1, but the alphabet is used to designate each regulator independently of the subgroup name. The types of regulation: ‘‘+’’, activation; ‘‘−’’, repression; ‘‘2’’, activation or repression depending on the presence or absence of inducers. The enzyme II genes of the phosphoenolpyruvate dependent phosphotransferase systems (PTSs) are denoted by superscript II. The transport proteins known to be inhibited by the presence of PTS carbohydrates are indicated by an asterisk on the corresponding gene name. Abbreviations for substances: GLC, glucose; G1P, glucose-1-phosphate; G6P, glucose-6-phosphate; F6P, fructose-6-phosphate; FBP, fructose-1,6-bisphosphate; DHAP, dihydroxyacetone phosphate; GAP, glyceraldehyde-3-phosphate; BPG, 1,3-bisphosphoglycerate; 3PG, 3-phosphoglycerate; 2PG, 2-phosphoglycerate; PEP, phosphoenolpyruvate; PYR, pyruvate; G3P, glycerol-3-phosphate; Glycerol-3-P-OR, glycerophosphoryl diester; NdR, deoxyribonucleoside.

the transcription of the operons encoding the transport proteins and catalytic enzyme proteins are almost completely listed in Table 1. Moreover, these transport systems and metabolic pathways are also comprehensive in the sense that they are organized for the maintenance of sending nutritious substances to the central path of glycolysis under different environmental conditions. As an example of illustrating the relationships between evolution, regulation and substrate flow, the operons encoding the transport proteins and catalytic enzyme proteins in this block are described along the metabolic pathways in Fig. 5, with an indication of their regulation. The

relations of regulatory proteins and regulated operons are extracted from those listed in Table 1, and the group names of operons and those of regulatory proteins are also denoted consistently with those denoted in this table. From this figure, the following characteristics emerge: (i) The multiple pathways leading to an intracellular substance in the central path themselves seem to have developed by chance assembly of independently evolved enzymes. If we focus on glucose-6-phosphate, for example, we see that this compound can be derived through any of at least eight routes; uptake of maltose, galactose, a-galactoside, lactose, raffinose, b-glucoside or

198

.   .

glucose as well as the direct uptake of glucose-6-phosphate itself. The first two substances are transported by HATSs, the third and fourth substances as well as glucose-6-phosphate by ion-driven transporters, and the fifth and sixth substances by PTSs. This is consistent with the previous indication that the metabolic pathways around the glycolytic pathway have evolved by chance assembly of the genes generated from the sense and antisense strands of pre-existing genes (Fukuchi & Otsuka, 1992). (ii) The different origins of transport proteins and enzyme proteins are organized extensively by the coupling of a component Crr in PTS-mediated regulation systems with a component CRP in transcriptional regulation system. As is well known (Saier, 1989), the phosphoryl group of phosphoenolpyruvate is sequentially transferred to enzyme I, then to HPr, and finally to PTS sugars through the respective sugar specific enzyme II complexes, if such sugars are present in the medium. These sugar specific enzymes II are denoted with superscript II in Fig. 5. Although the transcriptional regulation is not known on glucose specific enzyme II (PtsG), its phosphoryl group donor, Crr, which is also called enzyme IIIGlc , plays a central role in coordinating the transport systems in connection with transcriptional regulation. Under the condition of the presence of glucose, Crr is phosphorylated as a result of the transfer of moiety from phospho-HPr and then involved in the glucose uptake and phosphorylation via PtsG. The free form of Crr binds to some transport proteins such as LacY as well as to the enzyme II-type proteins other than PtsG and reduces their transport efficiency (Saier, 1989). In the absence of glucose and/or other PTS sugars, the phosphorylated form of Crr may be accumulated in the cell, and, in connection, it is suggested that the phosphorylated Crr activates adenylate cyclase to raise the synthesis rate of cAMP (Postma et al., 1993), which is needed for CRP to activate the transcription of many other transport protein genes. Although it is outstanding that many transport operons are under the transcriptional regulation of activation by CRP, such coupling of CRP with many transport protein operons may be the result of selection. In practice, it is plausible that CRP was originally coupled with sugar transport protein operons by chance, if we consider the case of Fnr which is highly similar to CRP. These regulatory proteins, CRP and Fnr, are mutually similar not only in their amino acid sequences but also in the nucleotide sequences of their binding sites on DNA. The consensus sequences proposed for CRP binding sites and Fnr binding sites respectively differ by only two nucleotides (Bell et al., 1989). In spite of such

high similarity, Fnr is associated not with the transport system but with the switching of respiration styles; i.e. under the anaerobic condition, Fnr activates the transcription of anaerobic enzyme genes, repressing the transcription of oxidase gene (Spiro & Guest, 1991), together with ArcA, one of the response regulators in Group 1, that represses the transcription of aerobic respiration enzyme genes (Iuchi et al., 1990; Iuchi & Lin, 1993). Thus, CRP and Fnr provide a representative example showing that the regulatory proteins, probably of the same origin, are separately associated with different blocks of metabolism. (iii) Most of the sugar transport operons, whose transcription is activated by CRP, are also under the control of repression by other transcriptional regulatory proteins. These regulatory proteins for repression are not so restricted but distributed over Groups 2, 4, 5, and ‘‘others’’ not clustered in the present study. These regulatory proteins seem not only to be provided for the repression that is removed by cAMP-CRP, but also to have abilities to control their own repression in response to the concentration of substrates. For example, LacI in Group 4 is well known to bind to the promoter region at the low concentration of some inducers, lactose or certain other galactosides, while it is released from DNA at the high concentration of the inducer even in the absence of cAMP-CRP. AraC, one of the members in Group 2, is suggested to bind to at least two sites around the promoter region and to form a double stranded DNA loop of more than 200 bp in the absence of arabinose, cAMP or CRP, while the loop could open in the presence of both arabinose and cAMP-CRP (Dunn et al., 1984). Some operons such as malEFG are under the transcriptional activation by the response regulators in Group 1 as well as by CRP, suggesting that they can be transcribed in response to the sensing of respective sugars regardless of the absence or presence of cAMP-CRP. This feature of multiple regulation by different origins of regulatory proteins strengthens the possibility that the transcriptional regulation system has been formed through the selection from promiscuous couplings of regulatory proteins with operons. The example of transport systems and the succeeding metabolic pathways mentioned above also implies the flexible nature of transcriptional regulation; it plays a role in coordinating the expression of transport operons to complement the PTS mediated regulation that is carried out throughout the central pathway of glycolysis. This nature of transcriptional regulation may be seen in the other

    blocks, although their experimental data are still insufficient for the discussion from the aspect of organization. Discussions and Conclusion The regulation of genetic expression by a constitutively synthesized repressor protein was a central feature of the classical operon model proposed by Jacob & Monod (1961). As molecular biologists have realized that not all regulatory proteins are synthesized constitutively and that not all regulatory proteins are repressors, it becomes generally recognized that the Jacob-Monod paradigm does not suffice to explain all genetic regulation. This drives many scientists to despair of finding a unifying principle for the diversity of regulated systems on one hand, but on the other hand stimulates the endeavor to formulate the regulated systems by the nature of regulator (repressor or activator), the mode of regulatory circuit (classical or autogenous) and the type of operons (inducible or repressible) (Goldberger, 1974; Savageau, 1974; 1975). Such trials partly succeed in finding and/or predicting the correlation between the nature of regulator and the mode of regulatory circuit in the intact system, and the correlation between the nature of regulator and the demand for expression (Savageau, 1979; 1989). As an extension of formulation, the first mode of transcriptional regulation is speculated to have been autogenous regulation and the issue of regulatory proteins is also discussed in two possibilities; one being the protein possessing the binding site for a physiologically relevant signal molecule (Cove, 1974; Goldberger & Deeley, 1976) and the other being the protein already having DNA binding domain (Savageau, 1979). Although the first possibility is argued for the regulatory proteins in catabolic and biosynthetic pathways, the latter possibility is proposed not only for the appearance of regulatory protein before the development of such pathways, but also for the regulatory proteins appeared at the later stage where dual specificity for signal molecule and nucleic acid is required, with the reasoning that there would have been many more DNA binding proteins in comparison to the number of specific effector binding proteins even at the later stage (Savageau, 1979). This reasoning might be consistent with the present result that many regulatory proteins are related to each other in sequence similarity, but that similarities between regulatory proteins and regulated proteins are rarely found. In practice, considerable similarity between the regulatory protein and the

199

regulated protein is only seen in the regulatory protein RbsR and a periplasmic component RbsB of HAT, which is encoded by the operon rbsDABCK under the repression by RbsR. Although the ribose binding protein RbsB also shows similarity with the signal–receptor domains of the regulatory proteins in the regulatory Group 4 such as MalI, PurR and CytR, they are not directly in a regulator–regulon relationship. The regulatory proteins, OxyR, CynR, IlvY and MetR, in the regulatory Group 3 also show similarity to a protein TdcA encoded by permease operon tdcABC, where the protein TdcA is an activator for the operon tdcABC itself. However, we cannot obtain a remarkable tendency that the regulatory proteins show considerable similarity to the other DNA binding proteins or to the proteins possessing the sites to bind some small molecules, even if all the sequenced proteins of E. coli are clustered apart from the regulatory relation (Watanabe & Otsuka, 1995). On the other hand, regulatory proteins clustered into the same group are mutually similar over a whole region of amino acid sequence including the small portions of DNA binding domain, or the helix-turnhelix region, and signal-receptor domain, as denoted in Fig. 2. Such a wide region of sequence similarity probably means that the similarity exists not only in the DNA binding and signal receptor domains but also in other functional manners. Although the knowledge of structures of regulatory proteins is still limited at the present time, the lac repressor is known to bind to DNA in a dimeric form and to aggregate into the tetrameric form, generating a stable looped DNA structure (Chakerian et al., 1992), and the hexamerization in tyrosine mediated repression of transcription by TyrR is proposed (Wilson et al., 1994). These examples indicate that the interaction between the aggregated proteins, each anchored to DNA by the part showing the helix-turn-helix motif, plays an important role in deforming the DNA around the binding site. The wide region of sequence similarity in the same group of regulatory proteins probably infers similarity in the protein–protein interaction. Even such homologous regulatory proteins, which would have diverged more recently than the appearance of an ancestral regulator, seem to have coupled promiscuously with different origins of operons according to the results shown in this paper. Even if the ancestral regulator had been drawn from DNA binding proteins or the proteins possessing the binding sites for relevant signal molecules and the regulator first fell into the incestuous coupling with any of the pool protein genes, the trace of such coupling would have been melted away by the

200

.   .

promiscuous couplings that have since occurred frequently. The promiscuous coupling of regulatory proteins and operons is also conceivable from the variation in nucleotide sequences at the sites that are experimentally identified to be bound by a definite regulatory protein. Although a consensus sequence has been proposed to cover the variation, its criterion becomes vague, especially when the nucleotide sequence of the binding site for self-regulation is added. This is reasonable because the binding of a regulatory protein to the site for self-regulation may be weaker than the other binding sites of the regulon, and there may be circumstantial evidence for the flexibility of regulator–regulon relationship in that regulatory proteins have changed their binding sites on the genome, sometimes alternating their binding sites in the evolutionary process. The flexibility of regulator– operon coupling can also be inferred from the amino acid sequence fragment responsible for DNA binding. In most of the regulatory proteins it is only a small portion, consisting of 20–40 amino acid residues, that interacts directly with the nucleotides of a binding site, as shown in Fig. 2. Thus, several amino acid substitutions in this portion may make it possible for the regulatory protein to bind to nucleotide sequences other than those of the original binding sites, without any essential changes in the tertiary structure as a whole. Amino acid changes in the DNA binding region, as well as the variation of nucleotide sequences recognized by a definite amino acid sequence fragment, would have enhanced the chance of generating new regulator–operon couplings. It is also understandable, by the changeability in binding site of regulatory proteins, that the genes of physiologically associated proteins are encoded in either an operon or a regulon. As seen in the example of a transport protein and the enzyme proteins catalysing the succeeding reactions of the transported substance, the proteins closely related with respect to their physiological functions are mostly encoded in an operon but they are not so similar to each other in their amino acid sequences as to be easily accepted as homologous ones. A situation similar to that of the operon is also seen in the regulon, where physiologically associated proteins are encoded by separate operons but are under the control of the same regulatory protein. In practice, a set of transport proteins and enzyme proteins in the succeeding metabolic pathways is not necessarily encoded in an operon but in the form of a regulon in some cases. As for the origin of such an operon and/or regulon, two possibilities come to mind. One possibility is that protein genes of different origins that are related to

each other by physiological functions have been collected into an operon. If this is the case, a strong force or selection may be required for this collection under the background of drastic changes of gene arrangement in the genome. Alternatively, it is also worth considering that the protein genes in an operon have diverged from a common ancestral gene by gene duplication and some of them have been separated into different operons as seen in the example of the regulon. At the present time there is no concrete evidence for judging which of the possibilities is correct. It should be noted, however, that the flexible change in the binding sites of regulatory proteins has been necessary for the organization of genes in the form of operons and/or regulons, even if the gene rearrangements have occurred in either way. In contrast to the transcriptional regulation by regulatory proteins, other styles of regulation and control seem to be restricted, e.g. the GC-rich region between the −10 Pribnow box and the initiating nucleotide is restricted to the stringent controlled promoters of ribosomal RNA genes (Lamond & Travers, 1985), the regulation at the level of translation, which depends on the hairpin loop structure of mRNA, is restricted to the syntheses of ribosomal proteins (Yates et al., 1980) and of RNA polymerase (Fukuda et al., 1978; Otsuka et al., 1988), special regulation by the coupling of transcriptional and translational processes in a leader sequence is only known on the transcription of some amino acid synthetic enzyme genes and nucleotide synthetic enzyme genes (Oxender et al., 1979), and the feedback inhibition by enzyme proteins is mainly known on the pathways of nucleotide synthesis and amino acid synthesis (Umbarger, 1969). These styles of regulation are autogenous or self-confined regulation, depending heavily on the specific sequence fragments, each of which is well defined and easily detectable. This is also the case in the PTS mediated regulation system that is restricted to the moiety accepting or releasing phosphorylation signal. Although these styles of regulation are also well investigated in E. coli, they may be ubiquitous or virtually ubiquitous in the corresponding sections of other organisms. In contrast, the transcriptional regulation by regulatory proteins is the flexible and extensive style of regulation in nature. The evolution of regulator–regulon relationships by promiscuous couplings might not be stimulating to those who tend to grasp molecular evolution by shortsighted causality. From the aspect of evolutionary strategies of organisms, however, the promiscuous coupling of regulatory proteins with operons is noticeable in enhancing the chance of generating

    better organization for a given set of regulatory proteins and operons. We consider a simple case that ni regulatory proteins have diverged from a regulatory protein i, and mi operons have diverged from the operon that was under the control of regulatory protein i. This situation is shown schematically in Fig. 6. If the regulatory relation is so conservative that succeeds only between the descendent regulatory proteins and operons, a possible combination of regulator–operon relations amounts only to Si ni mi at most. On the contrary, many more combinations (Si ni )(Si mi ) become possible if the couplings of regulatory proteins and operons are not conservative but changeable, regardless of their original issue. The selection from such a great number of combinations may promise much more advantageous couplings than those from the conservative combinations. Although only one step of coupling and selection is illustrated in Fig. 6 for simplicity, such steps would have been repeated in the evolution of organisms. The evolution of transcriptional regulation mentioned above may provide a clue to the elucidation of bifurcation phenomena such as the divergence of species and the cell differentiation in multicellular organisms, because two or more ways of couplings for survival could be generated among many combinations of regulatory proteins and operons. Since the

F. 6. Evolution of transcriptional regulation system inferred from the present study. Suppose the operon i (i=1, 2, 3, . . .) was under the control of regulatory protein i at an earlier stage. Then, the operon and the regulatory protein are duplicated to mi operons and ni regulatory proteins, respectively. If the regulatory relation in the ancestral regulatory protein and operon is strongly conserved, Si ni mi combinations are only expected for the coupling of regulatory proteins and operons in the descendants. On the contrary, much more combinations of (Si ni )(Si mi ) are expected, as shown by halftone lines, if the regulatory proteins can be coupled with the operons independently of their issue. Among so many possible combinations, some of the couplings have been selected as advantageous ones and are schematically shown by broken arrows.

201

publication of Origin of Species (Darwin, 1859), the divergence of species has been one of the most interesting but most difficult problems concerned with the evolution of organisms. From the beginning of chromosome mapping of mutant genes (Morgan et al., 1915), many hereditary variants have been detected in experiments but most are caused by the mutation of individual structural genes. Thus, the evolution of species has been mainly discussed within a framework of mathematical models in population genetics, which are formulated to calculate the probability that the mutant of a structural gene spreads in or is excluded from a population according to the degree of its selective advantage or disadvantage (Fisher, 1930; Wright, 1931). The accumulation of favorable mutants shown by this model is, however, gradual, and is not sufficient to persuade most evolutionists. In practice, this formulation focuses only on the struggle for the existence of individuals with respect to one characteristic, and cannot explain the divergence of species in a district, attributing the generation of different species to the geographical segregation of different populations. Even recently, the punctuational view of macroevolution has been proposed from the evolutionary pattern illustrated by fossil records (Stanley, 1981), opposing the gradualistic view. In contrast to the mutant on a structural gene that has been treated in population genetics, the mutants in the changes in regulator– operon couplings may be expected to give rise to punctuational divergence at the stage when regulatory proteins and operons are sufficiently accumulated to produce many possible ways of couplings. For simplicity, we consider a population of unicellular organisms reached at this stage. If some organisms, which happen to have one way of couplings, increase the number of their descendants, then other organisms would choose another way of couplings to avoid the struggle with the former. This concession at the level of regulator–operon couplings may ultimately bring about the difference in gene sets between the former and latter groups because the genes used in the coupling become more elaborate ones by selection in a group, while unused genes are exposed to random substitution and/or deletion. Apparently concerted evolution may also be expected at a stage when regulatory protein genes and operons are sufficiently accumulated to produce drastic changes in their couplings. The cell differentiation in a multicellular organism may be a representative example, showing the concession of cells in the same population. In fact, it is now generally accepted for cell differentiation that the morphological and functional difference arises between the cells carrying

.   .

202

the same set of genes, except for the gene rearrangement of immunoglobulin genes in some special cell lines. This probably indicates that a choice of one way of couplings in one group of cells directly influences the way of couplings in other cells. The phenomenon of metamorphosis, which is representatively seen in some multicellular organisms such as insects, may have arisen from the alteration in the sets of expressed genes, probably including the changes in the relations of transcriptional regulation between the cells. Including such differentiation between the cells in individual organisms, the divergence of species in multicellular organisms becomes a much more complex problem than that in prokaryotes if it is investigated from the molecular level. Recently, however, many examples indicating homologous proteins have been reported between different species of prokaryotes and eukaryotes (Maiden et al., 1987; Higgins et al., 1988; Henderson & Maiden, 1990; Bourne et al., 1991). Thus, it may become possible to inquire into this problem in the future when the data of regulatory proteins and operons will be much more accumulated in various species by investigating how the homologous proteins are differently regulated in different species. Throughout such an inquiry, the problems of cell differentiation and species divergence may be considerably resolved in terms of the bifurcation of the regulation network at the level of transcription. In the present paper, we have confined ourselves to the homologous relations of the proteins that would have been generated by a simple mechanism of gene duplication. This is because the homologous relations probably caused by domain shuffling are only seen in some proteins in the two-component regulatory system, for example some components in PTS and colicines (Watanabe & Otsuka, 1995), as far as the proteins in E. coli are concerned. Although it may also be interesting to follow the evolutionary relations suggested by other mechanisms such as the generation from antisense strand (Kunisawa & Otsuka, 1987), it is necessary to first develop a new method of detecting such evolutionary relations more confidently.

REFERENCES A, G. F.-L. & N, K. (1985). Nitrogen regulation in Salmonella typhimurium. Identification of an ntrC proteinbinding site and definition of a consensus binding sequence. EMBO J. 4, 539–547. A, A. E., L, B. & P, A. J. (1991). Mutational analysis of repression and activation of the tyrP gene in Escherichia coli. J. Bacteriol. 173, 5068–5078. A, S. & D, R. (1992). The prokaryotic enhancer binding protein NTRC has an ATPase activity which is phosphorylation and DNA dependent. EMBO J. 11, 2219–2228.

A, S., K, C. & D, R. (1991). Influence of a mutation in the putative nucleotide binding site of the nitrogen regulatory protein NTRC on its positive control function. Nucleic Acids Res. 19, 2281–2287. B, A. I., G, K. L., C, J. A. & B, S. J. W. (1989). Cloning of binding sequences for the Escherichia coli transcription activators, FNR and CRP: location of bases involved in discrimination between FNR and CRP. Nucleic Acids Res. 17, 3865–3874. B, H. R., S, D. A. & MC, F. (1991). The GTPase superfamily; conserved structure and molecular mechanism. Nature 349, 117–127. B, B. (1993). Regulation of the Escherichia coli heat-shock response. Mol. Microbiol. 9, 671–680. B, K. A., U, M. L. & S, G. V. (1991). The MetR binding site in the Salmonella typhimurium metH gene: DNA sequence constraints on activation. J. Bacteriol. 173, 3547–3553. C, A. E. & M, K. S. (1992). Effect of lac repressor oligomerization on regulatory outcome. Mol. Microbiol. 6, 963–968. C, D., R, M.,  V, F., B, A., C, R., N, Y., G, N. & P, A. (1992). Arginine regulon of Escherichia coli K-12. Study of repressor-operator interactions and of in vitro binding affinities versus in vivo repression. J. Mol. Biol. 226, 367–386. C-V, J., M, B. & G, J. D. (1991). Control site location and transcriptional regulation in Escherichia coli. Microbiol. Rev. 55, 371–394. C, D. J. (1974). Evolutionary significance of autogenous regulation. Nature 251, 256. C, J. E., J., G, R. B. & M, S. R. (1987). Cytoplasmic Membrane. In: Escherichia coli and Salmonella typhimurium. Cellular and Molecular Biology (Neidhardt, F. C., Ingraham, J. L., Low, K. B., Magasanik, B., Schaechter, M. & Umbarger, H. E., eds), pp. 31–55. Washington DC: American Society for Microbiology. D, C. (1859). O  O  S. L: J M. D, J., F, H., M, C., D, M., K-B, M. & B, P. L. (1991). A new oxygen-regulated operon in Escherichia coli comprises the genes for a putative third cytochrome oxidase and for pH 2.5 acid phosphatase (appA). Mol. Gen. Genet. 229, 341–352. D, M. O., B, W. C., H, L. T. & S, R. M. (1978). Protein superfamilies. In: Atlas of Protein Sequence and Structure 5, suppl. 3. pp. 9–24. The National Biomedical Research Foundation. D L, V., H, M., G, F. & N, J. B. (1988). Fur (ferric uptake regulation) protein and CAP (catabolite-activator protein) modulate transcription of fur gene in Escherichia coli. Eur. J. Biochem. 173, 537–546. D, R. E. (1980). Cytochrome c and the evolution of energy metabolism. Sci. Am. 242, 98–110. D, M. H., C, A. & M, L. A. (1990). The function of isolated domains and chimaeric proteins constructed from the transcriptional activators NifA and NtrC of Klebsiella pneumoniae. Mol. Microbiol. 4, 29–37. D, T. M., H, S., O, S. & S, R. F. (1984). An operator at −280 base pairs that is required for repression of araBAD operon promoter: Addition of DNA helical turns between the operator and promoter cyclically hinders repression. Proc. natl. Acad. Sci. U.S.A. 81, 5017–5020. F, R. A. (1930). The Genetical Theory of Natural Selection. Oxford: Clarendon Press. F, S. & O, J. (1992). Evolution of metabolic pathways by chance assembly of enzyme proteins generated from sense and antisense strands of pre-existing genes. J. theor. Biol. 158, 271–291. F, R., T, M. & I, A. (1978). Autogenous regulation of RNA polymerase b subunit synthesis in vitro. J. Biol. Chem. 253, 4501–4504.

    G, M.-T., M, C. & R, J. L. (1993). The XylS/AraC family of regulators. Nucleic Acids Res. 21, 807–810. G, Y. L., S, S. R., D, M. W., J, R. K. & D, P. (1993). TdcA, a transcriptional activator of the tdcABC operon of Escherichia coli, is a member of the LysR family of proteins. Mol. Gen. Genet. 240, 395–402. G, R. H.,   P, J. & P, P. W. (1989). The repressor of the PEP: Fructose phosphotransferase system is required for the transcription of the pps gene of Escherichia coli. Mol. Gen. Genet. 218, 348–352. G, R. (1930). Une, loi des reparitions economiques: L’effet proportionnel. Bull. Statist. Gen. Fr. 19, 469ff. G, W. (1978). Why genes in pieces? Nature 271, 501. G, R. F. (1974). Autogenous regulation of gene expression. Science 183, 810–816. G, R. F. & D, R. G. (1976). Autogenous regulation of gene expression. In: Regulatory Biology (Copeland, J. C. & Marzluf, G. A., eds), pp. 178–195. Columbus: Ohio State University Press. G, M., F, K., N, P. & U, B. E. (1989). Upstream activating sequences that are shared by two divergently transcribed operons mediate cAMP-CRP regulation of pilus-adhesin in Escherichia coli. Mol. Microbiol. 3, 1557–1565. G, J. T. & D, B. (1989). A global response induced in Escherichia coli by redox-cycling agents overlaps with that induced by peroxide stress. J. Bacteriol. 171, 3933–3939. H, A. & A, H. (1991). Molecular mechanism of negative autoregulation of Escherichia coli crp gene. Nucleic Acids Res. 19, 4413–4419. H, B., S, A., C, K. Y., Z, H. & S, J. M. (1990). Genes of the Escherichia coli pur regulon are negatively controlled by a repressor-operator interaction. J. Bacteriol. 175, 4555–4562. H, P. J. F. & M, M. C. J. (1990). Homologous sugar transport proteins in Escherichia coli and their relatives in both prokaryotes and eukaryotes. Phil. Trans. R. Soc. Lond. B326, 391–410. H, W., S, C. & S, R. (1990). Characterization of the Escherichia coli araFGH and araJ promoters. J. Mol. Biol. 215, 497–510. H, S., H, G. W., C, J. M. & W, J. C. (1988). A large famil of bacterial activator proteins. Proc. natl. Acad. Sci. U.S.A. 85, 6602–6606. H, A., N, Y., H, H., A, H., M, T. & H, K. (1993). Osmoregulation of the fatty acid receptor gene fadL in Escherichia coli. Mol. Gen. Genet. 240, 339–347. H, C. F., G, M. P. M, M. L. & P, S. R. (1988). A family of closely related ATP-binding subunits from prokaryotic and eukaryotic cells. BioEssays, 8, 111–116. H, K., S, H. & O, J. (1990). Discrimination between adaptive and neutral amino acid substitutions in vertebrate hemoglobins. J. Mol. Evol. 31, 302–324. I, A. S. & G, J. R. (1993). Lactobacillus casei contains a member of the CRP-FNR famly. Nucleic Acids Res. 21, 753. I, S. & L, E. C. C. (1993). Adaptation of Escherichia coli to redox environments by gene expression. Mol. Microbiol. 9, 9–15. I, S., M, Z., F, T. & L, E. C. C. (1990). The arcB gene of Escherichia coli encodes a sensor-regulator protein for anaerobic repression of the arc modulon. Mol. Microbiol. 4, 715–727. J, F. & M, J. (1961). Genetic regulatory mechanisms in the synthesis of proteins. J. Mol. Biol. 3, 318–356. K, D. & D, G. (1991). Modular structure of FixJ: homology of the transcriptional activator domain with the −35 binding domain of sigma factors. Mol. Microbiol. 5, 987–997. K, M. (1945). On the Gibrat distribution. Econometrica 13, 161–170. K, M. (1968). Evolutionary rate at the molecular level. Nature 217, 624–626. K, J. L. & J, T. H. (1969). Non-Darwinian evolution;

203

most evolutionary changes in proteins may be due to neutral mutations and genetic drift. Science 164, 788–798. K, A. N. (1941). Uber das logarithmisch Norale Verteilungsgesetz der Dimensionen der Teilchen bei Zerstuckelung. C. R. Acad. Sci. XXXI, 99–101. K, N. M. (1992). The molecular basis for positive regulation of cys promoters in Salmonella typhimurium and Escherichia coli. Mol. Microbiol. 6, 2747–2753. K, T., N, M., W, H., O, J., T, A., Y, L. S., G, D. G. & B, W. C. (1990). Escherichia coli K12 genomic database. Protein Seq. Data Anal. 3, 157–162. K, T. & O, J. (1987). A possible mode of protein evolution. Protein Seq. Data Anal. 1, 117–121. L, A. I. & T, A. A. (1985). Stringent control of bacterial transcription. Cell 41, 6–8. L, E. C. C. (1976). Glycerol dissimilation and its regulation in bacteria. Ann. Rev. Microbiol. 30, 535–578. L, D. J. & P, W. R. (1985). Rapid and sensitive protein similarity searches. Science 227, 1435–1441. L, J. & B, I. R. (1990). Transcription and regulation of the cpdB gene in Escherichia coli K12 and Salmonella typhimurium LT2: Evidence for modulation of constitutive promoters by cyclic AMP-CRP complex. Mol. Gen. Genet. 222, 161–165. L, M., G, M. & G, C. A. (1992). The s 70 family: sequence conservation and evolutionary relationships. J. Bacteriol. 174, 3843–3849. M, K., J, S., R, C. O. & C, J. E., J. (1993). Regulation of fatty acid biosynthesis in Escherichia coli. Microbiol. Rev. 57, 522–542. M, M. C. J., D, E. O., B, S. A., M, D. C. M. & H, P. J. F. (1987). Mammalian and bacterial sugar transport proteins are homologous. Nature 325, 641–643. M, C. A. & H, M. A. (1992). Structural homology between rbs repressor and ribose binding protein implies functional similarity. Protein Science 1, 843–849. M, T. J., N, D. M., B, C. L. & K, R. J. (1992). Promoter elements required for positive control of transcription of the Escherichia coli uhpT gene. J. Bacteriol. 174, 2763–2770. M, T. H., S, A. H., M, H. J. & B, C. B. (1915). The Mechanism of Mendelian Heredity. New York: Holt, Rinehart & Winston. M, E. W. & M, W. (1988). Optimal alignments in linear space. Comput. Applic. Biosci. 4, 11–17. N, Y. & I, K. (1993). Control and function of lysyl-tRNA synthetases: Diversity and co-ordination. Mol. Microbiol. 10, 225–231. N, J. & N, P. (1987). Purines and Pyrimidines. In: Escherichia coli and Salmonella typhimurium. Cellular and Molecular Biology (Neidhardt, F. C., Ingraham, J. L., Low, K. B., Magasanik, B., Schaechter, M. & Umbarger, H. E., eds), pp. 445–473. Washington DC: American Society for Microbiology. N, E. B., D’A, R. & L, R. T. (1992). The leucine-Lrp regulon in E. coli: A global response in search of a raison d’etre. Cell 68, 617– 619. O, J., M, H. & H, K. (1992). Structure model of core proteins in photosystem I inferred from the comparison with those in photosystem II and bacteria. Biochim et Biophys. Acta 1118, 194–210. O, J., M, K. & H, K. (1993). Divergence pattern and selective mode in protein evolution: the example of vertebrate myoglobins and hemoglobin chains. J. Mol. Evol. 36, 153–181. O, J., S, H. & K, T. (1988). Regulation in the synthesis of Escherichia coli RNA polymerase proposed from sequence analysis. Protein Seq. Data Anal. 1, 355–361. O, D. L., Z, G. & Y, C. (1979). Attenuation in the Escherichia coli tryptophan operon: Role of RNA secondary structure involving the tryptophan codon region. Proc. natl. Acad. Sci. U.S.A. 76, 5524–5528.

204

.   .

P, K. (1986). Gene regulation and its role in evolutionary processes. In: Evolutionary Processes and Theory (Karlin, S. & Nevo, E., eds), pp. 3–36. Orlando, Florida: Academic Press, Inc. P, W. R. & L, D. J. (1988). Improved tools for biological sequence comparison. Proc. natl. Acad. Sci. U.S.A. 85, 2444–2448. P, P. W., L, J. W. & J, G. R. (1993). Phosphoenolpyruvate: carbohydrate phosphotransferase systems of bacteria. Microbiol. Rev. 57, 543–594. R, A., N, S. & I, M. (1989). Characterization of OmpR binding sequences in the upstream region of the ompF promoter essential for transcriptional activation. J. Biol. Chem. 264, 18693–18700. R, A., D, J. S, M. H., J. & R, J. (1991). Analysis of the gluconate (gnt) operon of Bacillus subtilis. Mol. Microbiol. 5, 1081–1089. R, A. E., M, S., LG, S. F. J. & W, A. (1984). Enhancement of bacterial gene expression by insertion elements or by mutation in a CAP-cAMP binding site. J. Mol. Biol. 191, 85–95. R, R. J. & Z, H. (1990). Autoregulation of Escherichia coli purR requires two control sites downstream of the promoter. J. Bacteriol. 172, 5758–5766. R, H. C. (1989). Cluster Analysis for Researchers. Malabar, Florida: Robert E. Krieger Publishing Company, Inc. R, C. W., A, P. M., N, B. T. & A, F. M. (1987). Deduced products of C4-dicarboxylate transport regulatory genes of Rhizobium leguminosarum are homologous to nitrogen regulatory gene products. Nucleic Acids Res. 15, 7921–7934. R, K. E., M, W., W, C., O, J., T, C. & S, S. G. (1991). Mapping sequenced E. coli genes by computer: Software, strategies and examples. Nucleic Acids Res. 19, 637–647. S, M. H., J. (1989). Protein phosphorylation and allosteric control of inducer exclusion and catabolite repression by the bacterial phosphoenolpyruvate: sugar phosphotransferase system. Microbiol. Rev. 53, 109–120. S, M. A. (1974). Comparison of classical and autogenous systems of regulation in inducible operons. Nature 252, 546–549. S, M. A. (1975). Significance of autogenously regulated and constitutive synthesis of regulatory proteins in repressible biosynthetic systems. Nature 258, 208–214. S, M. A. (1979). Autogenous and classical regulation of gene expression: a general theory and experimental evidence. In: Biological Regulation and Development (Goldberger, R. F., ed.), pp. 57–108. New York: Plenum Press. S, M. A. (1989). Are there rules governing patterns of gene regulation? In: Theoretical Biology (Goodwin, B. & Saunders, P. eds), pp. 42–66. Edinburgh: Edinburgh University Press. S, S. & W, M. (1992). Gene regulation of plasmidand chromosome-determined inorganic ion transport in bacteria. Microbiol. Rev. 56, 195–228. S, T. F. & W, M. S. (1981). Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197. S, S. & G, J. R. (1991). Adaptive responses to oxygen limitation in Escherichia coli. Trends Biochem. Sci. 16, 310–314. S, S. M. (1981). The New Evolutionary Timetable. New York: Basic Books, Inc., Publishers. S, T. A., O, D. H., MK, D. B., A, W. F. & M, B. W. (1982). Structural similarity in the DNA binding domains of catabolite gene activator and cro repressor proteins. Proc. natl. Acad. Sci. U.S.A. 179, 3097–3100. S, V. & P, J., J. (1988). Identification and expression

of genes narL and narX of the nar (nitrate reductase) locus in Escherichia coli K-12. J. Bacteriol. 170, 1589–1597. S, J. B., N, A. J. & S, A. M. (1989). Protein phosphorylation and regulation of adaptive responses in bacteria. Microbiol. Rev. 53, 450–490. S, Y.-C. & F, J. A. (1992). The Escherichia coli K-12 cyn operon is positively regulated by a member of the lysR family. J. Bacteriol. 174, 3645–3650. T, R. & S, M. H., J. (1993). Structural, functional, and evolutionary relationships among extracellular solute-binding receptors of bacteria. Microbiol. Rev. 57, 320–346. T¨, B., H, D. S., F, L. & K, A. (1991). iciA, an Escherichia coli gene encoding a specific inhibitor of chromosomal initiation of replication in vitro. Proc. natl. Acad. Sci. U.S.A. 88, 4066–4070. T, J. F. & S, R. F. (1990). Purification and properties of RhaR, the positive regulator of the L-rhamnose operons of Escherichia coli. J. Mol. Biol. 211, 75–89. U, H. E. (1969). Regulation of amino acid metabolism. Ann. Rev. Biochem. 38, 323–370. U, M. L. & S, G. V. (1989). Genetic and biochemical analysis of the MetR activator-binding site in the metE metR control region of Samonella typhimurium. J. Bacteriol. 171, 5620–5629. V, A. M., K, H., A, T. & H, S. (1991). rcbR, a gene coding for a member of the LysR family of transcriptional regulators, is located upstream of the expressed set of ribulose 1,5-bisphosphate carboxylase/oxygenase genes in the photosynthetic bacterium Chromatium vinosum. J. Bacteriol. 173, 5224–5229.  B, S. B., H, G. T. & F, S. K. (1992). Opine catabolism and conjugal transfer of the nopaline Ti plasmid pTiC58 are coordinately regulated by a single repressor. Proc. Natl. Acad. Sci. U.S.A. 89, 643–647. W, H. & O, J. (1995). A comprehensive representation of extensive similarity linkage between large number of proteins. Comput. Applic. Biosci. 11, 159–166. W, M. J. & A, S. (1992). A family of bacterial regulators homologous to Gal and Lac repressors. J. Biol. Chem. 267, 15869–15874. W, H. & B, N. (1991). Regulation of methionine synthesis in Escherichia coli. Mol. Microbiol. 5, 1593–1597. W, D. A., R, C. W., P, J. V. & C, J. M. (1991). Characterization of Lrp, an Escherichia coli regulatory protein that mediates a global response to leucine. J. Biol. Chem. 266, 10768–10774. W, T. J., M, P., H, G. T. & D, B. E. (1994). Ligand-induced self-association of the Escherichia coli regulatory protein TyrR. J. Mol. Biol. 238, 309–318. W, H. R. & T, C. L., J. (1990). Role of the purine repressor in the regulation of pyrimidine gene expression in Escherichia coli K-12. J. Bacteriol. 172, 3208–3213. W, S. (1931). Evolution in Mendelian populations. Genetics 16, 97–159. X, X.,   C, N. & R, W. S. (1991). Downstream deletion analysis of the lac promoter. J. Bacteriol. 173, 4570–4577. Y, J. L., A, A. E. & N, M. (1980). In vitro expression of Escherichia coli ribosomal protein genes: Autogenous inhibition of translation. Proc. natl. Acad. Sci. U.S.A. 77, 1837–1841. Z, E. & P, L. (1965). Evolutionary divergence and convergence in proteins. In: Evolving Genes and Proteins (Bryson, V. & Vogel, H. J., eds), p. 97. New York: Academic Press.