Medical Hypotheses
5: 1063-1072, 1979
PROBABLE
OF ENZYME
MECHANISM
ORIGINATE Semih
EVOLUTION:
HOW
DID EBG OF E. COLI
?
Erhan,
2101
Chestnut
St, Philadelphia,
PA 19103,
USA.
ABSTRACT A mechanism is proposed for the formation of Ebg-evolved beta gaiactosi1. In the presence of dase-of E. coli based on the following assumptions: lactose, certain proteins being translated bind to their m-RNA-ribosome complexes; 2. This binding interferes with the release of m-RNA from the hacterial chromosome, marking the gene; 3. Thereupon a cytosine specific methylase and methyl cytosine deaminase pair, modify - mutate - the marked gene; rC. The result, after five or so mutations, is a new gene capable of coding for a different protein which can split lactose; 5. I propose that this enzyme pair has evolved to produce mutations internally, when need arises, as is the case here; 6. This mav be a general mechanism through which drug resistance and detoxification of a novel chemical, could be achieved in bacteria; 7. All of these ideas are experimentally testable. Key Works:
Evolution,
Enzymes, Mechanism,
Evolved Beta Galactosidase.
INTRODUCTION Recently, using an E. coli mutant lacking the lac Z gene of lactose operon, which codes for the wiled galactosidase (R gal), the evolution, in the labnratory, of a gene whose product could split lactose, was demonstrated independently by Campbell, Lengyel and Langridge (1) and Hart1 and Hall (2), Hall and Hart1 (3). When a mutant of E. coli K 12, with the deletion of the lac Z gene, was forced to grow on lactose agar, the ability to grow on lactose was acquired after about five mutations (1). The enzyme which was responsible for this activity (Ebg: evolved beta galactosidase) was shown to differ in its immunological, kinetic and sedimentation characteristics in addition to Authors. justimapping at a different position on the E. coli chromosome. fiably, suggested that a gene only peripherally involved in lactose utiliza-tion had progressively changed into a form capable of specifying a 8 gal This communication analyzes the events leading to these changes and activity. proposes a model according to which such changes could take place. ANAT,YSIS OF DATA AND SOME PREDICTIONS There are certain aspects of these studies and the phenomena which underlie them, that deserve special consideration: 1.
The mutation(s) took place on a gene which maps at a different part of the chromosome;
1263
2.
The mutation(s) have taken place without the use of external mutagens, which means that a constituent of the normal metabolism of the organism - be it enzyme(s) or low molecular weight metabolite(s) caused the change(s) observed, under the influence of lactose, which was present in excess;
3.
It should be possible to delete the Ebg gene, then to subject the organism to the same conditions under which Ebg was formed, and thus to obtain a second evolved enzyme capable of splitting lactose, EbgI, which can be expected to do it less efficiently than Ebg, which itself is less active than 6 gal;
4.
Both Ebg and the progenitor protein from which Ebg had evolved should be capable of being isolated by affinity chromatography using isopropylthiogalactoside;
5.
The affinity of these proteins toward, say, lactose, should decrease in the order 8 gal > Ebg > Progenitor 8 EbgI and the ease of their elution from the affinity column should EbgI > Progenitor > Ebg > 8 gal
Which means that EbgI would elute at a lower ionic strength than Ebg, which itself would elute at a lower ionic strength than 8 gal. Correspondence with Dr. H. Arraj (4) had confirmed that both Ebg and the progenitor protein were isolated through the affinity chromatography, using isopropylEbg and the progenitor protein were found to elute with thiogalactoside. lower ionic strength than $ gal and the progenitor protein required lower ionic strength than Ebg (5). Ebg gene was deleted by Dr. Hart1 (6) and attempts were being continued to evolve EbgI. Encouraged predictions:
by the success of these speculations
I now make two further
1.
This system is eminently suitable for testing the validity of the idea that proteins we see today are descendents of early proteinoids, which themselves were formed by the condensation of a finite number of primordial peptides, in various combinations (7,8,9),
2.
The gene which underwent five or so mutations leading to Ebg is the one whose protein product has the closest three dimensional folding to 8 gal. RATIONALE FOR THESE PREDICTIONS
During our own studies of the amino acid homologies found among the SO called "uncestrally unrelated" proteins, we have observed that a limited number of homologous peptides occured repeatedly not only along the amino acid sequence of many proteins (7,8,10) but the same hemologous peptides, which we had names "subsequences", were found to be shared by many proteins (7). We had suggested that these subsequences may be descendents of the primordial
1264
peptides from which all proteins might have been formed, in the primeval This may indeed by a universal way for the formation of macromoleoceans. cules in general because we have found similar subsequences also within t-RMA molecules (12). The basic premise behind homology studies is that, during evolution, substitutions of only very similar amino acids were accepted by the biologically active proteins, so that the critical three dimensional folding of the proteins undergoing mutation are not altered drastically (13). This idea stems from the observation that the members of a protein family, which have significant homologies among their amino acid sequences, also fold essentially Cith similar three dimensional structures, as found by X-ray crystallography (14,15,10). studies.
Otherwise
there would be no point in making
these homology
Now, since the members of a protein family have similar folding as well as significant amino acid homologies and furthermore in many cases have the same amino acids in their active sties, it should follow that the amino acids which comprise the active/binding sites of these proteins should occur within peptides that are themselves homologous to each other. They also should have similar three dimensional folding, at least throughout their homologous lengths. A direct consequence of this reasoning is that, if two proteins are known to bind the same molecule, then their binding sites should consist of homologous peptides folding similarly (16).
If this premise is true than Ebg and B gal should have subsequences that are very similar, at least around their respective active/binding sites. The fact that all three of these proteins have been isolated through affinity chromatography, using the same molecule, suggests that this premise is However, sequencing the Ebg and the progenitor protein, quite reasonable. The after suitable purification, will provide a test system for this premise. sequences, then, can be matched against the 6 gal sequence (17), according to (7). I predict that considerable homology will be found among their respective sequences. As Ebg is nearly twice as large as the H gal, the homology cannot be total. Nevertheless, one may find evidence for, say, gene duplication, if one finds two independent regions, both showing homology against the 8 gal. DEVELOPMENT
OF THE MODEL
The actual chemical events which have initiated these mutations are very difficult to pinpoint, at the moment, however, a convincing argument can be made for a particular sequence of events that may be one of several possibilities and its description will constitute the rest of this paper. The most striking aspect of this mutation is the fact that it must have been initiated by a normal cellular constituent, either some low molecular weight product(s) of intermediary metabolism or enzyme(s) that are normally present in these cells. The existence of many bifunctional, produced during intermediary metabolism
1265
hence potentially mutagenic, molecules is known and any of them could con-
ceivably cause mutations by crosslinking or derivatization of DNA bases. However, it is practically impossible to guess which of them would be responsible for such events, unless one goes through a systematic search covering all of them. Even then one would be faced with the problem of explaining why such mutations do not occur more frequently, since these compounds are so ubiquitous. There are two enzymes, cytosine methylase (CM) and 5-methyl cytosine deaminase (MCD), whose presence have been shown in sea urchin eggs (181, in HeLa Cells (19) and in Krebs-2 ascites cells (20) that can easily initiate such a chain of events. CM was shown to methylate ca. l/15 of cytosines which occur in C-isostichs or in CpG dinucleotides, to 5-methyl cytosine. The second enzyme converts these into thymines. The presence of such CMs, has been demonstrated in bacteria, too, even though the occurence of MCD in bacteria has not been shown conclusively. A systematic search for the presence of these two enzymes and an increase to, in of their concentration during the stress B. coli K 12 was subjected other bacteria, will show whether they are involved in the initiation of these mutations, according to my hypothesis. The binding of the protein product of a gene to its m-RNA during translation to act as a positive repressor have been reported in eukaryotes (23,24,24). Even though a similar phenomenon has not yet been reported for bacteria, the need for protein synthesis to occur simultaneously, to strip the newly transcribed m-RNA from DNA is known in prokaryotes (26). I propose
that:
1.
Bacteria
also
have MCD;
2.
These two enzymes, CM dnd MCD, have evolved the genetic material of altering -mutatingarises , as is the case here;
3.
Under the influence of a substrate or a closely related molecule, structural alterations occur within an enzyme, which are not limited to the active/binding site of that protein, as foreseen by the lock and key model for enzymes, but extending to other parts of the molecule as well, which cause the binding of the protein to its m-RNAwhile it is being translated and slaw the translaribscme complex, tion.
for the primary of an organism,
purpose when need
THE MODEL The cells were initially streaked on broth-lactose agar and the first When all of clones that were formed utilized the broth and appeared normal. Some, however, we have to asthe broth was consumed most of the cells died. When 8 gal is present in wild type B. coli, sume, must have survived. When the cells cannot make 8 gal, lactose lactose is degraded and utilized. concentration increases and is bound to the progenitor protein, and most probas well as tertiary structure, ably to a few others that have similar sequence This binding causes structural changes in while they are being translated. which induces these proteins to bind to their respective m-RNAthe molecule, 1266
Because ribosome complexes and slows down the movement of m-RNA on ribosomes. this movement is needed, in bacteria, for the transcription to be completed and the m-RNA to be released from DNA, the slowdown marks the gene(s) as a trouble spot for CM to act on. This enzyme, then, methylates the first cytosine found within a C-isostich or in CpG, whichever happens to occur next to the region where m-RNA is still bound to DNA. Then MCD converts the methylThe modified gene will have a lower affinity for the cytosines into thymines. m-RNA, being translated, because of the introduction of a small region of perThis lower affinity will aid the movement turbance due to the G:T mispairing. of the m-RNA on the ribosome and the transcription will be completed eventually.
The important point here is that these mutations are bieng caused by an excess of lactose, whose presence is slowing down some of the vital functions of the cell through its interference with the translocation of the m-RNA, a situation that would not have existed had the cells possessed the 2 gene. Hence we can anticipate that the mutations induced at the effected genes and the consequent changes taking place in the three dimensional folding of these proteins, will be those favoring an increase in the ability to split lactose I suggest that of those proteins effected in this by one of these proteins. way, the one whose tertiary structure was originally closest to #3 gal, is the one which will succeed in performing this function-in this case the protein we have called the progenitor protein, because it will require the minimum number of mutations to acquire the necessary structure. The next m-RNA molecule transcribed from the mutated gene will be modified to yield a protein which will have an altered amino acid but will still be unable to degrade lactose rapidly. It still would bind to m-RNA-ribosome complex, in the presence of lactose, and slow down the transcription, but to a lesser extent. Thus marked, the gene would again be effected by CM and MCD at the next CpG or C-isostich. After five or so mutations the change induced in the tertiary structure of the progenitor protein increases its efficiency in splitting the lactose molecule, while decreasing the tendency to be effected by lactose and bind to m-RNA-ribosome complex. A protein is thus produced, eventually, whose functional site becomes very similar but still not identical to the three demensional structure of the f3 gal active site, as is suggested by the differences found in reaction kinetics and to a lesser degree by the absence of immunological crossreactivity. Figure 1 demonstrates, schematically, from the progenitor protein to Ebg.
haw these mutations might have led
It is possible to have some insight as to what effect these mutations will have on the folding of the progenitor protein as well as others that might undergo similar mutations. It was already mentioned that CM acts on cytosines found within: a.
CpG dinucleotides,
b.
C-
or
isostichs
1267
FIGURE I a Schematic representation of the events which might have led to the formation of Ebg: a>
b)
The Z gene of B. coli coding for wild type $ gal. The gene is repreb sented as a segment of the circular chromosome. In later drawings the delection of this gene is represented as a missing segment at the same site C on the chromosome. The gene which produces the progenitor protein is represented on the lower left side of the circular chromosome, while the ter- d tiary structure of the progenitor protein is schematically shown, across from this gene, under the tertiary structure of B gal.
e
c-f) The progessive changes that lead the progenitor protein into something resembling the f.3gal are rendered. The arrows represent the transcription and translation events, jointly. Open circles re- f present the active site.
G G CI G+ + G
a.
This dinucleotide can occur either as CpGpX or XpCpG, along the DNA chain. If the mutation occurs within the sequence CpGpX, its effect would be silent, ie., it would not cause any structural changes because none of the amino acids coded by this triplet would change by this mutation (Table I>. If on the other hand the mutation should occur within the sequence XpCpG the result will be the conversion of an arginine into either a histidine or a glutamine.
b.
Of the mutations occuring with C - isostichs, one will be silent, glycine + glycine, while the others will convert glycine into highly polar arginine or glutamic acid (Table II).
1268
tion, tions,
Since only the information of (+) strand is being copied during transcripCM effects only this trand and we do not have to worry about other mutadue to different causes, which may affect the complementary strand.
Using man (271, that:
the predictive given for the
a helical and 13 sheet parameters of Chou and Fasamino acids effected, in Table I and II, one can see
Arginine to histidine transformation leads to a significant increase in the helicity while it decreases B sheet forming tendency somewhat. Arginine to glutamine exchange has the potential of increasing both the helicitp and 13 s true ture f onnat ion tendency. Since the overall effect depends on khe nature of the immediate neighbors of glutamine, it is impossible to state which of these will take place. One point is very clear: regardless, of which of these secondary structures are enhanced or weakened, this change will definitely effect the secondary structure of the progenitor protein. According to the data given in Table II, a glycine to glutamic transformation would lead to a significant increase in helicity while a glycine to arginine change will not really have any strong enhancing effect, however, because it eliminates a strong helix breaker it may be considered to help an increase in helicity.
not
Effects of necessarily
these mutations parallel their
on the effect
tertiary structure of these on the secondary structure:
proteins
do
Arginine to histidine and arginine to glutamine transformations can be anticipated to produce a destabilizing effect on the tertiary structure of the progenitor protein due to the elimination of a strongly basic side chain. On the other hand glycine to arginine and glycine to glutamic acid exchanges can be expected to affect the three dimensional folding of these proteins significantly because of the introduction of strongly charged side chains and secondly sterically, by substituting large side chains in place of a hydrogen atom. Thus it is obvious that even a few such mutations are going to alter the folding of the progenitor protein as well as others that are effected, That few other proteins, besides the progenitor protein are mutated, in the presence of lactose, is quite likely. This we deduce from the description of clones (1): After each streaking on lactose agar, clones with different appearance were observed. Since there is no evidence that the deletion of B gal causes any change in the appearence of B. coli _ ones, other mutations must have been responsible for these changes. We can also expect the other proteins that undergo mutations, in the presence of lactose, to have amino acid sequences and overall three dimensional folding similar to g gal, and to have substrates chemically related to lactose (16). This mechanism may also be responsible for the emergence of drug tance and detoxification mechanisms for a novel chemical in bacteria.
a
resis-
Most organisms will be killed by an exposure to such a chemical, but some, whose proteins may already possess certain benign alterations, will be in a better position to acconsaodate the toxin within their active sites - the requirement still being that the tertiary structure of the toxin should be closest to the tertiary structure of the substrate of one protein. The pro1269
tein, in the presence of the toxin,will bind to its a-RUA-riboousecomplex, while being translated,and marks the gene. It will eventuallybe mutated into a form whose protein product will be capable to degrade,modify or in general detoxify the alien molecule. REFERENCES 1. Campbell JR, Lengyel JA, LangridgeJ. Evolutionof a second gene for &Galactooidase in B. coli, Proc Nat1 Acad Sci USA. 70, 1841, 1973. 2. Rartl DL, Hall GB. Second naturallyoccurringS-galactosidasein 8. coli. Nature 248, 152, 1974. 3. Hall BG, Hart1 DL. Regulationof newly evolved enzymes. Selectionof a novel lactase regulatedby lactose in 8. coli. Genetics 76, 391, 1974. 4. Arraj J. Personal coxmmication, 1974. 5. Arraj J. Ph.D. Thesis, Universityof California,Los.Angelas,1973. 6. Hart1 DL. Personal communication,1975. 7. Greller LD, Erhan S. Short length amino acid sequenceh-logy among ancestrallyunrelatedproteins. Int J Pept Prot Res. 6, 165, 1974. 8. Erhan S, Greller LD. Presence of repeatingsub-sequencesand sgametry patternsin proteins. Int J Pept Prot Reo 6, 1975, 1974. 9. Erhan S. Origin of the first cell. Zeith Naturforsch32c, 1003, 1977. 10. Erhan S, Greller LD, Rasco B. Symmetrypatterns in trypsinogen. Int J Pept Prot Res 9, 5, 1977. have proteolyticactivity? 11. Erhan S, Greller LD. Do inmunoglobulins Nature 257, 353, 1974. 12. Erhan S, Greller LD, Rasco B. Evolutionof the t-RNA molecule. Zeit Naturforsch.32c, 413, 1977. 13. MctachlanAD. Repeating sequencesand gene duplicationin proteins. J Mel Biol. 64, 417, 1971. 14. Stroud RM, Kay I.$ DickersonRE. The crystal and molecular structureof DIP-inhibitedbovine trypsin at 2,7A" resolution. Cold Spring Harbor Symposium36, 125, 1971. 15. Birktoff JJ, Blow DU. Structureof crystallisedchymotprpsin. J Mol Biol 68, 187, 1972. 16. Erhan S, Greller ID. Potentialof amino acid homology studies: Homologies found between 8 galactosidaseand lac repressorof B. cold. Int J Biaed Camp. 8, 283, 1977.
1270
17. Fowler AV, Zabin I. Amino acid sequence of $-galactosidase of E. coli. Proc Nat1 Acad Sci USA. 74, 1507, 1977. 18.
Tosi L, Granieri A, Scarano E. Enzymatic DNA modifications in isolated nuclei from developing sea urchin embryos. Exptl Cell Res 72, 257, 1972.
19. Volpe P, Eremenko T. Preferential methylation of regulatory genes in HeLa cells. FEBS Letters 44, 121, 1974. 20.
Bourdon RH. Enzymatic modification of chromosomal macro-molecules. Biochem Biophys Acta 232, 359, 1971.
21. Fujimoto D, Srinivasan P, Borek E. 1965.
Protein Methylases.
Biochem 4, 2849,
Methylation of DNA.
.JGen
22.
Gold M, Gefter M, Hausmann R, Hurwitz J. Physiol 49, 5, 1966.
23.
Prekumar E, Shoyab M, Williamson AR. Germline basis for antibody diversity: Immunoglobulin VH and CH gene frequencies measured by DNA-RNA hybridization. Ptoc Nat1 Acad Sci USA 71, 99, 1974.
24.
Egly JM, Johnson BC, Stricker C, Mandel P, Kempf J. Newly phosphorylated proteins associated with cytoplasmic dRNA. FEBS Letters 22, 181, 1972.
25. Kimmel CB, Larrabie KL. Cytoplasmic RNA in immunoglobulin secreting myeloma tumor cells. Biochim Biophys Acta 335, 374, 1974. 26.
Miller OL, Hamlako BA, Thomas CA. action. Science 196, 392, 1970.
27.
Chou PY, Fasman GD. 1974.
Visualization of bacterial genes in
Prediction of protein folding. Biochem 13, 222,
1271