J. theor. Biol. (1981) 91, 13-31
Self Organization Origin of Life Scenariosand Information Theory HUBERT P. YOCKEY Army Pulse Radiation Division, Aberdeen Proving Ground, MD 21005, U.S.A. (Received 30 July, 1979, and in revised form 6 March 1980) The self organizationist scenarios for the origin of life are examined by means of information theory. It is found that self organization must yield only genetic message ensembles of information content much too low to
constitute a genome. It is shown that the statistical structure reflected in “the instructions in the amino acids themselves” is an impediment to the generation of genetic information not a source of it. It is concluded that at present there are no scientifically valid origin of life scenarios. Consequently, belief in little green men in outer space is purely religious not scientific. 1. Introduction
George Gaylord Simpson published in 1964 an important article on the origin of life on earth, exobiology and extra-terrestrial civilizations. He called the search for extra-terrestrial life “a gamble of the most adverse odds in history”. After fifteen years the first of these disciplines is not nearer its goal and the latter two have yet to demonstrate that their subject matter exists (Klein, 1978). Nevertheless, the established view is that the basic framework of a theory of the origin of life on earth is well in hand. The gaps and details which remain to be considered however, will keep many scientists employed for a generation or two (Dickerson, 1978). It has been demonstrated that basic monomers, the “building blocks of the informational biomolecules,” especially amino acids, would have been formed (Miller, 1955) in a “hot dilute soup”. They have also been found in interstellar dust clouds (Goldanskii, 1977) (Hoyle & Wickramasinghe, 1978). From this it is concluded that life is virtually certain to emerge spontaneously through “self organization” or “chance and natural causes” on all “suitable planets” at any time everywhere in the universe. From these premises it follows that since life exists on earth it must be common
nologically
in the universe. The search for extra-terrestrial
advanced extra-solar
civilizations 13
life and tech-
is based on this doctrine
14
H.
P.
YOCKEY
(Kuiper & Morris, 1977). We have arrived at this point by a series of glen sequiturs, the wish being the substance of things hoped for. In so far as chance plays a central role, the probability that even a very short protein, not withstanding a genome, could emerge from a primeval soup, if it ever existed, even with the help of a dew ex machina for lo9 years is so small that the faith of Job is required to believe it (Yockey, 19776). This probability is favored by taking into account the functionally equivalent or synonymous residues, if any, at each site. The calculations show that the racemic nature of the primeval soup is extremely unfavorable. Furthermore, non-biological amino acids and analogues, which must have been present, would have been incorporated in naturally formed amino acid polymers. The relative concentration of amino acids in the primeval soup may have been different from that prevailing in primitive protein. The effect of these latter two conditions was not taken into account but it is extremely unfavorable and so the real probability of the emergence of a genome by chance is much smaller than reported (Yockey, 1977b). Not everyone believes that informational biomolecules are “frozen accidents” which emerged from the primeval soup by chance. They believe in self organization scenarios according to which, given the “basic building stones,” biological order or organization is predestined (Kenyon & Steinman, 1969) to emerge spontaneously from the deterministic chemical properties or “instructions available from the amino acids themselves” (Fox, 1977, 1980). Amino acids polymerize by forming peptide bonds with loss of water. This is thermodynamically impossible in an aqueous medium. In order to avoid this problem self organization scenarios have been proposed which call upon dry heat (Fox, 1972) or a polymerizing agent such as montmorillonite or other clays to concentrate the extremely dilute primeval soup and to polymerize the amino acids (Bernal, 1951). These scenarios regard the emergence of life as being essentially contained in the laws of physics and chemistry and that life, like crystal structure, is a natural consequence of the properties of matter and therefore was predestined from the beginning of the universe (Kenyon & Steinman, 1969; Eigen, 1971). Biology is regarded by those who hold such views as being merely a branch of physics and chemistry. The belief in numerous technologically advanced civilizations on planets near other stars rests on a belief in origin of life scenarios, (Kuiper & Morris, 1977) and so we will begin our considerations there. In order to discuss the origin of life one needs an operational definition of life. This is notoriously difficult to give. Nevertheless, it is a fundamental part of molecular biology that the existence of a genome, that is, an ensemble of genetic messages
SELF
ORGANIZATION
SCENARIOS
15
capable of controlling the biochemical and genetic requirements of the organism is a necessary if not a sufficient condition to be met if matter is to be called living (Yockey, 1977b). The purpose of this paper will be to use information theory to determine whether or not an ensemble of messages with sufficient information content could be generated by the subject scenarios. The self organization proteinoid microsphere scenario of Fox and others has been severely criticized from the chemical point of view by Bernal (1967) and by Day (1979). Dillon (1978) criticized the various colloidal droplet models of the protobiont including coacervates and proteinoid microspheres from the biochemical point of view. He rejects their putative role and concludes that a genetic mechanism must have preceded the single cell in agreement with requirement of an ensemble of genetic messages stated above. It has often been remarked that biochemical genetics is extremely conservative. Members of the same families of compounds are found in taxonomically diverse organisms. Such facts and the universality of the genetic code lead one to believe that life on earth had a beginning and (to use a computer analogy) a basic program of genetic messages originated to form some ancient primitive organism, namely, the protobiont. The biochemical unity of this basic program has been retained throughout evolution in some cases with little modification, and new subroutines have been added (Chaitin, 1979). Our general criterion will reject all origin of life scenarios which cannot generate a sufficient information content in the time allowed by the age of the earth and the universe. The surviving scenarios must then show that they can meet the requirement of generating the particular genetic text which forms a genome. This issue contains some subtleties which lead us beyond the scope of this paper. A scenario for the origin of life and for evolution must construct a decision tree by which the very small number of proteins found in the genetic text must be selected from an enormous number of possible texts. There is no way of knowing whether or not a number of protobiont genomes could be formed which would lead to evolutionary dead ends far below the organizational level of known organisms. A number of biologically losing strategies will have an adequate information content at least for the protobiont. Our analysis will therefore be incomplete. We can be sure, on the other hand, that all proposed origin of life scenarios which do not meet our criterion are losers. The incompleteness of our analysis will not trouble us since we will find that all scenarios to be examined do not meet even this necessary but not sufficient condition. This means that at present there is no valid scientific scenario for the origin of life.
16
H.
P.
YOCKEY
2. The Effect of Internal Statistical Structure on Information Content of Symbol Sequences In order to consider self organizationist scenarios which may be proposed for the origin of life we introduce a general mathematical model of the source of sequences (Shannon & Weaver, 1964; Khinchin, 1957) known as a Markov process. Suppose we have a system which has a finite number, a, of states, i. These states may be associated with symbols such as letters, or with amino acids, and nucleotides. The probability that the system is in state i is pi. The conditional probability that if the system is in state i that it will go to state j is the transition probability p(jli). The transition probability may be influenced by the history of states of the system. Suppose bi is a block of m - 1 symbols. The probability that the system will pass from the block of symbols bi to symbol j is called p(j)bi). The statistical structure of any printed language ranges through letter frequencies, digrams, trigrams, word frequencies, etc., spelling rules, grammar and so forth and therefore can be represented by a Markov process given the states of the system, the pi and the p(jlbi) probabilities. If the statistical structure extends to m symbols the Markov process is said to be of order m. The relative amount of chance and necessity required to describe the problem can be incorporated in the pi and the p(j)bi). The entropy of the Markov process is a measure of the amount of chance, that is, uncertainty reflected in the statistical structure. We can learn a great deal about a source simply from the entropy without knowing the details of the statistical structure. According to the sequence hypothesis, the specificity of all proteins is recorded in the exact order in which the amino acid residues are arranged. At sites where any mutation destroys the specificity, the residue is said to be invariant. If mutations can be accepted all such residues may be said to be synonymous at that site. Therefore we may imagine a Markov process as a source of sequences of these compounds. Fox (1975) has objected that information theory treats amino acids as playing cards, that is, without intersymbol influence. The objection is invalid on two counts, first a Markov process can be constructed to match any relationship between amino acid residues and the statistical structure of any polyamino acid sequence, true protein or proteinoid. Secondly, as we shall discuss later, there is, in fact, no intersymbol influence in true proteins and the playing card analogy is indeed a good one. In the following we will resort to illustrating our points by reference to the properties of language. It is important to understand that we are not reasoning by analogy. The sequence hypothesis applies directly to the protein and the genetic text as well as to written language and therefore the treatment is mathematically identical.
SELF
ORGANIZATION
SCENARIOS
17
If the internal statistical structure of the Markov process is known, the source entropy or information content of the sequence generated can be calculated. In application to natural languages the exact nature of the Markov process will be very imperfectly known (Chomsky, 1957), therefore, following Shannon (1951), we calculate the information content by a series of approximations, Ho, H1, Hz. . . H,,, which take into account successively the statistical structure over more and more adjacent symbols. The m gram entropy or information content of an adjacent sequence of m members is:
(1)
i,i
The total number of sequences of length N which can be generated by a source with an alphabet of a members is a”. This number, aN, is of no practical interest, as we shall explain. The probability of a sequence depends on the letter probabilities pi. A long sequence of N members composed primarily of letters with a low pi will be very much less probable than one chosen primarily from letters with a high probability. Shannon (1964) (in theorem 3) and Khinchin (1957) have shown that for a Markov source the sequences can be divided into two classes. In one class all sequences have approximately the same high probability, namely 2-NH. All other sequences belong to a class which has a very small probability for the entire class. This is a very general and fundamental theorem in information theory. McMillan (1953) has shown that it applies to any ergodic source. We are therefore justified in ignoring the second class and in considering only the first class of which there are 2NH members. For sequences of any reasonable length and with letter frequencies found in practice 2NH is only a tiny fraction of a”. We note that aN = 2N log, a.
(2)
For the protein alphabet a equals twenty and so if N equals 100 there are 1.26 X 1013’ possible sequences. If we take account of the amino acid residue frequencies (Yockey, 1977a, 1974) then H ~4.153 and we find 1.04x 10 lz5. Thus the sequences with which we need to be concerned are only 10e5 part of the total possible. Some authors (Eigen, 1971) ignore this theorem and replace H by its maximum value, log, 20 = 4.322. This is an egregious error which nullifies much of what these authors have to say, as we shall see in this paper. It is easy to show (Bremermann, 1963) that the H, form a monotone sequence HorHlrH2rH..
.zH,,,.
(3)
18
H.
P.
YOCKEY
That is, as consideration of the internal statistical structure is extended over more symbols, and the stronger it is, the smaller H, becomes. Evaluating H, in the protein aphabet, we first take Ho = log2 20 = 4.322 bits/symbol. HI, which reflects only the amino acid residue frequencies, is close to 4.153 bits/symbol (Yockey, 1977a, 1974). Very early in the history of molecular biology a search was made for order in protein sequences as reflected by digram, trigram and n-gram frequencies in the protein text. At that time only a few fragmentary sequences were known (Gamow, Rich & Yeas, 1956; YEas, 1958). The digram frequencies were found to show no intersymbol influence. Goel et al. (1972) re-examined this problem when much more data were available. They too found no intersymbol influence in natural proteins. For this reason all H,=4*153 bits/symbol for m 5: 1. Let us now consider the effect of the statistical structure of amino acid polymers or proteinoids formed by natural processes, namely, by dry heat or by absorption on montmorillonite or other clays or perhaps, like Aphrodite, in sea foam (Bernal, 1967). Fox & Dose (1977), Fox (1972), Fox (1975), Katchalsky (1973), and Paecht-Horowitz, Berger & Katchalsky (1970) report that proteinoids formed by dry heat or by polymerization on clay have sequences which are “non-random”. The relation between “order”, “randomness”, “complexity” and “information” was discussed previously (Yockey, 1977b, 1974). (It is well to keep in mind that Fox et al. (1977) has his own non-mathematical definition of “information”, namely; “. . . the capacity of a molecule or system to interact selectively with other molecules or systems.“) The self organizationists regard intersymbol influence reflected by “non-random” assembly as being the source of and leading to, by an evolutionary process, informational biomolecules and the genetic text of messages which reflect biochemical specificity. That is, it is believed that, given the statistical structure of the amino acid sequences, the emergence of informational macromolecules is predestined. For example, Fox et al. (1977, 1975) believes his proteinoids were the first informational biomolecules although they are not proteins. Experimental values of the 400 residue digram probabilities p(i, j) and the 8000 trigram probabilities p(i, j, k) have not been reported. Self organizationists rest their case on a few digram probabilities which experimental evidence shows to exhibit a strong intersymbol influence (Dose, 1976; Fox, 1976; & Nakashima, Jungck & Fox, 1977). They must argue and believe that the “non-randomness” in proteinoids or other polyamino acids is strong (Eigen, 1971). Until the complete statistical structure of proteinoids or other polyamino acids is reported, it is necessary to resort to a comparison with the statistical structure found in language for guidance as to the expected values of H,,. If the “non-
SELF
ORGANIZATION
SCENARIOS
19
randomness” is as strong as Shannon (195 1) found it to be in written English then the information content of a polyamino acid formed by heat or on clay is between one and two bits/symbol. Let us now compare this with the information content of true proteins. If the genetic system is required to specify an amino acid residue chain uniquely 4.153 bits/residue is required (Yockey, 1974). Less information is needed if only one member of a protein family need be specified (Yockey, 1977~). Cytochrome c is the only protein family for which the information content has been calculated by taking into account synonymous residues and one finds 2.953 bits/residue (Yockey, 1977a). We may extrapolate this result to other proteins by taking note of the mutation rate. Those residue sites in a protein which have many synonymous residues have a small information content (Shannon & Weaver, 1964, fig. 1). The more synonymous residues which are separated by only one single base interchange the more likely it is that neutral mutations will appear and be accepted. Therefore, proteins with a low information content will show a high neutral mutation rate. It is reasonable to expect that when proteins are arranged according to the increasing rate of mutation acceptance that those with increasing information content will be found in the reverse order (Yockey, 1979). We find in Dayhoff’s list (1978) that proteins which are regarded as ancient or even precellular such as certain domains and structure of glyceraldehyde 3-PO4 dehydrogenase, lactate dehydrogenase, glutamate dehydrogenase (Buehner et al., 1973; Rossmann, Moras & Olsen, 1974; & Wootten, 1974) ferredoxin and the histones have a mutation rate which is nearly the same or smaller than that of cytochrome c. It is therefore reasonable to believe that they have the same or larger information content. As we pointed out above if the tendency of amino acids to order themselves into meaningful sequences is as strong as the self organizationists believe the source entropy of the Markov process representing this tendency is much smaller than 2.953 bits/symbol needed to specify at least one member of the cytochrome c family. Therefore it is impossible for any of the proposed self organization scenarios to be responsible for cytochrome c or the other ancient and fundamental proteins. Fox (1972) is therefore clearly mistaken when he says that proteinoids are “as complex or more complex than contemporary proteins”. The so-called “instructions in the amino acids themselves” which, it is proposed, generate the first informational biomolecules actually merely play the role of grammar, spelling rules, etc. in ordinary language. Grammar and spelling are autonomous and independent of meaning (Chomsky, 1957), so it is clear that it is impossible that the genome of the protobiont could have appeared in a “primitive soup” in this way.
20
H.
P.
YOCKEY
3. The Self Organization of Matter from Sequences of Low Information Content
In some self organization scenarios the self ordered polymers are acknowledged to be of low information content but they are believed to be the first step in a chemical evolution which leads to more complex sequences (Fox etal., 1977; Dose, 1976) with increased information content and finally to true proteins. According to Cairns-Smith (1965) “. . . we might say that the very first organism was the primitive earth itself; that primitive (claybased) organisms were initially viruses with the earth as a host . . .“. This is delightfully reminiscent of the earth-mother goddess in classical mythology and in primitive religions. See also Genesis l:ll, 24; 2:7; 3:19. Thus Gaea joins Lachesis as a goddess involved in the origin of life. Cairns-Smith regards the information content in crystal imperfections as the source of genetic information from which organisms evolved. He is grossly mistaken in his statement that the information density due to crystal defects in a crystallite is comparable to that in DNA. Fox (1972) argues that the self ordered proteinoids aggregate to cell-like microspheres. These microspheres formed a minimal cell which preceeded true cellular proteins and nucleic acids. He suggests that on the primitive earth and other suitable planets, a reproduction (or parent-connected multiplication) was carried out (Fox, 1973, 1976, 1980) in these microspheres. The products of this reproduction were screened by natural selection and this formed the basis of evolution. This parent-connected multiplication is prohibited between modern proteins by the central dogma where DNA, RNA and the genetic code are involved. I have shown that the central dogma is a mathematical property of the genetic code resulting from its degeneracy (Yockey, 1978). All degenerate codes have a central dogma but non-degenerate ones do not. The modern degenerate code may have evolved from a primitive non-degenerate doublet code (Jukes, 1965; Yockey, 1977a; Dose, 1976). In that case the central dogma may not have been operating in such early times. Information transfer in either direction between RNA and DNA is not prohibited since that code is not degenerate. A non-degenerate protein-protein code is also not prohibited by information theory although it may not be of biological significance for other reasons. It is known that short amino acid sequences can be synthesized by an enzyme without encoding by nucleic acids (Lipman, 1971, 1974). The parent-connected multiplication in the proteinoid microspheres postulated by Fox is not proscribed by information theory. According to this scenario chemical evolution and selection leads through protocells to the acquisition of a coding system and to true cells. Self organizationists believe that this process of spontaneous formation of protocells would have occurred
SELF
ORGANIZATION
SCENARIOS
21
innumerable times in innumerable locales on this planet during its long history (Fox et al., 1977, Fox, 1976; & Nakashima et al., 1977). This view is supported by the fact that the yield of proteinoid microspheres is about 10” per g (Fox et al., 1977). The primitive soup is thought to have contained a number of amino acids variously estimated as 1044. Furthermore, it is widely believed that there are lo* “ suitable planets” in the galaxy. If the microspheres are reconstituted diurnally (Fox et al., 1977) on all 10’ “suitable planets” a very large number of trials can be carried out in 10’ years. Evolution is always regarded as a stochastic process and accordingly, it is believed, there is plenty of time and material for the self ordering of the proteinoids to scan almost all possibilities many times and to lead to self organization of the cell-like microspheres and by molecular evolution to protocells. Simpson (1964) pointed out that in spite of statements in the scientific literature that there are several hundred million suitable planets which are probably inhabited, in fact, scientists know no such thing. Recent work by Hart (1979,1978) has shown that the zone about main sequence stars within which a planet avoids a runaway greenhouse effect or a runaway glaciation is very narrow around G type stars and even non-existent in K and it4 stars. There can be no more than one “suitable planet” per star and many will have no planet at all within this habitable zone. We have no clear idea of what a “suitable planet” must look like except that it must have an ocean of liquid water and some appropriate atmosphere for lo9 years or more. There well may be far fewer “suitable planets” than has been suggested. Nevertheless, in order to favor the scenario we entertain the case where lo* suitable planets may be considered to see if this will result in enough trials to generate a genome. Rarely if ever do those who propose an origin of life scenario trouble themselves to make a quantitative estimate of the probability that events in the real world will indeed go as described. Let us therefore provide this service by calculating the time it takes Lachesis or Gaea using all 1O44amino acids to scan 95% of the high probability sequences of 100 elements, at least once, if each sequence is reconstituted diurnally (Fox et al., 1977). The product of the number of trials,n, and the probability p of each sequence is equal to the Poisson parameter, A np=h.
(4)
If A = 3 the probability that each chain will appear at least once is O-95. The number of high probability sequences of length N with information content H(x) is 2 NH(x).
(5)
22
H.
P.
YOCKEY
If H(x) = 2 bits/symbol and N = 100 there are 1.61 x 106’ sequences which need be considered. Substituting in equation (4) we have 1o44 1 -x1o-6o=3. XtX1.61 100
(6)
We find t = 4.8 x 101’ days or 1.3 x 1016 years. Currently, the universe is believed to have originated in an enormous nuclear explosion, the so-called “big bang”, 15-20 x lo9 years ago. The scenario fails under the initial assumptions even though 100 residues is much too small to be a genome. Let us suppose that the first primitive organism needed one hundred chains of 100 residues each and that we allow the scenario to take account of all lo* “suitable planets” believed to exist. We then find the value of t to be 13 x lo9 years which is comparable to the age of the universe. Lachesis or Gaea might generate a chain of lo4 residues at least once in the history of the universe if the information content is 2 bits/symbol. The scenario requires H(x) to increase during evolution and consequently the number of sequences which must be scanned increases enormously. However, H cannot increase from a value characteristic of amino acid polymers formed by abiological means without a genetic code with an adaptor function. We see again that true proteins could not evolve from such polymers. A far more serious criticism of this scenario arises from the following considerations. Let us take the value for I&, = 4.153 bits/symbol which applies to true proteins. From equation 5 we calculated above that there are 1.04 x 1olZ5 high probability sequences for a 100 element chain. The high probability sequences for which H(x) = 2 bits/symbol are only a fraction 1.55 x 10P6’ of the larger ensemble of sequences. The fraction is reduced drastically as the chain lengthens. The burden of proof is on the self organizationist to prove that the genetic text of the protobiont lies in this very tiny fraction of sequences. This burden is too heavy to bear. True proteins will not be found in the smaller ensemble because of the strong intersymbol influence which, by definition, characterizes that ensemble. These calculations clearly do not support the view expressed by Fox & Dose (1977) that protocells would be formed spontaneously innumerable times in innumerable locales on the planet Earth and on all “suitable planets elsewhere in the universe.” The calculations given above are much kinder to the self organizationist scenario than reality would be. The scenarios are clearly not efficient in the use of the basic monomers on all suitable planets. Nevertheless, we have assumed complete efficiency in the use of all 1O44 amino acids on all lo* suitable planets. Furthermore, we have neglected the fact that the amino acids in the primitive soup must be racemic and that the primitive soup must
SELF
ORGANIZATION
SCENARIOS
23
contain many non-biological amino acids and analogues. The fact that the primitive soup may not have contained amino acids in the same proportion as that required in protein is also neglected. All these factors drastically increase the time required to form a genome. In the form of the self organization scenario due to Eigen (1971) and Eigen & Schuster (1977) chance plays its role at the “beginning” so that the origin of life started from random events. According to them there is an initial period of stochastic scanning before the auto catalytic hypercycles appear. The information carriers which he regards as being nucleic acids are believed to have formed by self organization from random polymers. “Orderliness” then emerges by means of self sustaining auto catalytic hypercycles. The calculations given above show that the required number of high probability sequences cannot be scanned with the material and the time available so that the scenario cannot get started. The self organized hypercycles are subject to the same criticism as Fox’s proteinoids with regard to information content and the generation of a genome. It is clear from the above discussion that the path to functional protein must pass through a phase in which the natural affinity of amino acid residues is eliminated. This is accomplished by the adaptor function of the tRNA’s. Woese (1972) and YEas (1974) have suggested that the translation of the first primitive code was “a very error ridden process”. It is possible to put quantitative limits on the accuracy required. In Yockey (1974) we showed that the probability that a DNA or RNA sequence of length N codons would code a protein sequence correctly as a function of a base change probability (Y is: -6.509Na P=e . (7) This equation takes into account the amino acid residue frequency and the error correction due to codon degeneracy. If, as has been suggested (Jukes, 1965; Dose 1976; Yockey, 1977~) the first primitive codes were nondegenerate doublets, no error correction would have been available from codon degeneracy. Some base changes result in a different but functionally equivalent or synonymous amino acid residue. The protein sequence remains in the same family and its functional effectiveness is preserved. This is discussed in Yockey (1977~) and we find in table 1 of that paper for yeast cytochrome c that equation (7) becomes p = e-397a.
(8)
Considering that cytochrome c requires 101 codons the exponent 397a may be written 3.931 x 101a or in general P=e
-3.931Na
.
(9)
24
H.
P.
YOCKEY
Suppose that only 50% of the proteins produced are functional,
then:
Na < 0.1763.
(10)
cr
(11)
If N = lo4 codons then
This accuracy must be attained even by the most primitive effective code. It therefore appears that a quite accurate code was required even at the origin of life. 4. Discussion
The argument made in all origin of life scenarios is that one can trace the origin of life by going back in time along evolutionary pathways to simpler and yet more simple organisms. One descends the biochemical organization level until one arrives at the basic and finite granularity expressed by the “building blocks” of biological systems. Guided by the infallible doctrine of dialectic materialism, it is clear that the process can be reversed. Therefore on at least a reasonable fraction of any “suitable planets” anywhere and any time in the universe, in due course, it is believed to be very likely that humanoids will evolve and in turn develop highly technical and benevolent civilizations (Simpson, 1964). They will, of course, be anxious to communicate with other such civilizations and in particular our own. If they do not communicate with us it is because we have sinned or are too primitive to deserve their attention. We are, nevertheless, being watched and upon good behavior we can hope to be included eventually in the celestial dialogue. Many searches have been made for such signals. Not even the presumed carrier frequency, let alone the expected modulation, has been detected (Horowitz, 1978). Since the Earth has been bright in the radio frequencies only in the last thirty years or so our extra-terrestrial civilizations have not yet had time to receive and analyze these signals. Thus to the true believer (Hoffer, 1951) the absence of a carrier proves “they” must be listening. The sequence hypothesis is equally valid in molecular biology and in ordinary language. We may therefore perform the same thought procedure as above and trace the organization of a treatise or other composition through its sections, paragraphs, sentences and words to its letters. By the same token we assume it to be obvious that knowing the letter frequencies, digram, trigram, n-gram frequencies, rules of spelling, grammar, and composition one could also reverse the thought procedure and construct a complete body of all statements which can be made in a chosen language. As in the application to genetics we are relying on the “instructions in the words
SELF
ORGANIZATION
SCENARIOS
25
themselves;” that is, in a tendency of words to order themselves. Quastler (1964) has pointed out that this is exactly the object of a project once underway in the Grand Academy of Lagado. One of the savants in that great institution of research and higher learning had constructed a simple frame which recorded all words in the language of that country in their several moods, tenses, and declensions. These words were arranged and rearranged in random order by students turning cranks and if sentences or fragments thereof appeared this was duly noted. This machine was crude and slow by modern standards so that in spite of a good deal of effort little was accomplished. Nevertheless, it was known in Lagado that Mars has two satellites of period 21h 30” and 10h 0”. This machine may have been the source of this knowledge. Clearly the correct identification of just two satellites and the nearly correct periods (modern values are, 30h 18” and 7h 39”) could not have happened by chance. The lack of further results can be ascribed to the primitive character of this machine which was such that it could take into account only word frequencies in the statistical structure of the language. The idea was ahead of its time but it may be presumed that with the very large memories and tremendous speed of present day computers this project could be brought to fruition. The programming can now be much more sophisticated and can include all that is known about the statistical structure of the language. A large number of nonsense sequences will be generated, of course, but, surely the computer program proposed above can be instructed to print out only the “ordered” and therefore significant sequences of paragraph length or longer for examination by human operators. Because the sequence hypothesis applies both to written language and to informational biomolecules the self organizationists should be willing and eager to put their scenario to the test. The advantages to human culture are beyond imagination. Properly programmed, the fast and large computers now available could write books on philosophy, poetry, politics, law, theology, and-even on the origin of life. Since the idea has already had some success surely a sufficiently eminent projector (as they were called in Lagado) at a sufficiently prestigious institution should be able to get such a program funded. Unfortunately for him the self organizationist has constructed a Haman’s gallows. He argues that the statistical structure of the proteinoid is very strong and therefore he must accept an information content much lower than that of those proteins thought to be basic to all or almost all organisms. The source entropy of this scenario is therefore such that those proteins could not have been generated. The source entropy can be increased only by eliminating the statistical structure of the naturally formed amino acid polymers. Modern proteins dating after the origin of the first genetic code,
26
H.
P.
YOCKEY
not necessarily the modern one, have a high information content and no detectable intersymbol statistical structure. The information content of modern proteins reflects a complexity nearly that of a random sequence (Yockey, 1977b). Since complexity and order are opposite concepts there is very little order in true proteins. Consideration of the adaptor operation of the genetic code mechanism shows clearly that natural intersymbol influence reflected by the “instructions in the amino acids themselves” must be removed from the protein formation process. The “order” in the naturally formed amino acid polymers is therefore an impediment and not a means of “self organization” which leads to informational biomolecules and from thence to a genome. It is believed this is the first time this function of the code has been pointed out. The accepted scenarios for the origin of life regard the genetic code to be merely one of many necessary events in the course of the “inevitable appearance of life on all suitable planets everywhere in the universe”. According to Wong (1976, 1975) more than 5 x 10’ kinds of genetic code are mathematically possible and perhaps it was necessary to scan a number of codes to select the modern one (Fox &al., 1977; Eigen & Schuster, 1977). The calculations presented in this paper show that the origin of a rather accurate genetic code, not necessarily the modern one, is a pons asinorum which must be crossed to pass over the abyss which separates crystallography, high polymer chemistry and physics from biology. The information content of amino acid sequences cannot increase until a genetic code with an adaptor function has appeared. Nothing which even vaguely resembles a code exists in the physico-chemical world. One must conclude that no valid scientific explanation of the origin of life exists at present. Two lines of consideration are used in this field. One argues that the universe is very very large and very very old and that although life is extremely improbable it is not impossible and that we, the happy product of this whim of Fortuna Imperatrix Mundi, could hardly be better located to observe it. This view is essentially a religious belief in miracles. It relieves us of the need to think about the problem today. Like Scarlett O’Hara we can think about it tomorrow. The second approach rejects the chance or “frozen accident” scenario and regards the origin of life to be predestined and an inevitable consequence of natural law as exhibited in chemistry and physics. Scientists, like philosophers and theologians often regard their peculiar surroundings and circumstances as being the inevitable result of natural law or of Divine Providence in a best of all possible worlds. Aristotle advocated infanticide of physically handicapped infants (Politics Book VII, chapter 16). He believed that some people are slaves by nature (Politics Book VII, chapter 13) and regarded one
SELF
ORGANIZATION
SCENARIOS
27
of the three purposes of government “to exercise the rule of a master over those who deserve to be treated as slaves.” (Politics Book V.1, chapter 14). Thus the philosophical justification of this unhappy institution was provided for the next 2200 years. St Paul in no fewer than four of his letters (Ephesians 6:5-6, Titus 2:9-10, Timothy 6:1-2, Colossians 3:22-24) instructed slaves to be loyal and obedient to their masters and also approved of stoning as a means of execution (Acts 8:l). By the same token the scientist is accustomed to “orderliness” as reflected, for example, by crystal structure, emerging by nature from super saturated solutions. Accordingly, he finds it easy to believe that amino acids and nucleotides will arrange themselves by nature into protobionts. We have seen in this paper that proving that amino acids and nucleic acids are formed by natural causes does not move one very far toward a scientifically credible origin of life scenario. Belief or disbelief is a human decision. A practical man will not believe a scenario which appears to him to have a very small probability. He will choose one which appears to him to have a probability nearly equal to one. As we discussed in (Yockey, 1977b) if a tossed coin is observed to fall heads ten times consecutively, a practical man will believe it to be two headed without examining it even though the sequence of all heads is exactly as probable as any other sequence. On the other hand, a true believer will continue to believe the coin fair no matter how long the sequence of heads. He will regard this as evidence that the one tossing the coin is capable of miracles. The longer the sequence of heads the more tenaciously do the practical man and the true believer each cling to his own divergent belief. Neither regards it necessary to examine the coin. Science and religion have different and opposite belief systems. Scientific beliefs are never absolute. Doubt is a virtue in science and many well established theories have been replaced because of tiny discrepancies. Faith on the other hand plays a central role in religion. Doubters, who must touch the very stigmata to believe, are not well regarded even if they are saints (John 20:25-29). A true believer confronted with evidence contrary to his doctrine regards this as merely a test of the steadfastness of his faith. The more his doctrine is denied by experience and observation the more tenaciously he clings to his holy faith. His greatest fear is heresy and treason to his doctrines. He believes that, like Job, he will be rewarded in the end when the amazing grace of holy writ is at last revealed. He does not need to understand, only to believe. He can be converted but not convinced. He views the world behind a fact proof screen and depends on an infallible comprehensive doctrine and often on an infallible leader (Hoffer, 1951). Faith in the infallible and comprehensive doctrines of dialectic materialism plays a crucial role in origin of life scenarios and especially in
28
H.
P.
YOCKEY
exobiology and its ultimate consequence the doctrine of advanced extraterrestrial civilizations. That life must exist somewhere in the solar system and on “suitable planets elsewhere” is widely and tenaciously believed in spite of lack of evidence or even abundant evidence to the contrary (Levin, Straat & Benton, 1978). The case for life on Venus until recently resided in a freedom to postulate or to believe “unfettered by inconvenient facts” (Dickerson, 1978). As the evidence on the climate of Venus has accumulated the true believers cling to the idea that life did once exist there but was killed by a runaway greenhouse effect. The case for life on Mars has never been very good. The “chicken soup” doctrine contrived to explain the origin of life on earth cannot be germaine for Mars. It has long been known that Mars has no modern oceans. Photographs and other data from the Mariner and the recent Viking missions show that liquid water cannot exist there and furthermore there are no fossil beaches, or river deltas. There is evidence that there have never been oceans on Mars (Rasool, 1972) and that the Martian regolith has never been wet (Nussinov, Chernyak & Ettinger, 1978; Nussinov & Vekhov, 1978). The few formations which have been interpreted as being caused by flowing liquid, not necessarily water, are not evidence of a fossil ocean (Clark, 1978; Trevina & Picard, 1978; Baker, 1978, Young & Pinto, 1978). The wholly negative results of the effort to seek evidence of life on Mars merely tests the faith of the true believer. The faith of many people in the doctrines of dialectic materialism has survived this test and they continue to believe ever more firmly in life on Mars (Levin et al., 1978) or Jupiter and/or its satellites or at least “elsewhere” in the galaxy. The literature in this field is completely gleichgeschaltet. No member of the scientific elite has published a paper questioning the established doctrine since 1964. This doctrine is held at the highest levels of scientific achievement. Crick & Orgel (1973) have made the following statement. “If the probability that life evolves in a suitable environment is low, we may be able to prove that we are likely to be alone in the galaxy (Universe). If it is high the galaxy may be pullulating with life of many different forms. At the moment we have no means of knowing which of these alternatives is correct. We are thus free to postulate that there have been (and stillare) many places in the galaxy where life could exist but that in at least a fraction of them, after several billion years the chemical systems had not evolved to the point of self-replication and natural selection” (italics ours). This statement, had it
come from a freshman student, would indicate that he would be wise to direct his talents to theology, sociology, politics or other pursuits not overly troubled by facts. However, Crick and Orgel are among the more creative of all scientists in the contemporary world. We therefore apply our principle
SELF
ORGANIZATION
SCENARIOS
29
that a practical man will not believe a scenario or a statement which appears to him to be very unlikely. He will select one which appears to have a probability near unity. We must conclude that the quotation given above and the article from which it was taken and many others of similar content were intended by the authors and by the editors of the journal to be a contribution to religious literature. They have a right to do this as long as they call it religion and not science. As to the unseen little green men whose civilizations putatively pullulate in our galaxy, faith is the evidence of these things unseen. (Now faith is the substance of things hoped for, the evidence of things unseen. Hebrews 11: 1.) Since science has not the vaguest idea how life originated on earth, whether life exists anywhere else, or whether little green men pullulate in our galaxy, it would be honest to admit this to students, the agencies funding research, and the public. Leaders in science, speaking ex cathedru, should stop polarizing the minds of students and younger creative scientists with statements for which faith is the only evidence. Research on the origin of life is legitimate and it would be much better off if the effort absorbed by defending scenarios which cannot meet the most elementary criteria for a scientific contribution were directed in a search for new knowledge. It is new knowledge not another clever scenario that is needed to achieve an understanding of the origin of life. This research must be carried out in an atmosphere of the creative skepticism typical of science. In this connection Simpson’s paper (1964) has proved prophetic and is well worth rereading, In the absence of better knowledge of the origin of life the search now being made for little green men and their signals from planets near other stars is based entirely on the evidence of faith and must therefore be regarded as an exercise of religious belief. REFERENCES BAKER, V. R. (1978). Science 202, 1249. BERNAL, J. D. (1951). The Physical Basis of life. London: Rouledge and Kegan Paul. BERNAL, J. D. (1967). The Origin of Life. Cleveland, New York: World Publishing Co. BREMERMANN, H. J. (1963). IEET Trans. Ml. 7,200. BUEHNER, M. FORD, G. C. MORAS, D. OLSEN, K. W. & ROSSMANN, M. G. (1973). Proc. natn. Acad Sci U.S.A. 70,3052. ROSSMANN, M. G. MORAS D. & OLSEN, K. W. (1974). Nature 250,194. WOOTTEN, J. C. (1974). Nature 252,542. CAIRNS-SMITH, A. G. (1965). J. theor. Biol. 10, 53. CHAITIN, G. J. (1979). In The Muximum Entropy Formalism. Cambridge, Mass: The MIT Press. CHOMSKY, N. (1957). Syntactic Structures. The Hage: Mouton & Co. CLARK, B. C. (1978). Icarus 34,645. CRICK, F. H. C. & ORGEL, L. E. (1973). Icarus 19,341.
30
H.
P.
YOCKEY
DAY, W. (1979). Genesis on Planet Earth, East Lansing, Michigan: Talos Publishing Co. Michigan. DAYHOFF M. 0. (ed.) (1978). In Atlas of Protein Sequence and Structure vol 5. Washington DC.: National Biomedical Research Foundation. DICKERSON, R. E. (1978). Sci. Am. 239,70. DOSE, K. In Protein Structure and Evolution (1976) (J. L. Fox, D. Zdenek & A. Blazej, eds). (1976). N.Y.: M. Dekker. DILLON, L. S. (1978). The GeneticMechanism and the Origin oflife, New York: Plenum Press. EIGEN, M. (1971). Naturwissenschuften 58,465. EIGEN, M. & SCHUSTER, P. (1977). Nuturwissenschuften 64,541. EIGEN M. & SCHUSTER, P. (1978). Naturwissenschuften 65,7. EIGEN, M. & SCHUSTER, P. (1978). Nuturwissenschuften 65,341. Fox, S. W. (1980). In The Nature of Life XIII Annual Nobel Conference (William H. Heidcamp, ed.). Baltimore: University Park Press. Fox, S. W. & DOSE, K. (1977) Molecular Evolution and the Origin of Life. New York, Basel: M. Decker. Fox, S. W. (1976). In Protein Structure and Evolution, (J. L. Fox, D. Zdenek & A. Blazej, eds) N.Y.: M. Dekker. NAKASHIMA, T. JUNGCK, J. R. & Fox. S. W. (1977). int. J. Quun. Chem; &ant. Biol. Symp.
4,65. Fox, S. W. (1975). Int. J. Quantum Chem; Quunt. Biol. Symp 2,307. FOX, S. W. (1972). Ann. New York Acud. Sci. 194, 71. Fox, S. W. (1973). Pure Appl. Chem 34,641. GAMOW, G., RICH, A. & YEAS, M. (1956). In Advances in Biological and Medical Physics Vol 4, p. 23. New York: Academic Press. GOEL, N. S. RAO, G. S., YEAS, M. BREMERMANN, J. J. & KING, L. (1972). J. theor. Biol. 35, 399. GOLDANSKII, V. I., (1977). Nature 269, 583. HART, M. H. (1979). Icarus 37,351. HART, M. H. (1978). Icarus 33,23. HOFFER, E. (1951). The True Believer. New York: Harper & Row. HOROWITZ, P. (1978). Science 201,733. HOYLE, F. & WICKRAMASINGHE, C. (1978). In Life Cloud, the Origin ofLife in the Universe. New York: Harper & Row. PAECHT-HOROWITZ, M. BERGER, J. & KATCHALSKY, A. (1970). Nature 228,636 (1970). JUKES, T. H. (1965). Biochem. Biophys. Res. Commun. 19,391. KATCHALSKY, A., (1973). Nuturwissenschuften 60,215. KENYON, D. H. & STEINMAN, G. (1979). Biochemical Predestination. New York: McGrawHill. KLEIN, H. P. (1978). Icarus 34,666. OYAMA, V. I. BERCHAHL, B. J. & CARLE, G. C. (1977). Nature 265, 110. KUIPER, T. B. H., & MORRIS, M. (1977). Science 196,616. LEVIN, G. V. STRAAT, P. A. & BENTON, W. D. (1978). J. theor. Biol. 75,381. LIPMAN, F. (1971). In Chemical Evolution and the Origin of Life by (R. Bunet & C. Ponnamperuma ed.) Amsterdam: North Holland. LIPMAN, F. (1974). In The Origin of Life and Evolutionary Biochemistry (K. Dose, S. W. Fox, G. A. Debarin & T. E. Pavlovskaya, eds) New York: Plenum Press. MCMILLAN, B. (1953). Ann. math. Stat. 24,, 196. MILLER, S. L. (1955). J. Amer. Chem. Sot. 77,235l. NUSSINOV, M. D., CHERNYAK, Y. B. & E~INGER, J. L. (1978). Nature 274,859. NUSSINOV, M. D. & VEKHOV, A. A. (1978). Nature 275, 19. QUASTLER, H. (1964). The Emergence of Biological Organization. New Haven, London: Yale University Press. RASOOL, S. I. (1972). In Exobiology (C. Ponnamperuma, ed.). New York, Amsterdam: North Holland.
SELF
ORGANIZATION
SCENARIOS
31
SHANNON, C. E. & WEAVER, W. (1964). The Mathematical Theory of Communication. Urbana: The University of Illinois Press. KHINCHIN, A. I. (1957). MathematicalFoundations of Information Theory. New York: Dover. SHANNON, C. E. (1951). Bell Syst. Tech. J. 30,50. SIMPSON, G. G. (1964), Science 143,769. TREVINA, A. S. & PICARD, M. D. (1978). Icarus 35,385. WOESE, C. R. (1977). In Exobiology (C. Ponnamperuma, ed.). Amsterdam, London: NorthHolland. TZE-FEI WONG, J. (1976). Proc. natn. Acad. Sci. U.S.A. 73,2336. TZE-FEI WONG, J. (1975). Proc. natn. Acad. Sci. U.S.A. 72, 1909. Y&AS, M. (1958). In Symposium on Information Theory in Biology (H. P. Yockey, R. L. Platzman & H. Quastler, eds). New York, London: Pergamon Press. YEAS, M. (1974). J. theor. Biol. 44,145. YOCKEY, H. P. (174). J. rheor. Biol. 46,369. YOCKEY, H. P. (1977a). J. theor. Biol. 67,345. YOCKEY, H. P. (19776). J. theor. Biol. 67,377. YOCKEY, H. P. (1978). J. theor. Biol. 74, 149. YOCKEY, H. P. (1979). J. theor. Biol. 80,21. YUNG, Y. L. & PINTO, J. P. (1978). Nature 273,730.