APPLIED MAT~EMATDC~ AND
CC~LITAT~Obl
ELSEVIER
Applied Mathematics and Computation 98 (1999) 293-300
i
The evolution in the brain Gregor Kjellstr6m Hagvi~gen 29-A. 141 70 Huddinge, Sweden
Abstract
It is shown that populations of individual signal patterns in the central nervous system, may be adapted for maximum mean fitness in about the same manner as populations of organisms in the Darwinian evolution of natural systems. This is also consistent with the Hebbian rule of associative learning and the maximum-entropyprinciple according to the second law of thermodynamics. © 1999 Elsevier Science Inc. All rights reserved.
Keywords." Evolution; Central nervous system; Maximum-entropy-principle;Associative learning
1. Introduction
The Darwinian evolution of natural systems has been described as a process climbing an unknown hilly genetic value landscape [1,2,4]. The evolution as well as the ontogenetic program of a particular individual has also been described as a random walk over a set of DNA-messages [1,6]. Thus, as soon as we map the set of individual DNA-messages onto some set of parameters, those parameters tend to become Gaussian distributed in every large population. Similar models have also been used by biologists [7]. As examples of such parameters we may also consider morphological polygenic characters. The set of acceptable points (individuals) in the parameter space is usually defined by a probability function known as thefitness o f the individual, s(x), i.e, the probability that the individual - having the phenotype vector X T = (Xl, X 2 , . . . ,Xn), where XT is the transpose of x - will become a parent of a new individual in the progeny population. As shown earlier [1], the function s(x) m a y be replaced by a region of acceptability, d , having s(x) = q = < 1 for all x inside d and s(x) = 0 outside ~ , without any loss of generality.
0096-3003/99/$ - see front matter © 1999 Elsevier Science Inc. All rights reserved. PII: S0096-3003(97) 10156-4
G. KjellstrSrn I AppZ Math. Comput. 98 (1999) 293-300
294
One way to explain the evolution would be to solve the equations of genetic diffusion with respect to the rules of genetic variation over the set of DNAmessages. But such an approach will hardly be feasible. Instead, the theory of Gaussian Adaptation (GA) [1,3-5] may give a good second order approximation of the behaviour of the parametric diffusion. In the GA-process, the variability of parameters in a large population of individuals is approximized by a Gaussian distribution. Let v(x) be a Gaussian probability density function with mean m and moment matrix M, i.e. v(x) = ? e x p { - ( m - x)TM -1 (m -- x)/2}, where 7 is a constant such that the integral of v(x) over the whole space equals 1. The entropy (which is a measure of disorder) of the Gaussian is defined as: H = log{(27te)" det(M)}t/z
where n is the number of dimensions.
This means that when the volume of the ellipsoid of concentration increases, then the entropy will also increase like the entropy of an expanding gas. This corresponds to larger genetic differences between individuals in the population in analog with larger distances between molecules in the gas as well. Other entities of interest are the mean fitness of population
P(m) = j q v ( m - x)dx = / s(x) v(m-x)dx .~/
and the center of gravity of acceptable individuals (parents to individuals in the progeny generation) In*
~
fuxv(m - x)dx f~¢v(in- x)dx "
z
j'x s(x) v(m - x)dx f s(x) v(m-x)dx
According to the theorem of GA, the entropy of the Gaussian distribution is maximized with respect to the constraints s(x), d and P if the conditions In = In* and M proportional to M* are satisfied [1,5]. Here M* is the moment matrix of parental parameters. The nervous system is already in the new born baby equipped with a tremendous amount of circuitry, adapted by evolution and immediately ready for use, for instance the seize reflex and the suck movement. A new born fawn tries to get up on his legs as soon as possible. But a lot of adaptation still remains. After the birth, the nervous system is more responsive to input signals from the environment, and an electro-chemical evolution and adaptation begins. Many researchers have come to the conclusion that there is an adaptive signal evolution going on in the human brain [9-11]. This mainly means that signal patterns are generated, modified and selected for the well-being and survival of the organism. Instead of genetic parameters, the vector x wilt now carry a signal pattern characteristic of the system. In our model the components of x are assumed to be ordinary real numbers. The function s(x) becomes
G. Kjellstrbm I AppL Math. Comput. 98 (1999) 293-300
295
the probability that the signal pattern x with in some way be accepted by the system. In the ordinary genetic evolution, different individuals may be tested in parallel, but I am not so sure that individual signals may always be tested in parallel. For instance, it will hardly be possible to test different motor signals for a certain muscle activity in parallel. On the other hand, many adaptive processes for different purposes may be executed in parallel. Many adaptive neural network models of the brain assume some sort of teacher to be available. Suppose for instance, that x is an input signal pattern to the system, and that it is desired that the system should respond with the signal y = Wx, where W is an n-by-n adaptive matrix with elements w,j, as shown earlier (Nass and Cooper, 1975, see reference in [12]) an efficient rule of modification is
wij(t + l) = (1 - b)wij(t) + b x,y,, where b is some small positive constant, and t represents the time. Other models rely on some set of differential equations, which make the system completely deterministic. The drawback seems to be a lack of imagination and creativity of the system. The well-known method of back propagation of errors [12] still requires a teacher and suffers the risk of getting stuck at some local optimum. To avoid this type of problem, certain global optimization algorithms have also been tried. By global algorithm we mean that it, at least, has the ambition to look for the highest peak in a large set of peaks in the landscape. The Simulated Annealing (SA) (see references in [12]) is an example of such an algorithm. In short, the process is described like this: Small random displacements at the nodes of the neural network occurs at given time intervals. If the change AE in energy (criterion) caused by this perturbation is negative or zero, the system displacement is accepted (i.e., allowed to reset the network) and carries over to the next time step. If, however, AE is positive, the displacement is accepted with probability p(AE), where p is an exponentially decaying function. Otherwise, the system reverts to its unperturbed state. It has been shown that the energy will be Boltzmann distributed. This is the origin of the Boltzmann machine. An example of the Boltzmann machine is described by Hinton and Se.jnowski (see reference in [12]). In this case the node weight wij will be modified according to Awij : - b w , j + c(p
--
where p~ and P~s are probabilities that must be determined by separate runs of the SA-algorithm. The advantage is that there is no need for a teacher, instead the teacher has been replaced by a certain energy function. The drawback, as the Boltzmann machine has been demonstrated in [12], is that it uses node weights as free parameters instead of signal patterns, which means that the algorithm
296
G. Kjellstr6m / Appl. Math. Comput. 98 (1999) 293-300
probably has to deal with a much larger number of parameters. For instance, the signal pattern x carrying n parameters may have to pass a matrix of n 2 nodes w o. Thus, the optimization problem becomes much more difficult. Another drawback is that several separate runs with the SA-algorithm are needed before the w,7 can be modified. The purpose of this paper is the presentation of a signal version of the GAprocess. The GA-process is a discrete random process that may fulfil the theorem of GA, which means that the entropy of a population of signal patterns will be maximized keeping P at some suitable level.
2. The GA-model of the brain
Of course, the circuitry of the brain is tremendously complex and good models should perhaps include such things as circuits for modulation, summation or multiplication of signals, digital filters, delta modulators and many other types of circuits. Recent neuron research [l 3] also reveals that neurons or networks of neurons may behave like such circuits. But I have no intension to give a complete model of the human central nervous system. The model to be presented here is based on the GA-process, and we should therefore expect it to climb some mental landscape by a constrained maximization of signal pattern entropy. As has earlier been shown [1,4], this considerably increases the probability of finding higher peaks in the landscape. The only teacher assumed here, is the well-being of the organism to the benefit of survival. The number of parameters is assumed to be n, i.e. the number of components in the signal pattern x. In this model memes [14] such as the singing of birds, melodies, strategies, cars, computers, rifels and atomic bombs may be represented as points x in a signal space. Like the random walk evolution of an individual, the evolution of a mem may be seen as a small step process in some parameter system, which may then be mapped to the cortex. Out of a basis of knowledge brought about by parents, teachers, researchers, and others, the human brain creates new knowledge by small changes in the available basis. The step from the radio receiver tube to the transistor is seemingly enormous, but it must not be in the brain of the inventor. With all respect to the invention of the transistor and its enormous impact to the technical development, we may conclude that it has about the same function as a radio reciever tube. Existing knowledge about the behaviour of electrons in semiconductors (silicon or germanium) made the invention of the transistor possible. But if the invention of the transistor had been contemporary with the birth of Jesus or Muhammed, then the step had been enormous. As another example of a cultural small step process we may think of the development of melodies to the annual Eurovision Song Contest. Melodies,
297
G Kjellstrg)m / Appl. Math. Comput. 98 (1999) 293-300
instrumentations or harmonies that differ too much from the predominant pattern will hardly survive. Again, because the process is assumed to be a small step process, we expect it to become Gaussian distributed as soon as it is mapped on some parameter system. And P becomes the mean fitness of signal patterns in a population of individual signal patterns. In the course of evolution, the brain has mainly been divided in three parts (1) the brain stem, (2) the limbic system and (3) the cortex [10,15]. This, of course, is a very rough description of the brain, but may still be sufficiently good for our purposes. In this model the brain becomes a dipole [10] with a chaotic brain stem inside, while the outside cortex may be seen as a highly ordered map of the environment. The idea is that the brain stem acts as a random signal generator to the limbic system, which serves as a very complex adaptive signal filter between the brain stem and the cortex. It is also known [10,13], that the brain stem has a chaotic structure and that it may include many neurons that fire at random. In the cortex the filtered signals are compared with the signals arriving from the environment. In this process perturbed individual signals in a large population, may compete with each other, but only those signal patterns which lead to a state of well-being of the organism will survive, which means that the limbic system is successively modified, so as to increase the mean fitness of signal patterns. An important principle of associative learning (due to Hebb, see references in [12]) seems to be that the synaptic transmission between neurons is strengthened if the neurons are simultaneously active while the system is in a state of well-being, otherwise the transmission may be weakened. To see that G A satisfies the Hebbian rule, we may proceed as follows: In order to asymptotically fulfil the theorem of GA, M may in principle be modified after every trial y - leading to an acceptable signal pattern x = m + y according to M:=(l-a)M+ayy
T
(a<< 1).
In order to guarantee a suitable increase in entropy, y should be Gaussian distributed with moment matrix tt2M, where the scalar/~ > 1 is used to increase the entropy. By a suitable choise of # the process might also fulfil the theorem of efficiency [1,3]. But M will never be used. Instead we use the matrix W defined by W W T = M. Thus, we have y = Wg, where g is Gaussian distributed with the moment matrix/~U (U is the unit matrix). W a n d W r may be modified by the formulas: W=(1-b)
W+byg T
and
WT = ( I - b )
because multiplication gives WWz = (1 - 2b)WWx + 2byyx,
WT + b g y T
(b<
(1)
G. Kjellstr6m / Appl. Math. Comput. 98 (1999) 293-300
298
gl
Yl
xl
C H A
•
notacceptable
_
T
R I T E
0
N
acceptable-~-
2 2525L.kk,21225Z_ LIIIL:II, L2LI2 L i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
"~................................ ................................................. i |
Fig. 1. An example of a two-dimensional signal process that might fulfil both the theorem of GA and the theorem of efficiency.
where terms including b 2 have been neglected. Thus, putting b = a/2, M will be indirectly adapted with good approximation. In practice it will suffice to modify W only. In Fig. 1, showing a signal pattern with only two components, it will be assumed that the chaotic brain stem generates a chaotic signal pattern (gl, g2) where gj,and g2 are independently Gaussian distributed with standard deviation equal to #. To the right we have the ordered cortex in which the criterion or well-being is defined and detected. If the system is in a state of well-being, the matrix element wit may be modified according to the rule (1) i.e.
w,j(t + 1) = (1 -b)w~/(t)+ by,g~,
(2)
otherwise w~j will remain unchanged. As can be seen, the connection between the neurons generating the signals Yi and gj is strengthened if both yi and g/are strong, otherwise wii will slowly be forgotten. Thus, the GA-algorithm, which is a good second order approximation of the Darwinian evolution of natural systems, is also consistent with the Hebbian rule of associative learning. In fact, it is also consistent to the rule proposed by Nass and Cooper, even though their rule was used for another purpose. The center of gravity, m, may be modified after every single acceptable individual signal pattern x according to the formula: m := (1 - c)m + cx,
(3)
G. Kjellstrfm / Appl. Math. Comput. 98 (1999) 293-300
299
where c is a scalar < 1. The inverse of c may also represent the number of individuals in the population of signal patterns. But of course, the rules (2) and (3) will never be calculated by the nervous system. Eq. (2) may be seen as a natural behaviour of a neuron and a center of gravity of a population (3) may exist without any calculations. Nevertheless, this signal process will maximize the entropy of signal pattern populations with respect to s(x) or ,~' for any given value of P. If the environmental requirements are sharpened during the process, then ~¢ may shrink and the process may converge towards some of the peaks in the landscape.
3. Discussion N o b o d y knows how many peaks there may be in such high-dimensional landscapes, but some researchers are convinced that genetic landscapes as well as landscapes on the surface of the earth, will have a fractal structure [2,8]. Mental landscapes will most probably be no exceptions from that rule. Simple mathematical estimations also seem to show tbat the number of peaks may as well be of the same order of magnitude as the total number atoms in the universe (about 108°). For instance, if we allow 300 different parameters to attain only 10 different parameter values each, then there will be 10300 points in the parameter space. Further, if, on the average, only one point out of 10200 is a peak, we will still have 10 m° peaks in the landscape. The advantage of the GA-algorithm is that signals are accepted or rejected with respect to s(x) or d , which, in contrast to SA, makes it possible to utilize the second order information inherent in the landscape, and to climb even very sharp mountain crests. The node weights wo may be immediately modified by the utilization of every acceptable signal pattern. There is no need for a teacher or some special energy function. The only necessary teacher is the well-being of the organism. The theorem of efficiency [1,3] may also be fulfilled if a certain mutation rate p, can be kept at a suitable level. As shown earlier [1,4], the algorithm is good at finding higher peaks in fraktal landscapes. Different GA-processes may be executed in parallel, if the corresponding signal patterns are being used for different purposes. Different populations of signal patterns may be adapted with respect to a mental landscape in about the same way as the evolution adapts different populations of organisms for maximum mean fitness in a genetic landscape. If the number of peaks in the mental landscape is of the same order of magnitude as the number of atoms in the universe, the process will become unpredictable. No one would say that such an algorithm is conscious or purposeful, but nevertheless, the algorithm is teachable, imaginative and creative; attributes that are normally ascribed to conscious persons. In principle, it will
300
G. Kjellstrrm / Appl. Math. Comput. 98 (1999) 293-300
also, as far as possible, find higher peaks prior to lower ones, or in other words: Out of a large number of alternatives, the algorithm tries to find the best one, which is about the same as a person with a free will also tries to do. But, of course, the algorithm has no free will; it is only free to maximize the entropy, imagination and creativity with respect to the constraints s(x), ~ or P.
References [1] G. Kjellstrrm, Evolution as a statistical optimization algorithm, Evolutionary Theory 11 (1996) 105-117. [2] M. Eigen, Steps Towards Life, Oxford Univ. Press, Oxford, 1992. [3] G. Kjellstrrm, On the emciency of Gaussian adaptation, J. Optimiz. Theory Applic. 71 (3) (1991) 589-597. [4] G. Kjellstrrm, L. Tax~n, Gaussian adaptation, An evolution-based efficient global optimizer, in: I.C. Brezinski, U. Kulish (Eds.), Computational and Applied Mathematics, Elsevier, Amsterdam, 1992, pp. 267-276. [5] G. Kjellstrrm, L. Taxrn, Stochastic optimization in system design, IEEE Trans. Circuits Systems CAS-28 (7) (1981) 702-715. [6] B. Alberts, D. Bray, J. Lewis, M. Raft, K. Roberts, J.D. Watson, The Molecular Biology of the Cell, Garland, New York, 1994. [7] R. Lande, Quantitative genetic analysis of multivariate evolution applied to brain-body size allometry, Evolution 33 (1) (1979) 402-416. [8] B.B. Mandelbrot, The Fractal Geometry of Nature, Freeman, New York, 1983. [9] G.M. Edelman, Neural Darwinism, in: The Theory of Neuronal Group Selection, Basic Books, New York, 1987. [10] M. Bergstrrm, Neuropedagogik. En skola frr hela hj~irnan (in Swedish), Wahlstrrm & Widstrand, Stockholm, 1995. [11] J.H. Holland, Adaptation in Natural and Artificial Systems, The University of Michigan Press, Ann Arbor, 1975. [12] D.S. Levine, Introduction to Neural & Cognitive Modeling. Lawrence Erlbaum, London, 1991. [13] E.R. Kandel, J.H. Schwartz, T.M. Jessel, Essentials of Neural Science and Behaviour, Prentice-Hall, Englewood Cliffs, NJ, 1995. [14] R. Dawkins, The Blind Watchmaker, Penguin, Harmondsworth, 1988. [15] P.D. MacLean, A Triune Concept of the Brain and Behaviour, The University of Toronto press, Toronto, 1973.