A hybrid architecture for programmable computing and evolutionary learning

A hybrid architecture for programmable computing and evolutionary learning

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 6,245-263 (1989) A Hybrid Architecture for Programmable Computing and Evolutionary Learning KIUMIAKING...

178KB Sizes 1 Downloads 115 Views

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING

6,245-263 (1989)

A Hybrid Architecture for Programmable Computing and Evolutionary Learning KIUMIAKINGBEHIN Department ofElectrical and Computer Engineering, University of Michigan, Dearborn, Michigan 48128 AND

MICHAELCONRAD Department of Computer Science, Wayne State University, Detroit, Michigan 48202 Received September 27, 1987

The relation between the structure and the computational function of the brain must have a plastic quality in order for it to have developed through the Darwinian mechanism of variation and selection. It is likely that such structure-function plasticity plays a role in ontogenetic learning as well. Present-day digital computers, by contrast, are highly programmable. But this powerful property entails rigidly defined structure-function relations that are generally incompatible with self-organization through evolution. Hybrid systems could combine the advantages of both programmability and evolvability. The system described in this paper comprises a conventional machine that communicates with an evolution-amenable cellular automaton system. The latter learns to use its low-level parallelism through a “copy thy neighbor” evolutionary algorithm. We describe some simulation experiments with the system and consider how direct fabrication might be achieved using silicon as well as nOnSiliCOn technologies. 0 1989 Academic PXSS, IIIC.

1. INTRODUCTION Recent years have seen a resurgence of interest in adaptive networks and computer learning. This has naturally led to a reexamination of the similarities and dissimilarities in the way conventional computers operate and the approach taken by living organisms. One of the most useful features of the conventional digital machine is that it is highly programmable. Spontane24.5 0743-73 15189 $3.00 Copyright 0 1989 by Academic Press, Inc. All rights of reproduction in any form reserved.

246

AKINGBEHIN AND CONRAD

ous learning, achieved easily by biological organisms, is however hard to implement. These divergent capabilities are probably inherent in the relation between structure and computational function [ 10, 12, 141. Digital computers are built from components that are supposed to behave according to specifications that are simple and precise, like the specifications of symbols in a formal language. The possible interactions are highly constrained, in general to a highly sequential mode of operation. The brain, in contrast, is built from neurons each of which is an intricate organization of receptor molecules, external and internal membranes, membrane and cellular cytoskeletons, channel proteins that control neuroelectric activity, and internal messengers that control channel proteins [ 20, 3 1, 301. These potentially powerful and highly modifiable information processors are organized in a massively parallel and highly interconnected fashion, allowing for a powerful and plastic network level of processing. Information can thus flow along both horizontal and vertical spatial scales and in addition is highly distributed [ 15 1. Lowlevel parallelism, at the level of groups of neurons and even within the neuron, probably contributes in significant ways to global function. Continuous dynamics and discrete switching processes are intermixed at each level of organization, and there is a great deal of redundancy. These plastic and dynamic features are highly favorable to evolution [ 13, 161, but inimical to the type of programmability that is supported by sequential digital computers. The style of computing in programmable machines and in the brain would thus appear to be conceptually rather alien. However, this does not preclude combining them into a hybrid architecture. Our strategy will be to abstract some of the attributes of brain processing discussed above, although without making any attempt to identify the elements in our architecture with components in the brain. The attributes to be represented include dynamics, redundancy, hierarchy, low-level parallelism, and learning through evolution. These attributes are treated as add-ons to a conventional digital computer, and actually we simulate them using a conventional machine. The humanmachine symbiosis as it exists today is also a hybrid architecture, but the human provides the more fundamental layer of computing. 2. AHYBRIDLANGUAGE Conventional programming languages are designed to take advantage of the programmability of conventional digital computers. For our hybrid architecture, we augment such a language with syntactic constructs that exploit the evolvability typical of living organisms. The constructs used will not have the elementary (obviously implementable) characteristics of the constructs

HYBRID

ARCHITECTURES

247

in conventional languages. They will reflect the type of operations that are seemingly natural for adaptable organisms. A “LEARN” construct will associate a “pattern” with a “class.” A “DECIDE” construct will request the “class” of a specified “pattern.” The LEARN construct supplies information to an adaptable entity while the DECIDE construct requests information from an adaptable entity. For example, the construct LEARN JOHNNY IS STUDENT associates the pattern “JOHNNY” with the class “STUDENT.” This association is an input to the adaptable entity. The construct DECIDE JOHNNY on the other hand is a request to the adaptable entity for an association. The resulting association is an output from the adaptable entity. One hopes that the adaptable entity will respond with the correct association. Such behavior is quite natural and efficient for a living organism but very inefficient and computationally expensive for a conventional computer. Table I is a formal specification of the programming language of the hybrid computer. The first eight instructions are conventional instructions patterned after the random access machine -(RAM). All conventional digital computers are implementations of the RAM model. Since the RAM model can simulate a Turing machine in polynomial time [2], we know this instruction set is completely general. The last four instructions are more biological in style. A program that uses the biological instructions is called an evolutionary program. The LEARN instruction teaches the model how to classify a pattern while the DECIDE instruction requests the model to classify a pattern. Once trained, the QUERY command is used to find out the current state of the model. This state can then be saved in nonvolatile storage for future use. The CONFIGURE command is used to restore the model to a known trained state. This makes it unnecessary to retrain the model to perform a task for which it was previously trained. Since the RAM machine is equivalent to a conventional programming language, our simulation simply augments a conventional programming language (“C” to be exact) with the biological instructions. The massive lowlevel parallelism remains invisible to the programmer (see also [ 25 ] ). 3. THEHYBRIDARCHITECTURE The architecture of the model is as shown in Fig. 1. The model consists of a control node and a network of adaptable transducing nodes. The control

248

AKINGBEHIN AND CONRAD TABLE I FORMAL INSTRUCTION SET

OF

THE

HYBRID MACHINE

The hybrid biocomputer has 26 registers named A thru Z. The valid instructions are: MOVE SUBTRACT ADD MULTIPLY JUMP JUMPZERO JUMPGTZ HALT

operand 1 TO operand 1 FROM operand 1 TO operand 1 BY location location location

operand2 operand2 operand2 operand2

LEARN DECIDE QUERY CONFIGURE

pattern pattern processor processor

classification

IS

Notes: 1. The operands of the MOVE instruction can be registers and/or external devices. 2. The Z register is a special register that is automatically updated after each instruction. The conditional jumps refer to the Z register. 3. Two or more instructions separated by an ampersand (&) will be executed concurrently; e.g., ADD A TO B&MULTIPLY C BY D

node has bidirectional communication with all the transducing nodes. Each transducing node in turn can communicate only with its immediate neighbor, with wrap-around at the edges so that all nodes communicate with four neighbors. The control node typically broadcasts a request to all the transducing nodes and then waits for the replies. The transducing nodes pro-

Network of Transducing Nodes

transducing node

FIG. 1. The hybrid model.

HYBRID ARCHITECTURES

249

cess the request in parallel and then send the results back to the control node (cf. [ 341). Note again that this low-level parallelism is invisible to the programmer. The “neighbor only” communication mode is characteristic of cellular automata. The neighbor-only communication is also very attractive from an implementation point of view. Such a topology is easily implemented using two-dimensional VLSI technology. Highly parallel structures such as systolic arrays use this configuration [ 40,241. The most obvious alternative to neighbor-only communication is full connectivity. However, the large number of communication links required for full connectivity makes such an approach impractical. Another possible topology is a bus topology. Bus topologies however have serious performance problems [ 38, 4, 391 if the application involves heavy instantaneous traffic, such as that in the hybrid architecture. The network here is two-dimensional, but higher dimensionality is in principle possible. The control node is a conventional silicon processor while each transducing node could be all silicon or could contain organic components. Conventional instructions are executed at the control node while biological instructions involve the transducing nodes. The software simulation of the hybrid architecture is as shown in the layered diagram of Fig. 2. One layer above the lowest level shown in Fig. 2 are several transducing elements. These elements operate in parallel and map a set of input patterns to a set of output patterns (classes). They provide lowlevel parallelism and pattern recognition mechanisms. At the next layer is the evolutionary learning algorithm. This algorithm varies and selects the lower level transducing elements to provide evolution by variation and selection. This layer uses a histogram to map several classes (output patterns) to one resultant class. The next layer (programmable interface) provides a virtual programmable hybrid computer. A programmer writes evolutionary programs to run on this layer.

F IG. 2. Structure of simulation,

2.50

AKINGBEHIN AND CONRAD

4.

LEARNING

ALGORITHM

The control node is the interface between a programmer and the model. The programmer submits instructions to the control node. The control node decides what to do with each instruction, takes the appropriate action, and then sends a reply back to the programmer. The conventional instructions are standard constructs of the host programming language and are executed directly on the control node. The biological instructions, however, require cooperation between the control node and the transducing nodes. The control node performs the following actions for each biological instruction: QUERY-Control node broadcasts a request for current configuration to all transducing nodes. Receives all configurations. Combines configurations into a single data structure. Returns the data structure to the requestor. CONFIGURE-Control node receives data structure from requestor. Extracts individual configurations from data structure. Sends individual configurations to transducing nodes. Waits for acknowledgment from all nodes. Notifies requestor that configuration is completed. LEARN-Control node receives a pattern and the correct classification for that pattern from requestor. Broadcasts pattern and “correct class” to transducing nodes. Waits for acknowledgments from transducing nodes. Notifies requestor that LEARN is completed. DECIDE-Control node receives a pattern from the requestor. Broadcasts pattern to transducing nodes. Waits for classifications. Builds histogram of classifications. Selects the most frequently occurring classification and sends it to requestor. The QUERY and CONFIGURE instructions are bookkeeping functions included for the convenience of the programmer and are irrelevant to the computational aspects of our model. A programmer typically trains the model to perform a task, executes a query to get the resulting configuration, and then saves the configuration in nonvolatile storage such as magnetic disk media. To perform the same task in the future, the programmer need only CONFIGURE the model. The model does not have to be retrained for the same task. During a LEARN operation, each transducing node executes the following : 1. receive “pattern” and “correctclass” from control node 2. locally transduce “pattern” to “myclass” 3. send “myclass” to neighbors 4. receive “neighborclass” from neighbors 5. compare “myclass”, “neighborclass”, and “correctclass”

HYBRID

251

ARCHITECTURES

-if I am correct, do nothing -if neighbor is better, copy neighbor -if we are both equally wrong, mutate ’ Note the order of steps 3 and 4. This order is important. If the receive operation precedes the send operation, a deadlock situation develops. “Better” in step 5 is measured in terms of the Hamming distance between two patterns; that is, the number of bit positions that differ between two bit strings of the same length. Also note that a global DECIDE operation by the entire network is in effect an aggregate of several local “transduce” operations. The transducers do not necessarily evolve to the same pattern, first because differently structured transducers can have correct performance and second because they may have correct performance under different input conditions. It should be pointed out that this type of learning and adaptation is decentralized, each node interacting only with its neighbors during the learn cycle. This is in contrast to the centralized technique commonly used in Adalinebased connectionist networks (see, for example, [ 4 I]) where a central controlling node determines weights and modifies all the nodes appropriately.

5.

GR A D U A L I S M

AND

GE N E R A L I Z A T I O N

EXPERIMENTS

We now write a number of programs to illustrate how our evolutionary programming language works, and how it is related to the ability of the hybrid model to learn gradually and to generalize from training cases. Rather than representing patterns and classes as bit strings as was done in the experiments, we arbitrarily express them as symbolic text strings. Also, we emphasize that at this point the operations performed on the bit strings are trivial; they merely map one bit string to another. Our objective is only to illustrate the structure of the programs. Also note that our system is not a content addressable memory; rather, it is a pattern recognizer. To investigate gradualism in the hybrid model, the following evolutionary program was used: DO NTIMES LEARN JOHNNY IS STUDENT DECIDE CLASS OF JOHNNY PRINT CLASS OF JOHNNY END DO

’ “Mutate” as used here means randomly vary the thresholds that control the manner in which cells transduce information or signals.

252

AKINGBEHIN

AND CONRAD

It was found that a value of around 5 for N was in generai adequate for the model to correctly classify JOHNNY as a STUDENT. It was also found that the decided CLASS of JOHNNY gradually converged toward the correct value. This gradualism could be demonstrated with as few as three transducing nodes. Here, as in all the experiments to be described, the classes are represented by 16-bit numbers. To investigate generalization, the following evolutionary program was used: DO N TIMES LEARN JOHNNY IS STUDENT END DO DECIDE CLASS OF JOHNNY DECIDE CLASS OF JOHN DECIDE CLASS OF JOHNY DECIDE CLASS OF MARY

Again a value of around 5 for N was found to be adequate. After being trained to classify JOHNNY as STUDENT, the model was found to give similar classifications to patterns which are similar to JOHNNY. Patterns which are very different from JOHNNY result in very different classifications. Reasonable generalizations were obtained with as few as four transducing nodes. Suppose it is required for the model to retain the following associations at the same time: JOHNNY IS STUDENT MARY IS FEMALE MIKE IS MALE TERRY IS STUDENT PETER IS FACULTY

An evolutionary program to train the model might be written as follows: DO NTIMES LEARN JOHNNY IS STUDENT END DO DO NTIMES LEARN MARY IS FEMALE END DO DO NTIMES LEARN MIKE IS MALE END DO

HYBRID

ARCHITECTURES

253

DO N TIMES LEARN TERRY IS STUDENT END DO DO NTIMES LEARN PETER IS FACULTY END DO

An alternative evolutionary program to perform the same training is as follows: Do NTIMES LEARN JOHNNY IS STUDENT LEARN MARY IS FEMALE LEARN MIKE IS MALE LEARN TERRY IS STUDENT LEARN PETER IS FACULTY END DO

The first program was found to be indeterminate with small values of N ( 1 to 2). With slightly larger values of N (2 to 4), somewhat better performance was obtained. With large values of N (5 or above) the model would retain only the last association. The second program was generally found to perform better than the first. In particular, the model would stabilize at an optimum performing state when a large value of N (around 20) was used. Increasing the value of N beyond this caused slight variations but no gradual shift in performance.

6.

PR O C E S S O R

AND

T O P O L O G Y EXPERIMENTS

Another group of experiments involved varying the characteristics of the model. The two characteristics varied were the number of transducing nodes and the topology of the transducing network. It was fairly easy to vary these characteristics since the simulation program was designed with this in mind. Varying the number of processors (i.e., transducing nodes) did not have much effect on performance in the single-association (one pattern and one class) case. The number of processors however was found to be very critical in the multiple-association (several patterns and classes) case. The more processors, the larger the number of associations that can be retained at the same time. In the absence of any significant transducer dynamics, a multiple association system could as well be built out of a collection of single-association networks. In fact, in order to deal with the multiple-association case, it was necessary to move the simulation to a more powerful computer.

254

AKINGBEHIN AND CONRAD 7. SHAPERECOGNITIONEXPERIMENTS

Now let us explore how the hybrid model could exploit its cellular automata architecture. A cellular automaton is an array of cells (finite state machines) that process signals concurrently, each cell interacting with its immediate neighbors (cf. [ 421) _The experiments focused on the shape recognition capabilities of the model. To explain these experiments, a slightly different terminology will be used. As shown for the center node in Fig. 3, each transducing node is still divided into 16 cells, “a” through “p.” Each cell within a transducing node interacts with its four neighboring cells. Each transducing node in turn interacts with its four neighboring nodes. For example, “f” interacts with “b,” “g,” “j,” and “e.” Similarly, “X” interacts with “B,” “C,” “D,” and “A.” The edges of each transducing node wrap around; for example, the “up” neighbor of “b” is “n.” Associated with each cell is an activity (inhibition threshold). The actual range of values for the activity depends on the rules of interaction used. To perform a local “transduce” operation, a 16-bit “initial state” is presented as input to a transducing node. The cells of the node process the input in parallel to produce a “final state” after a number of “unit time steps.” In keeping with the terminology of the hybrid model, the initial state will be referred to as a pattern and the final state will be referred to as a class. The global “decide” operation by all transducing nodes is still an aggregate of

Transducing Node

Cdl

F IG. 3. Cell-to-cell interactions.

HYBRID

ARCHITECTURES

255

the local transduce operations. Also, the upper layer evolutionary algorithm remains essentially the same. The processing of a pattern to produce a class depends on a predefined rule of cell-to-cell interactions. The rules used were arbitrary and do not mimic any known biological processes. Any of several rules could have been used. The rules used assume that the neuron is self-stimulatory and stimulated by neighbors. The following two rules were tried: RULE 1 (a) Activities vary (using 16 bit integers) from 0 to 65535. (b) The class is produced from the pattern in one time step. (c) A cell is active if the sum of its activity and neighboring activities (five activities total) is greater than 5 X 32,768. The cell is otherwise passive. An active cell inverts the pattern bit incident on it while a passive cell leaves the bit as is. RULE 2 (a) Activities vary over a variable small range (0 to 5, typical). (b) The class is produced from the pattern after a variable number of time steps (three typically). (c) A cell is active if its activity is greater than the sum of the pattern bits in its four neighbors. The cell is otherwise passive. An active cell inverts the pattern bit incident on it while a passive cell leaves the bit as is. In both cases, the upper layer evolutionary algorithm varies the activities. The second rule obviously leads to more processing within cells. In one set of experiments (using Rule l), the model was subjected to several LEARN cycles with the horizontal line shown in Fig. 4a as the pattern and the point in Fig. 4b as the class. The class of a pattern will be referred to as a “readout pattern” or “readout state” in this and subsequent experiments. After training, it was found that DECIDE operations on other horizontal lines consistently produced the pattern in Fig. 4a, which in turn produced the pattern in Fig. 4b. Other input patterns in general did not produce this readout pattern. In other words, the model evolved so as to be sensitive to horizontal lines. The explanation for this behavior is surprisingly simple.

+----+ I

I

IXXXXI I

I

I

I

t--+ (a)

+--mm-+ I

I

I

I I XI +----+ I

(bl

FIG. 4. Training pattern and class.

256

AKINGBEHIN AND CONRAD

When trained to produce Fig. 4b from Fig. 4a, the activities along the line essentially evolved to a high level of reactivity. This high level of reactivity served to blank out the line. We can understand this by recalling that simple rules of geometry stipulate that parallel straight lines never intersect. Hence, when presented with other horizontal lines, the line of high reactivity was not disturbed and since high reactivity results in inversion, a line was produced along the high reactivities. Other patterns in general would intersect with the line and prevent the activities from producing a complete line. The important point here is that evolution can take advantage of relationships or principles of which the programmer need not be aware. More extensive experiments were performed using Rule 2. The number of cycles required to learn the task of Fig. 4 was found for different activity ranges and cell time steps. Convergence to the readout pattern as a function of learn cycles was studied for different activity ranges and cell time steps. Figure 5 shows the number of LEARN cycles required to learn the task as a function of cell time steps. The bar graph indicates a cyclic behavior with two or five time steps being optimum. This correlates with the 4 X 4 cell array structure. The cyclic behavior can be attributed to the fact that the edges of the cell array wrap around to form a continuous array. Figure 6 shows the number of LEARN cycles required to learn the task as a function of activity range. The task was never learned with maximum activities of three or less. Similarly, the task was never learned with maximum activities of eight or more. A tightly constrained or too broad activity range impedes the learning process. Figure 7 shows the convergence of the readout pattern toward the desired pattern measured as a Hamming distance. Most of the convergence occurred within the first few LEARN cycles. This is probably connected to the initial

N u m b e r of ” “learn” 14 CYCkS

required to learn

12 ,n

yyy

1 2 3 4 5 Number of Cell Time Steps

FIG. 5. Learn cycles vs time steps.

257

HYBRID ARCHITECTURES “ever learned - XXX XXX xxx xxx xxx xxx 16.. x x x x x x xxx xxx xxx xxx N u m b e r o f 26.. XXX XXX xxx xxx xxx xxx ‘learn’ 24 .. xxx xxx xxx cycles xxx xxx xxx required 20.. x x x x x x xxx to learn xxx xxx xxx 12.. x x x x x x xxx xxx task xxx xxx xxx xxx xxx 8.. xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx 4.. xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx

I

2

3

4 5 6 7 Activity Range

Number of Cell Time Steps = 3

6

F IG. 6. Learn cycles vs activity range.

high heterogeneity of the system, and to the fact that subsequent variety arises only through random mutation. In another set of experiments (using Rule 2)) the model was again trained to recognize a horizontal straight line. The performance was then observed on I6 patterns which included vertical and horizontal lines. Figure 8 shows which patterns were classified as horizontal straight lines and which patterns were not classified as horizontal straight lines. The objective of the experiment was to evaluate the generalization properties of the model. Two of the sixteen patterns were misclassified. This 87.5% correct classification indicates good generalization capability. Also, as shown in Fig. 8, horizontal straight lines with slight imperfections were correctly classified as horizontal straight lines. This is significant. It indicates impreciseness can sometimes be an advantageous computational technique [ 2 1,431. In the experiment above, the network learned to respond to horizontal lines, but not to ignore or respond differently to vertical or other kinds of lines. We thus performed a simple variation (using a 3 X 2 topology) in which the network was trained with two horizontal and two vertical samples to respond to vertical lines by firing all subcells, and to horizontal lines by

2

FIG.

4

6 0 10 12 14 Number of %arti Cycles

16

7. Hamming distance vs learn cycles.

18

258

AKINGBEHIN AND CONRAD Patterns classified as horizontal straight lines: +---+ I I IXXXXI I I I I +---+ + ---- + IX x IXXI lx x I , x XI + ---- +

I

+ ---- + I I IXXXXI IX, I I + ---- +

c---•+ I I IX XXI I I 1 I +--+

+ ---- + I XI I :I I XI + ---- +

+---+ I lxxx I I I I 1 +----+

+----+ I XI IXXXXl I I I I + ---- +

+ ---- + IXXXXI I I I I I I + ---- +

+ ---- + I I I I ,XXXXl I I +----+

Patterns not classified as horizontal straight lines: +----+ +----+ I IXX, I I I IX XI I I I IX XI I ,XXXXl IXXI * - - - - + +----+

+----+ IX I x I x I I XI + - - -

+----+ +----+ I XXI I 1 IXXI I IXXI I I IXX I I XI - + +---+ +---+

+----+ IX I IX I IX I IX I +----*

+----+ I XI I XI I XI I XI +----+

FIG. 8. Classification of horizontal straight lines.

suppressing the firing of all subcells. The divergence between horizontal and vertical output patterns increased as the number of training cycles increased. We stopped the training process after 20 learning cycles, at which point the divergence was substantial. In testing the system, we classified the response as being of the vertical type if the majority of subcells fired, and of the horizontal type if the majority did not fire. We randomly created input patterns and classified them as vertical or horizontal. The classifications yielded by the network were: Vertical patterns recognized correctly: 15 out of 18 Horizontal patterns recognized correctly: 8 out of 10. The vast majority of imposed patterns were of course neither unambiguously vertical nor unambiguously horizontal to the human eye. Experiments were also performed to test the ability of the system to perform a generalization task that is known to be possible, but on the basis of an arbitrary training set. The behavior of the untrained model was first observed on a sample set of patterns, An arbitrarily selected readout pattern was then used as a criterion for separating the patterns into two groups, A and B. The system was restarted with new random activities. A training set of three samples from group A was then used to retrain the model. In one such experiment, the following results were obtained:

HYBRID ARCHITECTURES

259

Group A patterns classified as Group A: 8 Group A patterns classified as Group B: 3 Group B patterns classified as Group B: 9 Group B patterns classified as Group A: 5. These numbers represent 68% generalization. While this is not very high, it should be remembered that the training set could define a large number of potential groupings. Similar experiments with model neurons obeying reaction-diffusion dynamics [ 18, 19, 271 have shown that near-perfect generalization over large sets of patterns can be obtained by evolving the readout mechanism and leaving the dynamics constant. Near-perfect here means that if the dynamics are capable of doing the task, then it can be learned from an arbitrarily selected small subset of training patterns. Systems that use their intrinsic dynamics to generalize can of course always generalize in a way which is incorrect relative to a differently grouped set of patterns. The fact that learning can proceed by varying the dynamics (as in the present case, where the activities are varied) or the readout mechanism indicates that combining the two levels of evolution may be the most effective. (Experiments to test this conjecture are currently under way.) 8. POSSIBLEIMPLEMENTATIONS

The hybrid model was simulated on a uniprocessor computer with the transducing nodes implemented as concurrent processes [ 31. Three approaches to actual implementation are: (a) more efficient simulation using several processors (b) novel effects using conventional materials (c) molecular computers based on organic material. A logical candidate for the first approach is the Massively Parallel Processor (MPP) built for NASA by Goodyear Aerospace Corp. [ 36,351. The MPP is currently located at the Goddard Space Plight Center in Greenbelt, Maryland. The MPP contains 16,384 processing elements arranged in a two-dimensional network such as that used in the hybrid model. Direct implementation of cellular automaton dynamics using properties of silicon that are ordinarily suppressed is a candidate for the second approach. As chips become more densely packed, tunneling effects interfere with their performance of conventional logical functions. However, these effects could be utilized to directly implement the neighbor-neighbor interactions in a cellular automaton [ 81. There is much interest in and work currently being done toward realizing the third approach [ 93. Such effort has variously been described as molecular

260

AKINGBEHIN AND CONRAD

computing [ 7, 1 I], biochip technology [ 32, 171, molecular electronics [ 61, and biocomputing. The specificity, structure-behavior plasticity, and rich dynamical behavior of macromolecular systems make them ideally suited to being the nonprogrammable component of a hybrid computer. The silicon-based approaches (a and b) can be used in the immediate future, while the third approach (c) is obviously a longer term prospect.

9. C ONCLUSIONS We have described a computer architecture with roots in biology, neuroscience, computer science, and artificial intelligence. Computational tasks performed by a simulation of the architecture produced modest but promising results. The architecture we have described is not a content-addressable memory. The memory manipulation part would naturally be provided by the conventional component of the hybrid. Anderson-Kohonen type neural architectures for associative memory could easily be incorporated into the general architecture (cf. [ 1, 281). The cellular automaton design that we have proposed differs from these by virtue of the fact that the cells have richer (and potentially much richer) pattern processing capabilities, corresponding conceptually to neurons whose behavior would be governed by internal biochemical dynamics. The purpose of our learning algorithm is to recruit this local pattern processing capability for large-scale pattern processing tasks, although obviously our simulation studies so far have been on a small scale. Our working hypothesis is that if implemented in a large system, with VLSI techniques using true parallelism and thousands of processors, useful pattern processing tasks can be demonstrated. As a hybrid, the architecture is only partially biological. Furthermore, we have attempted to capture a biological style of computing rather than to design a system in which the components have specific analogs in the nervous system. Conceivably we could suppose that the transducing nodes are analogous to compact collections of neurons, each cell corresponding to a neuron. Alternatively, we can think of each transducing node as being a single neuron, with the “cells” corresponding to macromolecular components within the neuron. It is now known that subcellular processes contribute to the firing activity of neurons, and it is likely that the cytoskeletal network that underlies the membrane acts as a kind of intraneuronal cellular automaton (cf. [ 29-3 1, 221). For the purpose of the hybrid architecture, what is important is that slight changes in the dynamics of the transducing nodes allow for effective evolutionary learning, first because it provides for a plastic relation

HYBRID ARCHITECTURES

261

between structure and computational function and second because it allows for generalization from particular cases. We have, of course, purposely chosen dynamics that are suitable for evolutionary learning. In general this means a loss of conventional programmability, for example, because of the massive use of parallelism at a low hierarchical level. The evolutionary approach allows this low-level parallelism to be invisible to the programmer and yet effectively utilized. The LEARN and DECIDE instructions of the hybrid model are simply examples of language extensions that would be possible in a system that incorporated evolutionary adaptation mechanisms [S, 23, 261. Some problems that have traditionally been very difficult for conventional computers (such as adaptive pattern processing) lend themselves to the evolutionary approach, whereas others are obviously more suited to a sequential, programmable mode of computing. Hybrid systems should make it possible to combine the best features of both modes using technology that is currently available or by incorporating, in the future, novel silicon and molecular devices that are specifically suited to the evolutionary mode.

ACKNOWLEDGMENT M.C. acknowledges Grant IRI-8702600 from the U.S. National Science Foundation.

RE F E R E N C E S I. Anderson, J. A. Cognitive and psychological computation with neural models. IEEE Trans. SystemsMan Cybernet. 13,5 (1983), 799-815. 2. Aho, A., Hopcroft, J., and Ullman, J. The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, MA, 1974. 3. Akin&shin, K. Computational characterization of a hybrid biochip. Doctoral dissertation, Wayne State University, 1986. 4. Black, U. D. Data Communications and Distributed Networks. Prentice-Hall, Englewood Cliffs, NJ, 1987. 5. Bremermann, H. J. Optimization through evolution and recombination. In Yovits, M. C., Jacobi, G. T., and Goldstein, G. D. (Eds.). Self-Organizing Systems. Spartan Books, Washington, DC, 1982. 6. Carter, F. L. (Ed.). Molecular Electronic Devices. Dekker, New York, 1982. 7. Conrad, M. Molecular automata. In Conrad, M., Giittinger, W., and Dal Cin, M. (Eds.). Physics and Mathematics of the Nervous System. Springer-Verlag, New York, 1914b, pp, 419-430.

262

AKINGBEHIN AND CONRAD

8. Conrad, M. The price of programmability. In Herken, R. (Ed.). The Universal Turing Machine-A Fifty Year Surve.v. Oxford Univ. Press, New York, 1988, pp. 285-307. 9. Conrad, M. The lure of molecular computing. IEEE Spectrum (Oct. 1986), 55-60. 10. Conrad, M., Adaptability. Plenum, New York, 1983. 1 I. Conrad, M. Bootstrapping on the adaptive landscape. Biosystems 11 ( 1979)) 167- 182. 12. Conrad, M. On design principles for a molecular computer. Comm. ACM 28 (May 1985), 464-480.

13. Conrad, M. Evolutionary learning circuits. J. Theoret. Biol. 46 (1974a), 167-188. 14. Conrad, M. Information processing in molecular systems. Curr. Modern Biol. (now Biosystems)5(1972), l-14. 15. Conrad, M. Microscopic-macroscopic interface in biological information processing. Biosystems 16 (1984), 345-363. 16. Conrad, M., and Pattee, H. H. Evolution experiments with an artificial ecosystem. J. Theoret. Biol. 28 (1970), 393-409. 17. Conrad, M., Friedlander. C., and Akingbehin, K. Biochips: A feasibility study. Rep. prepared for National Geno Sciences, Aug. 1982. 18. Conrad, M., Kampfner, R. R., and Kirby, K. G. Simulation of an enzyme-driven computing device. Maecon/ Technical Session # 15 on Biomedical Electronics, Utica, MI, 1984. 19. Conrad, M., Kampfner, R. R., and Kirby, K. G. Simulation of a reaction-diffusion neuron which learns to recognize events. Appendix to Conrad, M. Rapprochement of artificial intelligence and dynamics. European J. Oper. Res. (1987) 280-290. 20. Greengard, P. Cyclic Nucleotides, Phosphorylated Proteins and Neuronal Function. Raven Press, New York, 1978. 2 1. Grigonis, R. Sixth generation computers. Dr. Dobb ‘s Journal 9,5 (May 1984), 37-48. 22. Hameroff, S. R., and Watt, R. C. Information processing in microtubules. J. Theoret. Biol. 98(1982),549-561. 23. Holland, J. Adaptation in Natural and Artificial Systems. Univ. of Michigan Press, Ann Arbor, 1975. 24. Hwang, K., and Briags, F. A. Computer Architecture and Parallel Processing. McGrawHill, New York, 1984. 25. Kahn, R. E. A new generation in computing. IEEE Spectrum (Nov. 1983), 36-4 1. 26. Kampfner, R., and Conrad, M. Computational modelling of evolutionary learning processes in the brain. Bull. Math. Biol. 45,6 (1983). 27. Kirby, K. G., and Conrad, M. Intraneuronal dynamics as a substrate for evolutionary learning. Phys. D 22 (1986), 205-2 15. 28. Kohonen, T. SelfOrganization and Associative Memory. Springer-Verlag, Berlin, 1984. 29. Liberman, E. A., Minina, S. V., Shklovsky-Kordy, N. E., and Conrad, M. Change of mechanical parameters as a possible means for information processing by the neuron. Biophysics27,5(1982),906-915. 30. Liberman, E. A., Minina, S. V., Mjakotina, 0. L., Shklovsky-Kordy, N. E., and Conrad, M. Neuron generator potentials evoked by intracellular injection of cyclic nucleotides and mechanical distension. Brain Rex 338 (1985), 33-44. 3 1. Matsumoto, G. A proposed membrane model for generation of sodium currents in squid giant axons. J. Theoret. Biol. 107 (1984), 649-666.

HYBRID ARCHITECTURES

263

32. McAlear, J. H., and Wehrung, J. M. Bioconductors-beyond VLSI. IEEE Symposium on VLSI Technology, 198 1, pp. 82-83. 33. McCorduck, P. An introduction to the fifth generation. Comm. ACM(Sept. 1983), 629630. 34. Murakani, K., Kakuta, T., Onai, R., and ho, N. Research on parallel machine architecture for fifth-generation computer systems. IEEE Comput. l&6 (June 1985), 76-92. 35. Potter. J. L. (Ed.). TheMassively ParallelProcessor. MIT Press, Cambridge, MA, 1985. 36. Schaeffer, D. H., Fischer, J. R., and Wallgren, K. R. The massively parallel processor. AZAA J. Guidance, Control, Dynamics 5,3 (May I982 ), 3 13-3 15. 37. Shapiro, E. Y. The fifth generation project-A trip report. Comm. ACM(Sept. 1983), 637641. 38. Stallings, W. Local Networks. Macmillan, New York, 1987. 39. Tanenbaum, A. S. Computer Networks. Prentice-Hall, Englewood Cliffs, NJ, 198 I. 40. Ullman, J. D. Compututional Aspects of VLSI. Computer Science Press, Rockville, MD, 1984. 4 1. Widrow, B., and Winter. R. Neural nets for adaptive filtering and adaptive pattern recognition. fEEE Spectrum (Mar. 1988), 25-39. 42. Wolfram, S. Approaches to complexity engineering. Phys. D 22 (1986), 385-399. 43. Zadeh, L. A. Making computers think like people. IEEESpectrum (Aug. 1984), 26-32.

KIUMI AKINGBEHIN received a B.S. in electrical engineering from Howard University in 1972, a M.S. in nuclear engineering from the University of Tennessee in 1975, and both an M.A. and a Ph.D. in computer science from Wayne State University in 198 1 and 1985, respectively. His research interests are in computing architectures, operating systems, biological information processing, and software engineering. He is currently Assistant Professor of Electrical and Computer Engineering at the University of Michigan at Dearborn where he also serves as Director of the Computer and Information Science Program. He has several years of industrial experience in real-time computer hardware and software. MICHAEL CONRAD is Professor of Computer Science at Wayne State University. He did his undergraduate work at Harvard University and earned his Ph.D. at Stanford University. His research interests are in biological information processing, molecular computer design, evolutionary learning algorithms, artificial worlds modeling of complex biological systems, and the theory of adaptable systems. He has published over 100 research papers. His book Adaptability (Plenum, 1983) presents a systems and information theory framework for analyzing how biological and other complex systems use information to effectively function in the face ofan uncertain environment.