Copyright © IFAC Algorithms and Architectures for Real-Time Control, Cancun, Mexico, 1998
BIOLOGICALLY INSPIRED REAL-TIME RECONFIGURA TION TECHNIQUE FOR PROCESSOR ARRA YS
Cesar Ortega and Andy TyrreU
Department of Electronics University of York York, YOJ 5DD, UK E-mail.· (cesar. amt)@ohm.york.ac.uk
Abstract: Real-time control applications involve the interaction of multiple components. As systems become more complex their reliability tends to decrease, hence, fault tolerance must be incorporated to keep reliability within specified levels. A reconfiguration mechanism for processor arrays inspired by mechanisms that take place during the embryonic development of living beings is proposed in this paper. It is illustrated using an example that the rapid fault-recovery characteristic of the embryonic system makes it a promising approach for real-time control applications. Copyright © J998JFAC Keywords: Fault-tolerant systems, Parallel processors, Binary decision systems, Bio control, Embryonics.
1. INTRODUCTION
becoming too complex to be considered error-free. Evidently, it is necessary to look somewhere else for the answers.
Practical experience has demonstrated that the goal of building fault-free real-time control systems, although attractive, is impossible to achieve. Hardware deteriorates with time, and software has become so complex that design faults are difficult, if not impossible, to avoid. Hence, the more viable alternative for applications requiring high levels of dependability is to implement systems capable of tolerating faults, i.e. Fault-Tolerant Systems.
Nature offers some remarkable examples of how to deal with complexity and its associated unreliability. The human body is one of the most complex systems ever known. Failures are not rare, but the overall function is highly reliable because of self-diagnosis and self-healing mechanisms that work ceaselessly throughout our bodies. To borrow the main principles that sustain this mechanisms and applying them to the design of electronic systems, could result in a new approach for the design of fault-tolerant systems (A vizienis, 1997).
The complexity of real-time control problems solved by computers and the complexity of computers themselves is increasing with time, therefore, the risk of failure is greater now than when computers and the problems they solved were simple. Complexity implies unreliability . Hence, it is necessary to look for new methodologies and strategies to deal with complex systems. One approach is the refinement of traditional design techniques , but the techniques themselves are
Of particular interest are the fault tolerance attributes of massively parallel processing networks or processor arrays, similar in structure to cellular automata, capable of self-diagnosis and self-repair. In this approach the "knowledge" is distributed throughout multiple processing elements, therefore,
217
if one or a small subset of processors fail , the overall functionality can be maintained (Chean and Fortes, 1990a; Dyer, 1995). For the purposes of this paper processor arrays will be considered as a special case of cellular automata, therefore the terms processor and cell will be used interchangeably.
spares. In the ideal case a processor array with N spares must be able to tolerate N faulty cells; but, in practice, limitations on the interconnection capabilities of each cell prevents this goal from being achieved. The work presented in this paper is focused on this type of redundancy giving a new slant to the structure and reconfiguration process.
Self-healing mechanisms found in nature and the implicit redundancy in processor arrays constitute the inspiration for the presented work. Embryonics ' proposal is to construct field programmable processor arrays with self-diagnosis and selfreconfiguration abilities able to tolerate the presence of failing cells.
Most of hardware redundancy reconfiguration techniques rely on complex algorithms to re-assign physical resources to the elements of the logical array. In most cases these algorithms are executed by a central processor which also performs diagnosis functions and co-ordinates the reconfiguration of the physical array (Dutt and Mahapatra, 1997; Fortes and Raghavendra, 1985). This approach has demonstrated to be effective, but its centralised nature makes it prone to collapse if the processor in charge of the fault tolerance functions fails . Furthermore, the timing of signals involved in the global control from a host computer is often prohibitively long, therefore, unsuitable for real-time fault tolerance. Only distributed processing is feasible (Kung, et aI. , 1989).
The embryonics concept was initially presented in (Mange, et al., 1996 and Marchal, et aI. , 1996). Embryonics (Embryology + Electronics) is inspired by the embryonic development of multicellular organisms, i.e. a new individual is evolved from a single cell (the fertilised egg) through a process of multiple divisions and specialisation. Just after conception there is only one copy of the organism' s DNA, a replica of which is passed to every cell during embryo' s development. The function of a particular cell of group of cells is detennined by that of their neighbours (Ntisslen-Volhard, 1996), in other words , a cell's function would be different if located in a different position inside the embryo. This is possible because every cell possess a complete copy of the DNA describing the organism.
An alternative approach is to distribute the diagnosis and reconfiguration algorithms among all the cells in the array. In this way no central agent is necessary and consequently the reliability and time-response of the system should improve. However, this decentralised approach does tend to increase the complexity of the reconfiguration algorithm and the amount of communications within the network.
2. THE PRESENT APPROACH TO FAULT TOLERANCE IN PROCESSOR ARRAYS
Embryonics' proposal is to embed some distinctive characteristics of biological cellular systems into the design of programmable processor arrays in order to achieve rapid reconfiguration suitable for real-time control applications.
Fault tolerance in processor arrays implies the mapping of a logical array onto a physical non-faulty array, i. e. every logical cell must have a correspondent physical cell. When faults arise, a mechanism must be provided for reconfiguring the physical array such that the logical array can still be represented by the remaining non-faulty cells. All reconfiguring mechanisms can be considered to be based on one of two types of redundancy: Time redundancy or hardware redundancy (Chean and Fortes, 1990b).
3. THE BIOLOGICAL APPROACH TO FAULT TOLERANCE A human being consists of approximately 60 trillion (60xl0 12) cells. At each instant, in each of these 60 trillion cells, the genome, a ribbon of 2 billion characters, is decoded to produce the proteins needed for the survival of the organism. This genome contains the ensemble of the genetic inheritance of the individual and, at the same time, the instructions for both the construction and the operation of the organism. The parallel execution of 60 trillion genomes in as many cells occurs ceaselessly from conception to death of an individual. Faults are rare and, in the majority of cases, successfully detected and repaired (Mange, et aI. , 1996). Which part of the DNA is interpreted will depend on the physical location of the cell with respect to its neighbours.
In time redundancy the tasks performed by faulty cells are distributed among its neighbours. When reconfiguration occurs, processors share their time between performing their own tasks and the faulty cells functions , resulting in some degradation of system' s performance. In addition, the algorithm being executed must be flexible enough to allow a simple and flexible division of tasks in real time. In hardware redundancy physical spare cells and links are used to replace the faulty ones. Therefore, reconfiguring algorithms must optimise the use of
21 8
Embryonics is inspired by the basic processes of molecular biology. By adopting the features of biological cellular organisation, and by transposing them to the two-dimensional world of cellular arrays, it can be shown that properties unique to the living world, such as self-reproduction and selfrepair, can also be applied to integrated circuits. Figure 1 shows the basic architecture of an embryonic system.
becomes transparent allowing another cell to take its co-ordinates and therefore its function . The block corresponding to error-detection and error-handling is not shown in figure 1. This cell differs from those proposed to improve the yield of VLSI or WSI cellular architectures in that it is autonomous and the reconfigurability can be used both during manufacturing and normal operation. Every cell has communication links with its north east west and south neighbours. The programmable 110 router block allows the spread of information all over the array. This block is controlled by the configuration register se!ected by the cell. The reconfiguration logic is designed so that when a cell auto-diagnoses faulty, the row for the corresponding cell is eliminated and replaced by a spare row. Co-ordinates for the cells above the row being eliminated are recalculated and new configuration registers are selected accordingly. No "code" is being communicated around the array when reconfiguration takes place, only Boolean signals are passed between cells. The reconfiguration is implemented by simply using a different local memory location. This strategy is far from being optimal with respect to the use of spare resources, but the short time needed to recover from a failure makes it attractive to implement real-time control systems (Allworth, 1990). A detailed description of the cell can be found in (Ortega and Tyrrell, 1998).
'. W
Memory Confi!;ur.uinn Rc~isu:rs )
21 21 20 12
UORouter
11
10 02 Pointer
I/O
Cuntrol Biu
4. EXAMPLE Address
Logic block
A simple application is presented next in order to highlight the reconfiguration properties of the embryonics strategy. A multiplexer is selected to be the functional part of the basic cell. The multiplexer has the particular advantage of being able to implement any node from a binary decision diagram (BOO), which in turn can represent any combinatorial or sequential logic function (Akers, 1978). Therefore, the architecture presented is ideal for implementation on a field programmable gate array (FPGA) (Tempesti et al., 1997).
s
Fig. 1 Basic Components of an Embryonic Cell. This architecture presents the following advantages: • It is highly regular, which simplifies its implementation on silicon. • The actual functionality of the logic block is independent from the function of the remaining blocks. This modularity could be exploited to produce a family of Embryonic FPGAs, each member offering a particular logic function , e.g. ALU, MUX, DEC, etc. • The simplicity of blocks' architecture allows the implementation of built-in self test (BIST) logic to provide self-diagnosis without excessively incrementing the silicon area.
The goal was to design a programmable frequency divider. Figure 2 shows the block diagram of the circuit. It is composed by a 3-bit selector which latches either the division factor n or the next state of a 3-bit down-counter according to the output of a zero detector. In this way, a I-cycle wide pulse will be generated every n cycles of F. The output of the circuit is taken from the output of the zero detector. It will be 1 during one cycle of F when the down-counter reaches the 000 state.
A full description of the self-diagnosis system can be found in (Tempesti, 1997). If one of the cells fails , it
219
DS2Bc DSIBB DSOBA
FIn
sel
z s1
F
clk Selector
o
3
C+
0
Z B+
0
F
S 2
Z
s
A+
0
F
3
F
DS2DSO L..._ _- - '
z Fig. 2 Programmable frequency divider.
o Figure 3 shows the binary decision diagrams for the combinatorial down-counter and the zero detector. A,B,C are the outputs of the 3-bit down counter, C being the most significant bit. The diagrams must be evaluated from top to bottom. If the variable on each node takes the value 0, then the branch with a broken line is selected. Otherwise, if the variable takes the value 1, then the branch represented with a solid line is followed. The final nodes on a BDD represent the output of the function for a particular set of input values. A good tutorial on BDDs can be found in (Bryant, 1986; Rauzy, 1996).
B A+
o A
1
Fig. 4. Hardware implementation of BDDs in figure 3 using multiplexers. From figure 3 note that the BDD for A+ is repeated in the diagrams for B+ and C+, therefore the corresponding network of multiplexers is simpler. Multiplexers 1, 2 and 3 implement the selector block in figure 2. These multiplexers operate in synchronous mode, i.e. their outputs will be updated on the rising edge of F. DS2, DSl and DSO are used to set the value of D.
(A)' ~ ~A o
o
B+
A+
IA)~' ~
0
A
1
o
c+ F
a) DSO OSl a)
,~
,~ ,~ 0
OS2
0
o Z b)
Fig. 3. BDDs for a) 3-bit down counter; and b) 3-bit zero detector.
OSO OSl
b)
Figure 4 shows the hardware implementation of the BDDs shown in figure 3.
OS2
OSO
OSl
OS2
c)
Fig. 5. Frequency divider: a) Without fails ; b) with one faulty cell ; c) with two faulty cells.
220
Figure 5a shows the final distribution of multiplexers needed to implement this particular application. The numbers on each cell correspond with the numbers assigned to the multiplexers in figure 4. Cells labelled S are spare cells, two rows in this example. Cells labelled R are routing cells. Routing cells are needed because every cell has direct connections only to its cardinal neighbours. Figures 5b and Sc show the reconfiguration process when one and two cells become faulty, respectively. Remember that during the reconfiguration process the cells' co-ordinates are re-calculated so that every cell selects the configuration register of the cell it is substituting.
REFERENCES Akers S. (1978). Binary Decision Diagrams. IEEE Trans. on Computers, VoI.27-6, pp.509-516 Allworth S. and Zobel R. (1990). Introduction to Real-time Software Design, Macrnillan, Hong Kong A vizienis A. (1997). Toward Systematic Design of Fault-Tolerant Systems. IEEE Computer, April, pp.51-58 Bryant R. (1986). Graph-based Algorithms for Boolean Function Manipulation. IEEE Trans. on Computers, VoI.35-8, pp.677-691 Chean M. and Fortes 1. (1990a), The Full Use of Suitable Spares (FUSS) Approach to Hardware Reconfiguration for Fault-Tolerant Processor Arrays, IEEE Trans. on Computers, Vo1.39-4 Chean M. and Fortes 1. (1990b). A Taxonomy of Reconfiguration Techniques for Fault-Tolerant Processor Arrays. Computer, January, pp. 55-69 Dutt S. and Mahapatra N. (1997). Node-covering, Error-correcting Codes and Multiprocessors with Very High Average Fault Tolerance. IEEE Trans. on Computers, VoI.46-9, pp.997-1014 Dyer M. (1995). Toward Synthesising Artificial Neural Networks that Exhibit Cooperative Intelligent Behaviour: Some open issues in Artificial Life. In: Artificial Life: an Overview (Langton C. (Ed)), pp.III-134. MIT Press, USA. Fortes 1. and Raghavendra C. (1985). Gracefully Degradable Processor Arrays. Trans. on Computers, Vol.34-11, pp. 1033-1043 Mange D. et al. (1996). Embryonics: A new family of coarse-grained FPGA with self-repair and selfreproduction properties. In: Towards Evolvable Hardware (Sanchez E. et at. (Ed)), pp. 197-220. LNCS 1062, Springer-Verlag, Berlin. Marchal P. et at. (1996). Embryonics: The Birth of Synthetic Life. In: Towards Evolvable Hardware (Sanchez E. et at. (Ed)), pp. 166-196. LNCS 1062, Springer-Verlag, Berlin. Niisslein-Volhard C. (1996). Gradients That Organize Embyo Development. Scientific American, August, pp.38-43 Ortega C. and Tyrrell A. (1998). Design of a Basic Cell to Construct Embryonic Arrays. lEE Procs. on Computers and Digital Techniques, in print. Rauzy A. (1996). A Brief Introduction to Binary Decision Diagrams. Journal Europeen des Systemes Automatises, Vol. 30-8, pp. 1033-1 050 Kung S. et al. (1989). Fault-Tolerant Array Processors Using Single-Track Switches, Trans. on Computers, VoI.38-4, pp. 501-513 Tempesti G. et at. (1997). A robust multiplexerbased FPGA inspired by biological systems. Journal of Systems Architecture, Vo1.43, pp.719733
Figure 6 shows the simulation results obtained for the frequency divider. Labels correspond with those of figure 2. OK4 and OK8 simulate fails in cells 4 and 8 respectively. Reconfiguration takes from 3 to 5 clock cycles. ,
I, I
0K8 ~
I:~h D
're---' "-
'I:Hl~~ n J11l
Nn
linz l
(So
I
li
I
is
I6Z:0 .
I
11
,
!2
WmMJ1IJl 1llJ AA ~~Hill ~1 mu ~ UUlm r-Jl ~ ~ I- Jlfl nr-,Jl - fl ~
6u
&.
I(k,
12u
1411
Fig. 6. Simulation results for frequency divider.
5. CONCLUSIONS A novel paradigm for designing fault-tolerant processor arrays inspired by biological processes has been presented. The regularity of the architecture makes it suitable to be implemented using state of the art FPGAs or WSI circuits. The array's rapid reconfiguration characteristics should be ideal for the use of embryonic systems in real-time control applications where response time is a critical constraint. Embryonics is a nascent discipline, therefore much research must be done to investigate in depth the real-time fault-tolerant properties of these systems. The embryonics' approach is coherent with a recent uproar about the application of biological concepts to the solution of engineering problems, e.g. evolutionary computing, evolvable hardware and genetic algorithms.
ACKNOWLEDGEMENTS This work was partially supported by the Mexican Government under grant CONACYT-111183 .
221