101
INTEGRATION Workshop
A 16-bit specialized processor design Toufic Ezzedine, Veronique Tempier and Georges Sagnes L.A.M.M., Universit~ des Sciences et Technique du Languedoc, 34060 Montpellier cedex, France
Received 4 November 1986
Abstract. In this paper the circuit and the design of an experimental 16 bits processor are
described. The circuit is used in controller applications between mass storage devices and CPU of mainframes. The chip is fabricated in 2.5 ~ NMOS technology. This component (45000 transistors, 35 mm 2, 40 pins) handles data generated by a CAD tool for real-time control system (PIASTRE).
Keywords. Specialized processor, operative part, control part, PLAs, hardware implementation.
1. Introduction
Progresses recently realized in the domain of VLSI, on architecture as well as in technology, permitted the design and realization of complex integrated circuits and coprocessors specialized in CAD tools, such as logic simulators, arithmetic accelerators and others. All these processors have in common that they implement heavy algorithms requiring long execution time of programs. As a consequence, their algorithmic description is already defined as a compiled microprogram or interpreted high-level language; their architecture design must insure fast and cost effective solutions to justify the integration. Indeed the need for integration appears when execution time of specific algorithms is out of the range of their application field or when pseudodiscrete hardware implementation results in dramatic increase of the cost. As a result, the integrated product must satisfy the temporal bounds as a final goal, and be design safe in sufficiently short time in agreement with the industrial constraints. North-Holland INTEGRATION, the VLSI journal 6 (1988) 101-110 0167-9260/88/$3.50 © 1988, Elsevier Science Publishers B.V. (North-Holland)
102
T. Ezzedine et al. / A 16-bit specialized proeessor design
For this range of processors, classical synthesis methods of control parts are quickly ineffective for highly complex algorithms. As an approach to a new solution to this problem, we present in this paper the design of a specialized processor, POLLUX, the purpose of which is to accelerate a CAD tool PIASTRE [1], used to real-time control systems. Because it must implement the equivalent of 250000 lines of INTEL 8085 instructions, limited to only four elementary A L U operations and I / O transfers via the central memory, it constitutes a good example of specialized machine for which final specifications depend mostly on the efficiency of the solutions adapted to implement the control part. In the Section 2, we specify the operation mode of this processor and its environment, and we define its architecture. The solution proposed to the control part is developed in Section 3. In Section 4, we give details on the architecture solution, used to realize the operative part. Overall performances of this implementation are summarized in Section 5.
Toufic Ezzedine was born in Lebanon on July 1957. He received the M.S. degree in Electronics Engineering in 1981 from the Institut des Sciences de l'Ing6nieur de Montpellier and the Ph.D. degree in Microelectronics Engineering in 1985 from the Laboratory of Automatics and Microelectronics of Montpellier, France. Since 1986 he has been working as an assistant professor at the university of Montpellier and has been involved in the development of VLSI special purpose processors.
G. Sagnes was born on April 1941. He received the M.S. degree in Electronics in 1965 and the Ph.D. degree in Applied Physics in 1974, both from the University of MontpeUier (USTL), Montpellier, France. From 1968 to 1984 he worked as an assistant professor at the University of Montpellier and was involved in research of acoustic surface waves and carrier traps in semiconductors. In 1981 he joined the Laboratory of Automatics and Microelectronics of Montpellier, France, as a Professor. He is engaged in the development of VLSI special purpose processors and the design of special architecture.
V6ronique Tempier was born in Algeria on November 1960. She received the M.S. degree in electronics from the University des Sciences et Techniques du Languedoc (USTL). Montpellier, France in 1983. In 1984 she joined the Laboratory of Automatics and Microelectronics of Montpellier, France, where she is presently working towards the Ph.D. Science in microelectronics.
T. Ezzedine et al. / A 16-bit specialized processor design 2. I n t e r n a l a r c h i t e c t u r e
103
definition
Used as a routine accelerator, this coprocessor is a full part of the specific CAD tool (PIASTRE) the purpose of which is to control the evolution of a process. For that the description of the control automation is completely contained in the PIASTRE data base as a hierarchical GRAFCET. The purpose of the coprocessor (hereafter referred to as POLLUX) is to manage GRAFCET evolution rules [2] insuring at any time the validation of the transitions allowed between the steps of the control graph. For that, POLLUX must integrate an exploration algorithm of the data base, through pointers addressing well-defined positions located in the central memory (see Fig. 1). The database position arrangement in bank of 17 blocks imposes an equal number of pointers organized in a 16 bits address field. Operations on pointers allow us to look up instantaneously graph states such as: - step codes, dynamic state code of each step (active, idle, ended); - transition codes and dynamic state codes of transitions between steps (allowed, non-aUowed, firable). The complete architectural specification is then given by: - central memory communication, I / O transfers (MULTIBUS protocol [3]); - permanent and local memorization of 17.16 bits pointer addresses; - data operation: addition, substract, decrement, increment and test (equal to zero, one or greater than 2).
3. C o n t r o l p a r t d e s i g n
As mentioned elsewhere, in a processor the area devoted to the control part increases quickly with the complexity of the algorithm to be implemented, so traditional methodologies, like unfolding of the control, are impractical. It is then necessary to found intuitively an hierarchical organization of the control part, allowing, operations on reduced parts of the algorithm at each step of the synthesis [4,5]. Minimization of the number of assembler lines (250000 for POLLUX algorithm) can be done through formal merge in subroutines of identical or analog sequences. From an informatics point of view, it is necessary to get heuristic
central C PU
U
POLLUX
l
mernory
ll.= s
Fig. 1. Hardware organisation of PIASTRE system.
T. Ezzedine et al. / A 16-bit specialized processor design
104
I..14
Lartguage
P1
M1
Program
Hardware
~ml
machine
Organv,atk~n
Fig. 2. Language-machine duality. definitions of M macro-instructions only, to be used in a main program and interpreted in as much subroutines as necessary (final generation of the micro-instructions). This structured programming technique allows easy and safe handling of highly complex algorithms. At this point, it is necessary to note that for each subroutine it is possible to define an equivalent sequential state-machine. Fig. 2 gives an illustration of this duality, in which the sequencement is organized through the program counter or the time generator. Nevertheless, in this kind of description problems may arise in the hardware counter part, when elementary controls are generated from different levels of state machines. In that case, final control of the operative part can only be done through various demultiplexing levels. It is then clear, that surface-effective implementation will impose to each control to be localized in only one machine, that is, at only one subroutine level. As a result, the structured decomposition of the algorithm in terms of hierarchical machines imposes to the generation of elementary controls to be issued from only one interpretation level. In Fig. 3 we see that, from a control point of view, this is equivalent to describe the sequencement from structured graphs in which each action associated with a step of level i generates at level (i + 1) a serie of more elementary actions. Any sequential graph such as GRAFCET, can be implemented as a finite state machine [6]. G R A F C E T description is completely defined by the quad G(E, T, L, A) which represents respectively the number of states, branches, links and actions of the graph. As shown in Fig. 4 a graph can be implemented in finite state machines defined by the quad M( X, Y, Z, N) (representing respectively the number of external inputs, internal variables, commands and minterms) if only one state of the graph is active at any time and if the following conditions are satisfied: X = log T,
Y = log E,
Z = log A,
N = L.
Synchronization between the different state-machines defines completely the hierarchical organization of the control part. For that different ways can be followed. We will choose the most natural, which consists in a distribution at each hierarchical level of the state transition clocks, synchronous of the general one, and controlled by the End of Execution signal (EE) of each machine.
T. Ezzedine et al. / A 16-bit specialized processor design
Level 1
i Level 2
I ,L i e
Fig. 3. Hierarchical description of an algorithm.
?
tX
E Fig. 4. GRAFCET-finite state-machine equivalence.
105
106
/ A 16-bit specialized processor design
T. Ezzedine et al.
Level
1
r L012
2
Level
1
EE~
[
r
EE2
O1 02
TG
L01i Level i
1
Level
[
Operative
L Part
EE i
LOln EO2n
n
l
f
02i
r
]
l
EEn
T
[
Fig. 5. Hierarchical control part organization. This distribution may obey the following conditions:
N If I-'IEEi = 1, then 01j = 01 and 0 2 j = 02; i=j N
If I - l E E / = 0, then 01j = 0 and 0 2 j = 0. i=j
where 01j and 0 2 j are the synchronization signals for the j t h level of the hierarchy. Fig. 5 represents the full diagram of the control part of POLLUX. Following the preceding methodology, it is a three levels organization, synchronized by the clock distribution (TC), as defined below. Level 1
It defines the machine which chains the algorithm main tasks and generates the execution code to be interpreted by the second level. It is implemented as a PLA
T. Ezzedine et al. / A 16-bit specialized processor design
107
(11 inputs, 13 outputs, 77 minterms) and a 8 bits master-slave register controlled by 011 and 012. Level 2
This level generates the sequences of instructions to be executed by the last level. Each code, issued from level 1, is divided into two fields: - the task code field, interpreted by a finite state machine M1, which generates the micro-operating code of the main task; - the address operand field to control the routing of data (machine M2). This organization in two machines minimizes the number of sequences to be generated. They are also implemented as PLA M1 (9 inputs, 11 outputs, 93 minterms) and M2 (12 inputs, 5 outputs, 102 minterms). Sequencing of this level is obtained from a 4 bits master-slave register used in counting mode. Level 3
It is the control interface between the Level 2 and the operative part, used to interprete instructions into elementary actions. It is implemented into three small PLAs. Time Generator module (TG)
It insures the synchronization of these three levels. It is defined as described before.
4. Operative part It is divided into two structures: - The internal memory which contains the addresses of the tables representing the dynamic steps of the control GRAFCET, as well as the current state of execution. It is a 1 7 , 1 6 bits organization with the simulated (SPICE) access time of 30 ns. The layout of this memory is given in Fig. 6.
RAM matrix
adress decoder
Fig. 6. Internal memory organization.
108
Io
T. Ezzedine et al. / A 16-bit specialized processor design
Cin
Cout I~
U Fig. 7. Precharged Manchester carry chain.
I ALU
II
I |
I
IREG
]
1
/ k_] states
Fig. 8. POLLUX operative part.
- The 16 bits arithmetic unit organized as 4 bits slices of precharged Manchester carry chain is implemented following the optimization rules developed in [7], see Fig. 7. It allows addition, substraction, increment and decrement operations in less than 40 ns, as obtain from SPICE simulations and checked on the test vehicule realized in the 4.5 ~ NMOS technology of the French CMP. As shown in Fig. 8, the complete operative part insures the fasted two operands calculations. Peripheral data transfers are executed through IREG and OREG registers.
5. Chip specifications The layout of the chip is given in Fig. 9. It is implemented in a 4.5 ~ NMOS technology on 35 mm 2 representing 45 000 transistors with 20 mm 2 devoted to the control part and 10 mm 2 to the operative part. It must be remarked on the Fig. 9
T. Ezzedine et al. / A 16-bit specialized processor design
Fig. 9. POLLUX photomicrograph.
109
110
T. Ezzedine et al. / A 16-bit specialized processor design
the high degree of regularity obtained with PLA design of control part. The chip has 37 I / O pads and has been organized to work at 10 MHz clock frequency. Testability and observability facilities have been implemented through a special I / O pad allowing step-by-step operating mode and direct access to the state register.
6. Conclusions
We presented here a novel approach of specialized processors. Using hierarchy GRAFCET descriptions of control parts, we showed that it was possible to implement the full sequencement of the machines with PLA, specialized to each interpretation level of the graph, and synchronized through a specific Time Generator Unit. This approach has been applied to the design of an accelerator of a real-time control automaton. It allows to implement on 35 mm 2 the equivalent of 250000 lines assembling language. The application of this approach, insuring fast and safe design of specialized machines, to more general machines would allow to solve the bottleneck problem of automatic generation of control part in silicon compilers.
References [1] Prunet, F., J.M. Dumas, P.M. Grojean, P. Leroy, Production et Implantation Assitre de Syst~mes de Commande (PIASTRE), MICAD 82. [2] Groupe de travail 'syst~mes logiques' de I'AFCET "Pour une reprrsentation normalisre du cahier des charges d'un automatisme logique", Automatisme no 61/62, Nov.-Drc. 1977. [3] INTEL, MULTIBUS: Functional and electrical specifications. [4] Ezzedine, T., J. Lassale and G. Sagnes, Hierarchical Description for Complex Control Parts Design, Proc. Integrated Circuit Technology Confer., NIHE, Limerick, Ireland, Sept. 15-17, 1986. [5] G. Zimmerman, MDS - the mimola design method, J. Digital Syst. 4 (3) (1980). [6] T. Ezzedine, V. Tempier, D. Auvergne and G. Sagnes, Design of Specific Processor for Real Time Control System, Proc. ISMM Internat. Syrup. Mini and Microcomputers and their applications, Sant Feliu de Guixols, Spain, June 25-28, 1985. [7] D. Auvergne, G. Cambon, D. Deschacht, M. Robert, G. Sagnes, and V. Tempier, Delay-time evaluation in ED MOS logic LSI, J. IEEE Solid-State Circ. (April 1986).