Arti®cial Intelligence in Engineering 13 (1999) 43±53
A veri®cation method for systolic arrays using induction-based theorem provers Kazuko Takahashi a,1,*, Hiroshi Fujita b a
Advanced Technology R & D Center, Mitsubishi Electric Corporation, 8-1-1, Tsukaguchi-Honmachi, Amagasaki 661-8661, Japan b Department of Intelligent Systems, Kyushu University, 6-1, Kasuga-Koen, Kasuga-Shi, Fukuoka 816-8580, Japan Received 25 March 1997; received in revised form 2 June 1998; accepted 2 June 1998
Abstract We proprose a method for verifying hardware design with an induction-based theorem prover such as the Boyer±Moore Theorem Prover. As a case study, we apply the method to veri®cation of the correctness of systolic array designs. In verifying circuits, we prove that an implementation satis®es a speci®cation, in particular their functional equivalence. In proving the equivalence, induction is applied to the variables that denote time and position in the circuit. We discuss what lemmas should be used for appropriate application of induction. The lemmas we have found re¯ect the characteristics of the structure of the circuit. With these lemmas, the method provides a systematic way of veri®cation for systolic arrays and eases the user's burden with respect to the hardware veri®cation. q 1998 Elsevier Science Ltd. All rights reserved. Keywords: Hardware veri®cation; Theorem proving; Automated deduction
1. Introduction The recent advance in VLSI technology has accelerated the growth of the size and complexity of circuits. Accordingly, a stage of design veri®cation becomes a serious bottleneck in the process of the hardware development, since veri®cation technology is surprisingly left behind compared with the advance of CAD tools for circuit design. The correctness of the design is usually checked by simulation method. The method, however, involves much computation time, since the circuit function must be simulated for all possible input patterns, and the output produced by the simulation must be checked against the intended one. For that reason, when verifying very large circuits, it is usual to only test limited numbers of patterns that are considered necessary to guarantee the correctness to some extent. Therefore, there remains a possibility of a serious error being overlooked. Recently, formal veri®cation has been attracting a great deal of attention as a competitive method which can * Corresponding author. Tel: 00 81 774 95 1351; Fax: 00 81 774 95 1308; E-mail:
[email protected] 1 Present address: ATR Interpreting Telecommunications Research Laboratories, 2-2, Hikaridal, Seika-cho, Souraku-gun, Kyoto, 619-0223, Japan
overcome the aforementioned shortcomings of the simulation method. In verifying circuits, we prove that an implementation satis®es a speci®cation, in particular their functional equivalence. The BDD-based approach [3] is one of the popular formal veri®cations. Although it is fully automatic and very ef®cient, it has a shortcoming that the size of the target circuit must be ®xed. To verify general properties for an arbitrary size, it has to describe the properties for each size and verify them individually. Moreover, the time taken for the veri®cation becomes larger, often exponentially, as the size increases. To avoid the aforementioned problems of the BDD-based approach, it is more advantageous to use induction-based theorem provers when dealing with circuits having parameters such as size. The Boyer±Moore Theorem Prover (BMTP) [1] is an induction-based theorem prover which tries to prove a given formula to be true by performing simpli®cation, heuristic rewriting and induction. So far, a lot of research on hardware veri®cation using BMTP has been carried out and successful results have been reported [2, 5, 7, 9, 17]. Some of these treated practical applications such as commercial micro-processors, which shows that the technology has achieved the level of practical use. However, state-of-the-art theorem provers are not so powerful as to perform full veri®cation automatically, and a user has to struggle with the provers elaborating
0954-1810/99/$ - see front matter q 1998 Elsevier Science Ltd. All rights reserved. PII: S 0954-181 0(98)00010-7
44
K. Takahashi, H. Fujita / Arti®cial Intelligence in Engineering 13 (1999) 43±53
techniques to make them act at his/her will. Such techniques depend heavily on a user's knowledge about the prover and his/her experience on the prover. In order to allow a designer to use the techniques more easily, it is most important to provide him/her with relevant lemmas. We have studied hardware veri®cation using BMTP [14± 16]. Our aim is to develop the technique which adapts theorem provers to hardware veri®cation, and to acquire knowhows such as relevant lemmas, strategy, and heuristics. The tried examples range from small combinational/sequential circuits to medium sized ALUs and special purpose signal processing chips, as well as some benchmark circuits. In this paper, we discuss veri®cation of systolic arrays. A systolic array [11] is a collection of small-grained processing elements which are connected regularly in a simple layout. The data ¯ows into and out of the array, passing through many processing elements in a rhythmical fashion. There have been few attempts to verify systolic arrays using BMTP. German and Wang [4] showed a veri®cation of a onedimensional systolic array which performs a Boolean function. In the paper, they emphasized the effectiveness of the parameterization of the number of cells, pointing out that a user can verify for an arbitrary size (namely, general case) at once, while in the earlier approaches users must verify the instances of designs one by one. However, there are no explicit descriptions in the form of the Boyer±Moore logic nor of the details of dif®cult issues that must have arisen. In Purushothaman and Subrahmanyan [13] showed the veri®cation of a systolic algorithm for a dynamic programming. The systolic array treated in their paper is two-dimensional. They described the details of the proof and the problems attacked. However, the circuit they treated is rather easy one to reason about in the sense that every signal ¯ows through elements in a single direction. We try to tackle more complex circuits, where some signals ¯ow through an element from left to right and others in different directions interacting with each other. In particular, we concern with those circuits in which such signals form a feedback-loop. Generally, an implementation of a systolic array is represented as the relation of signals which is parameterized with the time and position of a cell. Induction is applied to the recursive function on the variables that decrease at every recursive call. When a circuit has a feedback loop, the number of cells has to be considered as another parameter in addition to time and position. The decreasing rates of values of these parameters are different and their relation is not simple. Thus, the prover generates an undesirable induction hypothesis causing the proof of the theorem to fail. In this case, suitable lemmas have to be provided in order to achieve appropriate application of induction. A user usually tries to ®nd a suitable lemma by examining the trace of the failed proof and by repeating the veri®cation of the given formula over and over again. The user may take a
formula occurring in the failed proof as a lemma. However, the lemma chosen in such a way does not necessarily help the main formula to be proved successfully. Furthermore, there are many candidates which may cause the proof failure. Therefore, it is very hard for a user to ®nd suitable lemmas without any guide at all. In this paper, we clarify the problem and discuss the proof technique for systolic arrays with feedback-loops. We discuss what lemmas should be added to solve these problems, and propose the four techniques: (1) transform the recursive de®nition which has two base cases into the de®nition without including recursions; (2) create the formula that relates the undesirable induction hypothesis generated by the system to the user-intended one; (3) introduce a new function and transform it to the recursive form; and (4) divide the summation of a sequence into parts, and perform the proof stepwise. The lemmas we have found re¯ect the characteristics of the structure of the circuit. With these lemmas, the method provides a systematic way of veri®cation for systolic arrays and lightens the user's burden. We will explain the framework of the veri®cation procedure using two typical structures of systolic arrays which are provided as benchmarks [10]. This paper is organized as follows. In Section 2, we brie¯y explain the Boyer±Moore logic and the veri®cation method. In Sections 3 and 4, we show the veri®cations for a one-dimensional systolic array and a two-dimensional one, respectively. There, we discuss the problems and give their solutions. In Section 5, we summarize the veri®cation method and the techniques for applying appropriate induction. In Section 6, we show an experimental result. And ®nally, we conclude the paper in Section 7.
2. Preliminaries 2.1. The Boyer±Moore logic The Boyer±Moore logic [1] is a quanti®er-free ®rst-order logic with equality. Its language, which has function symbols, resembles pure LISP. It provides a mechanism which permits a user to introduce a new data type in an inductively constructed form. The logic has a set of axioms and de®nitions as a database. Fundamental axioms and de®nitions are stored in the initial database as basic theory. A user can add the de®nition of a new function to the database if it does not violate the De®nition Principle. Under this principle, a de®nition is accepted only if it is recursively or non-recursively de®ned in terms of previously de®ned functions and if the termination is guaranteed on evaluating the function. The prover tries to rewrite a given formula using elements in the database as the rewrite rules. Besides standard inference rules of propositional logic and instantiation, induction is applied to recursively-de®ned functions. If the formula is
K. Takahashi, H. Fujita / Arti®cial Intelligence in Engineering 13 (1999) 43±53
45
3.1. Speci®cation
Fig. 1. Black box view of one-dimensional systolic array.
rewritten to true, then the proof succeeds and the formula is added to the database. 2.2. Circuit description In previous works [14, 15], we proposed the Time Parameterized Function method (TPF method) for verifying synchronous circuits. The principle of the TPF method is that signals are represented not as a waveform, namely, the sequence of the values from the initial time to the current time, but as an instantaneous value. In the TPF method, each signal is represented as a time parameterized function I(t). Different from the usual term, this is regarded as a function without a function body which will not be rewritten in the proof procedure, since we are concerned only with the fact that I depends on t, and not with how I changes depending on t. This is realized by the dcl command in BMTP. In this paper, we extend the method so that we can use not only time, but also position and size as a parameter. 2.3. Outline of veri®cation A speci®cation is usually given as the relation of input/ output values, together with the timing requirements. Timing requirements show when and where the corresponding input±output values are observed, and are usually given in the form of a table or a diagram. For matrix multiplication, for example, a speci®cation is given as the relation of the elements of the matrices which is parameterized with the row- and column-indices of matrices. On the other hand, an implementation is represented as the relation of signals which is parameterized with time and position in the circuits. Then, the main theorem, which asserts that the implementation satis®es the speci®cation, will contain many variables. Given the main theorem as it is, the system would encounter so many candidate variables for induction that it cannot choose suitable induction scheme by itself. Therefore, we break down the proof into several steps, leading the system to do an easy job. First, we translate a given speci®cation into the relation of input/output signals of the whole circuit using the timing requirements. Then, we try to prove that it is satis®ed by the implementation. In the second step, induction is applied to the variables that denote time and position in the circuit. 3. One-dimensional systolic array In this section, we consider a one-dimensional systolic array [10]. Here we discuss what lemmas should be used for an appropriate application of induction.
Consider the following convolution problem. For given-weights w1,¼,wk and input values a1, a2,¼, compute the values b1, b2,¼ where bi (i 1,2,¼) is de®ned as: bi
k X j1
ai1j21 £ wj
It is represented in the following recursive form in the Boyer±Moore logic: (
b
i; k
0
if k 0 or i 0
a
i 1 k 2 1 £ w
k 1 b
i; k 2 1
otherwise
Table 1 shows the timing requirements of input±output values. The black box diagram of the circuit is shown in Fig. 1. An input to a systolic array from the outside is said to be a global input, and an output from the systolic array to the outside is said to be a global output. The cell that receives a global input is said to be an input port, and the cell that produces a global output is said to be an output port. Let x and y be global input/output signals, respectively. In the TPF method, x is represented as x(t), and y is represented as y(t,k) where t is a time and k is the number of the cells. According to Table 1, an input value ai is given at time 2i 2 1, and an output value bi is produced at time 2i 1 2k 2 1. Thus, the following correspondence holds: ai $ x
2i 2 1; bi $ y
k; 2i 1 2k 2 1 Then, the convolution problem is represented as the relation of the global input/output signals in the Boyer±Moore logic as follows: Speci®cation (
y
k; t
if k 0 or t , 2 £ k
0 x
t 2 2 £ w
k 1 y
k 2 1; t 2 2
Table 1 Input±output timing scheme Clock
Input
Output
0 1 2 3 4 5 .. . 2k 1 1 2k 1 2 2k 1 3 2k 1 4 2k 1 5 2k .. 1 6 .
0 a1 0 a2 0 a .. 3 . ak11 0 ak12 0 ak13 ..0 .
0 0 0 0 0 0. .. b1 0 b2 0 b3 0. ..
otherwise
46
K. Takahashi, H. Fujita / Arti®cial Intelligence in Engineering 13 (1999) 43±53
3.3. The main theorem When a systolic array has a feedback loop, the same cell has both input port and output port. Then, the de®nition of local_yout shows the relation of global input/output signals when u k. Therefore, we try to prove the following main theorem: Fig. 2. A systolic array for the convolution problem.
Main theorem
3.2. Implementation 3.2.1. General architecture Fig. 2 is a one-dimensional systolic array which realizes the aforementioned convolution problem. 3.2.2. Implementation of one cell Fig. 3 shows a unit cell cellu for the convolution problem. In this ®gure, R1, R2, R3 and R4 stand for registers whose initial values are 0. Weight wu is pre-loaded at cellu. Each cell performs a simple arithmetic. A two-phase clock is used for synchronization. In the ®rst cycle, the values stored in R1 and R2 of cellu are multiplied and the product is passed to R3. In the second cycle, the values stored in R3 and R4 are added and the result is passed to cellu11. To achieve this, the input data must be separated by two clock ticks. Let local_xout(u,t,k) and local_yout(u,t,k) be the output signals from cellu to cellu21 and cellu11 at time t, respectively. If u and k are out of range, the output signals are de®ned as 0. Otherwise, they are de®ned with the signals from the adjacent cells and a global input signal. If we represent these relations in a naive manner, we can get the following form: Implementation local_xout
u; t; k 8 0 if
1 # u # k and t 0 or u 0 or > > > > > < u . k 1 1 or k 0 > > else if u k 1 1 x
t > > > : otherwise local_xout
u 1 1; t 2 1; k local_yout
u; t; k 8 0 if t , 2 or u 0 or u . k or k 0 or > > > > > < t22,k2u > > local_xout
u 1 1; t 2 2; k £ w
u > > > : otherwise 1local_yout
u 2 1; t 2 1; k Note that an argument k is necessary, since in BMTP all the variables appearing in the de®nition must be declared as arguments of the function.
t $ 2 £ k and k $ 1 ! local_yout
k; t; k y
k; t: An immediate attempt of proving the theorem without providing any lemmas is very dif®cult, and will lead the system to fail soon. In Section 3.4, we will clarify the problems and provide their solutions. 3.4. Veri®cation 3.4.1. Removing recursion Expanding the de®nition of local_xout as it is, the prover would get lost because of two base cases 0 and x(t) in the recursive de®nition. To avoid this, the de®nition must be transformed. The transformation is performed as follows. We unfold local_xout(u 1 1,t 2 1,k) in the de®nition of local_xout and get local_xout(u 1 2,t 2 2,k). After further unfolding, this is ®nally unfolded either to x(t 2 (k 1 1 2 u)) or to 0. For the de®nition of local_yout, we unfold local_xout(u 1 1,t 2 2,k) similarly. The procedure is described in detail in the other paper [14]. Let xout and yout be function names which are obtained as the result of transforming local_xout and local_yout, respectively. Then, we get the following formulae: xout
u; t; k ( 0 if t , k 1 2 2 u or u 0 or u . k 1 1 or k 0 x
t 2
k 1 2 2 u yout
u; t; k 8 0 > > > > > <
otherwise
if t , 1 or u 0 or u . k or k 0 or
t22,k2u
> > x
t 2 2 2
k 2 u £ w
u 1 yout
u 2 1; t 2 1; k > > > : otherwise Note that each output signal is described as the function de®ned only by itself and the global input signals. We ®rst show the equivalence of the original de®nition and the result of the transformation: Lemma 3.1. local_yout
u; t; k yout
u; t; k
K. Takahashi, H. Fujita / Arti®cial Intelligence in Engineering 13 (1999) 43±53
47
Fig. 3. A unit cell for the convolution problem.
The proof of this lemma succeeds with several simple sublemmas. 3.4.2. Bridging two induction steps Next, we will show the equivalence: yout
u; t; k y
k; t But the prover still fails because of an inappropriate application of induction. Inspecting the trace of the failed proof, we ®nd the following formula 1 appearing as an induction step: yout
k 2 1; t 2 1; k 2 1 y
k 2 1; t 2 1 #
1
yout
k; t; k y
k; t
Fig. 4. The two induction steps.
cellk21 of the systolic array which consists of k cells. S1 and S2 in Fig. 4 show the systolic arrays which consist of k 2 1 and k cells, respectively. It takes one more clock period for an input to the systolic array to enter cellk21 in S2 than that in S1, since the input must pass through cellk before entering cellk21 in S2. On the other hand, it takes the same time in both S1 and S2 for an input signal to take a round-trip from cellk21 to cell1. Therefore, the output of cellk21, to the left (thick arrow) in S2 is equal to that in S1 at one previous clock period. Hence, the following formula holds: Lemma 3.2. yout
k 2 1; t; k yout
k 2 1; t 2 1; k 2 1
On the other hand, the recursive forms appearing in the de®nitions of yout and y are yout(k 2 1,t 2 1,k) and y(k 2 1, t 2 2), respectively. According to the de®nitions, the following formula 2 should be generated as the induction step:
This equivalence is the key to bridging of the two induction steps. Since yout is the only recursive function appearing in this formula, induction is applied properly and the proof of this lemma succeeds.
yout
k 2 1; t 2 1; k y
k 2 1; t 2 2
3.4.3. Introducing a new function Next, we consider the second problem which causes an inappropriate induction step. To simplify the problem, hereafter, we will discuss only the essential part, ignoring the base case and the exceptions. In the actual veri®cation, the proofs for the base case and the exceptions are executed individually. Let us consider the de®nition of yout:
#
2
yout
k; t; k y
k; t The discrepancies that led the prover to generate the inappropriate induction step are: 1. yout is de®ned for computing the output of the u-th cell using the output of the (u 2 1)-th cell, with k, the number of cells, ®xed. On the other hand, y is de®ned for computing the output of the circuit consisting of k cells using the output of the circuit consisting of k 2 1 cells. Hence, the meanings of the two recursions are different. 2. The variable t decreases by 1 at every recursive call in the de®nition of yout, while it decreases by 2 in the de®nition of y. These are the essential problems encountered in the veri®cation of systolic arrays with feedback loops using induction-based theorem provers. Consider the meaning of the left-hand-sides of the induction hypotheses of formulae 1 and 2. yout(k 2 1,t 2 1,k 2 1) of formula 1 stands for the output of cellk21, of the systolic array which consists of k 2 1 cells. On the other hand, yout(k 2 1,t 2 1,k) of formula 2 stands for the output of
yout
u; t; k x
t 2 2 2
k 2 u £ w
u 1 yout
u 2 1; t 2 1; k: The second argument t of yout decreases by 1 at every recursive call, while the ®rst argument u also decreases by 1. Thus, the argument of x, t 2 2 2 (k 2 u) decreases by 2 at every recursive call. Therefore, we want to re-de®ne the function yout as a new recursive function whose argument decreases by 2 at every recursive call. We introduce y1 which is de®ned by yout, and derive its recursive form by the unfold/fold technique, which is often used in program transformation. First, we introduce a new function y1: y1
k; tdef yout
k; t; k
48
K. Takahashi, H. Fujita / Arti®cial Intelligence in Engineering 13 (1999) 43±53
Table 2 Input/output timing scheme Input port
Output port
Clock
a4
a3
a2
a1
b4
b3
b2
b1
c4
c3
c2
c1
c0 3
c0 2
c0 1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
0 0 0 0 0 0 0 a14 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 a13 0 0 a24 0 0 0 0 0 0 0 0 0
0 0 0 a12 0 0 a23 0 0 a34 0 0 0 0 0 0 0 0
0 a11 0 0 a22 0 0 a33 0 0 a44 0 0 0 0 0 0 0
0 0 0 0 0 0 0 b41 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 b31 0 0 b42 0 0 0 0 0 0 0 0 0
0 0 0 b21 0 0 b32 0 0 b43 0 0 0 0 0 0 0 0
0 b11 0 0 b22 0 0 b33 0 0 b44 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 c11 0 0 c22 0 0 c33 0 0 c44
0 0 0 0 0 0 0 0 0 c12 0 0 c23 0 0 c34 0 0
0 0 0 0 0 0 0 0 0 0 c13 0 0 c24 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 c14 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 c21 0 0 c32 0 0 c43 0 0
0 0 0 0 0 0 0 0 0 0 c31 0 0 c42 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 c41 0 0 0 0 0 0
Note that y1 corresponds to the output of the unit cell that is specialized to the output port. We unfold this de®nition by yout to get the following formula: y1
k; t x
t 2 2 £ w
k 1 yout
k 2 1; t 2 1; k Next, applying Lemma 3.2, we rewrite the second term of the right-hand-side: y1
k; t x
t 2 2 £ w
k 1 yout
k 2 1; t 2 2; k 2 1: Finally, we fold the second term of the right-hand-side by y1 to obtain the recursive de®nition of y1, in which argument t decreases by 2 at every recursive call: Lemma 3.3. y1
k; t x
t 2 2 £ w
k 1 y1
k 2 1; t 2 2 We prove the correctness of this formula as a lemma. Then the proof easily succeeds. 3.4.4. Equivalence of sequences Lastly, we show the equivalence: Lemma 3.4. y1
k; t y
k; t It is trivial and proved immediately. After all, the main theorem is proved with the four main lemmas.
4. Two-dimensional systolic array In this section, we apply the veri®cation method to the two-dimensional hexagonal systolic array which computes matrix multiplication. 4.1. Speci®cation Let C (cij) be the result of multiplication of k £ k matrices A (aij) and B (bij). Then, the following relation holds: cij
k X n1
ain bnj
Assume that A is an upper-half triangular matrix, and B is a lower-half triangular matrix. That is, if i . j, then aij 0, and if j . i, then bij 0. Then, the relation is represented in the following recursive form in the Boyer±Moore logic. (
c
i; j; k
0 a
i; k £ b
k; j 1 c
i; j; k 2 1
if k 0 or i . k or j . k otherwise
The values aij and bij have to be fed into the ports a1,¼,a4 and b1,¼,b4 at the time as indicated in Table 2. The values cij are produced at the ports c1,¼,c4, c0 1, c0 2, c0 3 according to the scheme given in the table. The black box diagram of the circuit for k 4 is shown in Fig. 5. In this ®gure, a1,¼,a4 and b1,¼,b4 are input ports for matrix A and matrix B, respectively, and c1,¼,c4, c0 1, c0 2, c0 3 are output ports for matrix C. Let x and y be global input signals, and z be a global output signal. Since there are several global input/output
K. Takahashi, H. Fujita / Arti®cial Intelligence in Engineering 13 (1999) 43±53
Fig. 5. Black box view of two-dimensional systolic array.
signals, they are declared as the functions of both time and position of the port. x(t,v) and y(t,u) denote the values of input signals to input ports av and bu at a time t, respectively. z(k,v,t) and z(u,k,t) denote the values of the output signals from output ports cv and c0 u, at a time t, respectively. According to Table 2, the following correspondence holds: aij $ x
i 1 2j 2 2; j 2 i 1 l ; bij $ y
2i 1 j 2 2; i 2 j 1 1; ( z
i 2 j 1 k; k; 2i 1 j 1 2k 2 3 cij $ z
k; j 2 i 1 k; 2i 1 j 1 2k 2 3
if i # j if i . j
Then, the above problem of matrix multiplication is represented as the relation of the global input/output signals in the Boyer±Moore logic as follows: Speci®cation z
u; k; t 8 0 if k 0 or u 1 r 1 3 . 6k or t 2 2u 1 3 . 3k > > > > > u1t 2
u 1 t > > 1 k 2 1; 1 2k £ x > < 3 3 > t 2 2u 2
t 2 2u > > 1 2k 2 1; 1k 1 y > > 3 3 > > > : otherwise z
u 2 1; k 2 1; t 2 2 where both
u 1 t=3 and
t 2 2u=3 are integers. 4.2. Implementation 4.2.1. General architecture Fig. 6 shows a two-dimensional hexagonal systolic array which computes matrix multiplication for k 4. Generally, k £ k cells are used for the size of matrices k. We take ku,vl-coordinate axes shown in the ®gure and identify each cell by its coordinate ku,vl. kk,vl (v 1,¼,k) are input ports av and ku,kl (u 1,¼,k) are input ports bu. kk,vl and ku,kl (u,v 1,¼,k) are output ports cv and c0 u, respectively. Each value is fed every three clock cycles at each input port, or produced also every three clock cycles at each output port.
49
4.2.2. Implementation of one cell The structure of a unit cell is shown in Fig. 7. In this ®gure, R1, R2 and R3 stand for registers whose initial values are 0. Each cell ku,vl has three inputs Ain, Bin, Cin and three outputs Aout, Bout, Cout. Ain is sent to cell ku 2 1,vl as Aout after stored in R1. Similarly, Bin is sent to cell ku,v 2 1l as Bout after stored in R2. Cin and the product of Ain and Bin are added, and the result is sent to cell ku 1 1, v 1 1l as Cout after stored in R3. For a systolic array consisting of k £ k cells, let local_ xout(u,v,t,k), local_yout(u,v,t,k) and local_zout(u,v,t,k) be the values of the output signals Aout, Bout and Cout of cell ku,vl at a time t, respectively. Then, they are de®ned as follows: Implementation local_xout
u; v; t; k 8 0 if
1 # u # k and t 0 or u 0 or > > > > > < u . k 1 1 or k 0 > > else if u k 1 1 x
t > > > : otherwise local_xout
u 1 1; v; t 2 1; k local_yout
u; v; t; k 8 0 if
1 # v # k and t 0 or v 0 or > > > > > < v . k 1 1 or k 0 > y
t > else if v k 1 1 > > > : otherwise local_yout
u; v 1 1; t 2 1; k local_zout
u; v; t; k 8 0 if
t , 1 and 1 # u # k and 1 # v # k or > > > > > u 0 or u . k or v 0 or v . k or k 0 > > < local_xout
u 1 1; v; t 2 1; k £ > > > > local_yout
u; v 1 1; t 2 1; k 1 > > > : otherwise local_zout
u 2 1; v 2 1; t 2 1; k
4.3. The main theorem It is suf®cient to prove for u # v because of the symmetrical structure of the circuit. Therefore, we assume that 1 # u # v # k and examine only for the output ports ku,kl (u 1,¼,k). The de®nition of local_zout shows the relation of global input/output signals when v k. Therefore, we try to prove the following main theorem:
50
K. Takahashi, H. Fujita / Arti®cial Intelligence in Engineering 13 (1999) 43±53
with the one-dimensional case. However, Lemma 4.4 is not trivial in this case. Here, we encounter another problem of verifying the equivalence of shifted sequences.
Fig. 6. A structure of the matrix multiplier.
Main theorem
t $ 2 £ k and k $ 1 ! local_zout
u; k; t; k z
u; k; t 4.4. Veri®cation 4.4.1. Four main lemmas The following four main lemmas are created applying the veri®cation method used in the one-dimensional case. Lemma 4.1. Removing recursion
4.4.2. Equivalence of shifted sequences In this subsection, we discuss the technique of verifying the equivalence of shifted sequences, appearing in Lemma 4.4. The functions z and z1 have the same arguments (u,k and t), and have the same argument patterns in their recursive calls. P Although both functions calculate the sum of the form (x(m,n) £ y(p,q)), their intervals for the summation are different. To be brief, we consider the following equivalence of the summations (of `shifted sequences') which explains exactly the case occurring in the aforementioned situation: n X
s
i
im
nX 2m
t
i
3
i0
where the left-hand-side corresponds to the summation calculated by z1, the right-hand-side to that by z. Here, it should be noted that the following equalities hold:
local_zout
u; v; t; k zout
u; v; t; k
s
n ¼
n 2 m 1 1 t
m 2 1 ¼ t
1 t
0 0
4
where
s
i t
i for m # i # n 2 m
zout
u; v; t; k 8 0 if t , 1 or t 2 1 , k 2 u or t 2 1 , k 2 v or > > > > > < u 0 or u . k or v 0 or v . k or k 0 > > x
t 2 1 2
k 2 u; v £ y
t 2 1 2
k 2 v; u 1 > > > : otherwise zout
u 2 1; v 2 1; t 2 1; k
To prove Eq. (3) the system would generate the following induction step: nX 21
s
i
n 2X m21
im
t
i !
n X im
i0
s
i
5
nX 2m
t
i
6
i0
Lemma 4.2. Bridging two induction steps
The system would then try to prove the formula s(n) t(n 2 m), failing in the end since the equality does not hold. However, in fact, (3) holds due to the conditions (4) and (5), and this is proved by splitting the summation into two parts.
zout
u; k 2 1; t; k zout
u; k 2 1; t 2 1; k 2 1
1. The summation for zero elements:
Lemma 4.3. Introducing a new function z1
u; k; t 8 0 if k 0 or u , 1 or k , u or t 2 1 , k 2 u > > < x
t 2 1 2
k 2 u; k £ y
t 2 1; u 1 > > : otherwise z1
u 2 1; k 2 1; t 2 2 where
n X
s
i 0
in 2 m 1 1
This is proved by introducing a supplemental recursive function de®ned over the region i $ n 2 m 1 1, and by using Eq. (4). Also, the following equality holds: mX 21
t
i 0
i0
z1
u; k; t zout
u; k; t; k Lemma 4.4. Equivalence of sequences z1
u; k; t z
u; k; t The ®rst three lemmas can be proved in a similar manner
2. The summations for non-zero elements: nX 2m im
s
i
nX 2m im
t
i
K. Takahashi, H. Fujita / Arti®cial Intelligence in Engineering 13 (1999) 43±53
This is easily proved by induction under the condition (5). 5. Summary of the method We have described the framework of veri®cation for systolic arrays using two typical examples. Since the method does not depend on a speci®c systolic structure, it is applicable to any other systolic array. The method is summarized as follows. speci®cation. Write down a given speci®cation with vectors or matrices in a usual manner. Then according to the timing requirements, translate the description into the relations of the global input/output signals Spec(t, io_port), where t is a time and io_port is the position of the input/output port in the circuit. implementation. Describe the implementation of a unit cell Impl(t,pos_cir) where t is a time and pos_cir is a position of a cell in the circuit. veri®cation. Prove that Impl(t, io_port) satis®es Spec(t, io_port) as the main theorem. The main theorem claims that the relation of input/output signals of the cells corresponding to the input/output ports satis®es that of global input/output signals. We prove the main theorem using BMTP. An attempt to prove the given theorem as it is without providing any lemmas tends to result in a failure. The key to obtaining a successful proof is to guide the prover to performing an appropriate application of induction. When there is a variable whose value decreases at every recursive call, induction is applied on the variable. Assume that such a variable appears in more than one position in the formula being proved. If the decreasing rates of values are different at each occurrence of the variable, then the induction may be applied in a wrong way. This is the case we have encountered while dealing with systolic arrays that have feedback loops. In general, the user has to examine the failed proof to ®nd the cause of the failure, and prepare appropriate lemmas for the system to perform successful induction. In the case of systolic arrays, we have used the following techniques to derive such lemmas. ² Transform the recursive de®nition which has two base cases into the de®nition without including recursions, and prove their equivalence.
Fig. 7. A unit cell of the matrix multiplier.
51
² Create the lemma that relates the undesirable induction hypothesis generated by the system to the user-intended one, and prove it. ² Introduce a new function and transform it to the recursive form, then prove the correctness of the obtained formula. ² Divide the summation of a sequence into parts, and perform the proof stepwise. The obtained lemmas re¯ect the structures of systolic arrays. Hereafter, we list up such features that are re¯ected by the lemmas. ² The equivalence of the naive description of the relation of input/output signals of a unit cell and the de®nition in which each output signal is de®ned only by itself and the global inputs. ² The correspondence of the output signals of the two systolic arrays consisting of k cells and k 2 1 cells. ² The correctness of the recursive de®nition of a unit cell which is specialized to the input/output port. ² The equivalence of shifted sequences: the given sequence as a speci®cation and the one derived from an implementation. In fact, these lemmas could be suggested by the failed proof. However, it is more effective to devise the suitable lemmas by considering the function of the circuit than to squeeze out the plausible lemmas by examining the failed proof. 6. Experimental results Table 3 shows the experimental results of the veri®cations for one-dimensional and two-dimensional systolic arrays on SPARC Server 670MP. We use the system NQTHM (1992 version), as the Boyer±Moore Theorem Prover. We compare the results on the two systolic arrays. In the ®rst step, two-dimensional case needed 1.5 times as many sublemmas and consumed 1.5 times as much CPU time as those of one-dimensional case. This is because there are three user-de®ned functions involved in the proof for the two-dimensional case, while there are two for the one-dimensional case. Most of the sublemmas used in this step are formulae representing explicit case split, base cases and exceptions. These trivial sublemmas can be created easily, whereas four main lemmas are dif®cult to hit upon. In the second step, although the number of sublemmas are almost equal for the two cases, the CPU time consumed for the two-dimensional case is about ®ve times greater than that for the one-dimensional case. This is because more case splits occurred due to the additional parameter in the twodimensional case. Moreover, it needs much time to simplify formulae for each case since the formula for the twodimensional case is more complicated than that for the one-dimensional case.
52
K. Takahashi, H. Fujita / Arti®cial Intelligence in Engineering 13 (1999) 43±53
Table 3 Experimental results Step (lemma) Base case Bridging New function Shifted sequence Total
CPU time (s) 76.8 29.1 1.5 0.1 107.5
One-dimension Number of sublemmas 11 7 1 1 20
In the third step, the proof does not depend on the parameters standing for the positions. The CPU time is very short in both cases comparing with those consumed in the other steps. In the last step, the lemma is trivial in the one-dimensional case, whereas the lemma is considerably dif®cult to prove in the two-dimensional case, hence the large difference of CPU time. All the derived lemmas re¯ect the typical features of systolic arrays in general, but not depending on a speci®c instance. In fact, the lemmas used in the ®rst three steps of the two-dimensional systolic array are created simply by extending those used in the one-dimensional case. In a similar manner, lemmas for any other systolic array would be derived easily by modifying the lemmas we have already obtained. 7. Concluding remarks We have presented a method for verifying systolic arrays with an inductionbased theorem prover BMTP. With the method, we derived all the relevant lemmas for appropriate application of induction in a systematic way. Although there are lots of research on hardware veri®cation using theorem provers, few works on formalization of verifying systolic arrays have been undertaken. Our contribution is that we have shown the framework of the veri®cation procedure using two typical structures of systolic arrays which are provided as benchmarks [10]. The proposed method is readily applicable to veri®cation of other circuits that have regular structures with feedback loops. It is also applicable to tactic-oriented provers such as HOL [6] and PVS [12] by adapting the method for the prooftactics of induction. Recently, the new verson of BMTP called ACL2 is developed [8] by scaling up NQTHM to have the `industrial strength.' It enables more ef®cient use of programming primitives, automatic use of certain rules, database tools and so on. Most of the trivial sublemmas required in our veri®cation may not be necessary if ACL2 is used. Yet we suspect its ability to ®nd or suggest more `intelligent' lemmas that are the key to the veri®cation. As a matter of fact, state-of-the-art provers are not so powerful as to perform veri®cation full-automatically. In
CPU time (s) 184.3 158.9 1.8 47.3 392.3
Two-dimension Number of sublemmas 16 9 1 16 42
order to lighten users' burden, it is most important to provide them with relevant lemmas. We presented such lemmas speci®cally useful for verifying systolic arrays. Those lemmas developed so far are to be compiled into a library that can be referred as a `lemma-base' for later use. Extending the lemma-base, one could make a theorem prover to have substantial knowledge about circuits in general and to be well adapted to their veri®cation. References [1] Boyer RS, Moore JS. A computational logic handbook. New York: Academic Press, 1988. [2] Bronstein A, Talcott CL. Formal veri®cation of synchronous circuits based on string-functional semantics: the 7 Paillet circuits in Boyer± Moore. In: Sifakis J, editor. Automatic veri®cation methods for ®nite state systems. Springer, 1989:317±333. [3] Bryant RE. Graph-based algorithms for Boolean function manipulation. IEEE Transactions on Computers 1986;C35:677±691. [4] German SM, Wang Y. Formal veri®cation of parameterized hardware designs. In: Proceedings of the International Conference on Computer Design. IEEE, 1985:549±552. [5] Goldschlag DM. Mechanically verifying safety and liveness property of delay insensitive circuits. In: Larsen KG, Skou A, editors. Computer aided veri®cation. Berlin: Springer, 1991:354±364. [6] Gordon MJC. HOL: A proof generating system for higher-order logic. In: Birtwistle G, Subrahmanyam PA, editors. VLSI Speci®cation, veri®cation and synthesis. Boston, MA: Kluwer Academic Publishers, 1988:73±128. [7] Hunt Jr WA. FM8501: A veri®ed microprocessor, PhD thesis, University of Texas at Austin, 1985 (Also available through Computational Logic Inc). [8] Kaufmann M, Moore JS. ACL2: An industrial strength version of Nqthm. In: Proceedings of Eleventh Annual Conference on Computer Assurance. IEEE Computer Society Press, 1996:23±34. [9] Kinniment DJ, Koelmans AM. Modelling and veri®cation of timing conditions with Boyer±Moore Prover. In: Stavridou V, Melham T, Boute RT, editors. Theorem provers in circuit design. Amsterdam: Elsevier/North-Holland, 1992:111±127. [10] Kropf T. Benchmark-circuits for hardware-veri®cation. In: Theorem provers in circuit design 1995:1±12, http://goete.ira.uka.de/benchmarks, TPCD Benchmarks. [11] Kung HT. Why systolic architectures?. IEEE Computer 1982;X: pp. 37±46. [12] Owre S, Rushby JM, Shankar N. PVS: a prototype veri®cation system. In: 11th International Conference on Automated Deduction. New York: Springer, 1992:748±752. [13] Purushothaman S, Subrahmanyam PA. Mechanical certi®cation of systolic algorithms. Journal of Automated Reasoning 1989;5:67±91. [14] Takahashi K, Fujita H. Time parameterized function method: a new method for hardware veri®cation with the Boyer±Moore Theorem
K. Takahashi, H. Fujita / Arti®cial Intelligence in Engineering 13 (1999) 43±53 Prover. In: Proceedings of CHDL'95 (IFIP Conference on Hardware Description Languages and Their Applications). Chiba, Japan, 1995:545±552. [15] Takahashi K, Fujita H. TPF: an effective method for verifying synchronous circuits with induction-based provers. IEICE Transactions on Information and Systems 1998;E81D:12±18. [16] Takahashi K, Fujita H. Veri®cation of systolic arrays using induction-
53
based theorem provers. Journal of Information Processing 1998;39:2323±2330 (in Japanese). [17] Verkest D, Vandenbergh J, Claesen L, de Man H. A description methodology for parameterized modules in the Boyer±Moore logic. In: Stavridou V, Melham T, Boute RT, editors. Theorem provers in circuit design. Amsterdam: Elsevier/North-Holland, 1992:37±57.