Journal of Systems Architecture 51 (2005) 424–434 www.elsevier.com/locate/sysarc
An application of functional decomposition in ROM-based FSM implementation in FPGA devices Mariusz Rawski a
a,*
, Henry Selvaraj b, Tadeusz Łuba
a
Institute of Telecommunications, Warsaw University of Technology, Nowowiejska 15/19, 00-665 Warsaw, Poland b University of Nevada, Las Vegas 4505 Maryland Parkway, Las Vegas, NV 89154-4026, USA Received 3 March 2003; received in revised form 15 January 2004; accepted 1 July 2004 Available online 8 December 2004
Abstract Modern FPLD devices have very complex structure. They combine PLA like structures, as well as FPGA and even memory-based structures. However lack of appropriate synthesis methods do not allow fully exploiting the possibilities the modern FPLDs offer. The paper presents a general method for the synthesis targeted to implementation of sequential circuits using embedded memory blocks. The method is based on the serial decomposition concept and relies on decomposing the memory block into two blocks: a combinational address modifier and a smaller memory block. An appropriately chosen decomposition strategy may allow reducing the required memory size at the cost of additional logic cells for address modifier implementation. This makes possible implementation of FSMs that exceed available memory by using embedded memory blocks and additional programmable logic. 2004 Elsevier B.V. All rights reserved. Keywords: Logic synthesis; FPGAs; Functional decomposition; ROM; Finite state machine
1. Introduction Decomposition has become an important tool in the analysis and design of digital systems. It is fundamental to many fields in modern engineering and science [3,6,9,17–19]. The functional decomposition relies on breaking down a complex system into a network of smaller and relatively indepen*
Corresponding author. Tel.: +48 22 825 15 80. E-mail address:
[email protected] (M. Rawski).
dent co-operating sub-systems, in such a way that the original systemÕs behavior is preserved. A system is decomposed into a set of smaller sub-systems, such that each of them is easier to analyze, understand and synthesize. By taking advantage of the opportunities the modern microelectronic technology provides us with, we are in a position to build very complex digital circuits and systems at relatively low cost. There is a large variety of logic building blocks that can be exploited. The library of elements
1383-7621/$ - see front matter 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.sysarc.2004.07.004
10000M 1000M
10000K 1000K
100M
100K 10M
10K
1M
1K
0.1M
21% annual growth rate in productivity
0.01M
0.1K 0.01K
1980
1985
1990
1995
2000
2005
2010
Fig. 1. Difference in growth of device complexity and productivity.
contains various types of gates; a lot of complex gates can be generated in (semi-)custom CMOS design; and the field programmable logic families include different types of (C)PLDs and FPGAs. However, the opportunities created by modern microelectronic technology are not fully exploited because of weaknesses in traditional logic design methods. According to International Technology Roadmap for Semiconductors (1997) annual growth rate in complexity of digital devices is equal to 58% while annual growth rate in productivity is only 21% (Fig. 1). This means that the number of logic gates available in modern devices grows faster than the ability to design them meaningfully. New methods are required to aid design process in a way that possibilities offered by modern microelectronics are utilized in the highest possible degree [2,5,8,14]. Recently, new methods of logic synthesis based on the functional decomposition have been developed [1,4,7,11,15]. Unfortunately decomposition-
Clock Lock
IOE
LUT
LUT
LUT
IOE
Product Term
Product Term
Product Term
Memory
Memory
Memory
IOE
425
based methods are considered as methods suitable mainly for implementation of combinational functions. Modern FPLD devices have very complex structure (Fig. 2). They combine PLA like structures, as well as FPGA and even memory-based structures. In many cases, designers cannot utilize all possibilities such complex architectures provide due to the lack of appropriate synthesis methods. Embedded memory arrays make possible an implementation of memory like blocks, such as large registers, FIFOs, RAM or ROM modules. These memory resources make up considerably large part of the device, i.e. EP20K1500E devices provides 51,840 logic cells and 442 Kbits of SRAM. Taking under consideration conversion factors of logic elements and memory bits to logic gates (12 gates per logic element and four gates per memory bit) it turns out that embedded memory array make up over 70% of all logic resources. In many cases, though, these resources are not utilized, since not every design consist of such modules as RAM or ROM. However, such embedded memory blocks can be used for implementation of sequential machines in a way that requires less logic cells than the traditional flip-flop based implementation. This may be used to implement ‘‘non-vital’’ sequential parts of the design, saving logic cell resources for more important sections. Since the size of embedded memory blocks is limited, such an implementation may require more memory than available in a device. To reduce
100000K 58% annual growth rate in complexity
Productivity in Trans/Staff-month
Logic Transistors per Chip
M. Rawski et al. / Journal of Systems Architecture 51 (2005) 424–434
IOE
IOE
LUT
LUT
LUT
Product Term
Product Term
Product Term
Memory
Memory
Memory
IOE
IOE
IOE
Fig. 2. Structure of modern programmable devices.
IOE
IOE
426
M. Rawski et al. / Journal of Systems Architecture 51 (2005) 424–434
memory usage in ROM-based sequential machine implementations, decomposition-based methods can be successfully used [10]. In this paper, basic information is introduced first. Secondly, application of decomposition in the implementation of sequential machines is presented. Subsequently, some experimental results, obtained with a prototype tool that implements functional decomposition, are discussed. The experimental results demonstrate that decomposition is capable of constructing solutions (utilizing embedded memory blocks) of comparable or even better quality than the methods implemented in commercial systems.
Table 1 Truth table of example function
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
x1x2x3x4x5
y1y2y3
00000 00011 00010 01100 01101 01110 01000 11000 11010 11100 11111 11110 10001 10011 10010
000 010 100 011 001 010 001 001 000 100 011 010 001 000 100
2. Basic notions 2.1. Functional decomposition Let A and B be two sub-sets of X such that A [ B = X. Assume that the variables x1, . . ., xn have been relabeled in such way that: A ¼ fx1 ; . . . ; xr g
and
B ¼ fxnsþ1 ; . . . ; xn g:
Consequently, for an n-tuple x, the first r components are denoted by xA and the last s components by xB. Let F be a Boolean function, with n > 0 inputs and m > 0 outputs. Let (A, B) be as defined above. Assume that F is specified by a set F of the functionÕs cubes. Let G be a function with s inputs and p outputs; let H be a function with r + p inputs and m outputs. The pair (G, H) represents a serial decomposition of F with respect to (A, B), if for every minterm b relevant to F, G(bB) is defined, G(bB) 2 {0, 1}p and F(b) = H(bA,G(bB)). G and H are called blocks of the decomposition. Partition-based representation of Boolean functions can be used to describe functional decomposition algorithms [3,9,12,13,16]. If there exists a r-partition PG on F such that P(B) 6 PG, and P(A) • PG 6 PF, then F has a serial decomposition with respect to (A, B). The serial decomposition process consists of the following steps: an input support selection (the most time-consuming part of the process), calculation of partitions P(A), P(B) and PF, construction
of partition PG, and creation of functions H and G [12]. Example 1. Serial functional decomposition. Let us decompose a function F of five input variables and three binary output variables as it is shown in Table 1. For A = {x1, x3, x4}, B = {x2, x5} we obtain: P ðAÞ ¼ ð1; 7; 8; 13; 2; 3; 9; 14; 15; 4; 5; 10; 6; 11; 12Þ;
P ðBÞ ¼ ð1; 3; 15; 2; 13; 14; 4; 6; 7; 8; 9; 10; 12; 5; 11Þ and PG ¼ ð1; 3; 5; 11; 15; 2; 4; 6; 7; 8; 9; 10; 12; 13; 14Þ: It can be easily verified that since P(A) • PG 6 PF, where PF = (1,9,14; 5,7,8,13; 2,6,12; 4,11; 3, 10,15), function F is decomposable as F = H(x1, x3, x4, G(x2, x5)), where G is one-output function of two variables. The r-admissability test presented below allows to obtain the set A of variables which should be connected directly to the circuit H and for which there exists function G (generally with t outputs) such that jAj + t < n, where n = jXj. Let P1, . . ., Pk be partitions on M, the set of minterms of a function F. The set of partitions {P1, . . ., Pk} is r-admissible in relation to partition h if and only if there is a set {Pk+1, . . ., Pr} of twoblock partitions such that the product p of parti-
M. Rawski et al. / Journal of Systems Architecture 51 (2005) 424–434
ables, which can be connected to the circuit H directly, we compute sets of t-admissible partitions P(xi) only, where t is the given number of inputs of the circuit H.
X A
427
B
Example 2. R-admissibility evaluation. Consider the following set of partitions on M = {1, . . ., 15}.
G ΠG
P 1 ¼ ð1; . . . ; 7; 8; . . . ; 15Þ P 2 ¼ ð1; 2; 3; 13; 14; 15; 4; . . . ; 12Þ
H
P 3 ¼ ð1; 2; 3; 7; 8; 9; 13; 14; 15; 4; 5; 6; 10; 11; 12Þ P 4 ¼ ð1; 4; 5; 7; 8; 10; 13; 2; 3; 6; 9; 11; 12; 14; 15Þ
F Fig. 3. Schematic representation of the serial decomposition.
tions P1, . . ., Pk, Pk+1, . . ., Pr satisfies the inequality p 6 h, and there does not exist any set of r–k–1 two-block partitions which meets this requirement. The r-admissibility has the following interpretation. If a set of partitions {P1, . . ., Pk} is r-admissible, then there might exists a serial decomposition of F (Fig. 3) in which component H has r inputs: k primary inputs corresponding to input variables which induce {P1, . . ., Pk} and r–k inputs being outputs of G. Thus, to find a decomposition of F in which component H has r inputs, we must find a set of input variables which induces an r-admissible set of input partitions. The following theorem can be applied to check whether or not a given set of input partitions is r-admissible. Theorem 1. Let for r 6 s, sjr denotes the quotient partition and g(sjr) be the number of elements in the largest block of sjr. Let e(sjr) denotes the smallest integer equal to or larger than log2g(sjr), i.e. e(sjr) = dlog2g(sjr)e. Then {P1, . . ., Pk} is r-admissible, where
P 5 ¼ ð1; 3; 4; 6; . . . ; 10; 12; 15; 2; 5; 11; 13; 14Þ P F ¼ ð1; 9; 14; 5; 7; 8; 13; 2; 6; 12; 4; 11; 3; 10; 15Þ These partitions represent function F from Table 1, where Pi denotes P(xi). By examining admissability of P1 we obtain P 1 P F ¼ ð1; 9; 14; 5; 7; 8; 13; 2; 6; 12; 4; 11; 3; 10; 15Þ P 1 j P 1 P F ¼ ðð1Þð2; 6Þð3Þð4Þð5; 7Þ; ð8; 13Þð9; 14Þ ð10; 15Þð11Þð12ÞÞ Hence, r-admissability, r = 1 + d log25e = 4, i.e. r(P1) = 4. Also, as the quotient partition pjp•PF, where p = P1 • P3 • P4 is equal to ðð1Þð7Þ; ð8; 13Þ; ð2Þð3Þ; ð9; 14Þð15Þ; ð4Þð5Þ; ð10Þ; ð6Þ; ð11Þð12ÞÞ; the set {P1, P3, P4} is 4-admissible. Similarly, r = 4 for set {P3, P4, P5}. Therefore, F might be realized in the circuit H with inputs x1, x3, x4, g1 or x3, x4, x5, g2 where F = H(x1, x3, x4, G1(x2, x5)) or F = H(x3, x4, x5, G2(x1, x2)), respectively. 2.2. Finite state machine
r ¼ k þ eðp j pF Þ; moreover p is the product of partitions P1, . . ., Pk and pF = p • PF. Theorem 1 provides an approach to evaluate the admissability of a set of input variable partitions P(xi). In searching a maximum set of vari-
Let A = hV, S, di be an FSM (completely or incompletely specified) with no outputs (outputs are omitted as they have insignificant impact on the method), where V is the set of input symbols, S is the set of internal states, and d is the state transition function, and m = d log2jVje, n = d log2jSje
428
M. Rawski et al. / Journal of Systems Architecture 51 (2005) 424–434
denote the number of input and state variables respectively. To describe logic dependencies in such an FSM special partition description [3] and special partition algebra [6] are employed. Let K be a one-to-one correspondence between the domain Dd of transition function and K = {1, . . ., p}, where p = jDdj. The characteristic partition Pc of an FSM is defined in the following way: ðk 1 ; k 2 Þ 2 BPc
iff
dðK 1 ðk 1 ÞÞ ¼ dðK 1 ðk 2 ÞÞ:
Thus, each block BP c of the characteristic partition includes these elements from K which correspond to pairs (v, s) from the domain Dd such that the transition function d(v, s) = s 0 maps them onto the same next state s 0 . Example 3. For the FSM shown in Table 2a with K function as in Table 2b: P c ¼ ð1; 8; 11; 14; 2; 6; 10; 16; 9; 12; 13; 3; 5; 7; 15; 4Þ: A partition P on K is compatible with partition p on S iff for any inputs va, vb the condition that si, sj belong to one block of partition p implies that the elements from K corresponding to pairs (va, si) and (vb, sj) belong to one block of the partition P. A partition P on K is compatible with partition h on V iff for any state sa, sb the condition that vi, vj belong to one block of partition h implies that the elements from K corresponding to pairs (vi, sa) and (vj, sb) belong to one block of the partition P. In particular, a partition P on K is compatible with the set {p, h} if it is compatible with both p and h, while it is compatible with set {p1, . . ., pa} of partitions on S (or set {h1, . . ., ha} of partitions
on V) iff it is compatible with p = p1 • p2 • p3 • •pa (h = h1 • h 2 • h3 • • ha). Example 4. For FSM from Table 2: Partition P1 = (1, 2, 3, 9, 10, 11, 12; 4, 5, 6, 7, 8, 13, 14, 15, 16) is compatible with partition p = (s1, s3; s2, s4, s5), while partition P2 = (1, 2, 9, 10; 3, 4, 5, 11, 12; 6, 13, 14; 7, 8, 15, 16) is compatible with {p, h}: p ¼ ðs1 ; s2 ; s3 ; s4 ; s5 Þ;
h ¼ ðv1 ; v2 ; v3 ; v4 Þ:
3. ROM implementation of finite state machines FSM can be implemented using Read Only Memory (ROM) [10]. Fig. 4 shows the general architecture of such an implementation. State and input variables (q1, q2, . . ., qn and x1, x2, . . ., xm) constitute ROM address variables (a1, a2, . . ., am+n). The ROM would consist of words, each storing the encoded present state (control field) and output values (information field). The next state would be determined by the input values and the present-state information feedback from memory. This kind of implementation requires much fewer logic cells than the traditional flip-flop X n
m
Register (m + n)
Table 2 (a) FSM table, (b) transition mapping K
ROM
y
F
Q
Fig. 4. Implementation of FSM using memory blocks.
M. Rawski et al. / Journal of Systems Architecture 51 (2005) 424–434
implementation (or does not require them at all, if memory can be controlled by clock signal—no register required); therefore, it can be used to implement ‘‘non-vital’’ FSMs of the design, saving LC resources for more important sections of the design. However, a large FSM may require too much of buried memory resources. The size of the memory needed for such an implementation depends on the lengths of the address and memory word. Let m be the number of inputs, n be the number of state encoding bits and y be the number of output functions of FSM. The size of memory needed for implementation of such an FSM can be expressed by the following formula:
429
X
n
m
b a
Address modifier c< b
Register w = (a + c) w < (m + n)
M ¼ 2ðmþnÞ ðn þ yÞ; where m + n is the size of the address, and n + y is the size of the memory word. Since modern programmable devices contain embedded memory blocks, there exists a possibility to implement FSM using these blocks. The size of the memory blocks available in programmable devices is limited, though. For example, AlteraÕs FLEX family Embedded Array Block (EAB) is of 2048 memory bits capacity. A single device of this family consists of several such blocks (i.e. FLEX10K10 consists of three EABs). Functional decomposition can be used to implement FSMs that exceeded that size. 3.1. Address modifier Any FSM, say A, defined by a given transition table can be implemented as in Fig. 5 using an address modifier. If p1, . . ., pn are partitions on S, h1, . . ., hm are partitions on V, and Pk is partition on K compatible with either pi or hj then P = {P1, . . ., Pm+n} is the set of all partitions compatible with {p1, . . ., pn, h1, . . ., hm}. Partitions p1, . . ., pn correspond to state variables and h1, . . ., hm correspond to input variables. To achieve unambiguous encoding of address variables, and at the same time maintaining the consistency relation K with the transition function d, partitions P1, . . ., Pw have to be found, such that:
ROM
y
F
Q
Fig. 5. Implementation of FSM using an address modifier.
P 1 P 2 P 3 P w 6 P c: This is the necessary and sufficient condition for {P1, . . ., Pw} to determine the address variables. This is because each memory cell is associated with a single block of Pc, i.e. with those elements from K which map the corresponding (v, s) pairs onto the same next state. The selection of w (w < n + m) partitions from the set {P1, . . ., Pm+n} is made such that they produce the simplest addressing unit. Such a selection is possible thanks to the notion of r-admissibility [9]. A set {P1, . . ., Pk} is r-admissible in relation to partition P iff there is a set {Pk+1, . . ., Pr} of two block partitions such that the following condition holds: P 1 P 2 P 3 P k P kþ1 P r 6 P c ; and no set of r–k–1 partitions exist which meets this requirement.
430
M. Rawski et al. / Journal of Systems Architecture 51 (2005) 424–434
For partition q 6 r let rjq denote the quotient partition and e(rjq) the number of elements in the largest block of rjq. Let e(rjq) be the smallest integer equal to or larger than log2e(rjq) (i.e. e(rjq) = d log2e(rjq)e). Then the r-admissibility of {P1, . . ., Pk} is r = k + e(pjpf), where p is the product of P1, . . ., Pk and pf is the product of p and P. If P = {P1, . . ., Pk} is r-admissible in relation to P then each sub-set of P is r 0 -admissible, where r 0 6 r. The smallest partition, i.e. one where each element is a separate block, will be denoted as P(0), p(0), h(0), etc. 3.2. Input/state encoding The source of the complexity of the address modifier is in the address variables, which depend on more than one input/state variables. Therefore it is important to choose such an encoding of input and internal state symbols that we could obtain maximal set of partitions P (compatible with p or h) whose r-admissibility in relation to Pc is w, where w is the number of address bits of the given ROM block. Appropriate encoding will be determined by generating partitions with the knowledge that radmissibility of a partition P compatible with partition p = (B1; . . .; Bi; . . .; Ba) or compatible with partition h = (B1; . . .; Bi; . . .; Ba) is: r ¼ dlog2 ae þ dlog2 max j dðBi Þe; where Bi is a block of partition p or h, d is the transition function, max jd(B)ie denotes the number of elements in the most populous set d(Bi), i 2 {1, . . ., a}. Because of the one-to-one correspondence between partitions P and p or h, radmissibility of p, h or {p, h} in relation to Pc can be considered. For a given w, the necessary encoding that allows the implementation of the FSM with the use of address modifier can be found in the following way: 1. find r1 = r-admissibility of h(0); find r2 = radmissibility of p(0),
2. if r1 = w (or r2 = w) then a1 = x1, . . ., ax = xx (or a1 = q1, . . ., ax = qq) and further encoding partition are searched among p(0) (or h(0)), 3. if both r1 > w and r2 > w then for subsequent steps h if jVj < jSj or p if jVj > jSj is taken, 4. assume that h was chosen in the previous step; for i = 1, 2, . . . and a = 2mi find h = (B1; . . .; Ba) so that jB1j + jB2j + + jBaj = jVj and whose r-admissibility equals w. In a similar way p = (D1, . . ., Bb), b = 2nj, j = 1,2, . . . are found. The set {p, h} must have radmissibility of w. Partitions p and h can be represented as follows: p ¼ p1 p2 pk ; h ¼ h1 h2 hl ; where k ¼ dlog2 be; l ¼ dlog2 ae: The encoding of the remaining input and state variables can be obtained from the following rules: p1 p2 pk p0 ¼ pð0Þ; h1 h2 hl h0 ¼ hð0Þ; where p 0 and h 0 represent partitions induced by those variables. After all the variables are encoded the process may be considered as a decomposition of the memory block into two blocks: a combinational address modifier and a smaller memory block. Decomposition is computed for partitions: P ðAÞ ¼ p h ¼ p1 p2 pk h1 h2 hl ; P ðBÞ ¼ p0 h0 : The partitions p1, . . ., pk and h1, . . ., hl which were chosen for the state and input symbol encoding and partitions Pi compatible with them correspond to those address lines which are driven by a single variable, either a state variable q or an external variable x, i.e.: a1 ¼ q1 ; . . . ; ak ¼ qk ;
akþ1 ¼ x1 ; . . . ; akþl ¼ xl :
M. Rawski et al. / Journal of Systems Architecture 51 (2005) 424–434
Appropriately chosen strategy of decomposition may allow reducing required memory size at the cost of additional logic cells for address modifier implementation. This makes possible implementation of large FSMs that need more than available memory by making use of the embedded memory blocks and additional programmable logic. Example 4. FSM implementation with concept of address modifier. Let us consider FSM described in Table 2a. This FSM can be implemented using ROM memory with five addressing bits. This would require memory of size of 32 words. In order to implement this FSM machine in ROM with four addressing bits, the address modifier is required. Let us implement given FSM in a structure shown on Fig. 5 with three free variable (a = 3) and one output variable from address modifier (c = 1). To find the appropriate state/input encoding and partitioning the FSMÕs state transition table (the next states are numbered with numbers) is divided into eight sub-tables (encoded by free variables), each of them having no more then two different next states, that can be encoded with one variable—address modifier output variable (to achieve that, rows s3 and s4 changed places with each other). Next the appropriate state and input encoding is introduced (Table 2b). The partitionbased description of this process is given below. Let A = {x1,x2,q1}, B = {q2,q3} P ðAÞ j P F ¼ ðð1Þð6Þ; ð2Þ; ð3; 7Þð4Þ; ð5Þð8Þ; ð9; 13Þ; ð10Þð14Þ; ð11Þð15Þ; ð12Þð16ÞÞ; P ðBÞ ¼ ð1; 2; 3; 4; 5; 13; 14; 15; 16; 6; 7; 8; 9; 10; 11Þ: Following the serial functional decomposition method the partition PG has to be computed. Detailed descriptions of the process can be found in [12]. The decomposition cannot be constructed without adding to set B variable x1 that separates symbol 3 and allows constructing partition PG that satisfied decomposition condition. B0 ¼ fx1 ; q2 ; q3 g;
x1
x2
q1
g
431
G
x1 q2 q3
REGISTER g = x 1q 2 q 3 + q 2 q 3
H q1
q2
q3
Fig. 6. Implementation of FSM from Table 1b using an address modifier.
P ðB0 Þ ¼ ð1;2;3;13;14; 4;5;15; 16;6;7; 8;9;10;11; 12Þ: Finally partition PG is as follows: PG ¼ ð1; 2; 3; 7; 8; 9; 10; 11; 12; 4; 5; 6; 13; 14; 15; 16Þ: This allows to compute the truth table of address modifier. The structure of constructed implementation of given FSM is shown in Fig. 6. Such an implementation requires memory of size of 16 words and additional logic to implement address modifier.
4. Experimental results The proposed method was applied to implement several examples from standard benchmark set in FLEX10K10 devices using ALTERA MAX+PlusII system. In Table 3 a comparison of different FSM implementation techniques are presented. In the column named ROM implementation, the number of bits required to implement a given FSM using ROM is presented. FLEX10K10 device is equipped only with three EAB memory blocks each consisting of 2048 bits. Most of the presented FSM examples cannot be implemented in this device, because their implementation requires much more memory resources than available. In the column falling under FF Implementation label, the number of logic cells required to implement the given FSM in the ‘‘traditional’’ way using flip-flops is given. For this kind of implementation the FSMs have been described
432
M. Rawski et al. / Journal of Systems Architecture 51 (2005) 424–434
Table 3 Comparison of FSM implementation results of standard benchmarks in FPGA architecture (EPF10K10LC84-3 device) Benchmark
ROM Implementation #bits
FF implementation #LCs
AM implementation #LCs
#bits
bbtas beecount d14 mc lion9 train11 bbsse cse ex4 mark1 s1 sse tbk s389 P
160 448 512 224 320 320 22528a 22528a 13312a 10240a 24576a 22528a 16384a 22528a
10 32 60 14 24 25 52 92 28 40 137 52 759b 64
7 14 21 2 1 15 3 2 2 2 96 3 333 9
80 112 256 56 80 8 5632 5632 3328 5120 5632 5632 4093 5632
156608
1389
510
41293
%
100 %
100%
36.7%
26.4%
a b
Implementation not possible—not enough memory resources. Implementation not possible—not enough CLB resources.
as machines of state using Altera Hardware Description Language (AHDL). In the column under AM implementation, the results of implementation of the given FSM using the concept of address modifier are presented. In this approach, the address modifier was implemented using logic cell resources and ROM was implemented in EAB blocks (Fig. 5). The number of logic cells and the number of memory bits are given in the table as results. It can be easily noticed that the application of decomposition improves the quality of ROM as well as flip-flop implementation. The application of address modifier concept allows implementing FSM in such a way that
only about 37% of logic cell resources required in flip-flop implementation and about 27% of memory resources required in ROM implementation is used. This allows implementing all the presented FSMs using available memory and additional parts (address modifier) implemented in CLBs. In Table 4 results of implementation of several ‘‘real life’’ FSMs are presented. Following examples were used in the experiments: • DESaut—the state machine used in DES algorithm implementation, • 5B6B—the 5B–6B coder,
Table 4 Comparison of FSM implementation results in FPGA architecture (EPF10K10LC84-3 device) Example
FF_MAX+PlusII LCs/Bits
Speed [MHz]
LCs/Bits
Speed [MHz]
LCs/Bits
Speed [MHz]
DESaut 5B6B
46/0 93/0 72/0 18/0b
41,1 48,7 44,2 86,2b
8/1792 6/448
47,8 48,0
7/896 –a
47,1 –a
16/16384
–c
12/1024
39,5
Count4 a b c
ROM
Decomposition not possible. FSM described with special AHDL construction. Not enough memory bits to implement the project.
AM_ROM
M. Rawski et al. / Journal of Systems Architecture 51 (2005) 424–434
• count4—4 bit counter with COUNT COUNT DOWN, HOLD, CLEAR LOAD.
UP, and
Each sequential machine was described by a transition table. The results for each method of implementation are presented using the number of logic cells and memory bits required (area of the circuit) and the maximal frequency of clock signal (speed of the circuit). The columns under the FF_MAX+PlusII heading present results obtained by the Altera MAX+PlusII system for the classical flip-flop implementation of FSM. The ROM columns provide the results of ROM implementation; the columns under AM_ROM present the results of ROM implementation with the use of address modifier. Especially interesting is the implementation of the 4-bit counter. Its description with a transition table leads to a strongly non-optimal implementation. On the other hand, its description using a special AHDL construction produces very good results. This is possible due to the fact that this AHDL construction describes the counter as increment by constant, thus optimized adder structure can be used for implementation. The ROM implementation of this example requires too many memory bits (the size of required memory block exceeds the available memory), thus it cannot be implemented in the given structure. Application of the address modifier concept allows reducing the necessary size of memory, and that makes the implementation possible. The performance of the FSMs implemented with the use of address modifier concept is not significantly degraded.
5. Conclusions Despite an acknowledged design productivity gap in which the number of available transistors grows faster than the ability to design them meaningfully, there are methods such as functional decomposition that allow managing complex digital designs by applying well known ‘‘divide and conquer’’ paradigm. The balanced decomposition proved to be very useful in implementation of combinational func-
433
tions using FPGA-based architectures. However, results presented in this paper show that functional decomposition can be efficiently and effectively applied beyond the implementation of combinational circuits. Decomposition can be applied to implement large FSM in an alternate way—using ROM. This kind of implementation requires much fewer logic cells than the traditional flip-flop implementation; therefore, it can be used to implement ‘‘non-vital’’ FSMs of the design, saving LC resources for more important sections of the design. However, large FSMs may require too much buried memory resources. With the concept of address modifier, memory usage can be significantly reduced. The experimental results shown in this paper demonstrate that the synthesis method based on functional decomposition can help in implementing sequential machines using ROM memory. Acknowledgement This paper was supported by Polish State Committee for Scientific Research financial grant for years 2003–2004 number 4 T11D 014 24. References [1] M. Burns, M. Perkowski, L. Jo´z´wiak, An efficient approach to decomposition of multi-output boolean functions with large set of bound variables, in: Proceedings of of the Euromicro Conference, Vasteras, 1998. [2] I. Brzozowski, A. Kos, Minimisation of power consumption in digital integrated circuits by reduction of switching activity, in: Proceedings of the Euromicro Conference, Milan, vol. 1, 1999, pp. 376–380. [3] J.A. Brzozowski, T. Luba, Decomposition of Boolean Functions Specified by Cubes, Research Report CS-97-01, University of Waterloo, Waterloo, Revised, October 1998. [4] S.C. Chang, M. Marek-Sadowska, T.T. Hwang, Technology mapping for TLU FPGAs based on decomposition of binary decision diagrams, IEEE Trans. CAD 15 (10) (1996) 1226–1236. [5] G. De Micheli, Synthesis and Optimization of Digital Circuits, McGraw-Hill, New York, 1994. [6] J. Hartmanis, R.E. Stearns, Algebraic Structure Theory of Sequential Machines, Prentice-Hall, 1966. [7] L. Jozwiak, A. Chojnacki, Functional decomposition based on information relationship measures extremely effective and efficient for symmetric functions, in: Proceedings of the Euromicro Conference, Milan 1, 1999, pp. 150– 159.
434
M. Rawski et al. / Journal of Systems Architecture 51 (2005) 424–434
[8] V.N. Kravets, K.A. Sakallah, Constructive library-aware synthesis using symmetries, in: Proceedings of Design, Automation and Test in Europe Conference, 2000. [9] T. Luba, Multi-level logic synthesis based on decomposition, Microprocessors Microsyst. 18 (8) (1994) 429–437. [10] T. Luba, K. Gorski, L.B. Wronski, ROM-based finite state machines with PLA address modifiers, in: Proceedings of EURO-DAC, Hamburg, 1992. [11] T. Luba, H. Selvaraj, M. Nowicka, A. Krasniewski, Balanced multilevel decomposition and its applications in FPGA-based synthesis, in: G. Saucier, A. Mignotte (Eds.), Logic and Architecture Synthesis, Chapman & Hall, 1995. [12] T. Luba, H. Selvaraj, A general approach to boolean function decomposition and its applications in FPGAbased synthesis, VLSI Des. 3 (3–4) (1995) 289–300, Special Issue on Decompositions in VLSI Design. [13] T. Luba, M. Nowicka, M. Rawski, Performance-oriented synthesis for LUT-based FPGAs, in: Proceedings of Mixed Design of Integrated Circuits and Systems, Lodz, 1996, pp. 96–101. [14] T. Luba, C. Moraga, S. Yanushkevich, M. Opoka, V. Shmerko, Evolutionary multilevel network synthesis in given design style, in: Proceedings of IEEE 30th International symposium on Multiple-Valued Logic, 2000, pp. 253–258. [15] J. Qiao, M. Ikeda, K. Asada, Optimum functional decomposition for LUT-based FPGA synthesis, in: Proceedings of the FPLÕ2000 Conference, Villach, 2000, pp. 555–564. [16] M. Rawski, L. Jozwiak, M. Nowicka, T. Łuba, Nondisjoint decomposition of boolean functions and its application in FPGA-oriented technology mapping, in: Proceedings of the EUROMICROÕ97 Conference, IEEE Computer Society Press, 1997, pp. 24–30. [17] T. Ross, M. Noviskey, T. Taylor, D. Gadd, Pattern Theory: An Engineering Paradigm for Algorithm Design, Final Technical Report, Wright Laboratories, WL/AART/ WPAFB, 1991. [18] B. Zupan, M. Bohanec, Experimental Evaluation of Three Partition Selection Criteria for Decision Table Decomposition, Research Report, Department of Intelligent Systems, Josef Stefan Institute, Ljubljana, 1966. [19] B. Zupan, M. Bohanec, Learning Concept Hierarchies from Examples by Functional Decomposition, Research Report, Department of Inteligent Systems, Josef Stefan Institute, Ljubljana, 1966.
Mariusz Rawski received the M.Sc. degree in electronics engineering from the Warsaw University of Technology, Poland, in 1995. He received his Ph.D. degree from the same University in 1990. Currently he is an Assistant Professor in the Department of Electronics and Information Technology at the Warsaw University of Technology. His research interests include logic synthesis of digital circuits, CAD tools for logic synthesis and optimization, compilers for Programmable Logic Devices. Dr. Rawski has authored numerous publications in the area of Logic Synthesis. Within the European TEMPUS program he made visits to TIMA-CMP, France, and Eindhoven University of Technology, Netherland.
Henry Selvaraj earned his MasterÕs and Ph.D. from Warsaw University of Technology in 1986 and 1994 respectively. During 1994–1998 he was a member of the Faculty of Computing and Information Technology, Monash University, Melbourne, Australia. Since 1998 he is a faculty member in the Department of Electrical and Computer Engineering, University of Nevada, Las Vegas, USA. His research interests are logic synthesis, programmable logic devices, digital signal processing and systems engineering. He has published more than 80 scientific papers in refereed journals and proceedings.
Tadeusz Łuba received the M.S. degree in electronics engineering from the Warsaw University of Technology, Poland, in 1971. He received his Ph.D. and D.Sc. degrees from the same University in 1980 and 1988 respectively. In 1972 he joined the Institute of Telecommunications of the Warsaw University of Technology, where he is a Professor since 1993. Currently he is the Head of Telecommunications Fundamentals Department with the Institute of Telecommunications at WUT. His research interests include switching theory, CAD tools for logic synthesis and optimization, compilers for Programmable Logic Devices, logic synthesis and testing of digital circuits. Prof. Łuba has authored numerous publications in the area of Logic Synthesis. He is a member of Programme Committees of several international and national conferences and a Board of Directors member of EUROMICRO.