Author’s Accepted Manuscript Synthesis of Dual Mode Logic Lior Moyal, Itamar Levi, Adam Teman, Alexander Fish
www.elsevier.com/locate/vlsi
PII: DOI: Reference:
S0167-9260(16)30036-0 http://dx.doi.org/10.1016/j.vlsi.2016.07.004 VLSI1229
To appear in: Integration, the VLSI Journal Received date: 16 September 2015 Revised date: 7 June 2016 Accepted date: 28 July 2016 Cite this article as: Lior Moyal, Itamar Levi, Adam Teman and Alexander Fish, Synthesis of Dual Mode Logic, Integration, the VLSI Journal, http://dx.doi.org/10.1016/j.vlsi.2016.07.004 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
1
Synthesis of Dual Mode Logic Lior Moyal, Itamar Levi, Student Member, IEEE, Adam Teman, Member, IEEE, and Alexander Fish, Member, IEEE
Abstract—In recent years, the major focus of VLSI design has shifted from high-speed to low-power consumption. While standard CMOS-based digital design provides substantial flexibility during pre-silicon design phases, the characteristics of the gates are set by fabrication variations and environmental conditions and cannot easily be changed at runtime. The recently proposed Dual Mode Logic (DML) family provides a novel approach to providing this capability by introducing two configurable operating modes, static and dynamic, that enable fine-grained control of the power-performance tradeoff of a logic path. However, the introduction of a new topology requires the development of both a design methodology and techniques for integration in a robust design automation flow. Standard synthesis tools do not support dynamic gates, and in particular, dual-characteristic gates. Therefore, until now, DML has been limited to small, custom-made blocks and components. In this paper, we present a novel approach for the integration of DML into standard electronic design automation tools, as part of the standard digital design flow. The development of this approach and the accompanying design methodology enables DML to be used in larger designs, such as state-of-the-art, high-speed and/or low-power SoCs. We demonstrate the employment of the proposed approach in order to benefit from DML properties, and reduce the power consumption, while simultaneously improving the operating frequency of a number of test designs.
I. I NTRODUCTION For many generations during the evolution of integrated circuits (ICs), the quest for high-frequency operation was at the forefront of system design. Many different approaches and techniques to accelerate the critical path were introduced and experimented with, including several digital logic families. One of the fastest and most popular topologies was the dynamic Domino Logic family that was carefully integrated into datapaths and other critical elements of state-of-the-art, high-performance systems. However, with the continuation of technology scaling below the micron and into the deepnanoscale era, speed gave way to other factors, such as power and reliability, as the primary focuses of VLSI design. In the past decade, static CMOS digital logic has almost entirely displaced alternative logic families as the sole topology used in digital designs, trading-off speed for simple integration, robustness, and low power consumption. Manuscript received September 16, 2015. Manuscript revised January 26, 2016. L. Moyal is with the VLSI Systems Center, Ben-Gurion University of the Negev, P.O. Box 653, Beer Sheva, 84105. (phone: 972-8-647-7157; fax: 9728-6477620; e-mail:
[email protected]). I. Levi, A. Teman and A. Fish are with the Emerging Nanoscaled Integrated Circuits (EnICS) Labs of the Faculty of Engineering, Bar-Ilan University, Ramat Gan (e-mail:
[email protected],
[email protected],
[email protected]). A. Teman is also with the Telecommunications Circuits Laboratory (TCL) of the Institute of Electrical Engineering, EPFL, Lausanne, VD, 1015 Switzerland.
This trend of exclusively static CMOS digital design has spanned several nano-scaled technology processes. However, the increasing ambition for power savings has led to continuous efforts to scale the supply voltage – often considered the most efficient way to reduce both static and dynamic power consumption. But as the supply voltage drops, performance is exponentially degraded, and maybe even more problematic. In order to provide a solution to these modern challenges, Dual Mode Logic (DML) was recently proposed as an alternative to traditional CMOS [1], [2]. This logic family provides the ability to change the performance/power characteristics of each individual gate by controlling its mode of operation. In dynamic mode, a DML gate provides a speed advantage over CMOS at the expense of increased dynamic power, and in static mode, speed is relinquished for low power consumption. By switching between these two options at run-time, a system can adapt to the process, to the data-dependency and to the environment variations, as well as to the current powerperformance requirements of the application. However, the introduction of a new logic family does not come free of charge. Virtually all steps of the standard digital design flow (SDDF) assume that all digital components are static CMOS or compliant with the features of static CMOS, and based on this assumption, highly automated electronic design automation (EDA) tools are able to place millions of gates on a single die and quickly analyze their performance. Accordingly, any new logic family must propose solutions for integration with existing tools and design flows to accommodate smooth implementation within existing processes. Logic synthesis, which is the process of converting a highlevel hardware description language (HDL) into a gate-level netlist (GTL), is one of the primary components of the SDDF. Current synthesizers, which map behavioral register-transfer level (RTL) code to specific standard cells, only support CMOS compliant gates, featuring static output levels driven by low-resistance devices, high-resistance capacitive inputs, and asynchronous combinational logic. Dynamic logic solutions, such as DML, are characterized by different features and cannot be straightforwardly synthesized with current tools. Several past attempts have proposed methods to integrate dynamic logic families into the SDDF [3]–[7]. Yee, et al. [3] proposed a method to synthesize dynamic logic, which along with the studies in [4] and [5], describes a CMOS based solution with complex timing races that need to be addressed. Chappel, et al. [6] introduced a system-level solution for integrating Domino Logic into the SDDF, followed by a particular solution for synthesis, proposed by Parmar [7]. However, none of these approaches consider logic, such as DML, that can function with several performance characteristics at the same operating corner, nor do they take into account any of the
2
clk PUNComplementary Network
PUNEvaluation Network
PDNEvaluation Network
PDNComplementary Network
PUNComplementary clk Network
PDNEvaluation Network
clk
clk
(a) Type-A Topology
(b) Type-B Topology
Fig. 1. Topologies of basic (unfooted) DML gates
constraints required for correct DML functionality. In this paper, we propose a novel approach for synthesizing DML, as a first and fundamental step in full SDDF integration. This approach manipulates standard synthesis tools and introduces an approach to optimize the design while considering the flexibility of DML and meeting all the constraints introduced by DML. By following this methodology, DML can reap the benefits of the well-studied and highly-optimized algorithms, already implemented in standard synthesis tools, while allowing further optimization specifically for DML in a smooth integration process. In order to demonstrate the approach, more than 20 large benchmark circuits were synthesized in a 40 nm CMOS process and mapped to a DML library. The results show an improvement of up to 17% in timing, with as much as 15% power reduction for large designs. Contributions: The contributions of this paper can be summarized, as follows: •
•
•
A fully automated design flow for DML is presented for the first time, enabling seamless integration into the SDDF, rendering this new logic family applicable to modern large-scale designs. A methodology for using DML to reduce the delay of combinational logic is introduced, in order to improve the achievable frequency of a given design, in cases where this is required. The proposed approach is further shown to be effective in using DML to reduce the power consumption of highspeed components without any additional changes to the implementation architecture, when performance can be sacrificed1 .
The rest of this paper is constructed as follows: Section II provides a brief overview of the DML logic family. Section III presents the constraints required for conducting synthesis with DML libraries and the proposed synthesis methodology. Section IV presents the implementation results obtained from synthesizing logic designs into DML, comparing timing, area and power results to those obtained by standard synthesis to CMOS. Finally, conclusions are provided in Section V. 1 Note that a designer can control and switch between the high-performance and low energy modes on the fly.
clk
PUNEvaluation Network
PDNComplementary Network
clk
(a) Type-A footed topology (b) Type-B footed topology Fig. 2. Topologies of footed DML gates
II. D UAL M ODE L OGIC The Dual Mode Logic family was originally proposed in [8]–[10], as a combination of static CMOS and dynamic logic, by adding a precharge or pre-discharge transistor to the output of a static CMOS gate. These two topologies of a basic DML gate, referred to as Type-A and Type-B, are illustrated in Fig. 1. Switching between the two modes of a DML gate is done simply by gating or enabling the clk signal. The slower, power-saving, static mode, is set by gating the clk signal, thereby cutting off the precharge (pre-discharge) device, and essentially transforming the DML gate into its static CMOS counterpart. By enabling the clk signal, the gate switches into its high-speed dynamic mode, initially precharging (predischarging) the output during the high phase of the clock and evaluating the result during the low phase. Since the final state of the output is determined exclusively by the pull-down network (PDN) in a Type-A gate and by the pull-up network (PDN) in a Type-B gate, this network is referred to as the evaluation network. The other network is only relevant during static operation, providing the complementary logic state, and is therefore referred to as the complementary network. An extension of the basic DML topology, known as footed DML, is shown for both Type-A and Type-B in Fig. 2. In this configuration, an additional transistor is connected in series to the evaluation network (the PDN for Type-A or the PUN for Type-B). This device, referred to as the footer, ensures a full level at the output by eliminating short-circuit currents during the precharge phase, while decreasing the time required for this phase by eliminating the ripple effect of the precharge level advancing through cascaded gates. However, footed DML gates have a larger silicon footprint and longer propagation delays, and therefore should be used sparingly. The performance benefit of the DML topology over its CMOS counterpart is clear; similar to domino logic, DML gates in dynamic mode provide one pre-evaluated (“zero propagation delay”) output state, and therefore can be sized for fast evaluation of the other state. On the other hand, standard CMOS requires optimization of both the PUN and PDN for timing, with the upsizing of one network contributing output capacitance or “self loading” to the other. Due to the fact that DML static mode is, by definition, a low-speed mode, the
3
DD
SET SET
CLR CLR
QQ
QQ
CMOS DFFs
CMOS Combinational block
DML Combinational block
DD
SET SET
CLR CLR
QQ QQ
CMOS DFFs
CMOS Combinational block
DML complementary networks
clk
clk DML Combinational block
in
out1
clk Fig. 3. Combinational block between two registers with DML and CMOS paths
clk Type-A DML inverter
Type-A DML inverter
out1
in
clk
(a)
out2
complementary network can be sized minimally for a smaller silicon footprint and lower power consumption. In this way, a DML gate operated in dynamic mode provides a speed benefit over CMOS, while when toggled to static mode, it has lower switching capacitance and lower leakage than an equivalent CMOS gate. Note that in the current discussion, only combinational logic is realized with the DML approach. In other words, the use of DML assumes standard RTL design with the inference of CMOS registers that separate paths of combinational logic. This approach enables the seamless integration of DML and static CMOS combinational blocks within a single system, as shown in Fig. 3, such that if the proposed methodology does not yield an efficient solution for implementing a given path with DML gates, a design time decision can be made to revert to static CMOS for a given path.
out2
1 0.5 0 0.5 1 0.5 0 0.5 1 0.5 0 0.5 1.2
1
1.5
2
2.5 x 10
1
1.5
2
Dynamic Logic 1 1.5 DML Cascading replenishment
2.5 x 10
DML 2
−10
−10
2.5 x 10
−10
1 0.8 0.5
1
1.5
Dynamic Logic Cascading problem 2 2.5 −10 Time [s] x 10 (b)
III. P ROPOSED A PPROACH
Fig. 4. Two cascaded Type-A inverters and resulting waveforms. The voltage droop at OUT2 causes a degradation in performance and increased power consumption, and therefore should be avoided.
The synthesis of RTL to a standard CMOS library is a well-known procedure, and the constraints required to produce a logically equivalent netlist that meets the design requirements are implemented in various commercially available EDA tools. The most common form of synthesis optimizes synchronous paths between two sequential elements (such as those illustrated in Fig. 3), ensuring that both max-delay (setup) and min-delay (hold) timing requirements are met. A similar approach is used by DML, employing static CMOS sequential elements and requiring that the paths between these sequentials meet standard setup and hold constraints. However, similar to dynamic logic, the efficient operation of DML in dynamic mode relies on the precharge (PC) phase to drive the output to a pre-determined level that should, at the most, only change once during the evaluation phase, and therefore requires additional mapping constraints. To demonstrate the cascading problem that characterizes standard dynamic logic and is also one of the hazards that could occur due to inefficient DML mapping, Fig. 4 shows two cascaded footed Type-A inverters. The complementary network PMOS devices are circled to denote that they exist only in DML gates, as opposed to standard PDN based dynamic logic. During the PC phase, both output nodes are charged to ‘1’ for both DML and dynamic logic, as this is the precharge value for the Type-A topology. Therefore, with the inception of the evaluation phase, when the clk signal rises, both gates have the high supply voltage (VDD ) at their input and their outputs start to discharge, as shown in the waveforms of Fig. 4(b). For the dynamic gate topology, this is a functional hazard,
as the second output (out2) will stop discharging only when its input (out1) drops below the transistor threshold voltage (VTH ), and since traditional dynamic logic gates have no PUN in the evaluation stage, the output node cannot be charged back to VDD . This functional risk does not occur in DML, since the complementary PUN of a DML gate will charge the output back to VDD . Such a charging process, however, will result in a longer transition due to the fact that the PUN was chosen to be minimal. Moreover, this transition will consume more energy, and therefore should be avoided. To eliminate unwanted phenomena, such as the one shown in the above example, two types of path connections must be considered when performing DML synthesis: 1) DML gates driven by other DML gates, and 2) DML gates driven by primary inputs (i.e., CMOS register outputs and macro port inputs). The following subsection will define three constraints that take these scenarios into consideration, referred to as correct precharge (CPC), footed gates (FG), and single transition requirement (STR), respectively. Accordingly, the proposed synthesis methodology will be presented thereafter. To eliminate performance and power degrading phenomena, such as the above example, two types of path connections must be considered when performing DML synthesis: 1) DML gates driven by other DML gates, and 2) DML gates driven by primary inputs (i.e., CMOS register outputs and macro port inputs). The following subsection will define three constraints that take these scenarios into consideration, referred to as cor-
4
Type A DML Gate
In[0]
1
V
Type A DML Gate
Type B DML Gate
W
In[1]
Z
Y
0
X
In[2]
Type B DML Gate
TypeA
TypeA
TypeB footed
(a)
clk
M0
Complementary Network
TypeB footed
M1
clk
M4
1
1
clk In[2:0] 011 W X Y Z
010
Glitch
M2
Fig. 6. Single transition hazard example. Evaluation Network
0 M3
(b)
(c)
Fig. 5. CMOS based DML gates in the PC phase. (a) NAND2 DML gate driven by an input vector of ‘10’. (b) CMOS based Type-A DML building blocks with correct precharge. (c) Type-A NAND gate with correct precharge.
rect precharge (CPC), footed gates (FG), and single transition requirement (STR), respectively. Accordingly, the proposed synthesis methodology will be presented thereafter. A. Synthesis Constrains 1) Correct Precharge (CPC): A well-researched limitation of dynamic logic design is that during the PC phase, the output of the previous stage may partially or completely discharge the precharge level prior to evaluation. NORA or n-p dynamic logic [11] fixes this problem by always cascading oppositely precharged gates, thereby always cutting off the evaluation network during the PC phase. A similar approach when cascading DML gates that are driven entirely by preceding DML stages, thereby limiting the latency and power overhead of incorrect precharge, as mentioned in Section II. In order to ensure that a DML gate starts the evaluation clock phase with a strong output level, the evaluation network must be cut off during the PC phase. If the previous stages are all DML gates, then their state during the PC phase is pre-determined and known: Type-A gates precharge the output to ‘1’ (VDD ) and Type-B gates pre-discharge the output to ‘0’. Determination of the precharge input vector of a given gate will lead to the selection of the the gate type (Type-A or TypeB): the network that is cut off by this input vector will be chosen to be the evaluation network and the network that is enabled will be chosen to be the precharge network. A simple example of the decision on the type of DML gate is given in Fig. 5(a), showing a DML NAND gate that is driven by one Type-A and one Type-B DML gate. During the PC phase, the Type-A gate provides a ‘1’ and the Type-B gate
provides a ‘0’ to the cascaded NAND gate. With this precharge state (‘10’ or ‘01’), the structure of the NAND dictates that its PUN is conducting its PDN is cut off. Therefore, we will choose the PUN to be the precharge network and the PDN to be the evaluation network, which is the definition of a Type-A DML gate, as shown in Figs. 5(b) and 5(c). An easy way to figure out what gate should be used is that the gate type should match its precharge logic function, with Type-A matching a ‘1’ and Type-B matching a ‘0’. For the NAND example, given above, a ‘10’ input to a NAND is resolved to ‘1’, matching a Type-A gate. A ‘11’ input vector, on the other hand, would resolve to a ‘0’, leading to the choice of a Type-B DML NAND gate. 2) Footed Gates (FG): By adhering to the previous constraint, a strong precharge level is ensured when cascading DML gates. However, one of the basic hypotheses of digital design with DML is that only combinational elements are dual-mode, while sequential elements are realized with standard CMOS gates. As opposed to DML driven inputs, the state of CMOS driven inputs is unknown during the PC phase, as is the state of macro input ports. Therefore, a conducting evaluation network could occur, depleting the precharged output level before the start of evaluation. The solution to this problem is simply to use footed gates to implement the first DML stage following primary inputs (CMOS logic or input ports). This will ensure that the evaluation network is cut off during the precharge clock stage at the expense of an extra serially connected device. 3) Single Transition Requirement (STR): According to the CPC constraint, the DML gate type will be chosen to match the determined input vector in the PC phase. This resolves standard cascading issues, such as the one demonstrated in Fig. 4, but does not take signal glitching into account. Due to the inherently unbalanced delays between the PDN and PUN of DML gates [1], there is a high glitching probability on internal combinatorial nodes prior to stabilization at their final state. If such a node temporarily toggles to the wrong value and has to be replenished, this will result not only in wasted power, but also in a significant delay penalty
5
In[0]
in1 in2
W
NAND 2
in1 in1
in2
1 in2
01
TypeA
(a) Example logic block, including precharge values and selected gate types.
in2
00
10
in1
1
1
in2
11
in1
TypeB
Z
in2
TypeAfooted
in1
1
X
In[2]
10
1
in1
Y
00
in2
TypeAfooted
In[1]
0
Fig. 7. NAND2 single transition test.
01
1 if the node has to be charged or discharged through the nonoptimized network of its gate type. If the output of a DML gate switches more than once, the main benefit of using dynamic DML is lost. An example of such a hazard is illustrated in Fig. 6. In the example, the first stage of NAND gates were chosen to by Type-B, and according to the FG constraint, footed gates were used. Thereafter, both the inverter and the output NAND were determined to be Type-A gates, according to the CPC constraint, as described above. Now assume that the 3 bit input, in[2:0], transitions from 011 to 010, as illustrated in the accompanying waveforms of Fig. 6. The shorter path to internal node W will transition before the path to internal node Y, causing an unwanted temporary glitch to ‘0’ on the output node Z. In the above example, the selected configuration of DML gates (starting with the choice of Type-B NAND gates at the first stage), would lead to an STR violation, and therefore, this configuration should be disqualified. However, most arbitrary circuits have at least one configuration that will not violate the STR constraint. The following procedure provides an algorithmic means for testing the adherence to this constraint, and it should be applied to every DML gate in the design. To carry out the STR check, the truth table of each logic gate will be represented with a state diagram, with each input vector defining a state (i.e., 2N states for a gate with a fanin of N ). Each state is assigned an output value, which is the gate output given the relative input vector, and the transitions between the states represent the transition of a single input. For example, the state diagram of a 2-input NAND gate is shown in Fig. 7. In order to test the STR constraint, the initial state will be the input vector due to precharge, and thereafter, the possible transitions should be followed to see if a given order of arrivals of inputs would cause an output glitch. While the example of Fig. 6 demonstrated an STR hazard,
(b) State transition diagram, showing glitch free operation. Fig. 8. Example of first-stage gate selection and single transition constraint checking.
the following example will demonstrate a successful configuration. In this case, the first stage gates will be implemented with Type-A footed NANDs, leading to the choice of a Type-B inverter and a Type-A output NAND, according to the CPC constraint, as illustrated in Fig. 8(a). The precharge input vector of the output NAND, in this case, is [in1,in2]=10, which is, respectively, the initial state of the state diagram of Fig. 7. The only applicable transition that could potentially cause a glitch is the input arrival order in1→in2, since the path to in2 is longer than the path to in1. Therefore, the state machine will be used to follow state 10 through 00 (in1 changes from ‘1’ to ‘0’) to 01 (in2 changes from ‘0’ to ‘1’). Following the state diagram, shows that the output of the NAND gate remains at ‘1’ while traversing through these states, indicating that the STR constraint is met. In this case, all three constraints (CPC, FG, and STR) are met, and therefore this configuration is valid for efficient DML implementation. Choosing a one Type-A and one Type-B gate for the first stage of NANDs would have also provided valid configurations. A methodology for implementing a design with DML logic and validating these constraints is presented hereafter. B. Proposed Methodology In order for DML to be compatible with the standard digital design flow, a methodology must be created to generate a DML netlist, starting with a high-level HDL. This methodology should take into consideration all the requirements shown in the previous section and convert a CMOS netlist to a DML netlist. The gates in the netlist do not change; the only purpose
6
No
Read Netlist
Find Firststage gates
Initialize firststage gates
Capacitance and timing calculation
Determine all gate types
Constraints met? Yes
Write Netlist
Fig. 9. Approach flowchart
of this process is to choose their DML type: Type-A or Type-B, and whether the gate is footed or not. DML standard library cells being in use should be fully characterized to include all energy components including the energy of the clocked elements. Furthermore, suitable setup and hold delay characterization are needed to verify both timing and DML constraints (CPC, FG, and STR). The proposed methodology consists of the following stages, as illustrated in Fig. 9: 1) Determine the first-stage gates. These are the gates that have inputs that originate from primary inputs (sequential elements and macro inputs). These gates are implemented with footed Type-A or Type-B DML gates, according to the FG constraint. The first-stage gates can be easily found by following the outputs of the sequential elements and primary inputs of the design. 2) Choose a DML type for each of the first-stage gates. For N first-stage gates, there are 2N possible solutions to choose their DML types. 3) Derive the DML type of the internal gates by advancing through the netlist, stage by stage. The DML type of each gate is determined according to its logic function and the inputs at the PC phase to meet the previously described CPC constraint. 4) Calculate arrival times. In order to make sure all DML constraints are met, timing for each node in the netlist is calculated with standard static timing analysis (STA) methods. Capacitance and timing data are taken from DML timing libraries, characterized for the dynamic mode of operation. 5) Check glitching requirements. Every internal gate must be checked to ensure that its output cannot make more than one transition. This is done by advancing through the gate state diagram, as previously described for the STR constraint. 6) If all constraints are met, a new GTL netlist is created, containing DML gates. Otherwise, the current configuration is discarded, and a different choice of firststage gate types is made, returning to stage 3 of the flow. Due to the large number of possible first-stage combinations, a divide and conquer heuristic is employed to eliminate redundant possibilities, and thereby significantly lower the necessary runtime for finding a feasible solution. Large logic blocks are divided into smaller components and the constraints are applied to these sub-blocks. This is demonstrated in the
1
a
3
b
2
c d
Fig. 10. Divide and conquer - Logic divided to sub-netlists Standard Cell Library
RTL
Synthesize
static .lib
dynamic .lib
Static Timing Analysis
Fig. 11. Simulation flowchart
example of Fig. 10, where a logic block is divided into three sub-netlists. The two inner sub-blocks (marked ‘1’ and ‘2’) are equivalent to the network of Fig. 8(a), which has only three possible solutions: AA, AB, and BA, as previously shown. Therefore, all other possible combinations can be omitted from the constraint checking of the higher level module (marked ‘3’ in Fig. 10). This reduces the number of possible input combinations for module ‘3’ from 24 to only 32 possibilities. Similar reductions enable application of the proposed methodology to large designs. IV. I MPLEMENTATION R ESULTS Previous demonstrations of DML were limited to small logic blocks or hand-synthesized on individual larger components for proof-of-concept. The proposed algorithm enables automated integration of DML in larger designs. This section demonstrates the application of the proposed approach on several designs with varying size and complexity, in order to test the performance of the proposed approach when applied to real-life designs and problematic netlists. The flow was implemented in Perl, and the resulting code will be made available on the lab website.
7
2.2
DML Dynamic Timing DML Static Timing 2
1.5
1
Power (normalized to CMOS)
Delay (normalized to CMOS)
2.5
2
DML Dynamic Power DML Static Power
1.8 1.6 1.4 1.2 1 0.8 0.6
0.5
0.4
Benchmark
(a) Delay (normalized to static CMOS)
Benchmark
(b) Power (normalized to static CMOS)
Fig. 12. Timing and power comparisons of DML static and dynamic modes under 12 test-case benchmark circuits.
A. Simulation Methodology The simulation methodology for evaluating the performance of the proposed approach is illustrated in the block diagram of Fig. 11. Several high-level RTL benchmarks of varying size and complexity were used for this evaluation, including some ISCAS’89 benchmarks. Each benchmark was synthesized using Cadence RTL Compiler (RC) and mapped to a 40 nm CMOS standard cell library to produce a CMOS gate-level (GTL) netlist. Each design was over-constrained in order to achieve the lowest possible delay (highest frequency) achievable with a standard CMOS library. The resulting GTL netlist was used as the input to the proposed algorithm, which was applied to produce a DML GTL netlist. The resulting netlist was loaded back into RC and Synopsys PrimeTime (PT) in order to perform STA and analyze the results. The results for the two modes of DML operation were compared with the results of the standard CMOS implementation before the application of the proposed algorithm. It is important to point out that since the concept of DML is new, standard library characterization flows are not prepared to correctly characterize such gates. Therefore, a specialized characterization flow was developed in collaboration with Dolphin Integration based on the Dolphin Smash characterization tool to produce the Liberty timing files (.lib) that enable integration with standard EDA tools. The DML library used in this work was developed in-house and characterized with this tool, following design, verification, and layout achieved with Cadence Virtuoso. In particular, two separate libraries were constructed: a static mode library and a dynamic mode one. Proprietary setup and hold timing arcs were defined for each gate, and their characterization was extracted in addition to standard CMOS-like timing arcs. By implementing this approach, the presented results take into account all aspects of power and performance evaluation, including those attributed to clocking and precharge, which do not occur in combinational CMOS gates. The outputs of this characterization process were verified through comparison with Spectre
simulations on selected logic paths. B. Simulation Results To evaluate the proposed methodology, each benchmark circuit was first synthesized targeting minimum delay (maximum frequency) with static CMOS libraries. The power, performance, and area results of the static CMOS implementation were extracted for reference comparison with the DML implementations. Thereafter, the static CMOS GTL netlist was used as the input to the proposed algorithm and mapped to DML libraries targeting the high performance dynamic mode. The resulting timing and power characteristics of twelve of the evaluated benchmarks are presented in Fig. 12. The expected superiority of dynamic DML over CMOS and the delay penalty for operating in static mode are clearly displayed in Fig. 12(a). Similarly, the power saving benefit of operating in static DML mode over CMOS and the power penalty for operating in dynamic mode are shown in Fig. 12(b). However, this figure shows that the efficiency of using DML has a strong dependence on the characteristics of the underlying circuit. For example, using DML for implementation of the 8b-MUX design is clearly a bad choice, as even in static mode, this benchmark is inferior to CMOS. Therefore, an additional analysis could provide better insight into the effectiveness of using DML in a given design. An extended analysis is given in Fig. 13, which plots the dependence of the achievable performance, area, and power improvements of DML, as a function of the length of the logic path that it is used in. To extract this data, all of the logic paths in the implemented benchmarks were categorized according to the number of stages from startpoint to endpoint and the average delay, energy, and area of these paths was compared with their CMOS equivalents2 . These results clearly show that the achievable speed improvement through operating DML in dynamic mode mono2 The evaluation of both static and dynamic DML mode was applied to the same synthesized netlist, targeted at dynamic mode.
8
20%
for deep logic paths with the ability to switch to an energyefficient mode to save over 10% in energy. This is achieved with smaller silicon footprints, providing an 10% average area reduction. The integration of DML into the standard digital design flow will encourage the usage of this logic family in today’s IC industry.
15%
10% 5%
R EFERENCES 0%
(DML Dynamic/CMOS) Timing (DML Static/CMOS) Power (DML/CMOS) Area
-5% -10%
5 6 7
11
13
15 16
19
24 25
Logic Depth Fig. 13. Timing, power and area improvement of DML compared to CMOS
tonically improves with path depth, reaching as high as 17% for paths of 25 stages. This is expected, as the first stages of all DML paths are realized with footed gates, which are slower than standard DML gates, and the effect of these gates is more significant for shorter paths. Utilizing the static mode of DML, energy-efficiency rapidly increases, with the power reduction passing 10% for paths with over 6 stages. This, of course, makes sense due to the inherent penalty of the footed gates, which is significant for short logic paths. The smaller footprint of DML gates also provides an area reduction for paths with more than 6 stages, stabilizing at an average improvement of approximately 10% for paths with 19 or more stages. This, of course, reflects the inherent efficiency of DML in constructing gates with smaller capacitance and superior performance. In conclusion, DML is clearly shown to provide an advantage over traditional CMOS in the major aspects of digital design. By automating the integration of DML into the SDDF along with the design of smart controls to switch between the static and dynamic modes on-the-fly, DML can enhance all traditional challenges of digital design.
[1] I. Levi and A. Fish, “Dual mode logic - design for energy efficiency and high performance,” Access, IEEE, vol. 1, pp. 258–265, 2013. [2] I. Levi, A. Belenky, and A. Fish, “Logical effort for CMOS-based dual mode logic gates,” Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 22, no. 5, pp. 1042–1053, May 2014. [3] G. Yee and C. Sechen, “Dynamic logic synthesis,” in Custom Integrated Circuits Conference, 1997., Proceedings of the IEEE 1997. IEEE, 1997, pp. 345–348. [4] A. Pal and A. Mukherjee, “Synthesis of two-level dynamic CMOS circuits,” in VLSI’99. Proceedings. IEEE Computer Society Workshop On. IEEE, 1999, pp. 82–92. [5] D. Samanta, A. Pal, and N. Sinha, “Synthesis of high performance low power dynamic CMOS circuits,” in Proceedings of the 2002 Asia and South Pacific Design Automation Conference. IEEE Computer Society, 2002, p. 99. [6] B. Chappell, P. Saxena, J. Vendrell, X. Wang, P. Patra, M. Venkateshmurthy, S. Jain, H. Krishnamurthy, S. Hussain, S. Gupta et al., “A system-level solution to domino synthesis with 2 GHz application,” in 2012 IEEE 30th International Conference on Computer Design (ICCD). IEEE Computer Society, 2002, pp. 164–164. [7] D. M. Parmar, M. Sarma, and D. Samanta, “A novel approach to domino circuit synthesis,” in VLSI Design, 2007. Held jointly with 6th International Conference on Embedded Systems., 20th International Conference on. IEEE, 2007, pp. 401–406. [8] I. Levi, O. Bass, A. Kaizerman, A. Belenky, and A. Fish, “High speed dual mode logic carry look ahead adder,” in Circuits and Systems (ISCAS), 2012 IEEE International Symposium on, May 2012, pp. 3037– 3040. [9] I. Levi, A. Albeck, A. Fish, and S. Wimer, “A low energy and high performance rmDM 2 adder,” Circuits and Systems I: Regular Papers, IEEE Transactions on, vol. 61, no. 11, pp. 3175–3183, Nov 2014. [10] A. Kaizerman, S. Fisher, and A. Fish, “Subthreshold dual mode logic,” Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 21, no. 5, pp. 979–983, May 2013. [11] Y. Ji-Ren, I. Karsson, and C. Svenson, “A true single-phase-clock dynamic CMOS circuit technique,” IEEE Journal of Solid-State Circuits, vol. 22, no. 5, pp. 899–901, 1987.
V. C ONCLUSION The DML family was proposed to provide advantages over CMOS for both the application and the designer. Having two modes of operation, which can be switched on-the-fly, allows the application to operate at high frequencies when higher processing power is needed, and save energy when the performance can be relaxed. While these benefits have previously demonstrated on several custom-designed blocks, porting DML to any generic design was impractical, due to lack of compatibility with the standard digital design flow. In this paper, we proposed the first methodology for performing DML-compatible logic synthesis to efficiently map an RTL design onto a DML standard cell library. A fully automated DML synthesis flow, compatible with standard tools and library characterization formats was proposed and applied to numerous benchmark circuits. The resulting circuits were evaluated for performance, power consumption, and area requirements in both static and dynamic modes of operation, showing average performance improvements as high as 17%
Lior Moyal earned his B.Sc. degrees in Electrical and Computer Engineering in 2013 from BenGurion University, and is now pursuing his M.Sc. degree in electrical engineering at Ben-Gurion University. His current work is a design verification engineer at Apple.
Itamar Levi earned his B.Sc. and M.Sc. degrees in Electrical and Computer Engineering as a part of a direct excellence student track from Ben-Gurion University in 2012 and 2013, respectively. Currently, he is pursuing his Ph.D. degree in electrical engineering at Bar-Ilan University. His current research interests are hardware security and cryptography.
9
Adam Teman received the B.Sc. degree in Electrical Engineering from Ben-Gurion University, Be’er Sheva, Israel, in 2006. He worked as a Design Engineer at Marvell Semiconductors from 2006 to 2007, with an emphasis on Physical Implementation. He completed his M.Sc. at Ben-Gurion University in 2011 and his Ph.D. degree in 2014 under Prof. Alexander Fish as part of the Low Power Circuits and Systems (LPC&S) lab in Ben-Gurion University’s VLSI Systems Center. From 2014 to 2015, Dr. Teman was a post-doctoral researcher at the ´ Telecommunications Circuits Lab (TCL) at the Ecole Polytechnique F´ed´erale de Lausanne (EPFL) Switzerland under a Swiss Government Excellence Scholarship. In October 2015, Dr. Teman joined the faculty of engineering at Bar-Ilan University in 2015 as a tenure track researcher in the department of electrical engineering and as a partner in the Emerging Nanoscaled Integrated Circuits and Systems (EnICS) Labs research center. Dr. Teman’s research interests include low voltage digital design, energy efficient SRAM, NVM, and eDRAM memory arrays, low power CMOS image sensors and low power design techniques for digital and analog VLSI chips, energy efficient digital system implementation, approximate computing, and significancedriven computing for reliability and power optimization. He has authored over 40 scientific papers and 3 patent applications, and has presented excerpts from his research at a number of international conferences. In 2010-2012, Dr. Teman was honored with the Electrical Engineering Department’s Teaching Excellence recognition at Ben-Gurion University, and in 2011, he was awarded with BGU’s Outstanding Project award. Dr. Teman received the Yizhak BenYa’akov HaCohen Prize in 2010, the BGU Rector’s Prize for Outstanding Academic Achievement in 2012, the Wolf Foundation Scholarship for excellence of 2012 and the Intel Prize for Ph.D. students in 2013. His doctoral studies were conducted under a Kreitman Foundation Fellowship. Dr. Teman is an associate editor at the Microelectronics Journal and a member of the technical and review boards of several conferences and journals. Alexander Fish received the B.Sc. degree in Electrical Engineering from the Technion, Israel Institute of Technology, Haifa, Israel, in 1999. He completed his M.Sc. in 2002 and his Ph.D. (summa cum laude) in 2006, respectively, at Ben-Gurion University in Israel. He was a postdoctoral fellow in the ATIPS laboratory at the University of Calgary (Canada) from 2006-2008. In 2008 he joined Ben-Gurion University in Israel, as a faculty member in the Electrical and Computer Engineering Department. There he founded the Low Power Circuits and Systems (LPC&S) laboratory, specializing in low power circuits and systems. In July 2011 he was appointed as a head of the VLSI Systems Center at BGU. In October 2012 Prof. Fish joined the Faculty of Engineering of Bar-Ilan University as an Associate Professor and the head of the nanoelectronics track. Prof. Fish also leads the Emerging Nanoscaled Integrated Circuits and Systems (EnICS) Labs. Prof. Fish’s research interests include the development of secured hardware, ultra low power embedded memory arrays, CMOS image sensors and high speed and energy efficient design techniques. He has authored over 100 scientific papers in journals and conferences, including IEEE Journal of Solid State Circuits, IEEE Transactions on Electron Devices, IEEE Transactions on Circuits and Systems and many others. He also submitted 22 patent applications. Prof. Fish has published two book chapters. He was a co-author of papers that won the Best Paper Finalist awards at IEEE ISCAS and ICECS conferences. Prof. Fish serves as the Editor in Chief for the MDPI Journal of Low Power Electronics and Applications (JLPEA) and as an Associate Editor for the IEEE Sensors, IEEE Access, Elseiver Microelectronics and Integration, the VLSI Journals. He also served as a chair of different tracks of various IEEE conferences. He was a co-organizer of many special sessions at IEEE conferences, including IEEE ISCAS, IEEE Sensors and IEEEI conferences. Prof. Fish is a member of Sensory, VLSI Systems and Applications and Bio-medical Systems Technical Committees of IEEE Circuits and Systems Society.