An efficient memristor crossbar architecture for mapping Boolean functions using Binary Decision Diagrams (BDD)

An efficient memristor crossbar architecture for mapping Boolean functions using Binary Decision Diagrams (BDD)

Integration, the VLSI Journal xxx (xxxx) xxx Contents lists available at ScienceDirect Integration, the VLSI Journal journal homepage: www.elsevier...

1MB Sizes 0 Downloads 70 Views

Integration, the VLSI Journal xxx (xxxx) xxx

Contents lists available at ScienceDirect

Integration, the VLSI Journal journal homepage: www.elsevier.com/locate/vlsi

An efficient memristor crossbar architecture for mapping Boolean functions using Binary Decision Diagrams (BDD) Phrangboklang Lyngton Thangkhiew a, ∗ , Alwin Zulehner b , Robert Wille b,c , Kamalika Datta d , Indranil Sengupta e a

Department of Computer Science and Engineering, National Institute of Technology Meghalaya, India Institute for Integrated Circuits, Johannes Kepler University, Linz, Austria c Cyber-Physical Systems, DFKI GmbH, Bremen, Germany d School of Computer Science and Engineering, Nanyang Technological University, Singapore e Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, India b

A R T I C L E

I N F O

Keywords: Binary Decision Diagram (BDD) Crossbar array In-memory computing Logic synthesis Memristor MAGIC

A B S T R A C T

The memristor is considered as the fourth fundamental circuit element along with resistor, capacitor and inductor. It is a two-terminal passive circuit element whose resistance value changes based on the amount of charge flowing through it. Another property of the memristor is that its resistance change is non-volatile in nature, and hence can be used for non-volatile memory applications. Researchers have been exploring memristors from various perspectives such as logic design and storage applications. In this paper, a slicing crossbar architecture for the efficient mapping of Boolean functions is proposed which exploits gate level parallelism using the memristor aided logic (MAGIC) design style. A Boolean function is first represented as a Binary Decision Diagram (BDD). The BDD nodes are expressed as netlists of NOR and NOT gates, and are mapped to the proposed slicing crossbar architecture with parallel node evaluation where possible. This is the first approach that combines BDD-based synthesis with MAGIC gate evaluation on memristor crossbar, while at the same time avoiding crossbar-related problems using a slicing architecture. Experimental evaluations on standard benchmark functions show considerable improvement in the solutions.

1. Introduction Since the postulation of Leon Chua about the memristor as the fourth missing fundamental passive circuit element in 1971 [1], researchers have explored a number of its applications. The resistance of a memristor depends on the total amount of charge flowing through it, and as such can be changed by applying a voltage of suitable polarity across it. Because of the unique properties of oxide materials that are typically used to fabricate such devices, the change in resistance is non-volatile in nature. This property makes memristors suitable for non-volatile storage applications. Several researchers have also explored the applications of memristors in logic design, and this leads to in-memory computing architectures, where both storage and logic operations are carried out in the same hardware subsystem. Most of the scalable logic design efforts by researchers are based on one of two competing memristor-based logic design styles, (a) mate-

rial implication (IMPLY), and (b) memristor aided logic (MAGIC). A given function to be synthesized is first mapped to an intermediate netlist consisting of IMPLY or NOR/NOT gates using an off-the-shelf synthesis tool like ABC [2], and then the gates are evaluated using memristors (typically in a crossbar). The ABC tool uses the AND-Inverter-Graph (AIG) data structure as the intermediate representation of the function. In some prior works [3–6], the authors have used the Binary Decision Diagram (BDD) data structure to implement a given function using memristor IMPLY operations. The BDD data structure is known to give compact representations for many functions, and has been used in many applications. MAGIC gate operations are more efficient and consume less energy as compared to their IMPLY gate counterpart, which has motivated us to use MAGIC on a BDD-generated gate netlist in the present work. In this paper, we represent a given function in terms of BDD, and provide an efficient method for evaluating the BDD nodes on memris-

∗ Corresponding author. E-mail addresses: [email protected] (P.L. Thangkhiew), [email protected] (A. Zulehner), [email protected] (R. Wille), [email protected]. sg (K. Datta), [email protected] (I. Sengupta). https://doi.org/10.1016/j.vlsi.2019.11.014 Received 5 April 2019; Received in revised form 27 October 2019; Accepted 23 November 2019 Available online XXX 0167-9260/© 2019 Elsevier B.V. All rights reserved.

Please cite this article as: P.L. Thangkhiew, et al., An efficient memristor crossbar architecture for mapping Boolean functions using Binary Decision Diagrams (BDD), Integration, the VLSI Journal, https://doi.org/10.1016/j.vlsi.2019.11.014

P.L. Thangkhiew et al.

Integration, the VLSI Journal xxx (xxxx) xxx

tor crossbar using MAGIC design style. The BDD is generated using a commonly available tool CUDD [7]. Each BDD node corresponds to a 2-to-1 multiplexer (MUX), and the complementary edges are mapped to NOT operations. The BDD is traversed in a depth-first manner, and in each step it is identified whether a set of nodes can be evaluated in parallel or not. A slicing memristor crossbar architecture allows each BDD node to be mapped to a separate slice, which can be operated in parallel. A controller provides the control signals to the slices to carry out the required operations. Overall, this architecture allows parallelism in evaluating BDD nodes on one hand, and simplifies sneak path issues during gate evaluation (a major problem in many prior approaches) on the other. Experiments confirm the benefits of the proposed approach. The rest of the paper is organized as follows. Section 2 presents a brief review of memristors and their use in stateful logic design. A brief motivation for using BDD in the present work is also provided. Section 3 discusses the synthesis approach, with particular emphasis on the slicing crossbar architecture that has been proposed. Section 4 presents the simulation and experimental results on IWLS benchmark functions, and finally Section 5 concludes the paper.

elements (including the memristor) are shown in Fig. 1(a), and the ideal v-i characteristic of the memristor is shown in Fig. 1(b). Any device that shows a pinched hysteresis loop in the v-i characteristic is classified as a memristor [1,10]. The voltages V set and V reset are used to set the memristor to low resistance (logic 1) and high resistance states (logic 0), respectively. The voltage V on (V off ) is the threshold voltage at which the memristor starts to change its state from low to high (high to low) resistance state. In 2008, researchers at HP Labs fabricated a TiO2 based memristor [11], using thin films of pure TiO2 and doped TiO2−x with oxygen vacancies. Platinum nano-wires are used to provide the connections to form a crossbar structure as shown in Fig. 2(a). The length of the region with oxygen vacancy changes in response to an applied voltage, which in turn changes the resistance of the device. The SET and RESET operations as shown in Fig. 2(b) and (c) respectively show the conduction states of the memristor. The crossbar design in Ref. [11] consists of only memristive elements and nanowires. Other crossbar designs such as [12] use memristors as storage element and access transistors to control their switching. Another pure memristor based crossbar design is presented in Ref. [13] that show higher packing density and smaller power dissipation as compared to the hybrid design of [12].

2. Background

2.2. Memristive stateful logic design

In this section, we briefly discuss the basics of the memristor as well as its principle of operation and fabrication. We also review some of the stateful logic design styles using memristors, viz. material implication (IMPLY, [8]) and memristor aided logic (MAGIC, [9]). We also briefly review Binary Decision Diagrams (BDD) that are used in this work.

In addition to non-volatile storage applications, memristors can also be used in stateful logic design. The term stateful represents the feature where resistive states are used to represent logic values [8,14]. Typically, the high (low) resistance state is used to represent logic 0 (1). Here we discuss the IMPLY and MAGIC design styles that are most widely used for implementing stateful logic operations. It is worth mentioning that other hybrid design styles such as Memristor Threshold Logic (MTL) [15] and Memristor Ratioed Logic (MRL) [16] can also be used in logic design. However, here voltage levels instead of resistance values are used to represent logic states.

2.1. Memristor: the fourth circuit element In 1971, Leon Chua theoretically predicted the existence of a fourth fundamental passive circuit element called the memristor [1]. The resistance value of a memristor changes in response to the total amount of charge flowing through it, and this change is non-volatile. Memristor captures the non-linear relationship between magnetic flux and electrical charge. The six relationships among the four fundamental circuit

a) IMPLY: The IMPLY [8] design style requires two memristors and a resistor to implement the material implication operation represented

Fig. 1. (a) Relationship among the four fundamental elements [1]; (b) I-V characteristic of an ideal memristor.

Fig. 2. (a) Schematic view of a TiO2 memristor fabricated at HP Lab [11]; (b) During SET operation the width of the doped region expands; (c) during RESET operation the width of the doped region shrinks. 2

P.L. Thangkhiew et al.

Integration, the VLSI Journal xxx (xxxx) xxx

subsections. 3.1. BDD representation Boolean functions can be expressed in various forms such as sum-ofproducts (SOP), binary decision diagram (BDD), AND-Inverter graph (AIG), etc. Some of the earlier memristor-based synthesis methods have used the BDD and AIG data structures. AIG translation (for an AND node) to NOR is shown in Fig. 5, and BDD translation to NOR is shown in Fig. 6. It can be seen that a BDD node is equivalent to a 2:1 MUX that requires 4 NOT and 3 NOR gates. An AND gate in the AIG, on the other hand, requires only 2 NOT gates and a NOR gate. Two important aspects of representing a function are the number of terms and the length of the critical path (the number of levels of dependency). Of the two, the level of dependency is more important in multi-level logic synthesis, as it determines the overall delay. However, decreasing the number of levels often leads to an increase in the number of terms. This is a drawback in CMOS technology as it leads to an increase in power consumption, area, and delay. In general, the AIG representation may require less number of NOTNOR operations as compared to BDD. But for most functions, the BDD representation is known to be more compact and requires a much shorter critical path as compared to the AIG representation. For this reason, we have chosen the BDD representation for crossbar mapping. Fig. 7(a) and (b) show a full adder realization using AIG and BDD respectively. It is seen that the BDD representation has 4 levels and 6 nodes, while the AIG representation has 6 levels and 8 nodes. In this work, we propose to schedule the evaluation of a given function in a level-wise manner with respect to the BDD nodes. We have

Fig. 3. IMPLY gate implementation.

as A → B ≡ A + B (see Fig. 3). The false operation (A → 0 ≡ A) along with IMPLY is functionally complete. An IMPLY gate can be mapped to a row of the crossbar, and requires that the gate inputs be loaded before the operation. Any arbitrary function can be implemented as a sequence of IMPLY operations. Such an implementation increases the size of the crossbar in one direction on the same row, and lacks the flexibility to make better use of the cells in other row wires. Alternate IMPLY gate designs such as MAD [17] has been used for multipliers that execute faster than IMPLY and CMOS based designs. b) MAGIC: The MAGIC design style [9] requires only memristors, and as such is more compact as compared to IMPLY. The MAGIC NOR/NOT gates can be implemented in crossbar array as shown in Fig. 4, and requires less energy as compared to the IMPLY gate [9] operation. The output memristor has to be set to logic 1 prior to gate operation. It also supports parallel gate evaluation and is more flexible as computation can be carried out along rows as well as along columns [18–20]. This leads to a much regular crossbar structure [18,20–22] as compared to IMPLY based approaches.

3. Motivation and general idea Various mapping techniques using MAGIC have been presented in the literature [19,21,23,24]. In these methods, mapping is performed on a single crossbar and the gates are mostly evaluated sequentially. For large Boolean functions, this results in larger crossbar size and evaluation steps thereby making the design of the controller complex. Additionally, larger crossbar arrays are more prone to sneak-path and write disturb problems. In Ref. [22], mapping is performed on a single crossbar array in a level-wise manner, with same number of time steps required for each level. The mapping requires a smaller crossbar size and less time steps as compared to the mapping techniques in Refs. [19,21,23,24]. The cost metric is usually determined by time steps (cycles) and the number of memristors or crossbar area used. The BDD-based mapping technique presented in this paper exploits crossbar parallelism and simplifies the sneak-path and write disturb issues that were major problems in the previous approaches [19,21–24]. The motivation and detailed formulation are discussed in the following

Fig. 5. (a) AND node used in AIG; (b) NOR/NOT representation.

Fig. 6. (a) 2:1 MUX used to represent a BDD node; (b) NOR/NOT representation.

Fig. 4. (a) Row-wise MAGIC realization of 2-input NOR gate; (b) row-wise mapping in crossbar array; (c) column-wise MAGIC realization of 2-input NOR gate; and (d) column-wise mapping in crossbar array. 3

P.L. Thangkhiew et al.

Integration, the VLSI Journal xxx (xxxx) xxx

time steps required for mapping. The overall synthesis flow is shown in Fig. 8. 4.1. Implementing a 2:1 multiplexer in memristor crossbar Each BDD node can be represented in terms of a 2-to-1 MUX, which in turn can be implemented using NOT and NOR gates as shown in Fig. 6. It can be observed from Fig. 6(b) that both the inputs A and B are fed to two NOT gates to obtain A and B. In our proposed method, instead of writing A and B to the template directly write A and B, thereby reducing two NOT operations. Next, we write the select lines S and S. Then we perform the gates computation, in which G1 and G2 are computed in parallel, followed by the sequential evaluation of G3 and G4 (see Fig. 9). Table 1 shows the different voltages that are required to implement the 2-to-1 MUX in the 3 × 3 crossbar array. 4.2. Slicing (multi-level) crossbar architecture Fig. 7. Full adder representation: (a) in terms of AIG; (b) in terms of BDD (ROBDD).

In previous synthesis approaches [19,21], a single crossbar is used to map a given Boolean function, where both storage and computation are carried out. To overcome the write-disturb and sneak-path problems in such a crossbar, we propose a multi-level crossbar in which each BDD node is assigned to a 3 × 3 template (see Fig. 10). The BDD netlist generated by CUDD is mapped to the crossbar in a level-wise fashion. The maximum number of templates required is equal to the maximum number of BDD nodes at any level. All the templates corresponding to a particular level are evaluated in parallel, where the evaluation steps consist of initialization, computation, and reading the final output. The final output from each template is stored in a buffer for use in subsequent levels. The buffer itself can be implemented by a memristor crossbar of size nm × nl , where nm denotes the number of BDD nodes (i.e. multiplexers) and nl denotes the number of levels. To implement a 2-to-1 MUX in a template, a controller is used to generate the required micro-operations. Since the primary aim of the present work is to map Boolean functions in crossbar, we have not analyzed the complexity of the controller. However, a controller as proposed in Refs. [32,33] can be used. The main advantages of the proposed multi-crossbar architecture over conventional single-crossbar architecture are summarized below.

used MAGIC to evaluate the gates with parallel operations where possible, and at the same time reusing the hardware across levels. Each BDD node can be represented using a 2-to-1 MUX. Also, BDD generation tools like CUDD [7] generate complementary edges that map to NOT gates. In our implementation, we have used MAGIC for realizing the MUX and NOT gates, which is more energy efficient as compared to IMPLY-based realization [9]. It may be noted that there exist a few earlier works that have used IMPLY design style in conjunction with BDD to synthesize logic functions [4,25]. However, they suffer from the drawback that for fanout edges in the BDD, either the nodes have to be replicated or additional steps are required to forward the values to the target nodes. This has motivated us to explore the MAGIC design style on BDD representation in the present work. As mentioned earlier, some of the major problems in existing works are: (a) the write-disturb problem [26,27] that can change the values in some of the cells during gate computation, (b) the sneak path problem [28–31] that can lead to erroneous reading of some of the cells, and (c) high complexity of the controller for large crossbar sizes. In the proposed work, we have used a slicing crossbar architecture of small size (3 × 3) where all these problems can be very easily controlled. We have used the isolation voltage method to tackle these problems, where the controller design becomes very simple due to the small size of the slices.

• The write-disturb problem does not affect the stored data as computation is carried out in a 3 × 3 template only. • The sneak-path issues has to be handled only during the reading of the primary inputs from the data region. For this purpose, gating techniques as proposed in Refs. [30,31] can be used. Several other solutions to this problem are also available [28,29]. • In Ref. [28], the sneak path has been characterized by the presence of zero isosceles rectangles in the crossbar. It may be observed that the output cell in the 3 × 3 template does not exhibit the zeroisosceles rectangle problem. Thus, there is no issue of sneak path during computation. • Another solution to mitigate write disturb problem is to apply isolation voltage or to selectively ground rows [27]. This leads to more energy consumption in larger crossbar arrays; however, within the 3 × 3 templates as proposed, the energy dissipation will be very

4. Proposed method In this section, we discuss the proposed method to implement any given Boolean function in a memristor crossbar array. The CUDD tool is used to convert a given function into the BDD representation. The proposed mapping tool then traverses the BDD nodes in a bottom-up fashion, and maps them to MUX-es in a level-wise fashion. The tool has been implemented in C++, which also generates the required microoperations for gate evaluation, the resource utilization and number of

Fig. 8. Overall synthesis flow. 4

P.L. Thangkhiew et al.

Integration, the VLSI Journal xxx (xxxx) xxx

Fig. 9. Crossbar implementation of the 2:1 MUX using MAGIC NOR-NOT gates.

Table 2 Micro-operations to implement a MUX in a crossbar slice.

Table 1 Voltage application in a 3 × 3 crossbar array to implement BDD using 2:1 MUX in crossbar array. Voltage

Remarks

Cycle

Operations

VRESET = C1 VG1 = R1, VG2 = R2

Reset Voltage if A′ = 0, VG1 = GND, else VG1 = Viso , and if B′ = 0, VG2 = GND, else VG2 = Viso Reset Voltage if S = 0, VG1 = GND, else VG1 = Viso , and if S’ = 0, VG2 = GND, else VG2 = Viso NOR Operating Voltage for G1, G2 Isolation voltage NOT Operating Voltage for G3 Isolation voltage NOT Operating Voltage for G4 Isolation voltage

T1 T2 T3 T4 T5 T6 T7 T8

Set all memristor in template to 1 Write A and B Write S and S Evaluate G1 = NOR(A,S); G2 = NOR(B,S) Evaluate G3 = NOR(G1, G2) Evaluate G4 = NOT(G3) Read G4 Write result to buffer

VRESET = C2 VG1 = R1, VG2 = R2

Vo = C1, C2, GND = C3 Viso = R3 Vo = R3, GND = R1, R2 Viso = C1, C2 Vo = C3, GND = C2 Viso = R1, R2

Algorithm 1 Mapping algorithm. 1: INPUT: Multiplexer Netlist ML 2: OUTPUT: Crossbar Mapping 3: MLsort ← Topological_Sort (ML) 4: MAXMUX ← 0 5: MAXNOT ← 0 6: for each Level in MLsort do 7: Nmux ← Number of MUXES in Level 8: Nnot ← Number of NOTs in Level 9: Ntemplate ← Nmux 10: if Ntemplate > MaxMUX then 11: MAXMUX ← Ntemplate 12: end if 13: if Nnot > MaxNOT then 14: MAXNOT ← Nnot 15: end if 16: Read from data region 17: Init(Ntemplate ) 18: Eval(Ntemplate ) 19: Read & Write the result from each crossbar 20: end for 21: MAXMUX : number of templates 22: MAXNOT × 2: Size of NOT template ∗ Logic 1 nodes are written to data region directly and NOT nodes are evaluated in cycle T3

Fig. 10. Multi-level crossbar architecture where all the templates are connected to a common voltage controller.

nominal. Similarly, for the data region the same mitigation technique can be used to prevent write disturb.

4.3. Mapping BDD nodes to multi-level crossbar

The proposed slicing architecture reuses the crossbar slices at every BDD level. Level-wise scheduling of the BDD nodes also enables parallel computation. It also ensures high scalability and faster computation as compared to methods in Refs. [19,21,23] where the whole function is mapped to a single crossbar array.

In the proposed architecture, all the MUXes in a level can be evaluated in 8 clock cycles as shown in Table 2, where each MUX is assigned to a separate crossbar slice. To implement the MUX operations, the controller applies the micro-operations in parallel to all the templates. Algorithm 1 maps a given function in a multi-crossbar array. The BDD netlist is sorted topologically based on dependency. At each level (comprising of L nodes), the nodes are scheduled to execute in parallel on L templates. The NOT nodes in the BDD can be scheduled along with the MUX. If a certain level consists only of NOT nodes, then only 3 clock pulses are sufficient to implement that level. This includes setting the NOT template, loading inputs and gate evaluation.

4.4. Illustration for a 2-bit ripple carry adder (RCA) In this subsection, we illustrate the proposed method for a 2-bit RCA. Fig. 11(a) and (b) respectively show the schematic diagram and the BDD network of the 2-bit RCA, for which we observe the following: a) it consists of 5 levels; 5

P.L. Thangkhiew et al.

Integration, the VLSI Journal xxx (xxxx) xxx

Fig. 11. (a) 2-bit ripple carry adder; (b) Equivalent BDD as generated from CUDD tool.

Table 3 Implementation steps of the 2-bit RCA.

Table 4 Parameters used for the VTEAM memristor model.

Levels

NOT

MUX

Buffer

Cycles

1 2 3 4 5

– – – – –

1 3 2 3 2

1 3 2 3 2

8 8 8 8 8

11

40

Total: ∗ Number of MUX Template = 3

Parameters Values Parameters Values

Ron 1.0 kΩ koff 0.101 m/s

Roff 300 kΩ xon 0 nm

−1.5 V

V on

V off 0.3 V

xoff 3 nm

𝛼 on

−242.7 m/s 𝛼 off

kon

4

4

Table 5 Energy dissipation of MAGIC NOT operation. Inputs Energy (in fJ)

0 2.65

1 87.1

b) Levels 2 and 4 consist of the maximum number of BDD nodes (referred to as maxNodeLevel). energy for a memristor in states 0 and 1 are found to be 0.05 fJ and 58.47 fJ respectively. It is observed from Table 2 that 8 time steps are needed to implement a MUX, with the corresponding average energy consumptions shown in Table 7. The total energy consumption of a single crossbar slice is 351.68 fJ.

To implement the 2-bit RCA, the total number of MUX templates used is 3, which is equal to the number of nodes in maxNodeLevel. Table 3 shows the number of templates (NOT and MUX) for the 2bit RCA estimated in different levels during mapping, which requires a total of 40 time steps. The MUX outputs from each level are stored in a buffer, and the number of memristors required in the buffer is 11.

5.2. Synthesis results

5. Experimental results

We now present the experimental results of the proposed synthesis approach. We have used the CUDD 2.5.0 tool [7] to translate a given Boolean function into the corresponding BDD representation. To estimate the latency in terms of the number of voltage pulses and area in terms of the required number of memristors, we have implemented the mapping tool in C++ that is run on an Intel i7 processor with 2.6 GHz clock.

Here we first present the circuit simulation results for MAGIC NOR and NOT gate operations to estimate the energy consumption, followed by synthesis and crossbar mapping results on IWLS benchmarks. 5.1. SPICE simulation We first present the SPICE simulation results for MAGIC NOR and NOT gate operations. For modeling the memristors, we have used the VTEAM model [34] with parameters as described in Refs. [18,21–23] (see Table 4), which provides a good fit for the fabricated devices reported in Ref. [35]. The devices require 1ns for SET and RESET operations, with applied voltages of +2 V and −1 V respectively. The average switching energy for NOT and NOR gate operations are shown in Tables 5 and 6 respectively. It may be noted that the proposed mapping involves SET and RESET operations, which consume 91.63 fJ and 14.91 fJ of energy respectively. It is also required to read the memristor states for saving the MUX outputs in the buffer. The read

Table 6 Energy dissipation of MAGIC NOR operation. Inputs Energy (in fJ)

00 5.3

01 86.3

10 86.3

11 27.8

Table 7 Average Energy consumption of the 2:1 MUX. Cycles Energy (in fJ)

6

T1 91.63

T2 14.91

T3 14.91

T4 51.4

T5 51.4

T6 44.9

T7 29.26

T8 53.27

P.L. Thangkhiew et al.

Integration, the VLSI Journal xxx (xxxx) xxx

Table 8 Synthesis results for IWLS benchmarks. Benchmark

Levelwise-AIG [22]

This Work

% of Improvement

circuits

I/O

Level

Memristor

Cycle

Level

Memristor

Cycle

Memristor

Cycle

5xp1 9sym alu4 apex1 apex2 apex4 apex5 b12 bw clip con1 cordic duke2 e64 ex1010 ex5p inc misex1 misex2 misex3c misex3 pdc rd53 rd73 rd84 sao2 seq spla squar5 t481 table3 table5 vg2 xor5

7/10 9/1 14/8 45/45 38/3 9/19 114/88 15/9 5/28 9/5 7/2 23/2 22/29 65/65 10/10 8/63 7/9 8/7 25/18 14/14 14/14 16/40 5/3 7/3 8/4 10/4 41/35 16/46 5/8 16/1 14/14 17/15 25/8 5/1

10 20 26 26 28 24 18 10 11 14 8 17 22 23 24 17 12 9 12 24 26 24 11 17 18 14 24 28 10 18 26 28 14 9

63 159 615 1137 138 1620 417 57 114 75 30 81 294 408 1719 366 72 51 93 369 666 480 27 78 102 111 897 507 42 33 987 858 111 15

70 140 182 182 196 168 126 70 77 98 56 119 154 161 168 119 84 63 84 168 182 168 77 119 126 98 168 196 70 126 182 196 98 63

7 9 13 28 36 9 25 8 5 9 5 21 17 65 10 8 7 8 12 14 14 16 5 5 8 10 30 16 5 15 14 17 21 5

172 34 1650 2584 1124 2588 1104 132 324 326 42 26 480 202 2616 442 150 88 146 1178 694 604 42 56 70 216 8320 518 90 16 1038 596 378 8

60 76 113 228 292 76 204 68 44 76 44 188 140 524 84 68 60 52 100 116 116 132 44 57 68 84 249 132 44 129 116 140 177 41

−173.02

14.29 45.71 37.91 −25.27 −48.98 54.76 −61.90 2.86 42.86 22.45 21.43 −57.98 9.09 −225.47 50.00 42.86 28.57 17.46 −19.05 30.95 36.26 21.43 42.86 52.10 46.03 14.29 −48.21 32.65 37.14 −2.38 36.26 28.57 −80.61 34.92

78.62

−168.29 −127.26 −714.49 −59.75 −164.75 −131.58 −184.21 −334.67 −40.00 67.90 −63.27 50.49 −52.18 −20.77 −108.33 −72.55 −56.99 −219.24 −4.20 −25.83 −55.56 28.21 31.37 −94.59 −827.54 −2.17 −114.29 51.52 −5.17 30.54 −240.54 46.67

Table 9 Comparison with IMPLY-Based synthesis. Benchmark

IMPLY [4]

Function

Memristor

Cycle

Memristor

MAGIC-Proposed Cycle

Mem

Cycle

5xp1 alu4 apex1 apex4 apex5 clip cordic e64 ex1010 misex1 misex3 misex3c pdc seq squar5 t481 table5

84 642 1626 2073 806 120 32 94 1984 83 444 429 507 1566 93 26 580

73 334 705 447 888 89 149 456 396 69 185 239 142 692 56 107 168

172 1650 2584 2588 1104 326 26 202 2616 88 694 1178 604 8320 90 16 596

60 113 228 76 204 76 188 524 84 52 116 116 132 249 44 129 140

−104.76 −157.01 −58.92 −24.84 −36.97 −171.67

17.81 66.17 67.66 83.00 77.03 14.61 −26.17 −14.91 78.79 24.64 37.30 51.46 7.04 64.02 21.43 −20.56 16.67

We have used IWLS benchmarks for the purpose of experimentation, and compared the results with a recently published work [22]. The work in Ref. [22] uses the ABC tool for initial pre-processing, where a given function is translated into the AND-Inverter-Graph (AIG) representation, and then mapped to the crossbar. In both [22] and the proposed work, the gates/nodes are scheduled to be evaluated in a levelwise manner. Clearly, the latency increases with increase in the number of levels in the realization.

% of Improv

18.75 −114.89 −31.85 −6.02 −56.31 −174.59 −19.13 −431.29 3.23 38.46 −2.76

Table 8 summarizes the synthesis results. The first two columns in the table show the name and the number of inputs/outputs (I/O) of the benchmark functions. The next three columns show the results of the work reported in Ref. [22] in terms of the level of dependency in AIG, the number of memristors required, and the number of cycles. The next three columns show the corresponding values of the proposed work. The last two columns show the percentage improvements of the proposed work as compared to Ref. [22] in terms of the number of memristors and computation cycles. 7

P.L. Thangkhiew et al.

Integration, the VLSI Journal xxx (xxxx) xxx

References

For the IWLS benchmarks reported in Table 8, it can be observed that the BDD representation has less number of dependency levels as compared to the AIG representation reported in Ref. [22]. On the average, the dependency level of the proposed work is 9.16% less as compared to that of [22], which results in faster execution. For the benchmarks apex4 and rd73, improvements of over 50% as compared to Ref. [22] have been observed. However, in the BDD representation there may be several nodes per level, which requires more memristors for the proposed parallel evaluation using templates. Both the proposed work and [22] reuse the memristors after evaluation at every level, and hence require less number of memristors as compared to Refs. [19,21]. Also, the works reported in Refs. [19,21] map the entire gate-level netlist to the crossbar resulting in less parallelism in operations, which makes the proposed approach faster. Table 9 shows a comparison of the proposed approach with the IMPLY based method described in Ref. [4]. The authors in Ref. [4] have presented two synthesis schemes, serial and parallel. The parallel scheme gives better results and hence has been used for comparing with the proposed approach.

[1] L. Chua, Memristor - the missing circuit element, IEEE Trans. Circuit Theory 18 (5) (1971) 507–519. [2] Berkeley Logic Synthesis and Verification Group, ABC - a system for sequential synthesis and verification. http://www.eecs.berkeley.edu/alanmi/abc/. [3] H. Owlia, P. Keshavarzi, A. Rezai, A novel digital logic implementation approach on nanocrossbar arrays using memristor-based multiplexers, Microelectron. J. 45 (6) (2014) 597–603. [4] S. Chakraborti, P.V. Chowdhary, K. Datta, I. Sengupta, BDD based synthesis of Boolean functions using memristors, in: 2014 9th International Design and Test Symposium (IDT), 2014, pp. 136–141. [5] A. Chakraborty, R. Das, C. Bandopadhyay, H. Rahaman, BDD based synthesis technique for design of high-speed memristor based circuits, in: 2016 20th International Symposium on VLSI Design and Test (VDAT), 2016, pp. 1–6. [6] S. Shirinzadeh, M. Soeken, P.E. Gaillardon, R. Drechsler, Logic synthesis for RRAM-based in-memory computing, IEEE Trans. Comput. Aided Des. Integr Circuits Syst. (2017) 1–14. [7] F. Somenzi, CUDD: CU Decision Diagram package: release 2.5.0. http://vlsi. colorado.edu/fabio/CUDD/2012. [8] J. Borghetti, G. Snider, P. Kuekes, J. Yang, D. Stewart, R. Williams, Memristive switches enable stateful logic operations via material implication, Nature 464 (7290) (2010) 873–876. [9] S. Kvatinsky, D. Belousov, S. Liman, G. Satat, N. Wald, E.G. Friedman, A. Kolodny, U.C. Weiser, (MAGIC) - memristor-aided logic, IEEE Trans. Circuits Syst. II: Express Briefs 61 (11) (2014) 895–899. [10] L.O. Chua, S.M. Kang, Memristive devices and systems, Proc. IEEE 64 (2) (1976) 209–223. [11] D.B. Strukov, G.S. Snider, D.R. Steward, R.S. Williams, The missing memristor found, Nature 453 (7191) (2008) 80–83. [12] P. Junsangsri, F. Lombardi, Design of a hybrid memory cell using memristance and ambipolarity, IEEE Trans. Nanotechnol. 12 (1) (2013) 71–80. [13] Y. Zhang, Y. Shen, X. Wang, L. Cao, A novel design for memristor-based logic switch and crossbar circuits, IEEE Trans. Circuits Syst. I: Reg. Pap. 62 (5) (2015) 1402–1411. [14] E. Lehtonen, M. Laiho, Stateful implication logic with memristors, in: 2009 IEEE/ACM International Symposium on Nanoscale Architectures, 2009, pp. 33–36. [15] J. Rajendran, H. Manem, R. Karri, G.S. Rose, An energy-efficient memristive threshold logic circuit, IEEE Trans. Comput. 61 (4) (2012) 474–487. [16] S. Kvatinsky, N. Wald, G. Satat, A. Kolodny, U.C. Weiser, E.G. Friedman, Memristor ratioed logic, in: International Workshop on Cellular Nanoscale Networks and their Applications, 2012, pp. 1–6. [17] L. Guckert, E.E. Swartzlander, Optimized memristor-based multipliers, IEEE Trans. Circuits Syst. I: Reg. Pap. 64 (2) (2017) 373–385. [18] P.L. Thangkhiew, R. Gharpinde, P.V. Chowdhary, K. Datta, I. Sengupta, Area efficient implementation of ripple carry adder using memristor crossbar arrays, in: 2016 11th International Design Test Symposium (IDT), 2016, pp. 142–147. [19] N. Talati, S. Gupta, P. Mane, S. Kvatinsky, Logic design within memristive memories using memristor-aided loGIC (MAGIC), IEEE Trans. Nanotechnol. 15 (4) (2016) 635–650. [20] R.B. Hur, N. Wald, N. Talati, S. Kvatinsky, Simple magic: synthesis and in-memory Mapping of logic execution for memristor-aided logic, in: 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2017, pp. 225–232. [21] R. Gharpinde, P.L. Thangkhiew, K. Datta, I. Sengupta, A scalable in-memory logic synthesis approach using memristor crossbar, IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 26 (2) (2018) 355–366. [22] P.L. Thangkhiew, R. Gharpinde, K. Datta, Efficient mapping of boolean functions to memristor crossbar using MAGIC NOR gates, IEEE Trans. Circuits Syst. I: Reg. Pap. 65 (8) (2018) 2466–2476. [23] P.L. Thangkhiew, K. Datta, Scalable in-memory mapping of boolean functions in memristive crossbar array using simulated annealing, J. Syst. Archit. 89 (2018) 49–59. [24] D.N. Yadav, P.L. Thangkhiew, K. Datta, Look-ahead mapping of boolean functions in memristive crossbar array, Integration 64 (2019) 152–162. [25] F. Lalchhandama, B.G. Sapui, K. Datta, An improved approach for the synthesis of boolean functions using memristor based IMPLY and INVERSE-IMPLY gates, in: 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2016, pp. 319–324. [26] C. Xu, X. Dong, N.P. Jouppi, Y. Xie, Design implications of memristor-based RRAM cross-point structures, in: 2011 Design, Automation Test in Europe, 2011, pp. 1–6. [27] Y. Cassuto, S. Kvatinsky, E. Yaakobi, Write sneak-path constraints avoiding disturbs in memristor crossbar arrays, in: 2016 IEEE International Symposium on Information Theory (ISIT), 2016, pp. 950–954. [28] Y. Cassuto, S. Kvatinsky, E. Yaakobi, Sneak-path constraints in memristor crossbar arrays, in: 2013 IEEE International Symposium on Information Theory, 2013, pp. 156–160. [29] Y. Cassuto, S. Kvatinsky, E. Yaakobi, Information-theoretic sneak-path mitigation in memristor crossbar arrays, IEEE Trans. Inf. Theory 62 (9) (2016) 4801–4813. [30] M.A. Zidan, H.A.H. Fahmy, M.M. Hussain, K.N. Salama, Memristor-based memory: the sneak paths problem and solutions, Microelectron. J. 44 (2) (2013) 176–183. [31] M.A. Zidan, A.M. Eltawil, F. Kurdahi, H.A.H. Fahmy, K.N. Salama, Memristor multiport readout: a closed-form solution for sneak paths, IEEE Trans. Nanotechnol. 13 (2) (2014) 274–282.

6. Conclusion A memristor-based scalable logic synthesis approach has been presented in this paper, that uses the BDD data structure for representing a function and the MAGIC gate evaluation on memristor crossbar. A new slicing crossbar architecture has been proposed for the evaluation of the BDD nodes. The following conclusive remarks can be drawn from the experimental analyses carried out on IWLS benchmarks. a) As expected, the BDD representation has more number of nodes per level as compared to AIG. This comes at the expense of area, which can be tolerated for faster execution in memristor based mapping since crossbars offer high capacity storage. b) To overcome various issues we propose to map a Boolean function in the crossbar slicing architecture as follows: – We assign each BDD node in a level to a 3 × 3 crossbar template, apply micro-operations on the templates, and repeat this for all the levels. – The write disturb problem is simplified during gate computation on the 3 × 3 template, by applying isolation voltages on unused rows and columns. Since the application of isolation voltages on large crossbar leads to higher power consumption, the proposed approach ensures a more energy efficient mapping. Further we have carried out SPICE simulation and presented the synthesis results on IWLS benchmarks and compared the same with a recently published work. As expected, the number of memristors required across all the crossbar slices is higher; however, due to parallelism in operations the number of time steps is reduced. Acknowledgement This work was supported fully by the Department of Science and Technology, Government of India, for the project “Development of CAD Tools for Synthesis, Optimization and Verification of Digital Circuits using Memristors” (Grant No. INT/AUSTRIA/BMWF/P-02/2017), and by the Austrian Agency for International Cooperation in Education and Research (OeAD, Grant No. IN 08/2017).

Appendix A. Supplementary data Supplementary data to this article can be found online at https:// doi.org/10.1016/j.vlsi.2019.11.014.

8

P.L. Thangkhiew et al.

Integration, the VLSI Journal xxx (xxxx) xxx Germany, in 2006 and 2009, respectively. Besides that, he worked at the German Research Center for Artificial Intelligence and was a Lecturer with the University of Applied Science Bremen as well as a Visiting Professor with the University of Potsdam and the Technical University Dresden. His current research interests include the design of circuits and systems for both conventional and emerging technologies with a particular focus on design automation for quantum computation. On these topics, he has published over 250 journal and conference papers and got awarded e.g. with an Under-40 Innovators Award as well as a Young Researcher Award.

[32] R.B. Hur, S. Kvatinsky, Memory processing unit for in-memory processing, in: 2016 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH), 2016, pp. 171–172. [33] L. Xie, H.A.D. Nguyen, M. Taouil, S. Hamdioui, K. Bertels, Fast boolean logic mapped on memristor crossbar, in: 2015 33rd IEEE International Conference on Computer Design (ICCD), 2015, pp. 335–342. [34] S. Kvatinsky, M. Ramadan, E.G. Friedman, A. Kolodny, Vteam: a general model for voltage-controlled memristors, IEEE Trans. Circuits Syst. II: Express Briefs 62 (8) (2015) 786–790. [35] Y. Yang, J. Joshua, D.B. Strukov, D.R. Stewart, Memristive devices for computing, Nat. Nanotechnol. 8 (1) (2013) 13–24.

Kamalika Datta completed her Master of Science (MS) from Indian Institute of Technology Kharagpur, India in 2010, and Ph.D. from Indian Institute of Engineering Science and Technology (IIEST), Shibpur, India in 2014. She worked as an Assistant Professor at the National Institute of Technology Meghalaya during 2014 to 2018. She is currently a Research Fellow at the Nanyang Technological University Singapore. She has successfully guided two Ph.D. students, and published more than 60 papers in peer reviewed journals and conferences. Her research interests include logic design using emerging technologies, reversible and quantum computing.

Phrangboklang Lyngton Thangkhiew received the Bachelor of Engineering (BE) from Visvesvaraya Technological University, India, in 2014, and Ph.D. from National Institute of Technology Meghalaya, India in 2019. He has published eight papers in peer-reviewed journals and conferences. His research interests include memristor and its applications in logic design, synthesis, and optimization techniques.

Indranil Sengupta obtained his B.Tech., M.Tech. and Ph.D. degrees in Computer Science from the University of Calcutta in the years 1983, 1985 and 1990 respectively. He joined Indian Institute of Technology Kharagpur, India, as a faculty member in the year 1988, in the Department of Computer Science and Engineering, where he is presently a Full Professor. He had been the former Heads of the Department of Computer Science and Engineering, and School of Information Technology. He has over 30 years of teaching and research experience, guided 21 Ph.D. students and published over 200 papers in peer reviewed journals and conferences. He has served as the Program Chair / General Chair in several International Conferences in the areas of VLSI design/test, reversible computing and information security. His research interests include reversible and quantum computing, VLSI design and test, and network security. He is a Senior Member of the IEEE.

Alwin Zulehner received his Ph.D. degree at the Institute for Integrated Circuits at the Johannes Kepler University Linz, Austria. His research interests include design automation for emerging technologies, focusing on quantum computing. He has published several papers on international conferences and journals such as the IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems (TCAD), Asia and South Pacific Design Automation Conference (ASP-DAC), Design, Automation and Test in Europe (DATE), International Conference of Computer-Aided Design (ICCAD), and Design Automation Conference (DAC).

Robert Wille is a Full Professor at the Johannes Kepler University Linz, Austria. He received the Diploma and Dr.-Ing. degrees in computer science from the University of Bremen,

9