CMOS design and analysis of low-voltage signaling methodology for energy efficient on-chip interconnects

CMOS design and analysis of low-voltage signaling methodology for energy efficient on-chip interconnects

ARTICLE IN PRESS Microelectronics Journal 40 (2009) 1571–1581 Contents lists available at ScienceDirect Microelectronics Journal journal homepage: w...

688KB Sizes 0 Downloads 33 Views

ARTICLE IN PRESS Microelectronics Journal 40 (2009) 1571–1581

Contents lists available at ScienceDirect

Microelectronics Journal journal homepage: www.elsevier.com/locate/mejo

CMOS design and analysis of low-voltage signaling methodology for energy efficient on-chip interconnects Jose´ C. Garcı´a a, Juan A. Montiel-Nelson a, Saeid Nooshabadi b, a b

Institute for Applied Microelectronics, University of Las Palmas de Gran Canaria, Spain Department of Information and Communication, Gwangju Institute of Science and Technology, Republic of Korea

a r t i c l e in fo

abstract

Article history: Received 2 April 2008 Received in revised form 26 November 2008 Accepted 17 December 2008 Available online 11 February 2009

This paper provides a comparative study of the low-voltage signaling methodologies in terms of delay, energy dissipation, and energy delay product ðenergy  delayÞ, and sensitivity technology process variations, and noise. We also present the design of two symmetric low-swing driver–receiver pairs for driving signals on the global interconnect lines. The key advantage of the proposed signaling schemes is that they require only one power supply and threshold voltage, hence significantly reducing the design complexity. The proposed signaling schemes were implemented on 1.0 V 0:13 mm CMOS technology, for signal transmission along a wire-length of 10 mm. When compared with other counterpart symmetric and asymmetric low-swing signaling schemes, the proposed schemes perform better in terms of delay, energy dissipation and energy  delay. & 2009 Elsevier Ltd. All rights reserved.

Keywords: Digital CMOS Interconnect signaling Bus drivers Bus receivers Level converters Low energy Low-voltage Performance tradeoffs

1. Introduction Interconnect wires (busses, global clocks, and timing signals) and associated driver and receiver circuitries account for an ever increasing energy budget in the integrated circuits. It is reported in [1] that in some gate array design styles power dissipation from the interconnect wires amounts to up to 40% of the total on-chip power dissipation. On the field programmable gate arrays fabric the reported power dissipation from interconnect wires is up to 90% [2]. The chip performance and robustness [3,4] is largely dominated by interconnect. One major design strategy to combat large energy dissipation on the interconnect and achieve energy  delay efficiency is the use of low-voltage swing. The reduction on the voltage swing, however, generally comes at the expense of reduced reliability and performance and increase in the driver and receiver complexity [4]. Most low-swing voltage techniques to-date [1,5] rely on extra power supply, or reference voltage, multiple threshold process technology, large area penalty, and multiple wire interconnects when differential signaling is employed [6]. They also suffer from

 Corresponding author. Tel.: +98262 970 3120; fax: +98262 970 2204.

E-mail addresses: [email protected] (J.C. Garcı´a), [email protected] (J.A. Montiel-Nelson), [email protected], [email protected] (S. Nooshabadi). 0026-2692/$ - see front matter & 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.mejo.2008.12.003

large short-circuit current problem, long propagation delay, and high-power dissipation [1,5]. Due to reduction in the voltage swing, drivers for the low-swing voltage signaling schemes generally do not provide sufficient driving capability for the larger loads. In order to improve the driving capability, some driver circuits rely on bootstrapping techniques [7,8]. However, these circuits require extra bootstrapping capacitors, and generally need access to the well terminals that may not be readily available in many digital CMOS processes. The signaling schemes for the long interconnect lines are categorized according to the direction of the swing voltage reduction in the signal [1,9]. In the high-offset asymmetric (HOA) low-swing voltage scheme (e.g. HOA source follower in [1]), the range of signal level on the interconnect is between 0 and Vbus, where VbuspVddh, and Vddh is the nominal power supply used by the computational circuit blocks at the driver and receiver sides of the interconnect. To avoid employing a separate power supply the source follower drivers [1] set Vbus ¼ ðVdd  VtnÞ or Vbus ¼ ðVdd  2VtnÞ, where Vtn is the nMOS transistor threshold voltage.1In low–high-offset symmetric (LHOS) low-swing voltage signaling scheme, on the other hand, the output voltage range

1 It is also possible to construct a low-offset asymmetric (LOA) scheme where the signaling voltage range is between Vbus and Vddh, where VbusX0. In the source follower drivers for LOA it is convenient to choose Vbus ¼ jVtpj or Vbus ¼ 2jVtpj, where Vtp is the pMOS transistor threshold voltage.

ARTICLE IN PRESS 1572

J.C. Garcı´a et al. / Microelectronics Journal 40 (2009) 1571–1581

symmetrically extends between 0pVbusl and VbushpVddh. Values of Vbusl ¼ Vtn and Vbush ¼ ðVddh  jVtpjÞ are chosen in this paper as well as in the diode-connected LHOS work in [10]. The LHOS signaling scheme is preferred to the HOA scheme as it works well with a simple inverter at the receiver end, without a significant static power dissipation. Design in [10], unlike most alternatives, requires no extra power supply nor a multi-threshold process. However, in addition to low-driving capability, its performance is sensitive to variations in the power supply, device parameter, and loading condition [11]. The interconnect signaling schemes employing low-swing bus drivers require suitable matching level converters at the receiver end [5,9,10]. If the receiver is not designed properly it results in excessive static power dissipation and loss of performance. The work in [5] proposes a series of level converters in the HOA signaling schemes that consume low power and are very fast. However, the level converters in [5] require two power supplies; Vddl (conveniently can be set to Vbus) and Vddh. They also require nMOS devices with two different threshold voltages, Vtnl

(lower threshold voltage) and Vtnh (high-threshold voltage). Unfortunately, there is no reported suitable low-complexity circuit design, in the literature, for the level restorer at the receiver side for the LHOS signaling schemes. This paper introduces two new low-power LHOS (lhos–lhos and lhos-db) signaling schemes with high-driving capability at the driver side and suitable matching low-power level restorer at the receiver side. The lhos-receiver is a more complex design to improve delay performance at the expense of higher energy dissipation when the wire-length is large. The db-receiver is a simple double-buffer (db). We compare our proposed schemes with two other related designs. The set of quality metrics used for the comparative evaluation are delay performance, energy dissipation, energy  delay, area and design complexity, and sensitivity to noise and process parameters. This paper is organized as follows. Section 2 presents the test platform and circuit topology for the previous two representative works we use for the comparison. Section 3 presents the circuit structure for the proposed LHOS low-swing signaling schemes. Comparative measurement results and analysis of sensitivity to noise and process parameters are reported in Section 4. Finally, Section 5 concludes the paper.

2. Test architecture and previous work

Fig. 1. Interconnect signaling scheme (a) test architecture and (b) the p wire model.

The test platform we use in this paper is shown in Fig. 1, which is the same used in [1,10]. All circuits were implemented in 0:13 mm technology from UMC. We analyzed all the signaling schemes under the identical loading condition, power supply ðVddh ¼ 1:0 VÞ, and Vswing ¼ 0:54 V. All circuits are simulated with a receiver output load capacitance of 25 fF. The interconnect is implemented in metal-3 layer, with its length varying in the range between 1 and 10 mm, and is modeled by a p3 distributed RC model (RI ¼ 300 O=mm and CI ¼ 0:23 pF=mm). An extra capacitive load CF of 0.25 pF per mm length of wire is distributed along the wire for the fanout. Figs. 2 (dc-db) and 3 (sf-lr) present the circuit topologies for the two previous works we use as the basis for comparison. The topology in Fig. 2 is the scheme reported in [10]. It contains

Fig. 2. Circuit structure for the low–high-offset symmetric (LHOS) CMOS driver–receiver dc-db, with Vddh ¼ 1:0 V, Vtn ¼ 0:21 V and jVtpj ¼ 0:25 V.

Fig. 3. Circuit structure for the high-offset asymmetric (HOA) CMOS driver–receiver sf-lr, with Vddh ¼ 1:0 V, Vbus ¼ 0:54 V, Vtn ¼ 0:21 V and jVtpj ¼ 0:25 V.

ARTICLE IN PRESS J.C. Garcı´a et al. / Microelectronics Journal 40 (2009) 1571–1581

1573

Fig. 6. Circuit structure for the high-offset asymmetric (HOA) CMOS source follower sf-driver, with Vddh ¼ 1:0 V, Vbus ¼ 0:54 V, Vtn ¼ 0:21 V and jVtpj ¼ 0:25 V.

Fig. 4. Circuit structure for the low–high-offset symmetric (LHOS) CMOS diodeconnected dc-driver, with Vddh ¼ 1:0 V, Vtn ¼ 0:21 V and jVtpj ¼ 0:25 V.

Fig. 7. Circuit structure for the high-offset asymmetric (HOA) CMOS asymmetric level restorer lr-receiver, with Vddh ¼ 1:0 V, Vbus ¼ 0:54 V, Vtn ¼ 0:21 V and jVtpj ¼ 0:25 V.

Fig. 5. Circuit structure for the CMOS double-buffer db-receiver, with Vddh ¼ 1:0 V, Vtn ¼ 0:21 V and jVtpj ¼ 0:25 V.

the LHOS diode-connected dc-driver (Fig. 4) and double-buffer db-receiver (Fig. 5). The topology in Fig. 3 is the combination of high-performance HOA style source follower sf-driver from [1] (Fig. 6) at the driver end, and the matching level restorer lr-receiver circuit from [5] (Fig. 7) at the receiver end. The magnitude of symmetric voltage swing on the interconnect for the dc-db signaling scheme is computed as Vswing ¼ ðVddh  jVtpj  VtnÞ ¼ 0:54 V. The voltage swing for the sf-lr scheme is computed as Vswing ¼ Vbus ¼ 0:54 V.

3. lhos–lhos and lhos-db driver–receiver pairs circuit structures Fig. 8 presents the circuit topology for the proposed lhos–lhos and lhos-db schemes with low-swing LHOS signaling that are, respectively, faster and less energy consuming, than the previously presented circuits. Figs. 9 and 10 shows the circuit diagrams for the proposed lhos-driver and lhos-receiver, respec-

Fig. 8. Circuit structure for the low–high-offset symmetric (LHOS) CMOS driver–receiver pairs lhos–lhos and lhos-db, with Vddh ¼ 1:0 V, Vtn ¼ 0:21 V and jVtpj ¼ 0:25 V.

tively. The magnitude of symmetric voltage swing on the interconnect for the lhos–lhos, lhos-db schemes, is identical to the dc-db driver–receiver, and is computed as Vswing ¼ ðVddh  jVtpj  VtnÞ ¼ 0:54 V. Tables 1 and 2 show the transistors sizing for all the driver and receiver circuits, respectively, when optimized for the lowest energy  delay, for 10, 5, and 1 mm wire-lengths. The channel length for all transistors is 0:13 mm. Optimization is carried out

ARTICLE IN PRESS 1574

J.C. Garcı´a et al. / Microelectronics Journal 40 (2009) 1571–1581

Fig. 9. Circuit structure for the low–high-offset symmetric (LHOS) CMOS lhos-driver, with Vddh ¼ 1:0 V, Vtn ¼ 0:21 V and jVtpj ¼ 0:25 V.

Fig. 10. Circuit structure for the low–high-offset symmetric (LHOS) CMOS lhos-receiver, with Vddh ¼ 1:0 V, Vtn ¼ 0:21 V and jVtpj ¼ 0:25 V.

ARTICLE IN PRESS J.C. Garcı´a et al. / Microelectronics Journal 40 (2009) 1571–1581

1575

Table 1 Channel widths for transistors in dc-driver, lhos-driver, and sf-driver, optimized for the lowest energy  delay for 10, 5, and 1 mm wire-lengths. dc-driver Optimized for 10/5/1 mm wire-length Total area ¼ 5:51=3:88=3:88 mm2

lhos-driver Optimized for 10/5/1 mm wire-length Total area ¼ 3:54=3:34=3:28 mm2

Transistor

Type

Width ðmmÞ

Transistor(s)

Type

Width ðmmÞ

PXIN0 NXIN0 M1 M2, M3 M4, M12 M5 M6 M7 M8 M9 M10 M11 M13 M14

P N N N N P P P P P P P N N

10.0 5.0 10.0/4.0/4.0 1.0/0.28/0.28 0.28 0.28 0.28 1.0 10.0 1.0 0.5 0.28 0.5 1.0

PXIN0 NXIN0 PXIM1, PXIM2 NXIM1, NXIM2, NXIF1 PXIF1, MD0, MD1 MU0, MU1, MD2 MU2, MU3, MU4 MD3, MD4, MU5 MD5, MU8, MU11 MU7 MD7 MD8, MD11 MU10 MD10

P N P N P N P N P N P N P N

10.0 5.0 0.28 0.28 0.28 0.28 0.28 0.28 0.28 0.5/0.5/0.28 0.5/0.5/0.28 0.28 2.5/1.8/1.8 2.5/1.8/1.8

Width ðmmÞ

Transistor(s)

Type

Width ðmmÞ

10.0 1.0/1.0/0.8 20.0/20.0/15.0

NXIN0 NXIN1 MD1

N N N

5.0 0.5/0.5/0.5 2.0/2.0/1.8

sf-driver Optimized for 10/5/1 mm wire-length Total area ¼ 5:0=5:0=4:3 mm2 Transistor Type PXIN0 PXIN1 MU1

P P N

Important transistors for the purpose of optimization are highlighted in boldface. The channel length for all transistors is 0:13 mm. Triple well 0:13 mm 1.0/3.3 V process technology from UMC.

Table 2 Channel widths for transistors in lr-receiver, lhos-receiver, and db-receiver, optimized for the lowest energy  delay for 10, 5, and 1 mm wire-lengths. lr-receiver Optimized for 10/5/1 mm wire-length Total area ¼ 1:24=1:24=1:15 mm2

lhos-receiver Optimized for 10/5/1 mm wire-length Total area ¼ 2:62=1:31=0:83 mm2

Transistor

Type

Width ðmmÞ

Transistor(s)

Type

Width ðmmÞ

M1 M2 M3 M4 M5 M6 M7 – – – – –

N N P P P P N – – – – –

1.0 4.0 1.0/1.0/0.8 0.28 0.28 1.0 2.0/2.0/1.5 – – – – –

M1 M2 M3 M4 M5, M7, M9 M6 M8 M10, M11, M14 M12 M13 M15 M16

N N P P P N N N P P N P

10.0/3.5/0.5 0.4/0.28/0.28 0.4/0.4/0.28 0.4/0.28/0.28 0.28 2.0 2.0/0.28/0.28 0.28 0.28 1.0/0.7/0.28 1.5/0.4/0.28 0.5/0.28/0.28

Width ðmmÞ

Transistor(s)

Type

Width ðmmÞ

0.8/0.5/0.28 0.8/0.5/0.5

MD1 MD2

N N

0.28 0.28

db-receiver Optimized for 10/5/1 mm wire-length Total area ¼ 0:28=0:20=0:17 mm2 Transistor Type MU1 MU2

P P

Important transistors for the purpose of optimization are highlighted in boldface. The channel length for all transistors is 0:13 mm. Triple well 0:13 mm 1.0/3.3 V process technology from UMC.

through a multi-dimensional sweep of critical device sizes, within the range of interest in the HSPICE simulation. The lowest achievable energy  delay is taken as the optimum point.

Table 3 presents the sum of the active areas for the driver and receiver for all the signaling schemes optimized for the three representative wire-lengths. When optimized for the lowest

ARTICLE IN PRESS J.C. Garcı´a et al. / Microelectronics Journal 40 (2009) 1571–1581

1576

Table 3 Total active areas for the dc-db, lhos–lhos, lhos-db, and sf-lr signaling schemes, optimized for the lowest energy  delay for 10, 5, and 1 mm wire-lengths. Process: UMC 0:13 mm

Total areas for wire-lengths

Scheme

10 mm ðmm2 Þ

5 mm ðmm2 Þ

1 mm ðmm2 Þ

dc-db lhos–lhos lhos-db sf-lr

5.79 6.16 3.82 6.24

4.08 4.65 3.54 6.24

4.05 4.12 3.46 5.45

The active area for lhos-db is the lowest for all wire-lengths in the range. Triple well 0:13 mm 1.0/3.3 V process technology from UMC.

Fig. 11. The energy  delay optimized layout for lhos-driver in UMC 0:13 mm process.

Fig. 12. The energy  delay optimized layout for lhos-receiver in UMC 0:13 mm process.

Fig. 13. The energy  delay optimized layout for db-receiver in UMC 0:13 mm process.

energy  delay for 10 mm wire-length, the total active area of lhos–lhos ð6:16 mm2 Þ is 61% and 6% more than that of lhos-db ð3:82 mm2 Þ and dc-db ð5:79 mm2 Þ, respectively, and 1% less than sf-lr ð6:24 mm2 Þ. Figs. 11, 12 and 13 are the layout for the lhos-driver, lhos-receiver and db-receiver circuits. The respective areas are 2:97 mm 16:44 mm ¼ 48:82 mm2 , 2:97 mm  17:28 mm ¼ 51:32 mm2 , and 2:16 mm  3:02 mm ¼ 6:52 mm2 . 3.1. Operation of the driver circuit The key to the design of high performance and low-energy interconnect driver is to provide a large driving capability during the transitions, while limiting the voltage swing. This requires the design of special drivers that go into high-driving mode only

during the transitions. In lhos-driver circuit in Fig. 9, the combination of transistor pairs (MU10, MU11), and (MD10, MD11) provide the diode connected paths to limit the voltage swing to ðVddh  jVtpjÞ and Vtn, respectively. On the other hand, the transistor pairs (MU7, MU10), and (MD7, MD10), provide the large driving capability during the low-to-high and high-to-low transitions, respectively. For this driver to work, however, it requires feedback. The combination of XIF1, and gates in the upper and lower arms in Fig. 9 provide the necessary feedback for the brief activation of (MU7, MU10), and (MD7, MD10) pairs during the lowto-high and high-to-low transitions, respectively. The operation of lhos-driver circuit in Fig. 9 can be explained as follows. Low state at the output, out: For the output out in the low state we have inn ¼ out ¼ low, ou1 ¼ high, and ou2 ¼ low, MU7, MU10, and MU11 off, and MU8 on. In this state, the output is driven low through the diode connected pair MD10–MD11. Low-to-high transition at the output, out: After a low-to-high transition at inn, due to delay in the feedback loop (XIF1), ou1, and ou3 will go low, and ou2 will go high briefly. This causes MU7, and consequently MU10 to turn on and strongly pull the output node out to high, to charge up the output load. The feedback loop eventually turns ou3 and ou2 to their steady state values of high and low, respectively, turning MU7 off, disabling it from driving the gate of MU10. However, transistor MU11 which was turned on after inn went high will remain on, providing a diode connected configuration (pair MU10–MU11) to maintain the output voltage at  ðVdd  jVtpjÞ. High state at the output, out: For the output in the high state we have inn ¼ out ¼ high, od1 ¼ low, and ou2 ¼ high, MD7, MD10, and MD11 off, and MD8 on. In this state, the output is driven high through the diode connected pair MU10–MU11. High-to-low transition at the output: After a high-to-low transition at inn, due to delay in the feedback loop (XIF1), od1, and od3 will go high, and od2 will go low briefly. This causes MD7, and consequently MD10 to turn on and strongly pull the output node out to low, to discharge the output load. The feedback loop eventually turns od3 and od2 to their steady state values of low and high, respectively, turning MD7 off, disabling it from driving the gate of MD10. However, transistor MD11 which was turned on after inn went low will remain on, providing a diode connected configuration (pair MD10–MD11) to maintain the output voltage at  Vtn. The optimized sizing reported in Table 1 is based on the size of the interconnect. Longer wires require larger driver transistors MU10 and MD10, which in turn require larger MU7 and MD7.

3.2. Operation of the receiver circuit The key design requirement for the receiver at the end of a long interconnect in the proposed LHOS signaling scheme is the ability to restore the logic levels back to the normal levels in both directions (Vddh and GND). The proposed bi-directional level converter requires two features to function properly. First it should disallow the flow of current back from the higher Vddh potential at the receiver side to the lower ðVdd  jVtpjÞ potential on the bus. Further, due to the symmetric nature of LHOS signaling scheme, a complementary feature is required for the flow of current from GND on the receiver side to Vtn potential on the bus. The operation of lhos-receiver circuit in Fig. 10 can be explained as follows. In the lhos-receiver circuit, the pass transistor M1 isolates the internal node 2, from the previous stage. Without it the lower potential from the previous stage causes the current to flow from

ARTICLE IN PRESS J.C. Garcı´a et al. / Microelectronics Journal 40 (2009) 1571–1581

1 V(inlhos)

V(outlhos)

800m V(outw)

Voltages (V)

the Vddh through M3 back to the driver side. With node 2 isolated, the feedback transistor M4 can pull-up the gate of M3 above the high-swing voltage level at the input Vin. The proposed lhosreceiver uses the inverter (M13–M14), and M15 transistor to reduce the output pull-down transition time. Splitting the pull-up for node 2 to M4, and M5 will help to reduce the load on node 3 and reduce energy consumption without hurting the performance [5]. Introduction of M11 and M12 will ensure that there is no static power dissipation when M2 is not fully turned off when Vin is low. Finally, lhos-receiver improves the low-to-high propagation delay through the introduction of the additional pull-up transistor M16. The energy  delay rating of this topology is similar to that of lr-receiver. Meanwhile, lhos-receiver trades 7 times higher energy consumption for a factor 7 higher speed advantage over lr-receiver. The optimized sizing reported in Table 2 is based on the size of the interconnect. Longer wires require faster receiver circuits to compensate for the longer delay on the interconnect. This is achieved by increasing the size of M1 and M3 and M4 transistors and two inverters.

1577

600m

400m V(inw)

200m

0 100n

120n Time (s)

140n

Fig. 15. Waveforms of lhos–lhos driver of Fig. 8. Node voltages V(inlhos), V(outlhos), V(inw) and V(outw) correspond to the input of lhos-driver, the output of lhos-receiver, the driver and receiver ends of the wire, respectively. Wirelength is 10 mm.

4. Comparative evaluation 4.1. Delay, energy and energy  delay evaluation

1

Voltages (V)

8 lhos_line_lhos dc_line_db sf_line_lr lhos_line_db

7 6 5 4 3 2 1 0 1

2

3

4

5 6 7 Line length (mm)

8

9

10

Fig. 16. Propagation delay time versus the wire-length for UMC 0:13 mm process. Signaling schemes lhos–lhos and lhos-db provide the best performing signaling schemes for all wire-lengths in the range. For the longer wire-lengths lhos–lhos performs significantly better than lhos-db.

V(out) lhos-lhos

800m

600m

400m

Delay (ns) vs Line length (mm) 9

Delay (ns)

4.1.1. Evaluation for 10 mm wire-length Fig. 14 presents transient timing simulation results for nodes in and out for the circuit topologies for the four signaling schemes at 10 mm wire-length. As seen the proposed lhos–lhos and lhos-db have significantly faster transitions at the out node. Fig. 15 presents transient timing simulation results for the input and output nodes in and out, and the internal nodes at the driver and the receiver ends of the wire inw and outw for the lhos–lhos circuit topology of Fig. 8. The voltage swings at inw and outw nodes are only 200 and 100 mV, respectively. The waveforms demonstrate the ability of the lhos-receiver to recover the transmitted signal in the presence of severe degradation. Fig. 16 presents the delay versus the wire-length for the four signaling schemes, when optimized for 10 mm wire-length. The performance of the proposed lhos–lhos signaling scheme, at the wire-length of 10 mm and extra fanout capacitance of 2.5 pF, is 47%, 22% and 14% better than sf-lr, dc-db, and lhos-db, respectively. The superior performance of the lhos–lhos signaling scheme is due

V(out) sf-lr V(out) lhos-db

200m

V(out) dc-db

0 100n

120n

140n

Time (s) Fig. 14. Waveforms of lhos–lhos, lhos-db, dc-db and sf-lr for wire-length of 10 mm, with Vddh ¼ 1:0 V, Vswing ¼ 0:54 V, Vtn ¼ 0:21 V, and Vtp ¼ 0:25 V.

to the fact of higher driving capability of the lhos-driver during the transitions. Fig. 17 shows the energy dissipation versus the wire-length for the four signaling schemes. The proposed lhos–lhos, at the wirelength of 10 mm and extra fanout capacitance of 2.5 pF, consumes 25%, 25% and 53% more energy than sf-lr, dc-db and lhos-db, respectively. The higher energy dissipation of the lhos–lhos is due to higher energy dissipation of the lhos-receiver. This is due to the higher node capacitance present in the lhos-receiver. Fig. 18 presents the energy  delay versus the wire-length for the four signaling schemes. The proposed lhos–lhos performs 34% and 2% better than sf-lr, and dc-db, respectively, and 32% worse than lhos-db, at the wire-length of 10 mm and extra fanout capacitance of 2.5 pF. The components of delay, energy consumption and energy  delay in the four signaling schemes, are enumerated in Table 4. The component of energy dissipation at driver side is solely due to self-loading effects. Further, it should be noted that our HSPICE

ARTICLE IN PRESS J.C. Garcı´a et al. / Microelectronics Journal 40 (2009) 1571–1581

1578

Again these values are obtained with the driver–receiver circuits optimized for a 10 mm wire-length. Reoptimization for lowest energy  delay point for 1 mm wire-length results in improved energy  delay performance of lhos–lhos. It outperforms dc-db by 32%. Its loss to lhos-db, and sf-lr is reduced to 6%, and 67%, respectively. The reoptimized entries for the four signaling schemes in Table 4 are marked as 1 . From the data in Table 4 it is evident that the reoptimization for the lowest energy  delay point for 5 and 1 mm wire-lengths trades lower energy dissipation for the higher propagation delay. One significant benefit of reoptimization is reduction in silicon area for the driver–receiver circuits as shown in Table 3. The reduction in the total area of driver–receiver circuits is most noticeable for the lhos–lhos where an area saving of 33% is obtained.

Energy (pJ) vs Line length (mm) 1.6 lhos_line_lhos dc_line_db sf_line_lr lhos_line_db

1.4

Energy (pJ)

1.2 1 0.8 0.6 0.4 0.2 1

2

3

4

5 6 7 Line length (mm)

8

9

10 4.2. A discussion on the wire-length regions of interest

Fig. 17. Energy consumption versus the wire-length for UMC 0:13 mm process. While lhos–lhos has significantly higher energy dissipation, the energy rating of lhos-db is significantly lower for the longer wire-lengths.

From an analysis of the data in Table 4 the following set of observations can be made.

 Short wire-length: For the short wire-length of less than 1 mm,

Energy Delay Product (pJxns) vs Line length (mm)

Energy Delay Product (pJxns)

12 lhos_line_lhos dc_line_db sf_line_lr lhos_line_db

10 8

 6 4 2 0

1

2

3

4

5 6 7 Line length (mm)

8

9

10

Fig. 18. Energy  delay versus the wire-length for UMC 0:13 mm process. Signaling scheme lhos-db provides for the lowest energy  delay values among the signaling schemes for all wire-lengths in the range.

simulations have shown that the contribution of the static components to the overall energy consumptions is insignificantly small ðo0:1%Þ. 4.1.2. Evaluation for 5 mm wire-length When wire-length is reduced to 5 mm the energy  delay advantage of lhos–lhos with respect to dc-db remains the same. However, it looses to sf-lr, and lhos-db by 7% and 49%, respectively. Note that these values are obtained with the driver–receiver circuits optimized for a 10 mm wire-length. Reoptimization for lowest energy  delay point for 5 mm wire-length results in improved energy  delay performance of lhos–lhos where it outperforms dc-db, and sf-lr by 12%, and 5%, respectively. It still looses to lhos-db, but the loss is reduced from 49% to 33%. The reoptimized entries for the four signaling schemes in Table 4 are marked as 5 . 4.1.3. Evaluation for 1 mm wire-length When wire-length is further reduced to 1 mm the energy  delay advantage of lhos–lhos with respect to dc-db improves to 27%. It looses to lhos-db by a smaller margin of 41%. However, its loss with respect to sf-lr is very significant standing at 121%.



sf-lr scheme performs better than the other schemes with respect to delay, energy consumption and energy  delay. Therefore, sf-lr is a more appropriate signaling scheme at the shorter wire-lengths. It should, however, be noted that the area requirement for sf-lr is 58% more that the next best option lhos-db. Further, sf-lr scheme requires two power supplies. Medium wire-length: For the medium wire-length of around 5 mm, the delay performance of the proposed lhos–lhos is the best among all the signaling schemes. However, it is only 2% better than the other proposed scheme of lhos-db. However, the energy consumption and energy  delay of the proposed lhos-db are better than the other signaling schemes. Therefore, the proposed lhos-db is the appropriate signaling scheme at the medium wire-lengths. Further, the area requirement for lhosdb is 24% less than lhos–lhos. Long wire-length: For the longer wire-length of around 10 mm, lhos–lhos is, significantly, better than the other signaling schemes with respect to delay performance. However, lhos-db is an evident winner, by a significant margin, in terms of both energy consumption and energy  delay. So if the delay performance is the only design criterion then lhos–lhos is the most suitable choice. However, if the energy dissipation and energy  delay are the design criteria then lhos-db is the most suitable scheme. Considering that lhos–lhos performs only 5% better than lhos-db, in terms of propagation delay, overall lhosdb should be considered as the most suitable signaling scheme. This is specially true considering that the area requirement for lhos-db is also 38% less than lhos–lhos.

4.3. Reliability analysis Similar to works in [1,10], we have considered reliability degradation due to process variations, voltage supply noise and interline crosstalk using the worst case method presented in [4]. The sources of noise and the associated parameters used for the UMC 0:13 mm process is presented in Table 5. In Table 6 the values of signal-to-noise ratio ðSNRÞ are assessed as SNR ¼

0:5  Vswing , V Noise

ARTICLE IN PRESS J.C. Garcı´a et al. / Microelectronics Journal 40 (2009) 1571–1581

1579

Table 4 Breakdown of energy, delay and energy  delay versus wire-length for dc-db, lhos–lhos, lhos-db, and sf-lr (Vddh ¼ 1:0 V, Vtn ¼ 0:21 V and jVtpj ¼ 0:25 V). dc-wire-db Length (mm)

1.0 1:0 5.0 5:0 10.0

Energy (pJ)

Delay (ns)

dc

Line

db

Total

dc

Line

db

Total

0.05 0.05 0.05 0.03 0.05

0.33 0.24 0.81 0.57 0.96

0.05 0.03 0.15 0.16 0.25

0.43 0.32 1.01 0.76 1.26

0.13 0.15 0.13 0.10 0.13

0.42 0.43 2.18 3.00 4.76

0.27 0.30 0.41 0.46 0.55

0.82 0.88 2.72 3.56 5.44

Energy  delay ðpJ  nsÞ

1.0 1:0 5.0 5:0 10.0

dc

Line

db

Total

0.01 0.01 0.01 0.00 0.01

0.14 0.10 1.76 1.71 4.56

0.01 0.01 0.06 0.07 0.14

0.35 0.28 2.75 2.71 6.85

lhos-wire-lhos Energy (pJ)

Delay (ns)

lhos-dri

Line

lhos-rec

Total

lhos-dri

Line

lhos-rec

Total

1.0 1:0 5.0 5:0

0.05 0.05 0.05 0.05

0.28 0.23 0.67 0.58

0.10 0.06 0.51 0.42

0.43 0.34 1.23 1.05

0.10 0.10 0.10 0.09

0.30 0.25 1.64 1.90

0.20 0.22 0.45 0.28

0.60 0.57 2.19 2.27

10.0

0.05 0.71 Energy  delay ðpJ  nsÞ

0.82

1.58

0.10

3.58

0.57

4.25

lhos-dri

Line

lhos-rec

Total

0.05 0.01 0.05 0.00 0.05

0.08 0.06 1.09 1.10 2.53

0.02 0.01 0.23 0.12 0.47

0.26 0.19 2.69 2.38 6.72

1.0 1:0 5.0 5:0 10.0 lhos-wire-db

Energy (pJ)

1.0 1:0 5.0 5:0 10.0

Delay (ns)

lhos-dri

Line

db

Total

lhos-dri

Line

db

Total

0.05 0.05 0.05 0.05 0.05

0.22 0.22 0.61 0.54 0.67

0.05 0.04 0.21 0.18 0.31

0.32 0.31 0.87 0.77 1.03

0.10 0.10 0.10 0.09 0.10

0.26 0.25 1.67 2.23 4.58

0.21 0.25 0.31 0.00 0.27

0.57 0.60 2.08 2.32 4.95

Energy  delay ðpJ  nsÞ

1.0 1:0 5.0 5:0 10.0

lhos-dri

Line

db

Total

0.05 0.01 0.05 0.00 0.05

0.06 0.05 1.09 1.20 3.05

0.01 0.01 0.07 0.00 0.08

0.18 0.18 1.81 1.79 5.10

sf-wire-lr Energy (pJ)

1.0 1:0 5.0 5:0 10.0

Delay (ns)

sf

Line

lr

Total

sf

Line

lr

Total

0.05 0.04 0.05 0.05 0.05

0.13 0.13 0.70 0.70 1.09

0.04 0.04 0.07 0.07 0.12

0.22 0.21 0.82 0.82 1.26

0.08 0.08 0.08 0.08 0.08

0.14 0.20 1.55 1.55 4.05

0.31 0.29 1.44 1.44 3.96

0.53 0.56 3.07 3.07 8.09

ARTICLE IN PRESS J.C. Garcı´a et al. / Microelectronics Journal 40 (2009) 1571–1581

1580

Table 4 (continued ) dc-wire-db Length (mm)

Energy (pJ) dc

Delay (ns) Line

db

Total

dc

Line

db

Total

Energy  delay ðpJ  nsÞ

1.0 1:0 5.0 5:0 10.0

sf

Line

lr

Total

0.00 0.00 0.00 0.00 0.00

0.02 0.03 1.09 1.09 4.42

0.01 0.01 0.10 0.10 0.48

0.12 0.12 2.52 2.52 10.19

Triple well 0:13 mm 1.0/3.3 V process technology from UMC. Entries 1 and 5 correspond to the driver–receiver circuits optimized for the minimum energy  delay, for 1 mm and 5 mm wire-lengths. All other entries are for 10 mm optimized circuits.

Table 5 Noise sources and parameters.

Table 6 Noise analysis.

Parameter

Definition

Process: UMC 0:13 mm

KC

Crosstalk coupling coefficient for metal 3 CC KC ¼ ¼ 0:09 CW þ CL þ CC C W ¼ 0:23 pF=mm for 0:23 mm width, C L ¼ 0:25 pF=mm C C ¼ 0:05 pF=mm for 0:4 mm spacing Static driver crosstalk noise attenuation AttnC ¼ 20% Power supply noise due to interconnect signal switching K PS ¼ 5%

Driver–receiver circuit Vddh ¼ 1:0 V, Vbus ¼ 0:54 V Vtn ¼ 0:21 V and jVtpj ¼ 0:25 V

Parameter

lhos-db

lhos–lhos

dc-db

sf-lr

Units

Vswing KC AttnC K PS KN K N  Vswing Rx O Rx S PS AttnPS Tx O V UR VN SNR

0.54 0.09 0.20 0.05 0.07 0.04 0.06 0.03 0.05 0.39 0.02 0.13 0.17 1.61

0.54 0.09 0.20 0.05 0.07 0.04 0.07 0.03 0.05 0.46 0.02 0.14 0.18 1.49

0.54 0.09 0.20 0.05 0.07 0.04 0.06 0.03 0.05 0.39 0.02 0.13 0.17 1.61

0.54 0.09 0.20 0.05 0.07 0.04 0.04 0.02 0.05 0.45 0.02 0.10 0.14 1.92

V – – – – V V V V – V V V –

AttnC K PS

K N ¼ AttnC  K C þ K PS Rx O Rx S PS AttnPS

Tx O

Receiver input offset Receiver sensitivity Signal unrelated power supply noise PS ¼ 0:05  Vddh Power supply attenuation coefficient DV th AttnPS ¼ DVddh V th ¼ receiver gate switching threshold Driver offset

V UR ¼ Rx O þ Rx S þ AttnPS  PS þ Tx O V Noise ¼ K N Vswing þ V UR

where V Noise is the total noise introduced in the interconnect, estimated as K N Vswing þ V UR . In this case, the term K N Vswing accounts for the components of noise that are proportional to the amplitude of the voltage swing on the interconnect Vswing ¼ 0:54 V, and V UR represents the components of the noise that are unrelated to Vswing [4]. The SNR ratings for all the four signaling schemes are very similar to each other, to within 19%.

of the interconnect. The respective threshold voltages for nMOS and pMOS transistors for the fast–fast corner are the Vtnff ¼ 0:19 V, jVtpjff ¼ 0:23 V. The corresponding threshold voltage values for the slow–slow corner are Vtnss ¼ 0:24 V, jVtpjss ¼ 0:27 V. From the simulation results the following observations can be made.

 The combined effect of the deviation from typical–typical

4.4. Design corners sensitivity analysis The signaling schemes discussed in this paper were analyzed using the parameters from the typical–typical design corners. In order to measure the sensitivity of the signaling schemes to process variations across the interconnect, we subjected the drivers and receivers circuits to extreme design corners. We tried two cases (1) slow–slow corner at the driver side and fast–fast corner at the receiver side of the interconnect, and (2) fast–fast corner at the driver side and slow–slow corner at the receiver side



corner at driver and receiver sides, exhibit deterioration in performance in energy, delay and energy  delay characteristics. One notable exception is lhos–lhos scheme, where for the slow–slow driver and fast–fast receiver improvements in the delay and energy  delay performance are observed. The reason for the anomaly is that the lhos-receiver is relatively slow and will benefit from operating in the fast–fast corner to a larger extend. For all signaling schemes the overall performance (energy, delay and energy  delay) of slow–slow driver, fast–fast receiver is much better than the fast–fast driver and slow–slow receiver. This indicates that signaling schemes show much more sensitivity to the variation in process parameters at the receiver end than the driver end of the interconnect. As a matter of fact the sf-lr signaling scheme fails to function for the

ARTICLE IN PRESS J.C. Garcı´a et al. / Microelectronics Journal 40 (2009) 1571–1581

fast–fast driver and slow–slow receiver case. One exception where the sensitivity to the variation in process parameters at the driver end is more than the receiver end is the case of dc-db signaling scheme.

5. Conclusions Two new LHOS signaling schemes were designed in this paper. The proposed lhos–lhos and lhos-db require no reference voltages, nor multiple threshold voltage processes. The proposed schemes perform better than the other schemes in terms of delay, energy and energy  delay for moderate to long wire-lengths. Comparative analysis of sensitivity to noise and process parameters shows that the proposed LHOS signaling schemes perform better or as good as other signaling schemes. References [1] H. Zhang, V. George, J.M. Rabaey, Low-swing on-chip signaling techniques: effectiveness and robustness, IEEE Trans. VLSI Syst. 8 (3) (2000) 264–272.

1581

[2] E. Kusse, J.M. Rabaey, Low-energy embedded FPGA structures, in: International Symposium on Low Power Electronics and Design, August 1998, Monterey, CA, USA, pp. 155–160. [3] J. Rabaey, A. Chandarkasan, B. Nikolic, Digital Integrated Circuits, PrenticeHall, Englewood Cliffs, NJ, 2003. [4] W. Dally, J. Poulton, Digital Systems Engineering, Cambridge University Press, Cambridge, 1998. [5] S.H. Kulkarni, D. Sylvester, High performance level conversion for dual V DD design, IEEE Trans. on VLSI Syst. 12 (9) (2004) 926–936. [6] A. Narasimhan, M. Kasotiya, R. Sridhar, A low-swing differential signaling scheme for on-chip global interconnects, in: IEEE International Conference on VLSI Design, January 2005, Kolkata, India, pp. 634–639. [7] J.C. Garcı´a, J.A. Montiel-Nelson, S. Nooshabadi, Bootstrapped full-swing CMOS driver for low supply voltage operation, in: Design, Automation and Test in Europe Conference and Exhibition, vol. 1, March 2006, Munich, Germany, pp. 1–2. [8] J.C. Garcı´a, J.A. Montiel-Nelson, J. Sosa, H. Navarro, A direct bootstrapped CMOS large capacitive-load driver circuit, in: Design, Automation and Test in Europe Conference and Exhibition, vol. 1, February 2004, Paris, France, pp. 680–681. [9] A. Rjoub, O. Koufopavlou, Efficient drivers, receivers and repeaters for low power CMOS bus architectures, IEEE International Conference on Electronics, Circuits and Systems, vol. 2, September 1999, Pafos, Cyprus, pp. 789–794. [10] M. Ferretti, P.A. Beerel, Low swing signaling using a dynamic diode-connected driver, Solid-State Circuits Conference, September 2001, Villach, Austria, pp. 369–372. [11] J.C. Garcı´a, J.A. Montiel-Nelson, S. Nooshabadi, Adaptive low/high voltage swing CMOS driver for on-chip interconnects, in: International Symposium on Circuits and Systems, May 2007, New Orleans, USA.