Physica C 378–381 (2002) 1475–1480 www.elsevier.com/locate/physc
Design and component test of RSFQ packet decoders for shift register memories K. Fujiwara, H. Hoshina, J. Koshiyama, N. Yoshikawa
*
Department of Electrical and Computer Engineering, Faculty of Engineering, Yokohama National University, Tokiwadai 79-5, Hodogaya, Yokohama 240-8501, Japan Received 27 September 2001; accepted 22 November 2001
Abstract We show a design framework of shift register memories, which is usable for the high-speed register file of the RSFQ microprocessor. The proposed shift register memory consists of an array of shift registers and a packet decoder that switches a high-speed serial data stream into the destined shift register. A target clock frequency is 20 GHz assuming 1 kA/cm2 Nb standard process. A concept of data-drive self-timing (DDST) is employed to reduce the timing difficulty in the synchronized RSFQ circuit. In this paper we also show the design details of the DDST RSFQ packet decoder, which is composed of one-to-two DDST RSFQ packet switches. A D3 flip-flop, a main building element of the one-to-two DDST RSFQ packet switch, is newly developed as a non-destructive memory cell. The low speed test shows that the DC bias margin of the D3 flip-flop is 32%. We have also estimated the latency and the circuit area of the DDST packet decoder. Ó 2002 Elsevier Science B.V. All rights reserved. PACS: 85.25.Hv; 85.25.Na; 85.40.Bh Keywords: Superconducting device; RSFQ circuits; SFQ; Shift register; Memory; Data-drive self-timing
1. Introduction The lack of a high-density and high-speed memory is a serious impediment for realization of large-scale RSFQ digital systems [1]. A shift register memory is one candidate to overcome this drawback because of its high throughput and compact circuit structure. We have proposed that a shift register memory can be employed for
*
Corresponding author. Tel.: +81-45-339-4259; fax: +81-45338-1157. E-mail address:
[email protected] (N. Yoshikawa).
the main memory of the high-end SFQ server to achieve large processor-memory bandwidth [2]. The SFQ shift register memory has following advantages: (i) Low-power and DC powered operation. (ii) Simple and compact memory cell structure. (iii) High throughput due to high frequency operation of the RSFQ shift register. (iv) Short access time due to small delay of the packet decoder composed of a binary tree of the packet switches. In this paper, we will show a design framework of the shift register memory. The shift register memory proposed here is composed of an array of shift registers and a packet decoder that switches
0921-4534/02/$ - see front matter Ó 2002 Elsevier Science B.V. All rights reserved. PII: S 0 9 2 1 - 4 5 3 4 ( 0 2 ) 0 1 7 6 0 - 4
1476
K. Fujiwara et al. / Physica C 378–381 (2002) 1475–1480
the input data stream into an appropriate shift register selected by address data. Once the address is selected, the serial data stream goes to the same destined shift register. Address data are also distributed along the input data to improve the access time. In order to reduce the difficulty in the timing design in synchronize RSFQ circuits, we use a concept of the data-driven self-timing (DDST) [3] all over the design of the shift register memory system. We will also shows the design details of one of the key components, a DDST packet decoder, which is composed of one-to-two DDST packet switches. We have estimated the access time and the circuit area of the one-to-two DDST packet switch assuming the present and future Nb integrated circuit technology. A D3 flip-flop, which is a main building block of the one-to-two DDST packet switch, is newly developed as a nondestructive memory cell. Low speed test results of the D3 flip-flop will be shown at the end of the paper.
2. Shift register memory system Fig. 1 shows a block diagram of the shift register memory system. The system is composed of an input buffer, a local clock generator, a DDST
packet decoder, an array of DDST shift registers, an output SFQ gate and a completion detector. All the circuit component is designed based on the DDST concept [3], where each module has dualrail inputs and outputs. The internal clock for the each module is generated from the dual-rail input data themselves in the DDST circuit. In the write operation, the input data are at first stored in the input buffer, which is used to synchronize the clock rate difference between the memory and the outer system. When write trigger is inputted, high-speed clock pulses is sent to the input buffer and pushes the data in the buffer into the DDST packet decoder. The decoder switches the input serial data into an appropriate DDST shift register determined by the address data. Though the data in the shift register is also pushed out from the shift register at the same time, they are not outputted from the memory because the output SFQ gate is in the off state during this operation. In the read operation, the read trigger is sent to both the local clock generator and the output SFQ gate. It generates the clock pulse and sets the output SFQ gate into the on state. Then, highspeed clock pulses sent to the decoder push the data in the appropriate shift register selected by the decoder. Because the output SFQ gate is in the
Fig. 1. A block diagram of the shift register memory system.
K. Fujiwara et al. / Physica C 378–381 (2002) 1475–1480
on state in the read operation, the data are outputted this time. The output data have to be sent back to the input buffer because of destructive operation of the shift register. The completion detector detects the end of the output serial data and shut down the output SFQ gate. At the same time it generates the high-speed clock to send back the data in the input buffer into the original shift register.
3. Design of DDST packet decoder Fig. 2 shows a block diagram of the one-tosixteen DDST packet decoder, which is a binarytree-shaped array of the one-to-two DDST packet switches. The address data are also distributed in the same way to reduce the address setup time.
1477
Fig. 3(a) shows a circuit diagram of the oneto-two DDST packet switch, which is composed of four non-destructive memory cells. We have employed a newly developed D3 flip-flop as a nondestructive memory cell, which is a modified version of the D2 flip-flop [4]. Fig. 4(a) shows a circuit diagram of the D3 flop-flop. It has Data, Data and Clk input terminals, and an Out terminal. When an internal state of the D3 flip-flop is ‘‘1’’ (corresponding to the counterclockwise current in the storage loop J2 –J1 –Ls –J4 ), it outputs an SFQ pulse by an incidence of the Clk input, but the internal state is not reset. Its internal state is only reset by an incidence of the Date pulse. A symbol and Moore diagram of the D3 flip-flop is shown in Fig. 4(b) and (c). Optimized D3 flip-flop has DC bias margins ranging from )36% to þ34% at 20 GHz and from )38% to þ34% at 10 GHz.
Fig. 2. A block diagram of the one-to-sixteen DDST packet decoder.
1478
K. Fujiwara et al. / Physica C 378–381 (2002) 1475–1480
Fig. 3. (a) A circuit diagram of the one-to-two DDST packet switch. (b) A simulation results of the packet switch operating at 20 GHz.
Fig. 4. (a) A circuit schematic of the D3 flip-flop. (b) A symbol of the D3 flip-flop. (c) Moore diagram of the D3 flip-flop.
The one-to-two DDST packet switch switches the input data packet into one of the two outputs
depending on its internal state, i.e.‘‘0’’ or ‘‘1’’. Its internal state is determined by the dual-rail inputs
K. Fujiwara et al. / Physica C 378–381 (2002) 1475–1480
1479
Table 1 Estimation of the latency and the cell area of the packet decoder (assuming Nb 1 and 16 kA/cm2 process) Latency 2 bit decoder 8 bit decoder 64 bit decoder 512 bit decoder
Cell area
1 kA/cm2
16 kA/cm2
1 kA/cm2
16 kA/cm2
30 ps 150 ps 400 ps 1054 ps
8 ps 40 ps 91 ps 203 ps
1300 lm 650 lm 5:2 mm 3:2 mm 15:6 mm 9:6 mm 46:8 mm 28:8 mm
87 lm 43 lm 347 lm 213 lm 1:0 mm 0:64 mm 3:1 mm 1:9 mm
S and, S and maintained even after the data are outputted because of non-destructive nature of the D3 flip-flop. Fig. 3(b) is a simulation result of the packet switch by Jsim operating at 20 GHz, where we assume the Nb standard process with the critical current density of 1 kA/cm2 . The latency is estimated to be 30 ps. The DC bias margin is found to range from )29% to þ34% at 20 GHz and from )36% to þ34% at 10 GHz. We have estimated the latency and the cell area of the DDST packet decoder. Table 1 summarizes the estimation of various size of the decoder, where we assume two process technologies: one is the critical current density of 1 kA/cm2 and the other is 16 kA/cm2 , which will be available in the near future. In the estimation, 8-bit system is constructed by connecting the 2-bit system using Josephson transmission lines, and 64-bit and 512bit systems are made of the 8-bit system by using microstrip lines for wiring. Fig. 5. Photograph of the standard D3 flip-flop cell implemented using Hypres process (cell size: 240 lm 195 lm).
4. Test results of the D3 flip-flop 5. Conclusions We have implemented and tested the D3 flipflop cell. Fig. 5 shows a photograph of the D3 flipflop cell implemented using the Hypres Nb 1 kA/ cm2 process. A low speed test result is shown in Fig. 6, where rising edges in the input signals correspond to the input of SFQ pulses and transitions in the output signal are the output of SFQ pulses. It is found from the figure that when the internal state of the D3 flip-flop is set to ‘‘1’’, the Out signals are induced continuously by incidence of the Clk signal, whereas they are turned off by setting the flip-flop into ‘‘0’’ state. Tested DC bias margin is found to be 32% that is almost equal to the theoretical evaluation.
We have shown the basic design of the SFQ shift register memory system. We employed the DDST concept to simplify the timing design. We have also shown that design details of the DDST packet decoder, which is composed of a binarytree of the one-to-two DDST packet switches. We have estimated the performance of the packet decoder. When we assume the 16 kA/cm2 Nb process, the latency and the circuit area of the 512 bit packet switch are estimated to be 203 ps and 3:1 mm 1:9 mm, respectively. Low speed test results of D3 flip-flop, which is a main building element of the packet switch, have indicated that
1480
K. Fujiwara et al. / Physica C 378–381 (2002) 1475–1480
Fig. 6. A low speed test result of the D3 flip-flop.
its DC bias margin rages from )32% to þ30%, which is almost equal to the simulation results. Acknowledgements A part of this work was performed through Special Coordination Funds for promoting Science and Technology of the MEXT.
References [1] K.K. Likharev, V.K. Semenov, IEEE Trans. Appl. Supercond. 1 (1992) 1. [2] A. Fujimaki, Y. Takai, N. Yoshikawa, IEICE Trans. Electron. E85-C (2002) 612. [3] Z.J. Deng, N. Yoshikawa, S.R. Whiteley, T. Van Duzer, IEEE Trans. Appl. Supercond. 9 (1999) 7. [4] N. Yoshikawa, H. Tago, K. Yoneyama, IEEE Trans. Appl. Supercond. 9 (1999) 3161.