Design and component test of a 1-bit RSFQ microprocessor

Design and component test of a 1-bit RSFQ microprocessor

Physica C 378–381 (2002) 1454–1460 www.elsevier.com/locate/physc Design and component test of a 1-bit RSFQ microprocessor N. Yoshikawa *, F. Matsuzak...

754KB Sizes 1 Downloads 45 Views

Physica C 378–381 (2002) 1454–1460 www.elsevier.com/locate/physc

Design and component test of a 1-bit RSFQ microprocessor N. Yoshikawa *, F. Matsuzaki, N. Nakajima, K. Yoda Department of Electrical and Computer Engineering, Faculty of Engineering, Yokohama National University, Tokiwadai 79-5, Hodogaya, Yokohama 240-8501, Japan Received 27 September 2001; accepted 19 November 2001

Abstract We have designed a 1-bit rapid single flux quantum microprocessor based on simple architecture. The target local clock frequency is 10 GHz. The microprocessor consists of a 1-bit ALU, two 8-bit resistors, a program counter and a state controller. In order to reduce the complexity of the system and increase the clock frequency, a width of the data bus is reduced to 1-bit and the distributed local clock architecture is employed. Though the instruction set comprises only six operations, it includes all the basic operation required in general purpose computing. The circuit design of the microprocessor has carried out by using a binary decision diagram and a cell-based design methodology with the aid of top-down CAD environment. One of the important components of the microprocessor, a 1-bit ALU, which contains 730 Josephson junctions, has been implemented using the 1 kA/cm2 Nb process and its successful operation is confirmed at low speed. Ó 2002 Elsevier Science B.V. All rights reserved. PACS: 85.25.Hv; 85.25.Na; 85.40.Bh Keywords: Superconducting device; RSFQ circuits; SFQ; BDD; Microprocessor; ALU

1. Introduction Rapid single flux quantum (RSFQ) integrated circuit systems [1] have a potentially high performance beyond the semiconductor system in terms of their high operation frequency and extremely low power consumption. We have designed an 8-bit microprocessor [2,3] with the simple architecture in order to show the effectiveness of our cell-based design methodology based on the binary decision diagram (BDD) [4]. We were also able to

*

Corresponding author. Tel.: +81-45-339-4259; fax: +81-45338-1157. E-mail address: [email protected] (N. Yoshikawa).

extract a lot of information from the project that will appears when we design general purpose computing systems. The designed microprocessor contains about 7800 Josephson junctions and occupies the layout area of 10,500 lm  7500 lm. Though it is found that the RSFQ system is superior compared with the CMOS system with the same architecture in terms of the operating speed, its circuit size is too large to put the whole layout on a single chip. Besides, the clock frequency is not so high against expectation because we use simple ripple carry adder architecture for the ALU design, which limit the critical speed of the system. In our next design of the RSFQ microprocessor, we have employed a bit serial architecture, where

0921-4534/02/$ - see front matter Ó 2002 Elsevier Science B.V. All rights reserved. PII: S 0 9 2 1 - 4 5 3 4 ( 0 2 ) 0 1 7 5 6 - 2

N. Yoshikawa et al. / Physica C 378–381 (2002) 1454–1460

1-bit data bus and a 1-bit ALU are used to decrease the circuit area and to increase the clock frequency. We also use a distributed local clock architecture (DLCA), where an on-chip clock generator is put beside each register to generate serial output data streams at high clock frequency. The target local clock frequency is 10 GHz assuming 1 kA/cm2 Nb standard process. In this paper we will show the design detail of our 1-bit RSFQ microprocessor. We have also implemented one of the important components of the microprocessor, 1-bit ALU. The successful low-speed test results will be shown at the end of the paper.

2. Design methodology In this section we briefly introduce our cellbased design methodology based on the BDD for the help of understanding. Fig. 1(a) is a BDD representation of a logic function f ¼ x1 x2 þ x3 . The BDD consists of binary switches having one input (root) and two outputs (branches 0 and 1). In the BDD the SFQ pulse entering to the root of each binary switch is switched into one of the two

1455

Fig. 1. (a) A binary decision diagram. (b) Symbols of the basic cells in the BDD RSFQ standard cell library.

outputs depending on the internal state. The boxes denoted by ‘‘0’’ or ‘‘1’’ correspond to the results of the calculation. The binary switch can be implemented by using a D2 flip-flop, where its internal state is set by a dual rail input. Therefore the BDD RSFQ circuits are dual rail and data-driven selftimed (DDST) logic [5]. They do not need any global clock and precise timing between clocks and data. The only timing restriction is that the all the internal states of the binary switches have to be set before the trigger pulse coming in. In the cell-based design approach, circuits are constructed by connecting the tile-shaped basic

Fig. 2. A block diagram of the 1-bit BDD RSFQ microprocessor.

1456

N. Yoshikawa et al. / Physica C 378–381 (2002) 1454–1460

cells [6]. Fig. 1(b) shows symbols of the basic cells of our BDD RSFQ standard cell library. In the figure, the Bina corresponds to the cell that has a functionality of the binary switch. The internal state of the Bina is set by an SFQ input from ‘‘s0’’ or ‘‘s1’’. When an SFQ pulse is inputted from ‘‘r’’, it is switched into ‘‘b0’’ or ‘‘b1’’ depending on its internal state. The other small cells, line, cross, fork and join, are used for the interconnection. We have also developed a top-down CAD environment. Our top-down design flow consists of five steps: logic synthesis by the BDD, schematic

Table 1 Instruction set of 1-bit BDD RSFQ microprocessor Instruction example

Instruction name

Meaning

HLT ADD R1 LDA R2 STA R3 SKP JMP R4

Halt Add Load Store Skip if zero Jump

Stop ACC ACC þ RAM[R1] ACC RAM[R2] RAM½R3 ACC If (ACC ¼¼ 0) PC PC þ 1 PC R4

view entry, logic simulation by Verilog-XL, circuit simulation by Jsim, and layout extraction from the

Fig. 3. A circuit schematic of the 8-bit accumulator.

Fig. 4. A circuit schematic of the 1-bit ALU.

N. Yoshikawa et al. / Physica C 378–381 (2002) 1454–1460

1457

Fig. 5. A mask layout of the 1-bit ALU.

schematic view. All these processes are automated on the Cadence CAD environment [4].

Table 1. Here, we will show design details of some important components. 3.1. Accumulator

3. Design of 1-bit microprocessor A block diagram of the 1-bit microprocessor is shown in Fig. 2. It consists of an accumulator (ACC), an instruction register, a 1-bit ALU, a 5bit program counter (PC), a 32-Byte RAM and a state controller. Its basic organization is similar to that of our previous 8-bit microprocessor [2,3] except that the data bus width is reduced to 1-bit and the accumulator and the RAM have its own local clock generators to generate high-speed serial data. An 8-bit instruction is composed of a 3-bit opecode and a 5-bit operand. The number of the instruction set is six and address space is 32-Byte. The instruction set and its meaning is listed in

Because we utilize the bit serial architecture for the data bus, the data entering into and going out from the register are 8-bit serial data. It seems difficult to make a globally synchronous timing design all over the system at high clock frequency beyond 10 GHz, we therefore employed a DLCA in the 1-bit microprocessor design. In the DLCA, each register has its own local clock generator. Fig. 3 shows a circuit schematic of the 8-bit accumulator, which is composed of an 8-bit clock generator and an 8-bit DDST shift register. In operation, its internal data are rewritten every time when dual rail data enter to the accumulator, during this time no data is outputted from the

1458

N. Yoshikawa et al. / Physica C 378–381 (2002) 1454–1460

Fig. 6. A circuit schematic of the 1-bit RSFQ microprocessor.

accumulator. When ACC_trg is applied, the clock generator generates 8-bit clock pulses at 10 GHz, which push the data out from the accumulator. 3.2. ALU Similarly to the previous design [3], the 1-bit ALU has four operations: ADD, Data_through, ACC_through, and Zero_check. Fig. 4 is the circuit schematic of the 1-bit ALU, which consists of a full adder, OR gate for the Zero_check, and a multiplexer. The followings are the main difference of the ALU design from our previous 8-bit ALU: (i) The multiplexer is composed of an array of D3 flip-flops [7], and its multiplex operation

is not destructive, i.e. once the selection of the data path is established, it is maintained until a new selection is set up. (ii) The 1-bit full adder has an internal feedback loop from the carry output to the carry input to perform a bit serial add operation. (iii) The OR gate also has a feedback loop for a bit serial Zero_check operation. The functional simulations by Verilog-XL indicate that the maximum operation frequency of the ALU is 6.25 GHz, which is limited by the add operation. We believe that the maximum frequency can be increased more than 10 GHz by modifying the internal timing of the adder. Fig. 5 shows a layout of the 1-bit ALU. Its area is 2860 lm  2860 lm.

N. Yoshikawa et al. / Physica C 378–381 (2002) 1454–1460

1459

3.3. Microprocessor

4. Low-speed test of 1-bit ALU

We have also designed the other components, and put them together. Fig. 6 shows a circuit schematic of the 1-bit microprocessor, where the RAM is not included in the figure. The microstriplines with width of 16 lm are used for the interconnection of each component to decrease the latency and the circuit area. The microprocessor contains 5,118 Josephson junction and occupies the area of 7300 lm  7000 lm. These numbers are much smaller than those of our previous microprocessor design [3]. Fig. 7 shows the functional simulation results of the microprocessor. The figure demonstrates the STA operation. The microprocessor has the five phases determined by the external clock pulses: instruction fetch, PC increment, operand fetch, ALU operation, and memory operation. The figure demonstrates that the microprocessor operates correctly at local clock frequency of 6.25 GHz and at system cycle frequency of 670 MHz.

We have tested the 1-bit ALU at low speed and conformed their successful operation. The tested 1bit ALU has almost the same structure as the circuit described in Fig. 4, except that the ACC_through operation are replaced by the AND operation. The total junction number is 730 junctions. The circuit is implemented using the Hypres 1 kA/cm2 Nb standard process. Table 2 summarizes the tested DC bias margins for each operation. One can see that the DC bias margins larger than 12% are obtained except the ADD operation. Table 2 Low-speed test results of 1-bit ALU ALU operation Data_through AND OR ADD

DC bias margin Testing (%)

Simulation (%)

14.3 14.3 12.3 3.4

32 32 32 32

Fig. 7. Functional simulation of the 1-bit microprocessor. The figure shows a sequence of the STA operation: instruction fetch, increment PC, operand fetch, ALU operation (STA), and memory operation. The local clock frequency is 6.25 GHz and system cycle frequency is 670 MHz.

1460

N. Yoshikawa et al. / Physica C 378–381 (2002) 1454–1460

The deterioration of the DC bias margins in the ADD operation is thought to be due to the nonuniform distribution of the DC bias current for the adder circuit which may be caused by the existence of the contact register at the via.

Acknowledgements

5. Conclusions

References

We have designed a 1-bit RSFQ microprocessor by using our cell-based design methodology based on the BDD. We employed the bit serial architecture and the DLCA to decrease the circuit complexity and to increase the local clock frequency. The designed microprocessor contains 5118 Josephson junction and has an layout area of 7300 lm  7000 lm. Functional simulations have conformed its correct operation at local clock frequency of 6.25 GHz. We have also demonstrated successful operation of the 1-bit ALU, which contains 730 Josephson junctions.

[1] K.K. Likharev, V.K. Semenov, IEEE Trans. Appl. Supercond. 1 (1992) 1. [2] N. Yoshikawa, J. Koshiyama, K. Motoori, F. Matsuzaki, K. Yoda, Physica C 357–360 (2001) 1529. [3] F. Matsuzaki, K. Yoda, J. Koshiyama, K. Motoori, N. Yoshikawa, IEICE Trans. Electron E85-C (2002) 659. [4] N. Yoshikawa, J. Koshiyama, IEEE Trans. Appl. Supercond. 11 (2001) 1098. [5] Z.J. Deng, N. Yoshikawa, S.R. Whiteley, T. Van Duzer, IEEE Trans. Appl. Supercond. 9 (1999) 7. [6] J. Koshiyama, N. Yoshikawa, IEEE Trans. Appl. Supercond. 11 (2001) 263. [7] K. Fujiwara, J. Koshiyama and N. Yoshikawa, Extended Abstract of ISEC’01, Osaka, 2001, p. 163.

A part of this work was performed through Special Coordination Funds for promoting Science and Technology of the MEXT.