# **Rapid Single-Flux-Quantum Dual-Rail Logic for Asynchronous Circuits**

M. Maezawa, I. Kurosawa\*, M. Aoyagi, and H. Nakagawa

Electrotechnical Laboratory, 1-1-4 Umezono, Tsukuba, Ibaraki 305, Japan

# Y. Kameda and T. Nanya\*\*

Research Center for Advanced Science and Technology, University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153, Japan

Abstract-Dual-rail logic circuit elements based on rapid single-flux-quantum (RSFQ) technology have been designed and simulated. The proposed circuits can operate asynchronously, since dual-rail data include timing information in themselves. Therefore dual-rail logic scheme has a possibility of solving some problems of RSFQ circuits with flow clocking, which would become more serious as operating speed and complexity of the circuit increase. Implementation of RSFQ dual-rail AND and XOR cells is described. A scheme of transferring data from a serial flow-clocked circuit to a parallel dual-rail circuit is also proposed, which uses a fully asynchronous dual-rail demultiplexer.

### I. INTRODUCTION

As operating speed of circuits and systems increase, problems with timing and synchronization become more serious. Rapid single-flux-quantum (RSFQ) [1] is known to be an ultrafast device for digital processing. In RSFQ circuits, even though a microstrip line is used as a interconnection wire the propagation delay (~10 ps/mm) would be no longer negligible, compared with the gate delay (~10 ps/gate). This fact makes clock distribution and circuit synchronization hard. Moreover, in consequence of its representation of the binary data (Fig. 1 (a)), an RSFQ logic element needs a "clock" pulse for each operation. Therefore, problems associated with clock distribution is still more significant for RSFQ circuits

Flow clocking method [1] is a good solution to the problem because it does not require global synchronization. Flow clocking is a strong strategy for a serial processing, especially a simple straight-line pipeline structure. Such straight-line pipelined circuits with flow clocking scheme have been implemented [2], [3]. However, although no global synchronization is necessary, the flow clocking scheme assumes local synchronization of data and "clock' pulses, and complete knowledge of delays in the local region is necessary. As operating speed and complexity of the circuit increase, it would become harder to localize the timing in small area. Consequently, the delay estimate becomes

Manuscript received August 26, 1996. \* Present Address: Japan Women's University, 2-8-1 Mejirodai, Bunkyo-ku, Tokyo 112. \*\* Also with Department of Computer Science, Tokyo Institute of Technology

difficult and the design cost increases seriously. Moreover, flow-clocked circuits operate with the worst-case delay since the delays in "clock" lines are fixed so as to satisfy the condition of the local synchronization. Thus, for a complex and large scale system, flow clocking scheme would not enjoy the high-speed operation of RSFQ.

Dual-rail logics have a possibility of solving the above problems with flow clocking [4], [5]. Dual-rail logic circuits can operate without an external timing signal since dual-rail data include timing information in themselves. In dual-rail scheme a pair of (true- and false- ) signal lines is used for one-bit information; the appearances of an SFQ pulse on the true-line and the false-line are defined as the binary "1" and "0", respectively (Fig. 1(b)). The arrival of a pulse at one line of a dual-rail pair means the arrival of data immediately.

In this paper we present an improved design of dual-rail logic elements based on RSFO technology. The number of circuit elements per cell and the input/output delay of the circuits are smaller than those of our previous design [6], while margins for the supply voltage are estimated to be comparable. A scheme of transferring data from a serial flowclocked block to a parallel dual-rail block is also described, which uses an asynchronous dual-rail demultiplexer.

# **II. RSFQ DUAL-RAIL LOGIC ELEMENTS**

For simulations described in this section, the critical current density  $J_c$ , the specific capacitance  $C_s$  and the McCumber parameter  $\beta_c$  of the junctions are assumed to be 2 kA/cm<sup>2</sup>, 5.4 µF/cm<sup>2</sup> [7] and unity, respectively. Input and



Fig. 1. Representations of the binary data by single-flux-quantum pulses. (a) RSFQ basic convention and (b) dual-rail logic. In both cases, A="0" and B="1" are represented.

2706

output terminals of simulated circuits were connected to Josephson transmission lines during the simulations.

Fig. 2(a) shows an equivalent circuit of an RSFQ dual-rail AND element consisting of 27 junctions, 30 inductors and 10 bias lines. Basic concept of this new AND element is the same as that of our previous version [6] but there are two differences between them. First, the new version uses a D flip-flop with complementary output (DFFC) [8], while the previous one uses an AND and an OR of RSFQ family. As shown in Fig. 2(a), true- and false- outputs of dual-rail AND are calculated as  $\overline{a} + \overline{b}$  and  $\overline{a} + \overline{b}$ , respectively. Second, after connecting elementary components (pulse splitters, confluence buffers, and so on), we have removed some junctions which function as merely a buffer stage. As a result, the number of circuit elements has been reduced to about 60% of that of our previous version. Numerical simulation has confirmed that the circuit operates properly for all correct inputs, regardless of arrival timing of input pulses. An example of simulation results is shown in Fig. 2(b). The margins for the supply voltage have been calculated to be  $\pm 25\%$ . Input/output delays, delays between the latter input and the output, are in the range of 42 ps to 52 ps. Apparently, this dual-rail AND functions as an OR, a NAND and a NOR of dual-rail logic family. For example, an OR is made by twisting the true- and the false-lines of all dual-rail pairs at input and output;  $a + b = \overline{a} \cdot \overline{b}$  and  $\overline{a + b} = \overline{a} \cdot \overline{b}$ .



Fig. 2. An RSFQ dual-rail AND element. (a) Equivalent circuit diagram:  $J1=J4=J25=J26=J27=130\mu$ A,  $J2=J3=J22=150\mu$ A,  $J5=J10=105\mu$ A,  $J6=J7=J8=J9=80\mu$ A,  $J11=J12=J13=J15=120\mu$ A,  $J14=145\mu$ A,  $J16=J19=125\mu$ A,  $J17=175\mu$ A,  $J18=111\mu$ A,  $J20=123\mu$ A,  $J21=86\mu$ A,  $J22=151\mu$ A,  $J23=180\mu$ A,  $J24=140\mu$ A, L1=L2=L3=L4=L23=L4=L23=L24=L31=L32=8pH, L5=L10=0.1pH, L6=L7=L8=L9=0.5pH, L11=L15=2.9pH, L12=L16=6.1pH, L13=9pH, L14=0.2pH, L17=4.5pH, L18=5.7pH, L19=11.3pH, L20=7.1pH, L21=8.5pH, L22=2pH, L25=1.4pH, L26=1.6pH, L29=L30=15.4pH,  $R1=R3=12\Omega$ ,  $R2=23\Omega$ ,  $R4=38\Omega$ ,  $R5=28.3\Omega$ ,  $R6=24.4\Omega$ ,  $R7=34.5\Omega$ ,  $R8=11.5\Omega$ ,  $R9=R10=30.5\Omega$ . The circuit is biased by a common voltage source of 3mV. (b) An example of simulation results.



Fig. 3. An RSFQ dual-rail XOR element. (a) Equivalent circuit diagram:  $J1=J2=J3=J4=150\mu$ A,  $J5=J6=J7=J8=95\mu$ A,  $J9=J10=J11=J28=J29=J30=130\mu$ A,  $J12=J13=120\mu$ A,  $J14=J16=143\mu$ A,  $J15=170\mu$ A,  $J17=J18=J19=J20=108\mu$ A,  $J21=J22=140\mu$ A,  $J23=J26=122\mu$ A,  $J24=J25=101\mu$ A,  $J27=160\mu$ A, L1=L2=L3=L4=L7=L9=L13=L19=L20=L21=L26=L27=L33=L34=8pH, L5=L6=L11=L12=0.3pH, L8=L10=L14=L15=3pH, L16=L30=6pH, L17=3.1pH, L18=5.9pH, L22=L23=L24=L25=7.3pH, L28=1.5pH, L29=0.1pH, L31=L32=15.4pH,  $R1=R2=R3=21.5\Omega$ ,  $R4=R5=13.2\Omega$ ,  $R6=R9=38\Omega$ ,  $R7=R8=42\Omega$ ,  $R10=21\Omega$ ,  $R11=R12=30.6\Omega$ . The circuit is biased by a common voltage source of 3mV. (b) An example of simulation results.

An equivalent circuit of an RSFQ dual-rail XOR is shown in Fig. 3(a), which consists of 30 junctions, 32 inductors and 12 bias lines. Basic concept of this new XOR is the same as that of previous ones [6]. This dual-rail XOR includes two XOR cells of RSFQ family. True-output is calculated by one of the XORs as  $a \oplus b$ , while false-output by the other as  $a \oplus \overline{b}$ . Similar to the AND element, some functionless junctions have been removed so that the number of the circuit elements has been reduced. Proper operation of the circuit has been confirmed by numerical simulation (Fig. 3(b)). Input/output delays are calculated to be about 40 ps. The calculated margins for the supply voltage are  $\pm 27.5\%$ .

Using these logic elements, any combinational logic circuit can easily be constructed without a timing analysis. Since the proposed elements generate the output immediately after the completion of the input, the combinational circuit consisting of them operates with the average-case delay instead of the worst-case delay.

# III. DATA TRANSFER FROM A FLOW-CLOCKED BLOCK TO A DUAL-RAIL BLOCK

The best solution for the timing scheme depends on each system. For most real applications it is likely that more than one timing scheme exists in a system. In this section, we present a concept of data transfer scheme from serial flowclocked block to parallel dual-rail block, using an asynchronous dual-rail demultiplexer.

A schematic diagram of an one-to-two cell of a fully asynchronous dual-rail demultiplexer (demux) is shown in Fig. 4(a). This demux cell consists of two B-flip-flop-based T•RS flip-flops (T•RSFF) [9], a T flip-flop (TFF), a confluence buffer (CB) and four pulse splitters (PS). Similar demux using dual-rail scheme in part has been proposed by Deng, Whiteley and Van Duzer [5]. In their design, although the initial timing can be obtained from a dual-rail input datum, the timing information is lost at outputs since output data are represented in the single-rail form. The demux cell proposed here operates fully asynchronously since both input and output data are dual-railed. A pulse arriving at the demux cell is split in two at the input terminal. While one triggers one of the T•RSFFs and generates an output pulse, the other is directed by TFF and sets (or resets) the T•RSFFs. The only timing condition required for proper operation is that the T•RSFF should be set (or reset) after generating an output pulse, which can be localized in small region. Once an appropriate design of the demux cell is obtained, an one-to-N demux block can easily be constructed as shown in Fig. 4(b). Note that this asynchronous dual-rail demux tree is made of only the one-to-two demux cells; no external timing signal nor additional control circuit is necessary.

Fig. 5 shows a block diagram of a possible system consisting of a single bit-wide serial flow-clocked block and a parallel dual-rail block. Output data of the flow-clocked block is stored in DFFC and then is converted into dual-rail form. This dual-rail data is assigned to each input of the dual-rail block by an asynchronous dual-rail demux in Fig. 4(b). After arrival of each input pulse, the dual-rail block starts data processing asynchronously. Such a system as



(a)



Fig. 4. An asynchronous dual-rail demultiplexer. (a) Schematic diagram of an one-to-two cell consisting of two T-RS flip-flops (TeRSFF), a T flip-flop (TFF), four pulse splitters (PS), and a confluence buffer(CB). (b) Block diagram of an one-to-eight demultiplexer tree.



Fig. 5. A data-transfer scheme from a serial flow-clocked block to parallel dual-rail block.

shown in Fig. 5, combining serial flow-clocked and parallel dual-rail blocks, would be useful for many real application, since flow clocking scheme is suitable for simple serial architecture and dual-rail scheme for parallel architecture.

### **IV. DISCUSSIONS**

Self-timing operation is a big advantage of a dual-rail logic. Once elementary cells operating without an external timing signal are designed appropriately, a larger circuit block can easily be constructed as a simple network of the cells. Complicated timing analysis is unnecessary. It reduces design cost of large and complex circuits, e.g., random logics and logics with many data feedbacks, drastically. Moreover, in contrast to flow-clocked circuits, dual-rail circuits operate with the average-case delay. Even if circuits and systems become larger and more complex, reduction of operating speed is not so serious.

There are some disadvantages of dual-rail logics. First, dual-rail scheme costs overhead. Dual-rail implementation requires more circuit elements than flow clocking implementation. For example, our dual-rail XOR includes 30 junctions while an XOR of the conventional RSFO family includes 8 junctions. Moreover, on the elementary cell level, a dual-rail circuit element operates slower than a flow-clocked one. Therefore dual-rail scheme is not a good selection for small and simple systems. Second, some sequential RSFQ elements playing an important role in our implementations of dual-rail circuits, e.g., coincidence junction and T flip-flop, cannot be reset externally. Therefore initialization of the circuit in arbitrary state is almost This fact might make testing of the impossible. asynchronous dual-rail circuit difficult and become serious problem for the practical application.

# V. CONCLUSION

Unlike flow-clocked circuit, a dual-rail circuit can operate without an external timing signal since dual-rail data have timing information in themselves. Dual-rail scheme makes the design of large and complex RSFQ circuits easy. Because a dual-rail circuit operates with the average-case delay, it can make full use of high-speed operation of RSFQ even if circuits and systems become larger and more complex.

We have presented an improved design of dual-rail AND and XOR elements based on RSFQ technology, which operate without an external timing signal. The numerical simulation has confirmed that the proposed circuits operate asynchronously and have wide margins for the supply voltage. A fully asynchronous dual-rail demultiplexer has been also proposed. This demultiplexer is useful for transferring data from a serial flow-clocked circuit to a parallel dual-rail circuit.

#### ACKNOWLEDGMENT

We wish to thank M. Suzuki, Y. Hamazaki, S. Kiryu, T. Kikuchi, and A. Shoji for valuable discussions.

### REFERENCES

- K. K. Likharev and V. K. Semenov, "RSFQ logic/memory family: a new Josephson-junction technology for sub-teraherz-clock-frequency digital systems," *IEEE Trans. Appl. Supercond.*, vol. 1, pp. 3-28, March 1991.
- [2] O. A. Mukhanov and A. F. Kirichenko, "Implementation of a FFT radix 2 butterfly using serial RSFQ multiplier-adders," *IEEE Trans. Appl. Supercond.*, vol. 5, pp. 2461-2464, June. 1995.
- [3] K. Gaj, E. G. Friedman, M. J. Feldman, and A. Krasniewski, "A clock distribution scheme for large RSFQ circuits," *IEEE Trans. Appl. Supercond.*, vol. 5, pp. 3320-3324, June. 1995.
  [4] I. Kurosawa, H. Nakagawa, M. Aoyagi, M. Maezawa, Y. Kameda,
- [4] I. Kurosawa, H. Nakagawa, M. Aoyagi, M. Maezawa, Y. Kameda, and T. Nanya, "A basic circuit for asynchronous superconductive logic using RSFQ gates," in *Extended Abstracts of ISEC*'95, pp. 204-206, Sept. 1995.
- Sept. 1995.
  [5] J. Z. Deng, S. R. Whiteley, and T. Van Duzer, "Data-driven self-timing of RSFQ digital integrated circuits," in *Extended Abstracts of ISEC'95*, pp. 189-191, Sept. 1995.
- (b) 109-191, Sept. 1925.
  (c) M. Maezawa, I. Kurosawa, Y. Kameda, and T. Nanya, "Pulse-driven dual-rail logic gate family based on rapid single-flux-quantum (RSFQ) devices for asynchronous circuits," *Proc. 2nd Int. Symposium on Advanced Research in Asynchronous Circuits and Systems*, pp. 134-142, March 1996.
- [7] M. Maezawa, M. Aoyagi, H. Nakagawa, I. Kurosawa, and S. Takada, "Specific capacitance of Nb/AIOx/Nb Josephson junctions with critical current densities in the range of 0.1-18 kA/cm<sup>2</sup>," *Appl. Phys. Lett.*, vol. 66, pp. 2134-2136 April 1995
- 66, pp. 2134-2136, April 1995.
  [8] A. F. Kirichenko, V. K. Semenov, Y. K. Kwong, and V. Nandakumar, "4-bit rapid single-flux-quantum decoder," *IEEE Trans. Appl. Supercond.*, vol. 5, pp. 2857-2860, June 1995.
  [9] S. V. Polonsky, V. K. Semenov, and A. F. Kirichenko, "Single flux,
- [9] S. V. Polonsky, V. K. Semenov, and A. F. Kirichenko, "Single flux, quantum B flip-flop and its possible applications," *IEEE Trans. Appl. Supercond.*, vol. 4, pp. 9-18, Mar. 1994.