Architectures and technologies for an optoelectronic VLSI

Architectures and technologies for an optoelectronic VLSI

Architectures and technologies for an optoelectronic VLSI Dietmar Fey Friedrich-Schiller-Universität Jena, Institut für Informatik, Ernst-Abbe-Platz 1...

163KB Sizes 0 Downloads 55 Views

Architectures and technologies for an optoelectronic VLSI Dietmar Fey Friedrich-Schiller-Universität Jena, Institut für Informatik, Ernst-Abbe-Platz 1–4, 07743 Jena, Germany

Dedicated to Prof. Dr. Adolf Lohmann on the occasion of his 75th birthday

Abstract: The article discusses the possibilities of using shortdistance optical interconnects in digital computing systems. It is postulated that high-dense optical free-space optical interconnects in the chip-to-chip area offer the possibility to overcome the serious pin limitation problem for VLSI circuits. The current investigated technologies for an optoelectronic threedimensional VLSI are briefly presented. It is pointed out that the possible high channel density offered by optoelectronic interconnect technology is only exploited if the processor architecture supports this. The author presents a list of eight features an architecture should largely fulfil to guarantee an efficient use of high-dense optical interconnects. Examples for such architectures are presented. In a parallel optoelectronic digital signal processing unit an increase in the throughput performance about a factor of 70 compared with pure electronic solutions is achievable due to a parallel optical processormemory coupling. Investigations of a prototype circuit show that three magnitudes better configuration times can by achieved by using optics in dynamically reconfigurable hardware compared to pure electronics. In closing a solution for a noise robust receiver circuit is presented to realise smart detector circuits in cost-effective standard CMOS processes. Key words: Optoelectronic VLSI – reconfigurable computing – optical interconnects – optics in computing

1. Role of optics for short-distance interconnects in future digital systems Optical interconnects for digital data transfer offer high bandwidth rates and high reliability. Due to these benefits optical fibres became the first choice as transfer medium in the telecommunication area. Fibre optics replaced copper lines and in order to satisfy the huge network capacities required in the internet market optical switching nodes are the next step to realise high speed all-optical networks. More and more electrical short-distance interconnects prove as bottleneck in the input/output (I/O) bandwidth, too. This holds for the data transport on printed circuit boards (PCBs) as well as for the chip-to-chip [1] and on-chip data transfer. Hence using optical interconnects is discussed as future solution for short-distance communication in digital systems [2, 3, 4, 5]. In contrast with the far distance data transfer in Received 15 January 2001; accepted 6 April 2001. Fax: ++49-3641-946372 E-mail: [email protected] Optik 112, No. 7 (2001) 274–282 © 2001 Urban & Fischer Verlag http://www.urbanfischer.de/journals/optik

the telecommunication area, short distance digital systems need not so much a single high speed line but a large temporal bandwidth. The benefits of optics we want to use primarily in this work is the high space bandwidth or the high channel density than the large temporal bandwidth of optics. From today’s point of view we have to determine that optics seems not to be an alternative for optical on-chip interconnects for the next decade. Industry and research favour more low-k materials to reduce the RC factor of lines which remains constant against scaling processes in contrast to the switching time of transistors [6]. Another situation is given in the inter- and intra-board communication. Optical backplanes and electrical-optical PCBs have a maturity that they can soon become industrial standard [7, 8]. Future developments will also allow the use of 2-D fibre arrays for parallel data transfer between boards [9]. The still growing number of transistors that can be integrated on chip due to technological progress leads to a serious I/O bottleneck in the chipto-chip communication. This problem is known as pin count limitation in current very large scale integrated (VLSI) circuits. To illustrate how serious this problem is, we show the following simple example. It is not unrealistic to assume that a system in the near future will consist of 100 million transistors on one chip. Rent’s rule shows that such an integrated circuit has an I/O requirement of about 14 000 signals [10]. Electronic interconnects can not offer such a huge pin count. In contrast optical free-space interconnects based on two-dimensional (2-D) arrays of vertical-cavity surface emitting laser (VCSELs) and detectors can use the whole chip area for communication and not only the chip’s edge, where external electronic pads are located in order to minimise RC disturbing effects in long bond wires. The goal of 2-D optical interconnects is to make available 1000 channels with up to 1 Gbit/s transfer rate per channel for the chip-to-chip communication within the next ten years. A potential that is sufficient to satisfy the SIA roadmap [11] requirements for the year of 2010 for the pin count of processor and memory circuits. In this paper we focus especially to the role of optics on the chip-to-chip level. Furthermore we want to point out explicitly that the benefits of highdense optical 2-D interconnects will only be sufficiently exploited if it is supported by an appropriate circuit or system architecture. The remainder of the paper is organised as follows. In section 2 the state-of-the-art in optoelectronic VLSI (OE0030-4026/01/112/07-274 $ 15.00/0

Dietmar Fey, Architectures and technologies for an optoelectronic VLSI

VLSI) is briefly described. In section 3 a list of architecture features, which was set-up by the author, is presented. A system which fulfils all or some of these features is a good candidate to exploit best the high potential of optical shortdistance interconnects and to offer much more throughput performance than comparable pure electronic solutions. In section 4 some architecture examples are shown which are well-suited for an OE-VLSI and for which we developed first prototypes and manufacturing ready layout structures, respectively. The article closes with a summary and an outlook on work that has to be done in the future to lead OEVLSI to success.

2. Smart pixel systems for OE-VLSI OE-VLSI combines the high spatial interconnection density of optics for data communication with the high transistor integration density of electronics for data processing. The basic device elements for an OE-VLSI are 2-D detector and emitter arrays for data transport and silicon circuits for data processing. The 2-D detector array works as parallel optical input source for the silicon circuit and the 2-D emitter array as parallel optical output source. The silicon circuit is mostly realised in CMOS, the dominating technology in today’s semiconductor circuits. It is convenient to partition the CMOS circuit into processing elements (PEs) and to connect these PEs to a group of locally neighboured optical emitter and detector elements which operate as optical sender and receiver pads for the PEs. This means that the detector and receiver array is also partitioned and such a partition of sending and receiving pixels is assigned to a particular PE partition (fig. 1). In this context one speaks of a smart pixel system. The first attempts to realise smart pixel systems were characterised by very simple pixels, e.g. they consisted only of a few Boolean logic gates [12, 13] with optical I/O. The author restricts the term smart pixel not only to simple gate groups but considers arrays of processor cores [14] as well as reduced instruction set computer (RISC)-like PEs with optical input and outputs as

275

smart pixels, too. The main feature which justifies to speak of a smart pixel is the geometric locality of PEs and their associated external I/O channels. This implies that architectures with regular communication and processing structures are well-suited for an efficient realisation with smart pixels. Therefore typical von-Neumann processor architectures with central bus structures, irregularly distributed on-chip data traffic and unequally sized functional basic units will not become a smart pixel system even if the functional units are equipped with optical input and output elements. The same holds for modern superscalar processor architectures consisting of pipelined computing units, hierarchically organised caches, and prediction and register renaming units for dynamic order scheduling. Either refractive or diffractive optics or a combination of both is used to link smart pixel circuits whith free-space optics to build up 3-D circuits. Hence an OE-VLSI with smart pixel technology is always a technology for a 3-D VLSI, too. Different optical packaging technologies were developed and experimentally investigated. The basic concepts favour either stacked structures with micro-mechanical elements to combine micro-optical and OE-VLSI devices [15, 16] or planar integration techniques in which free-space optics is folded within a glass substrate [17]. This surface of the glass substrate is simultaneously used as a kind of optical multi-chip-module on which OE-VLSI circuits are bonded and in which the necessary optical devices are directly integrated. For OE-VLSI circuits hybrid solutions are necessary due to the fact that silicon optoelectronics is not yet mature enough at present. Therefore the emitter array has to be realised on GaAs-based technology. The most suited devices are 2-D arrays of VCSELs due to the high optical output power, the narrow wavelength bandwidth, which is mandatory if diffractive optics is used. Furthermore the fact that emission occurs normal to the substrate is ideal for a 3-D OE-VLSI. Such devices are commercially available as arrays of 8 ´ 8 laser diodes with a transfer rate of up to 2 Gbit/s per laser diode. One can expect that the yield in manufacturing such devices increases in the future making larger area size possible. For the detector array metal-semiconductor-metal photo diodes (MSM-PDs) or simple pn and pin structures can be used as optical receiving devices. If the diodes are monolithically integrated with the silicon circuitry we get smart detector circuits. The alternative is to interlace MSM-PDs with VCSELs on a single GaAs substrate [3]. Both interleaved VCSEL/MSM-PD arrays and pure VCSEL arrays are flip-chip solder bumped on a silicon application-specific integrated circuit (ASIC) or a smart detector circuit, respectively. Then the whole chip area can be used for communication between integrated circuits. Because the solder bumps serve not only as mechanical but also as electrical contacts, the driver circuitry for detectors and emitters is usually implemented in the silicon.

3. Architectures for an OE-VLSI Fig. 1. Smart pixel system with partitioned VCSEL, detector and PE array circuits.

Not every architecture realised in VLSI will profit from an optelectronic interface with high connectivity. Unfor-

276

Dietmar Fey, Architectures and technologies for an optoelectronic VLSI

Fig. 2. Abstract model of a 3-D OE-VLSI architecture.

tunately at the moment optical interconnects are not a solution for the problem of high latency and insufficient bandwidth in the processor-memory communication of modern universal processors due to the inherently irregular set-up of such architectures. In contrast, latency would increase due to the additional required time to get through the driver and receiver [18]. Another problem concerns geometric aspects. 2-D components like VCSEL and micro lens arrays or facets of holographic optical elements show fixed pitch sizes of 125 µm, 250 µm, or 500 µm. This regular set-up does not match to the mostly irregular set-up of modern microprocessors. In such microprocessors different modules with different geometrical sizes have considerably varying data traffic requirements to each other and to the external memory. In this case thousand optical links running from the memory to the processor circuit and vice versa would offer no advantage, because the electronically changed optical input signals have to be routed on long and inefficient lines to their final destination on the chip. Hence architectures that exploit optical interconnections efficiently have to take into account the locality and regularity of optical imaging systems. Examples for such efficient architectures can be found in [19, 20, 21, 22]. Here, we postulate the following list of features, an architecture should fulfil to profit from an optoelectronic interface with high connectivity: (i)

(ii)

Array processing on chip level: Carrying out a single calculation in an OE-VLSI circuit will never take place faster compared with its all-electronic counterpart because we calculate in an OE-VLSI circuit electronically, too. Hence, it should be a goal to integrate parallel processing structures on chip. The whole chip area has to contain so much PEs that the maximum possible pin count of the chip is exceeded. Then a parallel optical interface using the whole chip area for communication can be exploited reasonably. Simply structured PEs: Simply structured PEs are to prefer in order to get massively parallel architectures with ten to hundred thousands PEs which are distributed in stacked and optically linked circuit planes.

(iii) Scalable architectures: The architectures have to be scalable in three dimensions in order to react upon technological advances flexibly. (iv) Fine-grained structures: The best way to achieve scalable architectures in to use fine-grained processing structures which exploit best the large space-bandwidth product of optical interconnects. (v) Single instruction multiple data (SIMD)-like hardware: To simplify the design process we favour SIMD-like hardware. First an area optimised PE has to be designed. Afterwards the PE has to be copied multiple times on the chip area. (vi) Very long instruction world (VLIW)-architecture: Realising SIMD-like hardware does not mean that always SIMD computers must arise. Due to a VLIW architecture it is still possible to carry out different operations in neighboured PEs. (vii) Parallel synchronous operation mode: To reduce the effort for the control unit, operations which are executed in parallel should start and terminate simultaneously. (viii) Pipeline operation: Whenever possible pipeline processing is to exploit in stacked and optically linked circuits. Fig. 2 shows an abstract model of an architecture fulfilling these eight architecture features. Arrays of OE-VLSI circuits linked together by 3-D optical interconnections offer a kind of natural pipeline. In each OE-VLSI circuit optoelectronic PEs are integrated as much as possible. The circuit planes are horizontally stacked and linked by optical interconnection modules. In such a 3-D architecture we combine the two most popular parallel processing techniques: array processing within one circuit and pipelining in circuit planes stacked in the 3rd dimension.

4. Examples for efficient OE-VLSI architectures In the following section we present some architectures in which all or at least the most of the above eight architecture features are given. Furthermore for some of them first

Dietmar Fey, Architectures and technologies for an optoelectronic VLSI

277

Fig. 3. Architecture for a 3-D optoelectronic digital signal processor based on an optical memory processor communication.

x + 2–i ⋅ x –i yi +1 > 0 xi +1 =  i yi +1 ≤ 0  xi –i  y – ln (1 + 2 ) yi +1 > 0 yi +1 =  i yi +1 ≤ 0  yi

Fig. 4. Layout of the PE of the optoelectronic signal processing unit. The rectangular corresponds to a pn photo diode. Right of the diode the digitising receiver circuit is shown, below a cutting of the PE logic layout.

prototypes and demonstrator circuits were realised to prove the technical feasibility.

4.1. Parallel optoelectronic digital signal processor A 3-D optoelectronic digital signal processing unit is presented as first example. The performance of this architecture is mainly based on parallel optical memory-processor coupling. The algorithms which are hardwired implemented in the processor architecture are based on so-called convergence procedures to calculate standard functions like sine, cosine, logarithm, exponential function, arctangent, square root, multiplication and division. The following iterative calculation scheme is an example for the calculation of the exponential function (1).

(1) x0 = 1

i →∞ a y0 = a xi → e .

The calculation is based on simple operations like add, bit check, shift and table access. Table access is necessary to store the pre-calculated constant values ln (1 + 2–i) for each iteration step i. The algorithm starts with the argument a (0 ≤ a < 1) of the exponential function in the y register and the constant value 1 and the x register. In each iteration an addition is carried out to calculate the new value for x and y. If the value in the y register is greater than 0, the new values are valid for the next iteration step. If not, the new values are cancelled and the calculation process is continued with the old values for x and y. If n is the operand word length, the desired result is given in the y register after n iteration steps. Similar algorithms exist for the other functions. We developed a single PE whose internal data paths can be configured in such a way that one of the eight functions is calculated. We carried out intensive architecture studies in which we determined which architecture approach – a bit-serial or word-serial execution of the addition – offers the highest throughput, i.e. the most operations per second for a parallel implementation. As result we found out that a bit-serial approach shows the best throughput performance. More details concerning the algorithms and the architecture studies can be found in [23] and [24]. A serious problem for an electronic solution of the bit serial architecture is the table access, especially for a parallel digital signal processor. A parallel implementation of the tables needs too much chip area. To implement the table only one time on the chip and to broadcast the constant data values to the different signal PEs causes long on-chip interconnects. This leads either to a longer clock rate or to large interconnections which need much chip area. In the past designers were prevented from realising all-electronic solutions due to the difficulties to implement the tables for the constant values. As solution for that problem we propose an OE-VLSI system based on a parallel optical inter-

278

Dietmar Fey, Architectures and technologies for an optoelectronic VLSI

Fig. 5. Scheme for an optoelectronic reconfigurable system. The configware is duplicated, e.g. with a binary phase grating. The data is interleaved with the I/O of a data memory. The whole binary matrix is mapped with an appropriate optic onto the reconfigurable logic array.

face between a memory containing the table values and the parallel processor. The table is implemented in the memory circuit only one time and the constant values are broadcast by optical fan-out interconnections to their destination in the processor circuit. Fig. 3 shows a scheme of the proposed architecture concept. The core devices are a memory and a processor circuit. Input of operands and output of results are realized electronically. At the bottom of the processor array the operands are read in. The data flow of the operands in the processor array runs from bottom to top. The calculated result bits are read out at the top of the circuit. Each PE in a column calculates exactly one iteration step. In neighboured columns different functions can be calculated simultaneously. In addition to the constant values ln (1 + 2–i), further constants arctan (2–i) are required for trigonometric functions. In all PEs of one row the same iteration is carried out and in each iteration the same constant values are required. Therefore the constant values can be transmitted via a broadcast from the memory circuit to all column PEs. Moreover, we decided to distribute the clock globally to all PEs. Hence the memory circuit contains a corresponding clock generator. Due to the bit-serial architecture approach the bits of the constant values can be stored in a ring shifter and are transmitted bit by bit to the PEs. Each PE of the processor array needs three optical inputs (for two constant bits and the clock signal) and no optical outputs. So the processor array can be designed as a smart detector. For the optical interface between the memory and the processor circuit we propose a 1-D VCSEL array. Dammann gratings [25] can be used to realize the 1-ton splitting of the optical beams. Fig. 4 shows the layout for one PE which we developed in a combined automatic-manual process. The PE logic was synthesised with Cadence from a Verilog description and the result was subsequently connected to the photodiode within the layout editor. With help of the layout tool we determined for a 32 bit processor array and a 0.8 µm CMOS process a critical path length of 3.66 ns which results in a rounded-up clock fre-

quency of 250 MHz. Furthermore the design tool determined for one PE an area of 539 416 µm2. This means that for a chip with 1 cm2 seven pipelines can be horizontally arranged side-by-side and that one VCSEL of the memory circuit has to perform a fan-out of seven with a detector pitch of 1438 µm at maximum. Pitch size and fan-out splitting represent moderate requirements for optical hardware. Consequently, 50 mega operations per second (7 ´ 250 MHz/32 bit word length) can be achieved. One operation corresponds to the calculation of a standard function like sine or cosine for example. Compared to the 2.7 µs for the simultaneous calculation of sine and cosine in signal processors of the series TMS320C2XX, our optoelectronic approach improves this performance about a factor of 70. The presented digital signal processing unit is an excellent example for an architecture which fulfils the above listed architecture requirements for an efficient implementation with optoelectronic technologies. The processor consists of a fine-grain massively parallel array of PEs. Its structure is fully scalable, all PEs are built-up identically but nevertheless by reconfiguring of the data paths it is possible to carry out different operations on different PE column vectors simultaneously (VLIW architecture with SIMD-like hardware). All operations start and terminate at the same time and the operations are carried out in pipeline mode. Most of these features are also valid for the architecture class we present in the next section.

4.2. Dynamically reconfigurable optoelectronic hardware Computing with reconfigurable hardware pursuits to combine the efficiency of an ASIC with the flexibility of a universal programmable microprocessor. The flexibility is achieved by implementing logical functions not directly hard-wired, as in an ASIC, but by configuration of SRAM based look-up-tables or of hierarchical multiplexer structures. In the case of look-up-table based reconfigurable logic, the input variables serve as address under which the result of the desired Boolean function was stored during

Dietmar Fey, Architectures and technologies for an optoelectronic VLSI

configuration. Besides it is possible to link arbitrary outputs with inputs of different look-up-tables by configuring switching matrices during the configuration phase. In analogy to software of a microprocessor the bit stream for configuring look-up-tables and switching matrices is sometimes called configware. Normally a matrix of reconfigurable cells with corresponding switching matrices is integrated on a reconfigurable chip. Hence reconfigurable hardware is also a special kind of fine-grain, massively parallel and scalable SIMD-like hardware. Currently one of the main applications of reconfigurable hardware, e.g. of FPGAs (Field Programmable Gate Arrays), is their use in rapid prototyping. Hardware can be tested with FPGAs and the reconfiguring of SRAM based logic allows to make failures which can be corrected. The most attractive feature of reconfiguring in the view of computer science research is the possibility of run-time, dynamic reconfiguration. In an ideal case this allows to carry out most complex operations with simple and less hardware which will be permanently adapted to new tasks by dynamic reconfiguration. But to be competitive to a universal processor, dynamic reconfiguration has to be executed extremely fast. To achieve this task within milliseconds much basic research is still necessary. One reason for the difficulties is again the limited I/O pin count due to pins placed only around the edge of a memory chip containing the configware and a processor chip containing the reconfigurable logic. To program the processor chip a time consuming serial bit stream is transferred from the configware memory to the processor chip. As a solution of this problem we propose to use optoelectronic hardware which allows much faster configuration times than milliseconds due to the high-dense parallel optical access between reconfigurable hardware and memories in which configware and data are stored (fig. 5). The configware can be transferred directly to the location where it is needed on the processor chip due to the direct optical access. It is not necessary to transfer the configware data along long lines. Furthermore all reconfigurable logic cells can be addressed in parallel. Neither the direct access nor the parallel addressing can be realised with electronics so efficiently as this is possible with optical interconnects. The direct access via optoelectronic interconnects permits to carry out configurations within nanoseconds in the best case and microseconds in the worst case. In contrast a complete reconfiguration carried out electronically needs milliseconds. Therefore an optical solution offers the potential for a performance increase of more than three magnitudes compared with a pure electronic solutions. Furthermore an optical off-chip reconfiguration with a 3-D OE-VLSI supports elimination of global wires which require much chip area, and which are necessary for the usual on-chip configuration. The saved chip area can be used for the integration of additional configurable logic cells what increases the degree of parallelism. Besides, the optical access makes possible a partial reconfiguring of single cells at run time, too. Using optical interconnects offers a simple and efficient method for dynamic reconfiguration. In the author’s view dynamic reconfiguration is also the only useful application of optics for reconfigurable hardware. In

279

Fig. 6. Die of the test chip. The layout of SRAM based lookup-table is shown. The large rectangulars are p- and n- contacts on which optical modulators are bonded. The modulators operate as optical I/O pads for the circuit. The horizontal pitch is 62.5 µm, the vertical pitch between two p-contacts of the modulators is 125 µm.

rapid prototyping there is no much gain to configure an FPGA in nanoseconds instead of milliseconds at the beginning of a test run. Hence for this application the additional effort of introducing optoelectronics in VLSI is not justified. Prototypes, recently presented in literature, demonstrate the importance of optical interconnects for reconfigurable hardware. At McGill University optoelectronic programmable logic device (PLD) structures were developed to realise rapidly and flexibly different optoelectronic VLSI architectures [26]. 3-D FPGAs consisting of stacked optoelectronic FPGA devices were proposed and already realised in first prototypes at Gent University [27]. The goal of this work is to achieve a better utilisation of configurable logic cells due to additional optical 3-D wiring resources. At the University of Tokyo, high resolution liquid crystal devices are used to generated dynamically holographic optical interconnects for an optoelectronic parallel computing system [19]. Holographic memory techniques are favoured at California Institute of Technology for the configware memory [28]. Reconfigurable architectures with optical interconnects were realised by the research group of the author at the University Jena [24, 29]. We implemented on the layout level an ASIC in a 0.5 µm CMOS process with SRAM based look-up-tables of size 4 ´ 2. Each look-up-table possesses optical modulators as optical input and output source. Using these modulators, all look-up-tables of the chip can be loaded simultaneously via optical interconnects. Including the address and amplifier logic for the SRAM cells as well as the driver circuitry for the modulators one configurable cell required 356 transistors and a necessary area of about 4160 µm2. In SPICE simulations an access frequency of 250 MHz was determined to load one bit. Consequently 24 000 look-up-tables can be integrated on 1 cm2 chip area, and all look-up-tables can be reconfigured within nanoseconds due to the parallel optical access.

280

Dietmar Fey, Architectures and technologies for an optoelectronic VLSI

In laboratory experiments we measured a clock frequency of 90 MHz. One reason for the divergence to the simulation result concerns the different coding schemes we used. The simulation result refers to a differential dual rail coding scheme, i.e. two physical optical links belong to one logical channel. Such a coding is more immune against noise than a one-to-one coding, i.e. each optical link corresponds to one logical channel. We used in the realised chip the 1to-1 coding in order to reduce not the channel density of our tiny test chip of 1.2´ 1.2 mm2 size. Fig. 6 shows the die of the test chip. Currently research work is carried out at Fernuniversität Hagen to integrate this chip with optical interconnects and a laser array in a planar optic multi-chip module [2]. This approach can also be expanded for a coupling to optoelectronic memories storing configware and data.

4.3. Pipelined optoelectronic parallel integer processor in smart detector technology The last architecture example we present in this paper concerns a general-purpose optoelectronic PE capable of performing integer operations like addition, subtraction, multiplication and division. The architecture consists of stacked OE-VLSI circuits which all operate in pipeline mode. Subtraction, multiplication and division are reduced to subsequent additions. In each circuit layer an addition step is carried out. Because we have not only one pipeline but many parallel operating pipelines, which are spatially arranged along all circuits planes, the architecture represents a 3-D massively parallel OE-VLSI system. The optical interconnection scheme between directly neighboured circuits is a simple 1-to-1 mapping. Due to high-dense optical chip-tochip interconnects such an architecture has the potential to implement thousands of PEs in a 3-D optoelectronic processor cube in the future. Details concerning the algorithm and architecture can be found in [30].

Fig. 7. Die of the smart detector chip in a standard CMOS process. The layout of a PE (dark part in last diode row) of an optoelectronic parallel integer processor unit is shown together with a 4 ´ 4 photo diode array. The diode size is 30 ´ 30 µm2, the pitch of the diodes is 125 µm.

Fig. 8. Schematic of a receiver circuit with given transistor channel width/length ratio (top). Simulation based on SPICE simulations (bottom). The circuit digitises correctly the analog input for photo currents between the two curves.

Figure 7 shows the die of a smart detector chip in which one PE is integrated with a 4 ´ 4 photo diode array. The diodes operate as external optical chip inputs. For test purpose a further test chip containing only photo diodes was additionally realised. Our intention was to use a cheap standard CMOS process for the monolithic integration of optical pads and digital logic circuitry and to find an efficient receiver circuit for the converting of the optical input signal. Grimm [31] investigated intensively different photo diode sizes and different receiver circuits to find an optimum size and a receiver circuit which is stable against variations in the photo current. A trade-off has to be found between large diode square measures, which simplifies the optical imaging onto the diode, and a size which is as small as possible to reduce the capacity to get low RC values. A diode size of 40 ´ 40 µm was found as ideal in experiments carried out with an illuminating multi-mode fibre with 50 µm core diameter. For single-mode fibres we found out that an edge length of 35 µm is sufficient to illuminate the whole diode area. The circuit shown in fig. 8 is based on a transimpedance amplifier. For a maximum photo current of 15 µA a frequency of 500 Mbit/s was determined by SPICE simulations. To reach such a high frequency, Vtune has to be adjusted very exactly around 0.6105 V. Otherwise

Dietmar Fey, Architectures and technologies for an optoelectronic VLSI

this high frequency can not be reached. Furthermore the circuit offers for Vtune = – 90 mV a high dynamic range from 32.2 µA up to 61 µA, i.e. within these values the circuit will work correctly. We determined by measurements 0.4 A/W as responsivity of the pn diodes in a standard 0.8 µm CMOS process. Hence a dynamic range of 28.8 µA (61 µA – 32.2 µA) corresponds to fluctuations in the optical input power of about 80 µW. These results refer exclusively to the electronic circuits. Other groups are engaged in designing special diode structures, e.g. fingered pn regions, to reduce diffusion effects which arise due to the deep penetration of light into the silicon substrate at 850 nm [32]. The receiver circuit of fig. 8 will work with every photo diode structure. The better the photo current is generated the more efficient the potential of such a circuit is exploited.

5. Conclusion of future direction In this paper we evaluated the chance of short distance optical interconnects for VLSI circuits. Using high-dense optical interconnects in the chip-to-chip area allows to overcome the pin limitation problem in current VLSI technology. Monolithic and hybrid techniques based on VCSEL and detector arrays are developed to realise 3-D OEVLSI consisting of stacked CMOS circuits equipped with parallel optoelectronic I/O interfaces. Appropriate architectures are required to exploit efficiently the potential of 1000 optical links with up to 1 Gbit/s data rate between integrated circuits in the future. Fine-grain, massively parallel and pipelined architectures are in the focus because they will exploit best the benefits of the high space bandwidth of optics. As an example for such an architecture a digital signal processing unit was presented. A speed-up factor of 70 against pure electronic solutions in the calculation of standard functions can be achieved with this architecture due to a parallel optical memory-processor coupling. Another architecture class that will profit from short-distance optical interconnects are reconfigurable hardware systems. A parallel optoelectronic interface between the configware storing memory and the reconfigurable logic processor allows more parallelism in dynamically reconfigurable systems. Three magnitudes faster reconfiguration times and easier partial reconfiguring can be achieved compared to pure electronic solutions. A first prototype chip using modulator technology was realised to prove the principle functioning. The chip allows to load dynamically the content of lookup-tables by optical interconnections. Furthermore we investigated smart detector technology for a future massively parallel 3-D optoelectronic integer processing unit. The achievable high throughput of such an architecture is primarily based on pipelined optically linked circuit planes which are realised in a cheap standard CMOS process. The basic PE of such a universal processor architecture was integrated in a smart detector test chip together with an array of different pn photo diode structures which operate as external optical input pads. The diodes were experimentally tested concerning their optimum size. A diode size of 40´ 40 was found as ideal and a recently developed receiver

281

circuit which is robust against fluctuations about 28.8 µA was presented. To lead optoelectronics in CMOS VLSI systems to a widespread standard there is still some work to do in the future. A promising mature of technology was already reached in the last years. Nevertheless improvements in component development has to go on, e.g. the yield of integrated large VCSEL arrays should be improved and their power dissipation reduced. Furthermore it is necessary, especially with regard to commercial aspects, to achieve a standardisation in component manufacturing. The concerns standard packaging and assembly techniques as well as the design of standard CMOS library cells for efficient on-chip driver circuitry for detectors and VCSELs. It is likely that standardisation of optical components will be strongly driven by the need of future high-speed network technology for a faster internet and optical system-area networks. The main bottleneck which prevents higher bandwidth in the internet today is the required latency in the electronic routing nodes. This latency is caused by discrete detector and sender arrays, which convert optical in electronic signals, and CMOS circuitry performing routing tasks. All-optical internet and optical system-area networks might involve the direct attach of optical interconnects to the CMOS die, too. This will also promote the availability of low-cost components which can be used for an OE-VLSI. If standardisation in components manufacturing and in design tools for OE-VLSI systems is achieved, a major step is reached for a widespread use of an OE-VLSI in the next decade.

References [1] Goodman JW, Leonberger FJ, Kung S, Athale RA: Optical Interconnections for VLSI systems. Proc. IEEE 72 (1984) 850–865 [2] Fey D, Erhard W, Gruber M, Jahns J, Bartelt H, Grimm G, Hoppe L, Sinzinger S: Optical Interconnects for Neural and Reconfigurable Architectures. Proc. IEEE 88 (2000) 850–865 [3] Liu Y, Strzelecka EM, Nohava J, Hibbs-Brenner MK, Towe E: Smart-Pixel Array Technology for Free-Space Optical Interconnects. Proc. IEEE 88 (2000) 764–768 [4] Sinzinger S, Jahns J: Integrated Microoptical Imaging Systems with High Interconnection Capacity Fabricated in Planar Optics. Appl. Opt. 36 (1997) 4729–4735 [5] Bähr J, Brenner K-H: Optical Motherboard: a Planar Chipto-Chip Interconnection Scheme for Dense Optical Wiring. Proceedings Optics in Computing OC’98, Brugge, June 1998, pp. 419–422 [6] Ghosh P, Mangaser R, Mark C, Rose K: Interconnect-Dominated VLSI Design. Proceedings IEEE Conference on Advanced Research in VLSI ARVLSI’99, Atlanta 1999, pp. 114–123 [7] Moisel J, Bogenberger M, Guttman J, Krumpholz O, Kuhn K-P, Huber, H-P, Rohde M: Optical Backplanes with Integrated Polymer Waveguides. Opt. Eng. 39 (2000) 673– 679 [8] Griese R: Optical Interconnections on Printed Circuit Boards. Proc. SPIE 4089 (2000) 958–968 [9] Hoppe L, Köhler JM, Bartelt H, Höfer B: Zweidimensionales Faserarray und Verfahren zu seiner Herstellung. Patent application by the German Patent Office, number 199 25 015.4 [10] Levi AFJ: Optical Interconnects in Systems. Proc. IEEE 88 (2000) 750–757

282

Dietmar Fey, Architectures and technologies for an optoelectronic VLSI

[11] The International Technology Roadmap for Semiconductors. Semiconductor Industry Association, San Jose, USA 2000 [12] Chen J, Zhou P, Sun SZ, Hersee S, Myers DR, Zolper J, Vawter GA: Surface-Emitting Laser-Based Smart Pixels for TwoDimensional Optical Logic and Reconfigurable Optical Interconnections. IEEE J. Quantum Electron. 29 (1993) 741–755 [13] Lentine AL, Miller DAB: Evolution of the SEED technology: Bistable logic gates to optoelectronic smart pixels. IEEE J. Quantum Electron. 29 (1993) 655–669 [14] Kiamelev FE, Lambirth JS, Rozier RG, Krishnamoorthy AV: Design of a 64-bit microprocessor core IC for hybrid CMOSSEED technology. In Proc. MPPOI’96, Maui, Hawaii, IEEE CS Press, Los Alamitos 1996, pp. 53–60 [15] Thienpont H, Debaes C, Baukens V, Ottovaera H, Vynck P, Tuteleers P, Verschaffelt G, Volckaerts B, Hermanne A, Hanney M: Plastic Microoptic Interconnection Modules for Parallel Free-Space Inter- and Intra-MCM Data Communication. Proc. IEEE 88 (2000) 769–779 [16] Haney HW, Christensen MP, Miljokovic P, Fokken GJ, Vickberg M, Gilbert BK, Rieve J, Ekman J, Chandramani P, Kiamilev F: Description and Evaluation of the FAST-Net Smart Pixel-Based Optical Interconnection Prototype. Proc. IEEE 88 (2000) 819–828. [17] Jahns J: Planar Packaging of Free-Space Optical Interconnections. Proc. IEEE 82 (1994) 1623–1631 [18] Lytel R, Davidson HL, Nettleton N, Sze T: Optical Interconnections within High-performance Computing Systems. Proc. IEEE 88 (2000) 758–763 [19] Ishikawa M, McArdle N: Optically Interconnected Parallel Computing Systems. IEEE Computer (February 1998) 61–68 [20] Chai SM, Gentile A, Lugo-Beauchamp WE, Fonesca J, CruzRivera JL, Wills DS: Focal-plane Processing Architectures for Real-time Hyperspectral Image Processing. Appl. Opt. 39 (2000) 835–849 [21] Wu SM, Kuznia CB, Hoanca B, Chen C-H, Sawchuk AA: Demonstration and Architectural Analysis of Complementary Metal-oxide Semiconductor/Multiple-quantum-well Smart-Pixel Array Cellular Logic Processors for Single-Instruction Multiple-Data Parallel-Pipeline Processing. Appl. Opt. 38 (1999) 2270–2280

[22] Krishnamoorthy AV, Miller DAB: Firehouse Architectures for Free-space Optically Interconnected. VLSI circuits. J. Parallel and Distributed Computing 41 (1997) 109–114 [23] Fey D, Kasche B, Burkert C, Tschaeche O: Specification for a reconfigurable optoelectronic VLSI signal processor suitable for digital signal processing. Appl. Opt. 37 (1998) 284–295 [24] Fey D: Algorithmen, Architekturen und Technologie der optoelektronischen Rechentechnik. Habilitation thesis University Jena, 1999 [25] Dammann H, Görtler K: High-efficiency Inline Multiple Imaging by Means of Multiple Phase Holograms. Optics Comm. 3 (1971) 312–315 [26] Sherif SS, Griebel SK, Au A, Hui D, Szymanski TH, Hinton HS: Field Programmable Smart Pixel Arrays: Design, VLSI Implementation and Applications. Appl. Opt. 37 (1998) 284–295 [27] van Campenhout J, van Marck H, Depreitere J, Dampre J: Optoelectronic FPGAs. IEEE J. Selected Topics Quantum Electron. 5 (1999) 306–315 [28] Mombru J, Panotopoulus G, Psaltis D, An X, Mok F, Ay S, Barna S, Fossum ER: Optically Programmable Gate Array. SPIE-Proc. 4089 (2000) 763–771 [29] Grimm G, Fey D: An Associative Memory based on Hybrid SEED Technology. SPIE-Proc. 3460 (1998) 339–342 [30] Fey D, Degenkolb M: Digit Pipelined Arithmetic for 3-D Massively Parallel Optoelectronic Circuits. Supercomput. 16 (2000) 177–196 [31] Grimm G: Integration optischer Anschlüsse auf Schaltkreisen mit hoher Kanaldichte. Dissertation Universität Jena, 2001 [32] Kuijk M, Coppée D, Vounckx R: Spatially Modulated Light Detector in CMOS integrated with Sense-Amplifier Receiver Operating at 180 Mb/s for Optical Data Link Applications and Parallel Optical Interconnects Between Chips. IEEE J. Selected Topics Quantum Electron. 4 (1998) 1040–1045