MAD, a floating-point unit for massively-parallel processors

MAD, a floating-point unit for massively-parallel processors

Literature survey Bartoloni, A, Battista, C, Cabasino, N, Del Prete, F, Marzano, F, Paolucci, P S, Sarno, R, Salina, G, Todesco, G M, Torelli, M, Trip...

115KB Sizes 0 Downloads 95 Views

Literature survey Bartoloni, A, Battista, C, Cabasino, N, Del Prete, F, Marzano, F, Paolucci, P S, Sarno, R, Salina, G, Todesco, G M, Torelli, M, Tripiccione, R, Tross, W and Zanetti, E. S, Cabibbo,

'MAD, a floating-point unit for massively-parallel processors' Part. World Vol 2 No 3 (1991) pp 65-73 The authors describe in detail the architecture and implementation of the MAD chip. It is a floating point unit, used as the elementary processing element of the APE100 array processor. The design has been accurately tailored to the requirements of a SIMD floating point intensive machine.

Buonanno, G, Lombardi, F, Sciuto, D, Shen, Y-N 'Design for testability techniques for CMOS combinational gates' IEEE Trans. Instrum. Meas. Vol 40 No 4 (August 1991) pp 703-708 The design of easily testable CMOS combinational circuits is discussed. Two CMOS structured design techniques are presented. The novelty of this approach is the complete fault detection of singleand multiple-line stuck-at, transistor stuck-open, and stuck-on faults for combinational circuits. The test algorithm requires only minimal modifications to detect a large number of bridging faults. These techniques are both based on the addition of two transistors, a P-FET and an N-FET, which are placed in series between the P and N sections, In thefirst case (dynamic fully CMOS, DFCMOS), the transistors are controlled by a single input; in the other case (testable fully CMOS, TFCMOS), there is one input for each additional transistor. The test procedure is presented, and it is shown that multiple fault detection can be easily achieved,

Advances in the development of digital GaAs integrated circuits have progressed to the point that designers of signal and data processors can discern the system applications for which GaAs is best suited. Basic computation primitives in DSP and image processing systems are usually adders, multipliers and delay elements. In this paper, the authors present the results of a systematic study conducted to evaluate the influence of layout and design methodologies, both conventional and innovative ones, on the performance of those DSP computation primitives, Fouls, D J and Butner, S E 'Architecture and design of a 500MHz gallium-arsenide processing element fora parallel supercomputer' IEEEJ.Solid-State Circuits Vol 26 No9 (September 1991)pp 1199-1211 The design of the processing element of GASP, a GalAs supercomputer with a 500-MHz instruction issue rate and 1-GHz subsystem clocks, is presented. The novel, functionally modular, block data flow architecture of GASP is described. The architecture and design of a GASP processing element is then presented. The processing element (PE) is implemented in a hybrid semiconductor module with 152 custom GaAs ICs of eight different types. The effects of the implementation technology on both the system-level architecture and the PE design are discussed. SPICE simulations indicate that parts of the PE are capable of being clocked at 1 GHz, while the rest of the PE uses a 500-MHz clock. The architecture

for Industrial Applications, Prague, Czechoslovakia, 14-16 May 1990, Springer-Verlag, Berlin, Germany (1991) pp 154-170 Advances in microelectronics have had a significant impact on the implementation of digital controllers and a variety of processor hardware has been employed by the control engineer to meet the discipline of real-time operation including microprocessors, application specific integrated circuits, digital signal processors and digital signal controllers.Thepaperdescribeswork on parallel Kalman filtering, and in particular fine-grained systolic algorithms and their hardware realization using transputers. After a background discussion on systolic arrays, algorithms for regular and square root covariance Kalman filtering are defined, then systolic array architectures are described. The mapping of the systolic arrays onto transputers is discussed. Results are included to illustrate the effectiveness of the mapping method and the potential speedups for a simple application. The broader implications of the results are briefly discussed. Iloh, Y, Wada, A, Morimolo,

K,

Tomita, Y and Yamazaki, K 'Four-layer 3-D ICwith a function of parallel signal processing' Microelectron. Eng. Vo115 No 1-4(October 1991) pp 187-190 A four-layer 3-D IC was designed and fabricated as a primitive 3-D device with parallel signal processing function. The 3-D IC has the following four layers: optical sensor, level-detector, memory and ALU. The chip features array information input using optical sensor, vertical transfer of array information using viaholes, and parallel processing. The fundamental operation of moving object sensing was confirmed.

Eshraghian, K, Sarmiento, R, Carballo, P P, Nunez, A 'Speed-area-power optimization for

utilizes data flow techniques at a program block level, which allows efficient execution of parallel programs while maintaining reasonably good performance on sequential programs. A simulation study of the architecture indicates that an instruction execution rate of over 30 000 MIPS can be attained with 65 PEs.

Chiech

DCFL and SDCFL class of logic using ring notation' Microprocess. Microprogr. Vol 32 No 1-5 (August 1991) pp 75-82

Irwin, G W 'Novel control architectures' Advanced Methods in Adaptive Control

'Defect-tolerant hierarchical sorting networks for wafer-scale integration IEEEl. Solid-State Circuits Vo126 No 9 (September 1991) pp 1212-1222

Vol "15 N o 70 D e c e m b e r 199"/

Kuo, Sy-Yen and Liang, Sheng-

563