Product focus Digital signal processors 96-bit floating-point DSP The DSP96002 is the first member of Motorola's family of single-chip HCMOS, 96-bit general-purpose floating-point DSPs. It was designed for numerically-intensive applications requiring fast IEEE floating-point arithmetic and access to large memory subsystems, i.e. graphics and numeric processing. Motorola claims a peak performance of 40.5 MFLOPS with the 27 MHz device, which will be sampled in the fourth quarter of this year. The device has a 'dual-natured' architecture-- there are two independent data memory spaces, two address arithmetic units, two on-chip DMA controllers, and two buses (see block diagram). This duality makes it easier to write software for numerically intensive applications because, for example, data are naturally partitioned into X and Y coordinates for graphics and imageprocessing applications, and into real and imaginary spaces for performing complex arithmetic. The architecture exhibits a high I Address 32 : /
degree of parallelism- up to three floating operations, two data moves and two address pointer updates can be executed in a single instruction cycle. The CPU consists of three 32-bit execution units operating in parallel. First the data ALU performs all arithmetic (fixed and floating-point) and logical operations. It consists of five b l o c k s - a format conversion unit, a general-purpose register file for storing ALU results, a floating-point multiplier, a floating-point add/ subtract unit, including a 32-bit barrel shifter, and a special function unit. Second, an address generation unit (AGU) performs all address storage and address calculations to address data operands in memory. Third, the program controller permits data transfers between any two locations in any combination of memory spaces without intervention of the DSP96002 core. It consists of a program address generator, program decode controller and the program interrupt controller. The DSP96002 features 1024 words of data RAM equally divided
Fifth generation fixed-point DSP Texas Instruments has disclosed details of its fifth generation of fixedpoint DSP chips, the TMS320C5x, which is source-code compatible
I
~[~-~]Address
J ExternalJ ] address : switch =
lEx!ernall ,aaaress ,
switch
Dual addressJ[ - I BUS Ic°ntr°U A r b i t r a t i o n I conhlrol
Internal data switch
Program 1024 x 32
RAM
RAM
Y memory 512 x 32 RAM
Dual c h a n n e l [ I DMA II controller
64 x 32 Bootstrap ROM
512 x 32 Cosine ROM
512 x 32 Sine ROM
r°0ramJ I Pr°gram I[ address
generator
X memory 512 x 32
Jl
generation unit
- '
Port A / h o s t interface,
into X data and Y data memory, 1024 word of full-speed on-chip program RAM and two preprogrammed data ROMs. On-chip bootstrap ROM allows convenient loading of user programs into the program RAM. Two independent expansion bus ports facilitate interfacing to SRAMs, DRAMs and VRAMs. A package of software development tools, DSP96000CLASx, is available now to allow the user to develop assembly language for DSP6002 applications, to link it and then to fully test it. The package consists of a simulator, an assembler, a linker and callable modules. (Motorola Ltd, 3501 Ed Bluestein Blvd, Austin, TX 7872•, USA. Colvilles Road, Kelvin Estate, East Kilbride, Glasgow G 75 OTG, UK. Tel: (03552) 39101) []
decode controller
controllerinterrupt
32
~--
Port B/host I - ' hi I Arbitration ~interface (BHI) control ~ . ~
con%,
I Program
/
' [ Data unit I . I E E E floating point
I.s2x 32 integer ALU
debug
Program control unit CLK 3 2 - B i t buses
Serial debug port
Block diagram of the DSP96002
Vol 13 No 7 September 1989
481
Product focus I External memory
I
Program data RAM(8kx16)
interface.. A ~ D(15-0)K/~H A(15-0) MUX
I Data RAM I JTAG test/EMLL- (544 x 16) control [ ~'~ ~ Pe~itPh~arcaeI
Boot R O M (2kx16}
Program data buses MUX CPU 32-bit ALU
16 x 16 bit muttplier 32-bit ACC Acc/prod shifter Pre/post Context switch shifter registers PLU Status registers Program control Instruction registers registers
m~ ~ ) L --i I-" ~
Timer
Software F wait states
Block diagram of the TMS320C50 with all TMS320Clx and TMS320C2x DSPs. The latest generation is designed to perform an instruction in 35 ns, giving a performance of 28.6 MIPS, claims TI. It is targetted at telecommunication, automotive, military and computer peripheral applications. The first chip in this family is the TMS320C50, samples of which will be made available in a 50 ns version capable of 20 MIPS in the first quarter of 1990. It is said to outperform existing fixed-point DSPs by 2-4 fold. The 0.8pro CMOS device has an advanced Harvard architecture and a high degree of parallelism. There is single-cycle address generation and progam execution, with a majority of instructions operating in a single cycle. The architecture (see block diagram) is maximized for efficient bit manipulation, zero overhead context switching (through the use of stack registers), and block-repeat execution of code. Large on-chip memories and many peripherals are integrated to maximize system performance and minimize cost. The DSP incorporates the JTAG IEEE P1149.1 standard for improved testability and ease of emulation. The central processor unit consists of a multiplier, which performs a multiply in a single machine cycle; an arithmetic logic unit (ALU) which performs single-cycle 16- or 32-bit logical or arithmetic operations, such as add and subtract; five shifters, of which three are barrel shifters, give flexibility to extended precision arithmetic and to the handling of
482
overflow associated with large numbers of multiply/accumulates; and a parallel logic unit (PLU), which operates independently from the ALU, for bit manipulations. There are also 19 program control registers for servicing of interrupts. There are three on-chip memory b l o c k s - - a 544 X 16-bit RAM (for storage and recovery of CPU calculations), and a 8192 x 16-bit RAM (for program execution at full speed), and a 2048 x 16-bit ROM (used as a boot loader). The TMS320C50 can also address 128 k x 16 bit of external memory. Peripherals are linked through a common bus structure, the TI Bus; this will facilitate the development of spin-off devices. The peripherals are: a full duplex serial port, operating at rates of IOMIPS, provides direct communication with serial devices or may be used in a multiprocessor configuration; an internal timer; 32 software wait state generators which allow the device to be used with slower off-chip memory and I / 0 devices; and a parallel I / 0 port of 16bit width. (Texas Instruments, 12501 Research Blvd, Austin, TX 78759, USA. Tel: (512) 250-7655. Manton Lane, Bedford MK41 7PA, UK. Tel: (0234) 270111) []
ST18 family upgrade The SGS-Thomson ST18 family has been upgraded with the addition of two 32-bit DSPs-- ST18940 (a microcontroller; contains ROM) and ST18941 (a microprocessor; ROMless).
ST18940/1 features an advanced Harvard architecture and a high degree of parallelism- in a single operation cycle the device can read two independent operands, perform a multiplication and an ALU operation, write a result back to memory, modify three address pointers and perform an I/O operation. With a cycle time of 10 ns, SGS-Thomson claims a throughput of 10 MIPS. The device is upwardly source compatible with the earlier 16-bit members of the family. It is aimed at advanced DSP applications in telecommunications, speech and image processing, spectrum analysis, high-speed control systems and digital filtering. This announcement quickly follows that of the ST18930/1, a CMOS version of the initial member of the ST18 family, TS68930, but with a faster instruction cycle time (80 ns) and additional hardware and software facilities. In comparison with the earlier members of the family, ST18940/1 provides enhanced arithmetic capabilities (and is particularly suited for fast Fourier transform, convolution and echo calling), addressing modes and additional I/O functions. The architecture (see block diagram) is based on four independent address calculation units (ACUs), three internal 16-bit data buses and three internal data memories, with a separate 32-bit program bus. The ST18940 has a 3 k x 32-bit program ROM and 512 x 16-bit coefficient ROM. The ST18941 microprocessor version can address up to 64 k of program memory on a dedicated bus, thus providing true realtime emulation of the ST18940 ROM version. In addition it has two internal RAMs (X and Y); a 128 × 16bit coefficient RAM is included for coefficient memory emulation. The two external buses, the system bus and the local bus, allow the device to be connected to a host processor or to other DSPs without additional glue logic. With the 16-bit local bus, either peripherals, such as analogue interfaces, can be controlled or up to 64 k × 16-bit of external memory can be accessed in
Microprocessors and Microsystems