CMOS single-chip digital signal processor by Hideo H a r a * , Takashi A k a z a w a * and Yoshimune H a g i w a r a * *
*M.usashi Works, Hitachi, Ltd. **Central Research Laboratory, Hitachi, Ltd. The HSP (HD61810) is a single-chip digital signal processor which includes a high speed arithmetic logic unit, a high speed multiplier, and a large memory on a single silicon chip. Its architecture features floating point arithmetic and a pipeline structure. With the floating point arithmetic operation, the HSP can manipulate a wide dynamic range of data. The instruction cycle of the HSP is 250 ns. One multiply/add operation can be executed per cycle. The HSP uses 3/zm CMOS technology and so achieves low power consumption. It is programmed by an internal instruction ROM.
1.
Introduction
Recent improvements in the VLSI technology have made it possible to integrate on a single chip signal processing elements for voice band applications. The device is called a single-chip digital signal processor (DSP), and it can process signals digitally following the sequence of the internal R O M instruction program, rather like a single chip microcomputer. DSPs have already been announced by semiconductor manufacturers t-4. Hitachi has developed a singlechip DSP with a new architecture. Called the High Performance Signal Processor 5, its architecture features floating point arithmetic realizing high accuracy to implement speech processing without increasing the number of chips. The HSP is fabricated in 3/xm CMOS LSI technology and so achieves low power consumption. 2.
Design concept of HSP
The performance and accuracy of applications for DSPs have been studied over a long period. On the basis of this, the HSP's architecture was designed. A stored program architecture was selected. This is because the HSP can be used in applications that require internal instruction programming similar to memories or coefficient data memories, similar to conventional DSPs. With this architecture the HSPcan be applied to various systems with only internal ROM programming. A HSP design should provide the following general purpose functions: (i) General p u ~ multiplier and add/subtract circuits and a multibus structure should be incorporated on one chip for efficient execution of various types of signal processing. (ii) An instruction sequence and constant data, such as the coefficient of the digital filter, need to be programmed on an internal ROM to realize a low cost single-chip DSP. (iii) Efficient microinstructions should be designed for high speed multiply/add operations. Moreover, the HSP should contain the basic instructions found in general purpose microcomputers. (iv) It should be possible to use the HSP alone for compact systems, but also to use it as a peripheral LSI for an 8 or 16 bit microcomputer in complex systems. 20
MICROELECTRONtCS JOURNAL Vo115 No 4 9 1984 Benn Electronics Publications Ltd, Luton
2.1 High accuracy High accuracy is an important characteristic of the HSP. Signal processing accuracy is represented by dynamic range. Conventional single-chip DSPs do not have enough accuracy to implement linear predictive coding (LPC) speech processing. This LPC requires a wide dynamic range, greater than 24 bits. Conventional DSPs used fixed point arithmetic operation. However, this approach cannot lead to a wide dynamic range because LSI technology does not yet allow economical integration of a large bit-size mulitplier, arithmetic logic unit (ALU), data bus, memory, and associated circuits on one silicon chip. In contrast, the floating point arithmetic architecture is generally useful for a wide dynamic range, but it needs a large ALU area compared with the fixed point arthmetic ALU. For speech and telecommunication applications, the accuracy of the floating point operation needs to be at the level of a 32 bit dynamic range and 16 bit resolution. Thus, 16 bit resolution floating point arithmetic was selected for the HSP architecture because the aim of the HSP is to realize st/fficient accuracy to implement LPC speech processing.
2.2 Highspeed operation The DSP operational speed is generally evaluated according to the repeated multiply/add operation. The requirement for voice band signal processing applications is a processing speed of 250 ns or faster. In order to improve the operation speed, a pipeline structure, parallel operation, a high speed multiplier, and a multilJus structure were employed. These elements would consume more than two watts of power if the chip were produced by NMOS technology. Therefore, it was designed for CMOS technology, realizing low power consumption.
3.
Floating point arithmetic
Conventional DSPs have fixed point arithmetic circuits. To integrate the floating point architecture circuit on a chip, the HSP is designed with a new floating point technique for signal processing. For speech and telecommunication applications, the accuracy of the floating point arithmetic can be represented by a 32 bit dynamic range and a 16 bit maximum resolution, as is shown by the shaded area in Fig. 1. The left part of the shaded area can be determined by fixed point arithmetic and the right by floating point arithmetic. If fixed point arithmetic is used, only the left part will be realized. 2032 bit dynamic range
16
. . . . . . . . . . . .
~
g
12
e~
8
-
.
4 /.,
.
",thr~tic i I
2 -~'
|
2 -I~
9
I
|
F ~ ' ~ point
"t~r~'~c " I
2-'
I
2o
|
I
2'
Amphtude (normalized value)
Fig. 1 Accuracy of HSP Floating Point Arithmetic. The HSP achieves a 32 bit wide dynamic range and mamixmum 16 bit resolution using floating point arithmetic. 21
CMOS single-chip digital signal processor c o n t i n u e d f r o m p a g e 2 1
Exponent
Mantissa
Z5
0
I'i .............. .
.
.
.
.
.
.
.
3
0
t .
.
.
.
.
.
+/I
i
'
I
~ I
'
'
~ ~ ~ j
'
~ I
'
sI
$ : Sign bit
Fig. 2 Floating Point Data Format. The floating point data are represented by a 16 bit mantissa and 4 bit exponent in the HSP.
The feature of the HSP floating point architecture is the automatic switching between two different types of arithmetic, floating point and fixed point, by distinguishing the amplitude of the data. If the amplitude is equal to or larger than 2 -s, the operation employs floating point arithmetic. If the amplitude is smaller than 2 -s, the operation employs fixed point arithmetic. The data format for the HSP's floating arithmetic is shown in Fig. 2. The HSP has a 16 bit mantissa and 4 bit exponent. These data are represented by two's complement number. Thus each piece of data can be represented by the following expression: Mx2 E where M : Mantissa,-1-
4.
Functional description of the HSP
A block diagram of the HSP is shown in Fig. 3. The HSP architecture is suitable for effective digital signal processing as a digital filter. The HSP's fdatures include an internal multibus structure, horizontal type microinstructions, and a large capacity two port RAM. 4.1 Floating point multiplier andALU The floating point multiplier (FMULT) contains a 12 x 12--* 16 bit parallel multiplier for the mantissa and a 4 bit full adder for the exponent. The multiplier and ALU are operated in parallel each cycle. The 20 bit product (16 bit mantissa, 4 bit exponent) of the multiplier is entered into the FALU the following cycle. By means of this pipeline control, the throughput of the multiply/add operation is the speed of one instruction cycle. The floating point ALU (FALU) manipulates 20 bit floating point data (16 bit mantissa, 4 bit exponent). The FALU also permits execution of fixed point arithmetic and logical operations. FALU operation affects a condition code register (CCR), and output to the F A L U enters into the A or B accumulator (ACC). 22
I Data ROM
Data RAM
Instrucbon ROM
1 Instruction rogxster
~_r_L
J
...~rialinput r~nter
-• ___[
1
--•
Serial output register
Input reg:ster
Output register
[~
I
0 5,1
9 0 50
!_--~176 " i - - 0 0,~
~
Data bus Note CCR : Conditioncc4e regrstet Do-Dl~ : Data input/outpJt pr:s GR : Generalregzstet PC : Procjramcounter Sl : Serialinput pm 50 : 5chat output pn
Fig. 3 Block Diagram of the HSP. The data memory, multiplier, ALU, and inpu:Uoutput registers are conected to each other by the multibus in the lISP.
23
CMOS Idngle-chip digital =~gnal ~
continued from page 23
4.2 Data m e m o r y The HSP has a data RAM, a data ROM which is used as a constant memory for the coefficients of the digital filter, and the four general registers (Gigs). High density and high speed static RAM technology is used to mount a 200 word (16 bits for each word) RAM on chip. With a dual port read and one port write RAM, two different data can be read at the same time. Two different data exit on a different bus andthen are transferred to the FMULT and FALU. The address of the data memory is defined by a two page address, which selects each output bus of the data memory, and by a pointer address. The pointer address is decided by two pointer address registers for RAM or one pointer address register for ROM. Figure 4 shows the data memory organization. X/Y-page address o
2
1
Y-pageaddress 4
3
5
0
0
7
I
2
2
3
3
[ "-
GR
ROM
RAM 3O 31
ROMpointer address 48 49
- - R A M pointer address
Fig. 4 Data Memory Organization. The data RAM and ROM have a t w o - p o r t structure. The data memory is addressed by a two-dimensional address.
4.3 Instruction m e m o w The user's program is a 22 bit horizontal type microinstruction sequence and is stored in the instruction ROM. This ROM is the 512 word (22 bits for each word), high speed type. The instruction ROM is addressed by a 9 bit program counter (PC). The PC has two stack registers, so two level nesting of subroutines or interrupts is available. 4.4 Input and output interface The HSP has an 8 or 16 bit parallel input/output interface, a 16 bit serial input, and a 16 bit serial output. The parallel input/output interface is undertaken with an 8 or 16 bit microcomputer. It can also implement direct memory access data transfer. Data transfer is performed through the input register (IR) and output register (OR). Asynchronous serial interface can be used for interfacing with the external analog-todigital or digital-to-analog converters. The serial data transfer is performed through the serial input register (SIR) and the serial output register (SOR). 24
The performance of these input and output interfaces is enhanced by the interrupt feature of the HSP. 4.5 Example of liSP program Using HSP instructions, high performance digital signal processing can be achieved. As an example of a typical digital filter, the transversal filter shown in Fig. 5. The transversal filter is represented by !1"
Y=
X
C~_lXo_t
i=0
Xi~ Xl-1 (one sample delay) These equations are programmed by only 12 HSP instructions. This program will be made with the floating point operation method and will include data conversion and input/output interface functions. Futhermore, the 32 tap transversal filter can be performed by this program within only 10.5 ps.
z -t : One sample delay
Fig. 5 Transversal Filter. The transversal filter is realized by 12 steps of the ItSP program. Then, for example, the 32 tap transversal filter can be performed within 10.5 p.s.
TABLE I Results of chip fabrication The HSP operates at a 5 V single power supply voltage. The instruction cycle time is 250 ns. Input clock Instruction cycle Integration Die size Supply voltage Power consumption Package
16 MHz 250 ns 55 k transistors 6.88 m m x 7.16mm 5V 250 mW 40 pin dual-in-line ceramic package 25
CMOS
singkH~hip digital ~dgnalptoce~m~ continued from page 25
"....
"
.. - - .
~.
~
9
.
.-
.
.~ . . . . .
~
.%,'....
......................
~-.
.
,,
. . . . . . . . . .
_% .....
./
Fig. 6 Microphotograph of the lISP. The HSP is fabricated using CMOS technology and integrates 55,000 transistors on a 6.88 m m x 7.17 mm die.
5. Results of chip fabrication The HSP is fabricated with CMOS 3 #m technology. Furthermore, the memory cell of the data RAM consists of two high resistance polysilicon resistors and four NMOS transistors on a P-well to obtain high speed and high density. The HSP's features are listed in Table 1. A microphotograph of the HSP is shown in Fig.6. The input clock frequency is 16 MHz and the HSP operates at the speed of 250 ns. Power consumption is 250 mW. Approximately 55,000 transistors are integrated on a 6.88 mm x 7.16 mm die. The'chip is housed in a standard dual-in-line 40 pin package (DILC-40). The pin assignment is shown in Fig. 7. Pin Do to D is are the data bus for the paicrocomputer interface, but if an 8 bit microcomputer is used, pins Da to Dis will not be used. 6.
Applications
The HSP has been applied to variousapplications of voice band signal processing. The following are examples of HSP application: (1) Real time LPC speech analysis and synthesis, (2) 9.6 kbps high speed modem, (3) Echo canceller, (4) Servo controller. In this paper, the LPC speech processing system is shown. Before the HSP, real time LPC 26
F; f7 TxAK E Oi~ rz O~ E O:, r~ D, E E Ot E
4~ 0so
Vit
SYNC
TEST
9 OlI
D, OH
D~
3"~
3-~ so BIT I/0 TxRO 3-~ ,SICK 3 ~ SOEN
E
3~ SOCK
O7 r~ r~
3-6] SIs 'Os
D~D O~ D, O~
r~
~-c-s
D, r~ D, V. |
R/~
Fig. 7 Pin Assignment of ttSP (Top View). The HSP is housed in a standard dual-in-line 40 pin package. Pins I)8 to Dis will not be used in the case of an 8 bit microcomputer interface.
speech analysis was realized only by big computers. But a compact system can be realized with two HSPs. Figure 8 shows this system's block diagram. It consists of two HSPs, an 8 bit microcomputer, memory, an analog-to-digital converter, and a speech synthesis LSI. The analysis program is divided into three parts. Spectrum analysis is executed by one HSP, and pitch detection is executed by the other HSP. Control, decisions, and autocorrection are executed by the microcomputer.
2. 4- 9.6 kbps
Note ADC : Analog-to-digital
converter MC : Microcomputer SS : Speech synthesis LSI
Memory
T HSP (spectrum)
It
HSP (pttch)
64 kbps
Fig. 8 LPC Speech Analysis and Synthesis System. This system can execute speech data reduction. The 64 kbps data rate of the original voice is compressed to 2.4 to 9.6 kbps. 27
CMOS slngle-c~ip digital signal I~OCessorcontinued from page 27
In this system, the inputspeech signal is converted to the digital signal, and then it is digitally processed to get the compressed data (LPC parameters). The bit reduction rate of the system is 2.4 kbps to 9.6 kbps as compared with the 64 kbps of the original speech. The program steps, coefficient words, and data words consist of, respectively, 456 steps, 118 words, and 131 words for spectrum analysis HSP and 479 steps, 93 words, and 200 words for pitch detection HSP.
7.
Program development tool
The internal program of the HSP is developed by the customer for each application, so a program development tool will be prepared. A cross assembler-based microcomputer development system will convert the user program, which is written in mnemonic code, to the HSP object code. This object code is transferred to the HSP evaluation chip called the HSP-RM and is used for real time emulation of the user program. The HSP-RM includes 128 words of data R A M and 512 words of instruction RAM instead of the ROM area of the HSP. The data of these RAMs are written by.the external microcomputer before the program evaluation. The HSP-RM is designed based on the HSP chip. It is housed in a 68-pin grid array package (PGA-68). A comparison of the HSP-RM is shown in Table II. TABLE II Comparison of H s P and HSP-RM The HSP-RM has a large capacity RAM instead of the data and instruction ROM of the HSP.
P a n number
ttSP
HSP-RM
HD61810
HD61811
Data memory 200 x 16 bits (RAM)
200 x 16 bits (RAM)
Coefficient memory Instruction memory Address trap
128 x 16 bits (ROM)
128 x 16 bits (RAM)
512 x 22bits (ROM)
512 x 22 bits (RAM)
Package
40 pin dual-in-line ceramic package
Available 68-pin pin grid array
8. Conclusions A powerful single chip CMOS digital signal processor (the HSP) has been developed. It has a floating point architecture and achieves real accuracy. It also achieves low power consumption by the use of CMOS technology. The HSP makes it possible to build a compact real time LPC speech analysis system. In addition, the HSP can be applied to various kinds of digital processing. 9.
References
[1] Kawakami, Y., et al., "A Single Chip Digital Signal Processor for Voice Band Applications,' ISSCC Dig. Tech. Papers (1980), pp. 40-41. [2] Nishitani, T., et al., "A Single Chip Digital Signal Processor Telecommunication Applications," I E E E J. o f Solid State Circuits SC-16 (1981), pp. 372-376. [3] Boddie, J., et al., "A Digital Signal Processor for Telecommunications Applications," ISSCC Dig. Tech. Papers (1980), pp. 40-45. [4] Nicholson, W., et al, "The $2811 Signal Processing Peripheral," WESCON (1979). [5] Hagiwara, Y., et al., "A High Performance Signal Processor Speech Analysis and Synthesis," late paper (session DSP 8-11), ICASSP (1982). '28