Bandwidth reduction for speech transmission: A microprocessor based simulator

Bandwidth reduction for speech transmission: A microprocessor based simulator

North-Holland Microprocessingand Microprogramming 12 (1983) 231-236 231 Bandwidth Reduction for Speech Transmission: A Microprocessor Based Simulato...

361KB Sizes 0 Downloads 68 Views

North-Holland Microprocessingand Microprogramming 12 (1983) 231-236

231

Bandwidth Reduction for Speech Transmission: A Microprocessor Based Simulator Neal Hutchinson The Microcomputer Unit Polytechnic of Central London London, England Hardware based on a sixteen bit microprocessor has been developed to allow for the reduction of the bandwidth required to transmit speech. The system based on the MC68000 performs in real time. To assist in determining the parameters required for the real time operation a simulator has been designed. The simulator runs on a microprocessor development but since most of the software is written in Pascal it is relatively portable.

1.

This paper describes a simulator which has been designed to allow for the investigation of the parameters which affect the quality of speech when bandwidth reduction is used. The method of bandwidth reduction involves the use of the Walsh Transform. The Walsh functions take the value +l or -1 and are easily computed using a microprocessor. ° Bandwidth reduction is based on the fact that the energy of the speech is concentrated among a small number of coefficients in the transform domain and it is these coefficients which are transmitted rather than the time domain data. The microprocessor based real time system performs the transform, the selection of coefficients and the inverse transforms, and is programmed in Assembly language. The simulator allows the same word or phrase to be reconstructed using different parameters and listening tests then allow comparisons to be made. Since the execution time for the coefficient selection routine is a function of the actual speech, the simulator is used to determine worst case and average execution times.

2.

In a 64 point transform between 4 and 12 coefficients surfice to allow good quality speech to be reproduced.

INTRODUCTION

BANDWIDTH REDUCTION USING FWHT

There are a number of techniques available for reducing the bandwidth required to transmit speech. The method used in this system involves the use of orthogonal transforms (1). In this method bandwidth reduction is based on the fact that the energy of the speech is concentrated among a small number of coefficients in the transform domain and it is these coefficients rather than the time domain data which is transmitted. The most widely known orthogonal transform is the Fourier, which transforms between the time and frequency domains. This transform requires the use of complex multiplication. The Fourier Transform is not the only useful one and for speed and simplified hardware the Walsh-Hadamard Transform is an attractive alternative. The Walsh function (Fig. I) only takes the value +l or -l~

WAL (O,t)

+I +1 --1-

-WAL +1 -1

(1,t)

WAL (2,t)

+1 -1

I

I

Figure I.

L

WAL (3,t)

Continuous Walsh Functions,

Figure 2 shows the complete speech transmission system. A speech input signal is low pass filtered (3.4 kHz) and digitised to lO bits at an 8kHz sampling rate. Blocks of 64 are then transformed. The dominant set of coefficients are selected, labelled and transmitted. The receiver performs the inverse transform. An orthogonal operation:N

= [4

transform involves the matrix

[q

- - 1

where ~ and [ ~ are vectors representing the transformed data and the input data vectors is the transform matrix. respectively and ~ The transform matrix takes the form (for 4 points).

T=

i-i -1-1

- - 2

-ll

In general for an N point transform NZmulti-

232

N. Hutchinson / Bandwidth Reduction for Speech Transmission

Speech Input

-~Low P a s s ~ ~ - ~ Filter ) ] --]

Received ~ Data

Microcomputerperforming

1

)

Transmitted Data

Speech

I

Output

Inverse Transform

Figure 2.

Speech Transmission System

plioations by +l or -1 are required (ie additions and subtractions). The fast Walsh Hadamard Transform (FWHT) is an algorithm for computing the transform in N log.N additions and L subtractions (ignoring a division by N) (2).

3.

Microcomputer Performing FWHT and Coefficient Sort

THE REAL TIME IMPLEMENTATION

To achieve this bandwidth reduction a 16 bit microprocessor based system has been designed (3). A 64 point transform is used. The hardware uses an MC58000 based single board computer (MEX 68KDM) and a purpose built board containing analogue converters and fast static ram. The single board computer contains the 8 MHz microprocessor, 32K dynamic RAM, 8K ROM, programmable counter, interface devices and interrupt circuitry. The static ram used allows the processor to run without wait states. The software, written in assembly language~ consists of 4 modules; the FWHT, the sorting routine, the inverse FWHT and the analogue input/output software. The analogue converter software is interrupt driven. The sorting routine must find the largest N coefficients out of a set of 64. An insertion sort algorithm is used. Assuming the first k elements are sorted into a list, each of the remaining (64-k) are compared with the smallest element on the list and either rejected if smaller~ or inverted into the list at the appropriate position. After an insertion, all the elements in the list below the new value are pushed down one place and the

smallest element drops out. The total time needed to execute this sorting routine depends on the actual data and will vary from block to block. It is necessary to show that in the worst case for speech data, the time needed is available after allowing for the FWHT, the inverse FWHT and the analog play and store. The 64 point transform takes1550 ~S to execute. The time to execute the analogue input/output routine is 580 ~S and to perform the transform, invers~ transform and analogue input/ output takes 3880~S leaving 4120 ~S for the sorting routine. One of the functions of the simulator is to determine if this is enough when typical speech input is used.

4.

SIMULATOR REQUIREMENTS

The second generation 15 bit microprocessors, such as the MC68000, can run the sorting routines in real time but with very little to spare. The quality of the reconstructed speech will depend on a number of parameters (4); these include the number of coefficients transmitted, the number of bits per coefficient and any threshold used to limit low valued coefficients. Since, in order to achieve this real time operation, the system is programmed in assembly language, the scope for easy manipulation of the system parameters is limited. The simulator was designed with the following ebjectiues:-

N. Hutchinson/Bandwidth Reduction for Speech Transmission

(1)

To allow the user to listen to sections of speech reconstructed using different parameter values.

(2)

To determine the timings of critical sections of the real time system.

(3)

To present statistics relating to the characteristics of the speech being listened to.

The front end back ends require as moch memory as possible to hold the data (Fig. 3) and since PLM is compiled and needs no run time system, it is very compact. In addition the version of Pascal available is interpreted and would use up a large amount of memory. The main advantages of PLM in this file handling application are:(1) (2) (3)

5.

IMPLEMENTATION

233

Allows interaction with ISIS. Easily linked with assembly language routine. Fast, since it is compiled.

DETAILS FFFF

5.1 Hardware The simulator has been implemented on an Intel Development System (MDS800) which i8 8080 based. This is a multibus based microcomputer with 64K read/write memory and dual double density floppy discs. The MDS runs under the Intel operating system ISIS-11. Extensive use is made of the system calls available under ISIS, particularly those relating to the disc system. A multibus compatible analog input/output card was deeigne~ This card contains a sample hold amplifier, an analog to digital converter and dual digital to analog converters. 5.2 Software There are 3 separate tasks which the simulator must perform. These involve digitising speech, file manipulation~ and the bulk of the simulator's operations such as the FWHT, sorting, etc. Once the digitised data is stored in memory, part (or all) of it must be transferred to a named disc file. At the reconstruction stage, data may be taken from a number of different files and transferred to RAM for listening tests. Sections of data from a number of files may be combined in a single file to allow comparison of the same segment of speech reconstructed using different parameters. This communication with the disc can be most efficiently undertaken using the operating system's low level system calls. For the real time sections assembly language is the most appropriate. The two remaining sections can be implemented in high level although the file handling software must be very closely coupled to the computer's I/O drivers. The choice of language available on microcomputers is often restricted. However, a number of languages are available on the MDS including Basic, Pascal, Fortran and PLM80. The file manipulation involves accessing the disc I/O drivers which are part of the operating system and can be easily done using PLM. The question of using Pascal for this function arises: however, the version available (5) does not have the same low level capability of PLM. There is, of course, strong coupling between the hardware and software at the front and back ends of the simulator and PLM is ideal for this work.

Monitor F800

Data Area

6000 Simulator

Programs 31BO

ISIS Resident O000 Figure 3.

Simulator Memory Map

5.3 The FWHT and Sorting Programs The main part of the simulator is written in Pascal and includes the following modules:(1) (2) (3)

FWHT Various sorting routines. Inverse transform.

There are a number of variations on the ~orting programs employed. A number of these programs involve the use of records, in that each coefficient produced by the FWHT has an associated index and sorting must be done on both coefficient value and index at different times. Although not part of standard Pascal, some use has been made of the string handling extensions provided by Pascal 80. In general, however, to aid portabilit~ only the standard has been used. 5.4 Data Files Data is held on file in a twos complement format consistent with ISIS. In addition to the data, each file contains a header. This allows the system to display information about the file to the user. Files which are made up of sections of other files have a file header (Table l) which contains the number of segments and their length. Simple files containing one segment of speech have a one word header containing the number of 64 word blocks. 5.5 The Front End Software The front end software is written in a com-

234

IV. Hutchinson/Bandwidth Reduction for Speech Transmission

bination of PLM and assembly language. The main program is a PLM module (STORE) which calls an assembly routine (PICK). It is necessary for speed reasons to use assembly language since at an 8K Hz sampling rate there is only 125~S available for the storing program. Timing is provided by a programmable 16 bit counter (8253). The counter initiates the analog to digital conversion. The end of convert (EOC) is monitored by the program. After each EOC the least significant, then the most significant bytes are stored. When the program is run the input/output devices are initialised and the processor halted (wait for interrupt). Once the segment of speech is about to be played, an interrupt is generated from the front panel switch which initiates the store cycle. The total memory available for date storage is 38glO bytes. (Fig. 3). Once this space is filled, the data stored is played back and the listener can decide whether or net the required speech segment has been captured. The second half of PICK allows the captured data to be windowed by moving in or out a left and right hand cursor and playing back this windowed speech. Four keys are used: 'I' moves the left hand side in (closes the window), 'D' opens the window from this side, 'E' moves the right hand side out (opens the window), while 'L' closes it. 'S' returns control to the main program. Once the segment required has been selected, it can be stored on disc. Each word is converted from complementary offset binary to twos complement and sign extended. This is done since both PLM 80 and Pascal 80 use this format (ISIS standard). The user is prompted for a file name and the system checks for its previous use before saving the data. 5.6 The Back End Software The playback facility is similar to the record facility in that two programs are involved: one in assembly language (PLAY) which actually plays back the data and the main program REI which is a PLM module and provides the interactive facility. Replay allows the user to open any file (prompt: Which file?) and play it back or select part of it (prompt: W(indow) E(xit) N(ew file) followed by number of blocks and current window, followed by a prompt for new left and right block numbers. This allows the user to listen to any segment of the storsd speech. The block numbers are entered in decimal (range 0-999).

Word 0 Word 1

Total number of words in the file Number of speech segments

Word 2-5

Number of words per setment

TABLE 1

File Header

6.

USE OF THE SIMULATOR

The simulator is designed to allow easy manipulation of speech data which is stored on disc. In a typical session, speech will be digitised as it is played from a tape recorder and stored in read/write memory. The start of this conversion routine (STORE) is initiated by an interrupt generated by the user activating a front panel switch. Once the data is in memory any part of the speech may be selected to be stored in the floppy disc. To do this a 'window' is opened on the data. A right side and left side cursor are defined and these can be moved in or out from the keyboard. After each change the windowed data is played back. Once the user has specified the data to be saved the system prompts for a file name. The file name is checked to prevent inadvertent destruction of previously stored data. A typical sequence is shown in Appendix B. The time domain data now stored can be transformed by invoking WOLX. This program performs a Fast Welsh Hadamard Transform on the audio data. The coefficients produced can be sorted, a number selected and the inverse transform performed. The reconstructed speech can then be played back. In accessing the quality and performance of the system, this is a crucial stage. A number of playback programs can be used. REPLAY is analagous to the front end program STORE. Any part of the reconstructed speech can be played back. Since the transform is performed on blocks of 64 words at a time, the speech output is specified in terms of block numbers. The user is prompted with the current block settings, ie. the total number of blocks and the current right and left block number. The system then prompts for a new set of block numbers. It is desirable to be able to listen to the same phrase or word when it has been reconstructed, using a different number of coefficients or a different number of bits per coefficient. To allow comparisons to be made, the various versions of the word or phrase must be played back one after the other with a short gap between them. The program 'SPLICE' allows this. Up to four files san have sections spliced together in a new file. Each of the four files can be individually windowed. Splice prompts the user with the total number of blocks and the number of sections (1 and 4). The listener can then play any combination of the sections in any order simply by keying in the required sequence. Information relating to the distribution of coefficients in the Walsh domain can be directed to the line printer. This data can then be related to the played back speech.

7.

USING THE SIMULATOR TO DETERMINE THE PARAMETERS FOR THE REAL TIME SYSTEM

(1) The system running on the MC58000 must perform the FWHT, the insert sort and the

N. Hutchinson / Bandwidth Reduction for Speech Transmission

inverse FWHT in real time. Of these, the speed of the sort routine depends on the data. The timing of the program depends on how many attempted inserts must be made for each data word. The rest of the program is time independent. The simulator has been programmed to count the number of attempted inserts and using this number the timing for the complete sort routine can be calculated. This procedure is carried out for a number of sentences and the program prints the average number of attempted inserts as well as the minimum and maximum. Table 2 shows some of the results. The theoretical worst case results from sorting a data vector in which each element is greater than the previous one. The worst case and average relate to actual speech.

235

Appendix A Program Module

Data File Speech Input

NAM . O I

,

N

Theoretical Worst Case

8 12

Table 2.

1840 4106

Worst Case

Average

650 1600

550 1300

Sort Routine Timings (Micro Seconds)

(2) A modification to the sort routine involves setting a threshold value and rejecting all coefficients below that value without going through the test-for-insert procedure. A fixed threshold can easily be employed or e dynamic one whose value is a function of the coefficient values in the most recent blocks. The simulator has been run to examine both these systems and e further time saving without degrading the speech quality has been found using the fixed threshold. The usa of the dynamic threshold is currently under investigation.

8.

CONCLUSION

The main objectives of the simulator have been met. The effect on the quality of reproduction has been investigated using listening tests as has the effect of fixed thresholds. In simulating the real time system the time required for the critical sorting routine has bean established. The main limitations are due to slow operation of the microcomputer used. A compiled version of Pascal would improve this. Further work on the system will include a graph plotting feature and the inclusion of the date and the phrase being spoken on all data files.

M .RE? 1

236

N. Hutchinson/Bandwidth Reduction for Speech Transmission

Appendix B A typical Simulator sequence. File COG.POT contains the word "one", reproduced using three different sets of parameters. The total number of blocks is 105. The listener selects each of the three segments in turn and then in the sequence 2,3,1.

ISIS-II,

V3.4

-:FI:REPLAY.LOC Which file ? :FI:COG.POT W(indow) E(xit) N ( e w file) S ( e g m e n t ) * W No of b l o c k s is 105 Current window 000 000 L e f t w i n d o w b l o c k n u m b e r ?0 R i g h t w i n d o w b l o c k n u m b e r ?105 W(indow) E(xit) N ( e w file) S ( e g m e n t ) * S W h i c h s e c t i o n ,(C r e t u r n s to c o m m a n d l e v e l ) , n o 1 2 3 231 C W(indow) E(xit) N ( e w file) S ( e g m e n t ) * W N o of b l o c k s is 105 Current window 000 105 L e f t w i n d o w b l o c k n u m b e r ?50 R i g h t w i n d o w b l o c k n u m b e r ?i00 W(indow) E(xit) N ( e w file) S ( e g m e n t ) * N Which file ?

of s e c t i o n s =

003

REFERENCES 1.

Campanella, S.J. & Robinson, G.S. "A comparison of orthogonal transformations for digital speech processing", IEEE Transactions on Communications Technology, Vol COM-lg no 6, 1971

2.

Shum, Y.Y., Elliott, A.R. & 8town, O.W. "Speech processing with Welsh - Hadamard transforms", IEEE Transactions lg73, AV-219 pp 174-17g

3.

Hutchinson, N. "Bandwidth Reduction for speech transmission using a sixteen-bit microprocessor" Journal of Microcomputer Applications (1982) 59 119-128

4.

Frangoulis,E. & Turner, L.G. "Hadamard transformation technique for speech coding: results" 9 Proc IEE 124, 845-852, 1977

5.

Intel Corp. "PASCAL-80 USERS GUIDE", 1980

some further

N.J.Hutchineon, after graduating, worked as a research officer at the Central Electricity Research Laboratories. Subsequently he has worked in R&D for a number of major companies inclwding EMI Medical and GEC. Microprocessor based projects he has worked on include cash dispensers,X-ray scanners and environmental control equipment. Since 1979 he has been the manager of the PCL Microcomputer Unit. The Unit provides a consultancy and training service to industry. Hie current research interests include speech processing and the application of microprocessors in industrial control.