Copyright © IF AC Programmable Devices and Systems, Ostrava, Czech Republic, 2003
ELSEVIER
IFAC PUBLICATIONS www .elsevier.comllocateli fac
SPEECH COMMUNICATIONS APPLICATIONS Piotr Klosowski
Division of Telecommunication, Institute of Electronics, Faculty of Automatic Control, Electronics and Computer Science, Sitesian University of Technology Akademicka 16, 44-100 Gliwice, Poland, Phone: (+48) 32 23 7-17-66, Fax: (+48) 32237-22-25, E-Mail:
[email protected]
Abstract: Paper presents results of main research areas on speech communications applications : speech synthesis, speech recognition and speaker verification and identification systems. Research was conducted by Institute of Electronics Silesian University of Technology. Copyright © 2003 IFAC Keywords: speech, speech analysis
1. INTRODUCTION Division of Telecommunication, a part of the Institute of Electronics Silesian University of Technology, since many years specializes in advanced fields of telecommunication engineering. One of them is speech communication applications. Main research areas on this field are : speech synthesis, speech recognition and speaker verification and identification systems (Fig. 1). 2. RESULTS OF RESEARCH ON SPEECH SYNTHESIS SYSTEMS At present, the speech synthesis is widely used in many applications, the first of all in telecommunications (Rabiner, 1984). In Institute of Electronics ware developed two generation of speech synthesizer for polish based on
TTS (Text to Speech) technology. Structure of TTS synthesis system is presented on Figure 2. The full ITS system converts an arbitrary ASCII text to speech. The first task of the system is to extract the phonetic components of the required message realized in text processing unit shown in Figure 3. The output of this stage is a string of symbols representing sound-units (phonemes or allophones), boundaries between words, phrases and sentences along with a set of prosody markers (indicating the speed of speech, the intonation etc.). The second part of the process is to match the sequence of symbols up with items stored in the phonetic inventory, link them together and send to a voice output device. This task is realized in speech processing unit shown on Figure 4.
Speech Communication Applications
~ Speech Synthesis
,
~
Speaker Indentification & Verification
Speech Recognition
Fig. 1. Speech communication applications [1]
247
~ Others
Text processing
Speech processing
String of phonemes and allophones
Fig. 2. Structure oftext-to-speech synthesis system
Clause Sepatarion
TEXT
String of phonemes . ._ _~ and allophones
Fig. 3. Structure of text processing unit
Sequence of phonemes and allophones
Fig. 4. Structure of speech processing unit A combination of linguistic analysis must be done in the first stage which involves: converting abbreviations and special symbols (decimal points, plus, minus, etc.) to spoken form. On the Figure 5 are· shown two generations of speech synthesis systems for polish developed by Institute of Electronics, Silesian University of Technology. Developed speech synthesis systems are: • SMIO text-to-speech system for polish, was the first speech synthesizer developed in Institute of Electronics, Silesian University of Technology, to simulate the human vocal tract, dedicated for blind persons. SM I 0 allows proper word pronunciation and word stress by means of full phoneme transcription. Speech synthesis was made on the phoneme level.
.
Speech Synthesis Software
I
•
SM10
SM23
based on phonemes synthesis
based on allophones synthesis
Fig. 5. Two generations of speech synthesis system for polish developed by Institute of Electronics Silesian University of Technology
248
•
SM23 is the next generation of speech synthesis system based on allophonic level. Allophonic speech synthesis quality is better than quality of SM 10 phoneme speech synthesizer. SM23 software provides natural-sounding, highly intelligible text-to-speech synthesis.
Extraction of
Preprocessing of speech signal
Optimalization of parameters
Classification
Teaching process
"--4 Speech Patterns
Fig. 6. Detailed structure of speech recognition process
Phonemes Recognition System
Sequence of phonemes or allophones
Phonemes to Text Conversion
Fig. 7. Two steps of speech recognition process
sequence of phonemes or allophones. This sequer processed by phonemes to text conversion unit with elements of speech understanding system. Final res this process is text.
3. RESULTS OF RESEARCH ON SPEECH RECOGNITION SYSTEMS Speech recognition is a conversion from an acoustic waveform to a written equivalent of the message information. The nature of speech recognition problem is heavily dependent upon the constraints placed on speaker, speaking situation and message context. Detailed structure of speech recognition process is shown on Figure 6. Speech recognition process is realized in two steps (Figure 7). In the first step speech signal is processed by phonemes recognition system. The result of this process is
249
The second major of research in speech communi! applications is speech recognition and partic improving speech recognition process of polish lan: using linguistic knowledge (phonetics and phone (Ostaszewska and Tambor, 1983). This idea is present Figure 8 Improving speech recognition process is re; by using Acoustic, Phonetic, Syntactic and Sen Knowledge of polish language.
Acoustic Knowledge
.....
Phonetic Knowledge
+
..
Improvement of Speech Recognition
"
+ Syntactic Knowledge
Semantic Knowledge
Fig. 8. Methods of improve speech recognition process
Application layer Orthographic text Semantic layer Sequence of characters Syntactic layer Sequence of phonemes Phonetic layer Vector distinctive parameters Articulation layer Physical speech parameters
I
(SPEECH)r--~·~I____A_co_u_s_t_iC_la __y_e_r__~ Fig. 9. Multilayer speech recognition system with elements of speech understanding Result of this research was creation of multilayer speech recognition system. Each layer realize one step of speech recognition process. There are: acoustic layer, articulation layer, phonetic layer, syntactic layer, semantic layer and application layer. Model of Multilayer speech recognition system is shown on Figure 9.
250
The fIrst acoustic layer provides physical parameters of speech. Second articulation layer provides vectors of distinctive parameters of speech. Phonetic layer on the basics of these vectors generates sequence of speech phonemes. Syntactic layer using dictionary of pronunciation rules provide orthographical notation of speech. Semantic layer establishes of meaning
orthographical sequence of characters and provides sentences in polish language. Task of application layer depends on destination of speech recognition system. The new used method for speech recognition was based on detection of distinctive acoustic parameters of phonemes in polish language. Distinctivity has been assumed as a most important selection of parameters which have represented objects from recognized classes of phonemes. 4. IMPROVING RECOGNITION PROCESS OF POLISH PHONEMES Phonemes are sound units determined meaning of words. Effective phonemes recognition, sound units of each language allow effective to recognize continuous speech. Improving phonemes recognition process is possible ussing phonetics and phonology of polish language (Klosowskl, 2002). The new method of speech recognition was based on detection of distinctive acoustic parameters of phonemes in polish language (Klosowski and Izyd~rcz~k, 1999). Each phoneme is specific by vector of dlstmctive parameters of speech signal. The first distinctive parameter means class of phoneme. Second means place of phoneme articulation. Third parameter means method of phoneme articulation (Klosowski and Izydorczyk, 2001). Average
number of distinctive parameters required to recogniz phoneme equals 2.71, and was estimate using formula I l
M
37
Ns= LPk ·Nk = LPk ·Nk =2.71 k =1
k =1
where: Ns is average number of distinctive paramete speech, M number of phonemes, Pk probability of k-th I articulation , Nk number of distinctive parameters req to recognize k-th phneme. SUMMARY The research on speech recognition is continued. At pr· efforts of Telecommunication Group concentrate creation efficient speech recognition system base< multilayer speech recognition model using distin parameters of speech. The second major of effo creation speaker verification and identification systerr implement some speaker identification algorithm speech recognition system. Future goal of researc construction of full speech dialog system with elen speech understanding based on artificial intellig technology. Structure of this system is presentee Figure 10.
Speaker Recognition System
Input Question or Answer
Speech Recognition System
Speech Understanding System
Output Question or Answer
Speech Synthesis System
Conversation System
Fig. 10. Structure of dialog system with speaker reconciliation feature
251
Knowledge Database
REFERENCES Klosowski P., Improving of speech recogmtlOn process for polish language, Transactions on Automatic Control and Computer Science Vol.47 (61), 2002, ISSN 1224-600X, CONTI'2002: 5th International Conference On Technical Informatics, 18-19 October 2002, TIMISOARA, ROMANIA . Klosowski P., Izydorczyk 1. Base Acoustic Properties of Polish Speech, International Conference Programmable Devices and Systems PDS2001 IFAC Workshop, Gliwice 2001 Klosowski P., Izydorczyk l . Acoustic properties of Polish vowels, Bulletin of the Polish Academy of Science - Technical Sciences Vol.4 7, No.l, Warsaw 1999,pp.29-37. Ostaszewska D.,Tambor l ., Podstawowe wiadomosci z fonetyki i fonologii wsp6lczesnego j~zyka polskiego Uniwercity of Silesia nr 488, Katowice 1993, (in polish). Rabiner L. R. , Applications of Voice Processing to Telecommunications Proc. of the IEEE, vo1.82, No.2, pp. 197-228, Feb. 1994.
252