Digital speech processing: Speech coding, synthesis, and recognition
Signal Processing 30 (1993) 133 134 Elsevier
133
Book review "Digital Speech Processing:Speech Coding, Synthesis, and Recognition", by A. Nejat Ince...
Book review "Digital Speech Processing:Speech Coding, Synthesis, and Recognition", by A. Nejat Ince, Editor. Publishers: Kluwer Academic Publishers, 1992, xiv + 242 pp, ISBN 0-7923-9220-5
This book discusses in a tutorial manner the use of voice for civil and military communications. The book is based on lectures (sponsored by A G A R D / NATO) and the topics of the lectures are organised as individual chapters in the book. The chapters are well written by some of the leading researchers in the field and they form a well organised entity for readers interested in speech processing in general. As the book is a collection of lecture papers written by several authors, the writing styles and the profoundness the chapters discuss their topics varies from chapter to chapter. Most of the chapters provide a non-mathematical introduction to the basic concepts and applications in the field of speech processing. By avoiding mathematical expressions the book differs from most of the various recently published books on digital speech processing making the topic accessible to a wide variety of readers. Another distinctive feature in the book is that it contains topics covering military communications. One of the chapters is devoted to discussing applications of speech recognition for military requirements. Several military speech coding standards are also presented in the book. Chapter 1 provides an overview of voice communications. Network structures such as the NATO communication network and the Integrated Services Digital Network (ISDN) are reviewed. Various speech processing techniques are briefly outlined as an introduction for the succeeding chapters. In Chapter 2, a non-mathematical introduction to speech signal is provided. The chapter discusses Elsevier Science Publishers B.V.
speech production and perception as well as phonemics. A special emphasis is given to exploring the differencies between written communication and voice communication. With some enlightening examples, the chapter argues successfully that in order to develop efficient speech processing we must avoid projecting the properties of text into speech. Chapter 3 deals with speech coding and compression. The concepts of linear prediction and predictive quantisation are briefly reviewed. In addition, speech compression methods such as LPC vocoding, pitch prediction, adaptive predictive coding and vector quantisation are discussed. The emphasis in this chapter is on analysisby-synthesis excitation coding. Vector Excitation Coding (VXC) known also as Code Excited Linear Prediction (CELP) is discussed in detail. This chapter gives a good uniform overview of predictive speech coding from ADPCM to the sophisticated modern vector excitation methods. Chapter 4 provides a brief overview of voice interactive information systems. The performance of speech recognisers is discussed. The HuMaNet-system is discussed as an example of integration of voice in multimedia systems. Chapter 5 discusses speech recognition based on the pattern recognition method. The general pattern recognition framework is discussed and it is shown how pattern recognition techniques have been applied to the problems of isolated word recognition, connected word recognition and continuous speech recognition. The performance of current systems is also estimated. Chapter 6 discusses methods developed for quality evaluation of speech. Several subjective and objective measures are reviewed and application examples are provided for quality measurements for the three cases (human-to-human,
134
Book Review
machine-to-human and human-to-machine). Chapter 7 discusses several international speech processing standards by CCITT and by NATO. The chapter gives also an overview of CCITT's activities for future standards. Even the working methods of the CCITT are covered for some extent. Some important standards like the GSM speech coding algorithm RPE-LTP are only briefly mentioned. The chapter would be more valuable, e.g., with inclusion of the many emerging cellular radio speech coding standards. However, some of the low bit-rate algorithms missing here are covered by Chapter 3 (the U.S. cellular standard and CCITT's 16kbit/s LD-CELP recommendation). The last chapter of the book, Chapter 8, describes some recent applications of automatic speech recognition for military use and discusses the vast challenge for the future. The book contains also selective bibliography of speech processing literature with abstracts included. Together with the
Signal Processing
reference lists of the individual chapters it provides access for further reading. The three main fields of speech processing are mentioned in the title of this book. In my opinion, this is misleading since speech synthesis is discussed only very briefly in connection with voice interactive systems. This book has most value for readers who are not necessarily familiar with speech processing and who are interested in getting an overview of the fundamental algorithms and their performance as well as the existing standards. This book should be particularly valuable for those interested in military applications, although most of its contents are devoted to civil applications. Kari J~irvinen Nok~l Research Center Kanslerotkatu 8 SF-33720 Tampere Finland