Knowledge-based speech pattern recognition

Knowledge-based speech pattern recognition

Book revienrs 367 References Brown, G., K. L. Currie & J. Kenworthy (1980) Qircstions ofintonarion, London: Croom Helm. Crystal. D. (1975) The Engli...

160KB Sizes 7 Downloads 193 Views

Book revienrs

367

References Brown, G., K. L. Currie & J. Kenworthy (1980) Qircstions ofintonarion, London: Croom Helm. Crystal. D. (1975) The English tone of voice. London: Edward Arnold. Crystal. D. & R. Quirk (1964) Systems ofprosodic and paralinguislic,feafurer in English, The Hague: Mouton. Knowles, G. (1984) “Variable strategies in intonation”. In D. Gibbon & H. Richter (Eds). Intonation arent and rhvlhm. Berlin: de Gruyter.

Knowledge-Based Speech Pattern Recognitioil tion Computing Series. Kogan Page.

by Michael Allerhand.

The Fifth Genera-

Michael Allerhand’s monograph, published in Kogan Page’s Fifth Generation ComputIng Series, proposes an integration of parametric and structural approaches to automatic speech recognition (ASR) using the formalism of attributed grammars. The first chapter introduces the problem with its difficulties and discusses possible approaches to ASR. The author supports the use of a knowledge based (KB) approach when unrestricted speech recognition is the goal. According to the author. other approaches merely based on statistical methods have the advantage of using learning :ilgorithms with acceptable complexity but have a relatively restricted performance. Chapter two contains a review of syntactic pattern recognition and fuzzy set theory and introduces an isolated word recognizer in which lexical access is based on the order and duration of only voiced, voiceless and silence features. Word patterns are described using structural information (segment order) and quantitative information (duration distributions). A context-free description of syllable structure in terms of features is used. Phonotactic constraints upon sequences of intermediate manner of articulation categories are expressed. The syllabic grammar is used to parse strings of features into manner of articulation classes. Strings of these classes are matched with strings in the lexicon to Jccess particular words. Chapter three starts with an important statement that seems to be confirmed by recent results in ASR. The author says that the use of simple models creates an inherent performance limitation; a plateau is reached and further improvement cannot be made. Models of sufficient detail cannot be inferred from observations of speech data alone. A context-free grammar is then proposed based on liguistic knowledge obtained by expert linguists. Methods based on dynamic time warping and hidden Markov models (HMM) are also reviewed and their limitations are discussed. Chapter four discusses feature extraction through property detectors. Mathematical techniques for feature selection are reviewed. Methods based on hierarchical linear regression are discussed in details and applied to waveform segmentation. Chapter five contains a useful review of decision algorithms and introduces a composite pattern recognition model for ASK. This model is based on rewriting rules of the form:

368

Hooh

r~‘l’/e’H’.s

where P,, is the probability of the rule and a,, is a collection of’ 11 priori probabilit> distributions relating extracted features to class hypotheses. It can be shown that HMMa can be represented by rules of this type. Parameter distributions can be estimated using known methods. If decisions are based on mini-max rules rather than upon probability products. an admissible strategy leading to optimal sequential decisions can be used. The role of the knowledge captured by rewriting rules is that of guiding a search by eliminating sequences in conflict with phonotactics. This reduces the problem complexity that otherwise would be exponential. Chapter six is dedicated to modelling allophonic and phonotactic constraints. The pattern constraints contained in the hybrid model introduced in chapter five take the form of structural constraints in the domain of phonotactics, and quantitative constraints in the domain of allophonic variants. Allophonic variants are represented by symbolic qualifiers (e.g. like high. low). These qualifiers define probabilistic likelihood functions which describe the quantitative characteristics of each allophonic variant. Various examples of probability distributions are reported. Chapter seven contains conclusions with a perspective on the possibility of using the proposed methods with parallel computers. The book is well written, well structured and rich with interesting considerations and ideas for future developments. It contains some provocative statements that, rather than disappointing the reader, should stimulate further thinking. R. De Mori An Art$ificiul Intelligence Approach North Oxford Academic. 1988

to Understanding

Naturul

Lunguuge h.t* J. Pitrat.

This is a strange book. It asserts some general propositions about language understanding, establishing a framework at the level of Winston’s arch. It then treats individual topics, analysis, generation and so forth by presenting the problems involved, throwing off a few practical remarks about cutting your natural language processing (NLP) cloth to suit your task figure and detailing a few strategies, including (at some length) the author’s own. The book is stated to be “intended not only for computer scientists. but also for workers in natural languages”. The author wants to sell NLP to linguists as suppliers of language expertise and to computer scientists as the potential consumers of NLP boxes as system modules. But the book is unsatisfactory because, in spite of the more technical material, e.g. the account of the author’s analyser, it does not provide enough detail to give the reader a real feeling for what one does in NLP in the way that Charniak and McDermott’s ‘Introduction to Artificial Intelligence’, for instance, does. Much of the text is very discursive and informal, with more emphasis on problems than on how far people have got in solving them and what types of solution have been proposed. It is also a dangerous book. It is written in a style suggesting a rather broad coverage of the approaches that have been adopted in NLP, when it is in fact much more selective. There are chapters on meaning representation, knowledge bases, analysis, generation. and pragmatic (i.e. world or domain) knowledge, with a concluding very short one on the state of the art. The chapters follow a generally similar pattern, with a discussion ot problems followed by a presentation of specific approaches. The unwary reader would