Language modeling using stochastic context-free grammars

Language modeling using stochastic context-free grammars

Speech Communication 13 (1993) 163-170 North-Holland 163 Language modeling using stochastic context-flee grammars A. Corazza Istituto per la Ricerca...

593KB Sizes 0 Downloads 183 Views

Speech Communication 13 (1993) 163-170 North-Holland

163

Language modeling using stochastic context-flee grammars A. Corazza Istituto per la Ricerca Scientifica e Tecnologica, 38050 Poco di Trento, Italy

R. De Mori School of Computer Science, McGill University, 3480 University Street, Montreal, Quebec, Canada, H3A2A7

R. Gretter Istituto per la Ricerca Scientifica e Tecnologica, 38050 Poco di Trento, Italy

G. Satta University of Pennsylvania, Philadelphia, PA 19104-6228, USA Received 22 December 1992 Revised 26 May 1993

Abstract. Island-driven parsers have interesting potential applications in Automatic Speech Understanding (ASU). Most of the recently developed ASU systems are based on an Acoustic Processor (AP) and a Language Processor (LP). AP computes the a priori probability of the acoustic data given a linguistic interpretation. LP computes the probability of the linguistic interpretation. This paper describes an effort to adapt island-driven parsers to handle stochastic context-free grammars. These grammars could then be used as Language Models (LM) by LP to compute the probability of a linguistic interpretation.

Znsammenfassung. Die Analysatoren durch Vertrauensinseln stellen interessante potentielle Applikationen bei der automatischen Spracherkennung dar. Die meisten der kiirzlich entwickelten automatischen Spracherkennungssysteme enthalten einen Akustikprozessor und einen Sprachprozessor. Der Akustikprozessor berechnet die Wahrscheinlichkeit a priori der akustischen Daten f'fir eine gegebene sprachliche Auslegung. Der Sprachprozessor berechnet die Wahrscheinlichkeit der sprachlichen Auslegung. Dieser Artikel beschreibt einen Versuch zur Anpassung der Analysatoren dutch Vertrauensinseln an die Verarbeitung von zusammenhanglosen Zufallsgrammatiken. Diese Grammatiken k6nnten dann als Sprachmodell vom Sprachanalysator zur Berechnung der Wahrscheinlichkeit einer sprachlichen Auslegung verwendet werden.

R6sum6. Les analyseurs syntaxiques ~ il6ts de confiance ont des applications potentielles int6ressantes en Reconnaissance Automatique de la Parole. La plupart des syst~mes r6cemment d6velopp~s sont fondus sur l'association d'un Module Acoustique et d'un Module Linguistique. Le premier calcule la probabilit~ a priori des donn~es acoustiques, ~tant donn~e une interpr&ation linguistique. Le second calcule la probabilit6 de cette interpr6tation linguistique. Cet article d~crit des travaux de g~n~ralisation des analyseurs syntaxiques ~ il6ts de confiance aux grammaires hors-contextes stochastiques. Celles-ci pourraient alors ~tre utilis~es comme modules par le module linguistique pour calculer la probabilit6 d'une interpretation linguistique.

Keywords. Stochastic context-free grammars; upper-bounds; best derivation.

1 This paper is dedicated to Professor Hiroya Fujisaki. 0167-6393/93/$06.00 © 1993 - Elsevier Science Publishers B.V. All rights reserved

164

A. Corazza et al. / Stochastic context-free grammars

I. Introduction Automatic Speech Understanding (ASU) differs from Automatic Speech Recognition (ASR) because it has to produce a semantic representation of a spoken message rather than a sequence of recognized words. The structure of a semantic representation depends on the use that has to be made of it. In most ASU applications, a sentence is spoken or a dialog is performed to produce certain actions. Examples of actions based on semantic representation are data-base query, knowledge-based inference, robot planning and replanning, a In the case of speech processing, the data to be analyzed are represented by a signal and not by a sequence of words. Given a Partial Interpretation (PI) of the input signal, the words which can extend it are predicted in some way using syntactic and semantic information to form a new interpretation that has to be scored with syntactic, semantic and acoustic likelihoods. Therefore, acoustic, syntactic and semantic Knowledge Sources should be involved in a problem solving activity, whose goal is that of translating speech into actions. Problem solving generates a list of competing PIs, represented by sequences of words which may contain gaps corresponding to not yet recognized parts. PIs are scored by an evaluation function computed by probabilistic models which are defined in this paper. They assign priorities for expanding PIs in search for the complete interpretation with the highest score (or at least one of them, if there is more than one optimal solution). A decision criterion to find the semantic representation C, expressed by the sequence of words 3', given the acoustic description A of a speech signal, can be based on the maximization of the following probability: Pr(C, 3'IA) = Pr(A)-1pr(A]C, 3')Pr(CI3')Pr(3').

(1)

2 The research described in this paper is part of an effort with the purpose of providing spoken dialog capabilities to person-robot interaction. These capabilities are developed in the MAIA project at IRST (Trenti, Italy) and at the Institute for Robotics and Intelligent Systems, a Canadian Network of Centers of Excellence.

The probability Pr(A I C, 3') can be approximated by Pr(A 13') provided that an acoustic description is chosen that mostly depends on phonemes and their context. Pr(A) -1 can be disregarded because it does not affect the decision process, being constant with respect to C and 3'. Therefore, for any semantic interpretation C a likelihood function can be computed, corresponding to the word sequence which is most likely to have expressed it:

f(CIA ) = max Pr(AI3,)Pr(CI3,)Pr(3'). -y

(2)

Finally, the semantic interpretation C* is chosen such that

f ( C * l h ) = max f ( f l Z ) .

(3)

Pr(A I y) can be computed by an acoustic model based on Hidden Markov Models (HMM). The computation of Pr(y) can be based on a probabilistic Language Model (LM), while Pr(C[y) should be obtained with a probabilistic semantic model. In many applications C can be almost certainly derived from the best syntactic interpretation of y, in which case P r ( C [ y ) = 1. Otherwise, the value of this term must be taken into account in the search for the optimal solution. Based on this decision criterion, an optimal complete solution can be obtained by searching a space in which nodes are PIs and operators grow PIs generating extensions which will eventually become complete interpretations. Each PI is scored by an evaluation function that is an upper-bound of function f defined by (2) for every possible completion. At each step of the problem solving activity, priorities for PI expansions are based on these upper-bounds. If tr is a PI represented by a string with words and gaps, an upper-bound on the product to be maximized in (3) can be computed as the product of the upper-bounds of each single term. Given a description A of the input signal, determining C* requires growing a PI until a sequence y without gaps is found that maximizes (3). The most popular LMs used so far have been based on bigram or trigram probabilities (Jelinek, 1991). Notable exceptions using formal grammars are described in (Fujisaki, 1990).

A. Corazza et al. / Stochastic context-free grammars

In (Corazza et al., 1991, Jelinek et al., 1992), it is shown how LMs based on Stochastic ContextFree G r a m m a r s (SCFGs) are effective in discarding unacceptable sentences and in predicting a list of words that can possibly expand a PI. The type of search considered in these papers does not lead to the interpretation of the input, but to the best sequence of words. This approach is useful for A S R applications such as dictation, but it has an unacceptable computational complexity if search is driven by an island-driven strategy. Furthermore, the evaluation function does not compute in these cases the tightest upper-bound for a PI. In this paper, computations of probabilities of parts of sentences including islands and gaps are introduced and a new scoring method is proposed to be used in the search for the best semantic representation of a spoken signal. The new score is based on the probability of the best derivation of 3' in the considered SCFG. The proposed new score for a PI is the maxim u m of the scores of all possible completions, which is the tightest, and thus the most informative upper-bound that can be used in the evaluation function.

2. Background Let A be a generic (finite) set of symbols. A string u over A is a finite sequence of symbols from A; l u[ denotes the length of u. The null string e is the (unique) string whose length equals zero. L e t u = a 1 . . - a n , n > / 1 ; k : u a n d u : k , 0 < k < n, denote the prefix string a I • • • ak and the suffix string a n_k+ l " " " an, respectively. The set of all strings over A is denoted A* (e included); the set of all strings over A whose length is k, k >1O, is denoted Ak. Let u and v be two strings in A*; uv denotes the concatenation of u and v (u before v). The concatenation is extended to sets of strings in the following way. Let L1, L2_cA*; uL 1 denote the set { x l x = uy, y ~ L 1 } , LlU denote the set { x l x = y u , y ~ L 1} and L~L 2 denote the set {x I x = yz, y ~ L 1, z E L2}.

Some definitions are now briefly introduced. A more comprehensive discussion of them can be found in the literature on stochastic context-free

165

languages (see for instance (Gonzales and Thomason, 1978; Wetherell, 1980)). An SCFG is a 4-tuple G s = (N, Z, Ps, S), where N is a finite set of nonterminal symbols, ~ is a finite set of terminal symbols such that N n ,~ = ¢, S ~ N is a special symbol called start symbol and Ps is a finite set of pairs ( p , P r ( p ) ) , where p is a production and P r ( p ) ~ 0 is the probability associated with p in G~. Productions are represented as H ~ 3 " , H e N , 3' E ( N U ~ ) * , and symbol P denotes the set of all productions in P~. As a convention, it is assumed that productions not belonging to P have zero probability. G~ is assumed to be a proper SCFG, that is, the following relation holds for every nonterminal H: E

P r ( H ~ 3') = 1.

(4)

~,~(Nu~v) *

The g r a m m a r G S is in Chomsky Normal Form (CNF) if all productions in P have the form H --* F I F 2 or H ~ a, where H, Fl, F 2 ~ N, a c ~. Without any loss of generality, it is assumed in the following that Gs is in CNF. As a convention, it is also assumed that N = {H~ . . . . ,HINI} , H 1 = S. A derivation in G S is any sequence 3q, 72 . . . . . 7n, n > 1, such that 3'i ~ ( N U Z ) * for every 1 ~< i ~
A. Corazza et al. / Stochastic context-free grammars

166

of all derivation trees in such a set. On the other hand, we will use the operator Pm, defined as follows, to represent the maximum of such probabilities: Pm(H(y))=

max

{Pr(z)}.

(5)

"r~H(v)

Let L be a string set, L _ (N u 2 ) * . We extend the previous notation and write H(L) to represent the set of all derivation trees with root H and yield in L; P m ( H ( L > ) is then the maximum among all the probabilities of derivation trees in H(L). The language generated by Gs, denoted L(Gs), is the set of all strings in 2 * that can be derived in G~ from the start symbol S. Similarly, T(Gs) denotes the set of all derivation trees in S ( y ) , 3' in 2 * . It follows that Pr(L(G~)) = Pr(T(Gs)).

3. Computation of relevant probabilities In this section we assume that the grammar G s is consistent, that is, the following condition holds (see (Gonzales and Thomason, 1978)): 3 Pr(S(T>) = 1.

(6)

y~.Y*

From this hypothesis it can be derived that a similar condition holds for all nonterminals. Let u = w i " " wi+p be a string in 2 * ; P r ( H ( ( u ) ) is the inside probability, i.e., the probability that the nonterminal symbol H derives the string u in every possible way, given by the sum of the probabilities of all trees derived in G~ having root H and yield u. The inside probability can be computed with an algorithm known as Inside algorithm (Baker, 1979; Lari and Young, 1990). 4

The prefix-string probability, denoted by Pr(H(uL>), L = , Y * o r L = ~ m, is either the probability that a nonterminal symbol H derives all the strings beginning with u, if L = ,Y*, or the probability that H exactly derives a string composed by u followed by m terminal symbols (a gap) if L = 2 m. An algorithm called Left-to-Right Inside has been recently proposed in (Jelinek and Lafferty, 1991) for the computation of Pr(H(u,Y* >). A method is presented in (Corazza et al., 1991) to compute the quantity P r ( H ( u 2 m>). The latter probability differs from the former just because the length of the gap is known. In a symmetrical way the suffix-string probability, denoted by P r ( H ( L u ) ) , L = 2 m or L = ,Y* is either the probability that a nonterminal symbol H derives all the strings ending with u, if L = ,Y*, or the probability that H exactly derives a string composed by u followed by a gap of length m, if L = 2 m. The computation of both P r ( H ( 2 * u ) ) and Pr(H((2mu>) is similar to the computation of Pr(H(u2m)). Extending the framework proposed in (Jelinek and Lafferty, 1991), some new quantities are now introduced. The gap probability, denoted by P r ( H ( L ) ) , L = 2 " or L = 2 m is the sum of the probabilities of all trees generated in G s having root H and spanning a gap, i.e. all trees spanning some string in 2 * . In the case L = 2 m, the computation of the gap probability can be carried out in a recursive way as follows: Pr(H(-~m >) =

~

Pr(H~FIF2)

FI,Fe~N m-1

X ~

Pr(Ft(2J))

j=l

XPr(F2(2m-J)), m > a ; P r ( H ( , Y ' ) ) = E P r ( H ~ w). 3 The normalization property expressed in (4) guarantees that the probabilities of all finite and infinite derivations of terminal strings from the start symbol in Gs sum to one. Nevertheless, the language generated by the grammar only corresponds to those derivations that are finite. The probability of such a subset can be less than one. 4 The Inside algorithm is a specialization of the well-known Kasami-Younger-Cocke (CYK) algorithm for recognition of context-free languages (Aho and Ullman, 1972).

(7)

As previously observed, the probability Pr(H(,Y* )) equals 1 for every H E N, because of the consistency assumption. The gap-in-string probability, indicated by Pr(H(uLv)), L =,Y* and L =,vm, is the probability of all trees generated in Gs by nonterminal

A. Corazza et al. / Stochastic context-free grammars

H and spanning a string beginning with u and ending with v, such that u and v are separated by a gap of unspecified length in the case of L = Z*, or a gap of length m if L = ~ " . In other words, P r ( H ( u L v )) is the probability that a nonterminal symbol H in G S generates a string beginning with u and ending with v (with no overlapping between u and v) in all possible ways if L =Y,* or with a given length gap if L = ~m. Expressions for the computation of the gap-in-string probability are presented in (Corazza et al., 1991). T h e island p r o b a b i l i t y , i n d i c a t e d by Pr(H(LvX*)) is the probability of all trees generated in Gs by nonterminal H and spanning a string containing a given "island" v between two gaps. The computation of this probability is described in (Corazza et al., 1991). The case in which the length of the final gap is known is not of interest for the theoretical framework developed in this paper. This is true also for the following probabilities. The prefix-string-with-gap probability, indicated by Pr(H(uLvX*)), L =Xm or L =Y,*, is the probability of all trees generated in G~ whose yields include a prefix string u and, after a first gap, an island v followed by a second gap. The computation of this probability requires the solution of a complex system of equations (Corazza et al., 1991) if the length of the gap is not known, i.e. if L = X*. A simplest computation requiring polynomial time can be used if the length of the gap is known ( L = Xm). In the next definitions two additional probabilities are introduced, which are needed for the computation of the island probabilities only in the case of unknown gap length. The following definition is symmetrical with respect to the previous one. The suffix-string-with-gap probability, denoted by P r ( H ( X * vX*t )), is the probability of all trees generated in G~ by a nonterminal symbol H and spanning a string composed by a gap followed by an island c and after a second gap, by a suffix string t. The string-with-gap-and-island probability, denoted by Pr(H(uX*vX*t)), is the probability of all trees generated in G s by a nonterminal symbol and whose yield begins with u and ends with t. These two substrings are separated by an island v surrounded by two unknown length gaps.

167

Computation of all these probabilities can be found in (Corazza et al., 1991). Unfortunately, this approach is impracticable when the number of words which can fill in the gap is unknown, due to its computational complexity. Nevertheless, acoustic information can be used to obtain an estimation of the number of words in a gap. An algorithm, with an acceptable polynomial time complexity is proposed in Section 4 for a similar purpose.

4. Computation of probabilities derivation tree with gaps

of the best

The problems mentioned in Section 3 do not arise if the purpose of the computation is the probability of the best derivation tree. In this section a framework is introduced for the computation of P m ( H ( L ) ) for L ={u}, L = u Z * and L = X * u Z * where u is a given string over ~. As will be discussed, these quantities can be used as upper-bounds in the search for the most likely complete interpretation of the speech signal. For details about this approach, see (Corazza et al., 1992a, 1992b). Some of the expressions required in such a computation do not depend on the sequence of words to be analyzed, but only on the grammar; therefore these expressions can be computed offline. Methods for this computation are introduced in (Corazza et al., 1992b), which uses a dynamic programming technique reminiscent of well-known methods for removing useless symbols from a context-free grammar (see for example (Harrison, 1978)).

4.1. Off-line computations Among the quantities that can be computed off-line, there are the probabilities of the best complete derivation trees having root H i and generating a string in X*. Notice that there is one such quantity for every non-terminal in the grammar. These probabilities will be used in the next section to compute the score of parts of the sentence that have not yet been analyzed; we will therefore call these quantities gap upper-bounds. More formally, gap upper-bounds are defined

168

A. Corazza et al. / Stochastic context-free grammars

by means of an f N I x 1 array Lg in the following way: Lg[i] = P m ( H , ( , ~ * ) ) .

(8)

Instead of describing the relations to be used in the computation of Lg, (see (Corazza et al., 1992b)), an intuitive introduction to such a computation is provided. First of all, notice that the analysis can be limited to derivation trees with height not greater than I N I . In fact, let ~- be a derivation tree such that a path from its root to one of its leaves contains more than one occurrence of a nonterminal H r. Let also ~" be the derivation tree obtained from ~- by removing the subtree rooted in the occurrence of Hj closest to the root of ~" and replacing it with the subtree rooted in the occurrence of H i closest to the yield of r. In (Corazza et al., 1992b) it is shown that the probability of ~" is strictly less than the probability of ~-: thus every optimal tree must have height not greater than I N I. There is still a second property that can be used to speed-up the computation of the gap upper-bounds. Let Tk be the set of derivation trees with height not greater than k. In (Corazza et al., 1991) it is shown that a set of at least k trees {~'~, ~'2,-..,~'k} is always found within Tk, whose roots are labeled by different symbols Hq, Hi2 . . . . . H/~ and whose probabilities are optimal and define the elements Lg[il], L g [ i 2 ] , . . . , Lg[ik]. A tabular method can then be used to compute array Lg, where, at the k-th iteration, set Tk is considered. As a consequence of the two observations above, the method converges to Lg in at most I NI iterations and at each iteration at least one new element of array Lg is obtained. A second family of probabilities is considered that depends only upon the grammar G s, and can therefore be computed independently of the input words. These probabilities are associated with "optimal" derivation trees whose root is H~ and whose yield is composed of a nonterminal H i followed by a string in 2 * . Notice that these probabilities depend on a pair of nonterminal symbols. Then they can be grouped into an ] N I × t N I array Lp in the following way:

Lo[i, j ] = P m ( H i ( H i ~ * ) ).

(9)

These quantities will be used in the next section to compute optimal upper-bounds for the probability that a single derivation produces a given prefix string. More details can be found in (Corazza et al., 1992a). Let us finally consider the maximum probability of the derivation trees whose root is H i and whose yield is a nonterminal Hj surrounded by two strings in 2 * ; these probabilities will be used in the next subsection for the computation of syntactic upper-bounds for the probability of the derivation of an "island". Let us define an I g l × I N I array L i such that

Li[i, j ] = P m ( / - / i ( E * H i 2 * ) ).

(10)

4.2. Probabilities to be computed on-line As already discussed, we are interested in finding the probability of an "optimal" derivation in Gs of a sentence which includes, as a prefix or as an island, an already recognized word sequence u. In this subsection we discuss an efficient computation of such a probability. Using the notation and expressions introduced in the previous subsections, the problem can be seen as the one of finding the probability of the "optimal" derivations of sentences in the languages u,~* and 2*u,~*. In the following, quantities depending on a string u = w~ • • • wn ~ ,~n are considered, which must be computed on-line. The functional notation adopted for all these probabilities expresses the dependence of the defined quantities upon the given string u. The probability of the most likely derivation of a given string u, according to a grammar G s defined by

Mb(U)[i ] = P m ( H i ( u ) ),

(11)

can be computed using a probabilistic version of the Kasami-Younger-Coeke (CYK) recognizer (see for instance (Younger, 1967, Aho and UI1man, 1972) based on the Viterbi algorithm as shown in (Jelinek et al., 1992). For a given string u, the set /-/~(u,~*) contains all possible derivation trees having a root node labeled by nonterminal H i and, as a yield, a string of nonterminal symbols that includes u =

A. Corazza et al. / Stochastic contextzfree grammars

w t . - . wn ~ ,~n as a prefix. The highest among all probabilities of elements in Hi(uZ*> are optimal syntactic upper-bounds for trees generating partial interpretations starting with u; therefore, in the following, these probabilities will be referred to as prefix upper-bounds. Such quantities are grouped together in an I NL x 1 array Mp(U) as follows:

Mp(u)[i]

= P m ( H i < u Z * ) ).

(12)

The highest among all probabilities of derivation trees in the set Hi(Z*uZ*> are called /sland upper-bounds. Given a string u = w 1 . . . w, in Z", Mi(u) is the I N I x 1 array whose elements are defined as follows:

Mi( u)[ i ]

=

e m ( H i ( Z * u ~ * >).

If H i = S, these probabilities represent optimal syntactic upper-bounds in the search for the most likely derivation of a sentence in L(Gs), starting from different competitive hypotheses which correspond to already analyzed "islands".

5. Conclusions

Stochastic grammars are a useful tool for driving the interpretation of a written or a spoken sentence. Pioneer work in stochastic syntax analysis is described in (Fu, 1982) and has its roots in (Salomaa, 1969; Lee and Fu, 1972; Persoon and Fu, 1975; Lu and Fu, 1977). When these grammars are used, there is a need for computing the probability that they generate a sentence, given only some of the words in it. Expressions for the calculation of probabilities of strings of this type have also been proposed in (Jelinek and Lafferty, 1991) for the case in which only the prefix of a sentence is known. This case is also considered in (Persoon and Fu, 1975) for scoring partial parses of strings which cannot be further parsed because they have been affected by an error. The probability computation in the more general case in which partial word sequences interleaved by gaps are known is presented in (Corazza et al., 1991). This computation is too complex in practice unless the lengths of the gaps are known.

169

The probability of the best parse tree that can generate a sentence only part of which (consisting of prefix, suffix or an island) is known is the lowest possible upper-bound of an evaluation function to be used in search for the best interpretation of the speech signal. Its computation has polynomial time complexity even if the size of the gap preceding an island is unknown. This makes it possible to use SCFGs in practice for driving interpretations of sentences in natural language.

References A.V. Aho and J.D. Ullman (1972), The Theory of Parsing, Translation and Compiling, Volume 1 (Prentice Hall, Englewood Cliffs, N J). J.K. Baker (1979), "Trainable grammars for speech recognition", in Proc. Spring Conf. of the Acoustical Society of America. A. Corazza, R. De Mori, R. Gretter and G. Satta (1991), "Computation of probabilities of a stochastic island-driven parser", IEEE Trans. Pattern Anal Machine lntell., Vol. 13, No. 9, pp. 936-950. A. Corazza, R. De Mori, R. Gretter and G. Satta (1992a), Optimal probabilistic evaluation functions for search controlled by stochastic context-free grammars, Technical Report 9207-01, Istituto per la Ricerca Scientifica e Tecnologica, 1-38050 Povo di Trento, Italy. A. Corazza, R. De Mori and G. Satta (1992b), "Computation of upper-bounds for stochastic context-free languages", in Proc. Tenth National Conf. on Artificial Intelligence, San Jose, California, pp. 344-349. K.S. Fu (1982), Syntactic Pattern Recognition and Applications, (Prentice Hall, Englewood Cliffs, N J). H. Fujisaki, ed. (1990), Recent Research Toward AdL,anced Man- Machine Interface, Through Spoken Language (Stearing group on Advanced Man-Machine Interface Through Spoken language, The Ministry of Education, Science and Culture of Japan, Tokio, Japan). R.C. Gonzales and M.G. Thomason (1978), Syntactic Pattern Recognition (Addison-Wesley, Reading, MA). M.A. Harrison (1978), Introduction to Formal Language Theory (Addison-Wesley, Reading, MA). F. Jelinek (1991), "Up from trigrams! The struggle for improved language models", in Proc. European Conf. on Speech Communication and Technology, Genova, Italy, pp. 1037-1040. F. Jelinek and J.D. Lafferty (1991), "Computation of the probability of initial substring generation by stochastic context free grammars", Computation Linguistics, Vol. 17, No. 3, pp. 315-323. F. Jelinek, J.D. Lafferty and R.L. Mercer (1992), "Basic methods of probabilistic context free grammars", in Speech

170

A. Corazza et al. / Stochastic context-free grammars

Recognition and Understanding, ed. by R. De Mori and P. Laface (Springer, Verlag, Berlin). K. Lari and S.J. Young (1990), "The estimation of stochastic context-free grammars using the inside-outside algorithm", Comput. Speech Language, Vol. 4, No. 1, pp. 35-56. H.C. Lee and K.S. Fu (1972), "A stochastic syntax analysis procedure and its application to pattern classification", IEEE Trans. Comput., Vol. 4, No. 3, pp. 660-666. S.Y. Lu and K.S. Fu (1977), "Stochastic error-correcting syntax analysis for recognition of noisy patterns", IEEE Trans. Comput., Vol. 26, No. 12, pp. 1268-1276.

E. Persoon and K.S. Fu (1975), "Sequential classification of strings generated by stochastic context-free grammars", lnternat. J. Comput. Information Sci., Vol. 4, No. 3, pp. 205-218. A. Salomaa (1969), "Probabilistic and weighted grammars", Information and Control, Vol. 15, pp. 529-544. C.S. Wetherell (1980), "Probabilistic languages: A review and some open questions", Computing Surveys, Vol. 12, No. 4, pp. 361-379. D.H. Younger (1967), "Recognition and parsing of contextfree languages in time n 3'', Information and Control, Vol. 10, pp. 189-208.