P. R. Krishnaiah and L. N. Kanal, eds., Handbook of Statistics, VoL 2 ©North-Holland Publishing Company (1982) 417-449
"1 0
& j,
Applications of Stochastic Languages*
K . S . Fu
1.
Introduction
Formal languages and their corresponding automata and parsing procedures have been used in the modeling and analysis of natural and computer languages [2, 9] and the description and recognition of patterns [15, 18]. One natural extension of one-dimensional string languages to high dimensional is tree languages. A string could be regarded as a single-branch tree. The capability of having more than one branch often gives trees a more efficient pattern representation. Interesting applications of tree languages to picture recognition include the classification of bubble chamber events, the recognition of fingerprint patterns, and the interpretation of LANDSATdata [2%35]. In some applications, a certain amount of uncertainty exists in the process under study. For example, noise or distortion occurs in the communication, storage and retrieval of information, or in the measurement of processing of physical patterns. Under these situations, in order to model the process more realistically, the approach of using stochastic languages have been suggested [11, 15, 18, 21]. For every string or tree x in a language L, a probability p(x) can be assigned such that 0 < p ( x ) ~ < l and Y'xe Lp(x) = 1. Thus, the function p(x) is a probability measure defined over L, and it can be used to characterize the uncertainty and randomness of L. In this paper, after a brief introduction of stochastic string languages, three applications are described. 1 They are (1) communication and coding, (2) syntactic pattern recognition, and (3) error-correcting parsing. Stochastic tree languages are then introduced, and their application to texture modeling is described. 2.
Review of stochastic languages
Two natural ways of extending the concept of formal languages to stochastic languages are to randomize the productions of grammars and the state transitions *This work was supported by the NSF Grant ENG 78-16970. I Other applications of stochastic languages include language learning [6, 40] and digital system design [5]. 417
418
K. S. Fu
of recognition devices (acceptors) respectively. I n this section some major results in stochastic g r a m m a r s and stochastic syntax analysis are reviewed.
DEFINITION 2.1. A stochastic phrase structure g r a m m a r (or simply stochastic g r a m m a r ) is a four-tuple Gs = (V N, VT, P~, S ) where VN and VT are finite sets of nonterminals and terminals; S E VN is the start symbol; 2 Ps is a finite set of stochastic productions each of which is of the form PU
ai~fli/,
j=l
..... n,;
i=1 ..... k
(1)
where
E(
)*,
Bu and pij is the probability associated with the application of this stochastic production, ni
0 < p i j ~<1
and
~ Pij=l-3
(2)
j=l
V* denotes the set of all strings c o m p o s e d of symbols of V, including the e m p t y string X. Suppose that a i ~ puflij is in Ps. 4 Then the string ( -- ]tlO~i]/2 m a y be replaced by ~1= "hflij72 with the probabilityp~j. We shall denote this derivation b y ])ij
and we say that ~ directly generates 7/ with probability Pij- If there exists a sequence of strings ~ol.... , % + 1 such that
~ = ¢a~l ,
7~ :
0 ) n + 1,
Pi ~i---~i+l,
i=1
then we say that ~ generates 71 with probability p
..... n,
=
~n=lp i and denote this
2In a more general formulation, instead of a single start symbol, a start symbol (probability) distribution can be used. 3If these two conditions are not satisfied, Pij will be denoted as the weight wij and Ps the set of weighted productions. Consequently, the grammar and the language generated are called weighted grammar and weighted language, respectively [12]. 4 I n running text we will set indices superior and inferior to arrows instead of indices centered above or below the arrow.
Applications of stochastic languages
419
derivation by P
~--,~. The probability associated with this derivation is equal to the product of the probabilities associated with the sequence of stochastic productions used in the derivation. It is clear that ~P. is the reflexive and transitive closure of the relation --, P. The stochastic language generated by G~ is
x,p(x))lxEV{.,S~x, j = l ..... kandp(x)= ~. pj~,
L(G~)=
*
j
1
(3) where k is the number of all distinctively different derivations of x from S and pj is the probability associated with t h e j t h distinctive derivation of x. In general, a stochastic language L(G~) is characterized by (L, p) where L is a language and p is a probability distribution defined over L. The language L of a stochastic language L(G~)= (L, p) is called the characteristic language of L(G~). Since the productions of a stochastic grammar are exactly the same as those of the non-randomized grammar except for the assignment of the probability distribution, the language L generated by a stochastic grammar is the same as that generated by the non-randomized version. EXAMPLe 2.2. where and
Consider the stochastic finite-state grammar
Gs=(VN,VT, Ps,S)
VN=(S,A,B),
VT=(0,1)
1
Ps: S--,1A, 0.8
A --,0B,
0.2
A ~ 1,
0.3
B-~0,
0.7
B ~ 1S.
A typical derivation, for example, would be S -, 1A ~ 10B ~ 100, p(100) = 1 × 0 . 8 × 0 . 3 = 0.24. The stochastic language generated by G~, L(G~) is as is illustrated in Table 1. It is noted that p(x)=0.2+0.24+ xE
L(Gs)
~ (0.2+0.24)(0.56)n=1. n = 1
K.S. Fu
420 Table 1 String generated x
p (x)
11 100 (101)"11 (101)"100
0.2 0.24 0.2×(0.56) n 0.24×(0.56)"
DEFINITION 2.3.
If, for a stochastic grammar Gs,
E
(4)
p(x) :1,
x ~ L(Gs)
then GS is said to be a consistent stochastic grammar. The condition for a stochastic context-free grammar to be consistent is briefly described [6, 15, 18]. Whether or not a stochastic context-sensitive grammar is consistent under certain conditions has not yet been known. The consistency condition for stochastic finite-state grammars is trivial since every stochastic finite-state grammar is consistent. For a context-free grammar, all the productions are of the form A-,a,
A~V
N,
a~V +
= V * - {?,}.
In this case a nonterminal at the left side of a production may directly generate zero or a finite number of nonterminals. The theory of multitype Galton-Watson branching processes [23] can be applied to study the language generation process by stochastic context-free grammars. The zero-th level of a generation process corresponds to the start symbol S. The first level will be taken as fll where fll is the string generated by the production S ~ hi. The second level will correspond to the string 82 which is obtained from 81 by applying appropriate productions to every nonterminal in ill- If 81 does not contain any nonterminal, the process is terminated. Following this procedure, the j t h level string fig.is defined to be the string obtained from the string 8j-l by applying appropriate productions to every nonterminal of 8j-1. Since all nonterminals are considered simultaneously in going from the ( j - 1)th level to the j t h level, only the probabilities associated with each productions A i ~ a r, Pit, need to be considered. Define the equivalence class CAi = (all productions of P with left side Ai}. Thus, k
P=UCAi. i:l
For each equivalence class CA,,
E Pir : 1.
cA,
Applications of stochastic languages
42 1
DEFINITION 2.4. For each CA,, i = 1..... k, define the k-argument generating function (st, s 2 .... ,sk) as f / ( S 1. . . . .
Sk) ~- ~ PirS~il(ar)s~ i2(ar). . , S~cik(etr)
(5)
cA, where /~it(ar) denotes the n u m b e r of times the nonterminal A l appears in the string a r of the production Ai --, ar, and S = Av DEFINITION 2.5. sively as
The jth-level generating function Fj(st,...,sk) is defined recur-
Fo(s,,...,sk)=
st,
Ft(s t .....
S k ) = f , ( s I . . . . . Sk),
Fj(s, .... ,s~) = Fj_ ,( f,(s t ..... sk) ..... fk(St,... , , k ) ) .
EXAMPLE 2.6.
(6)
Gs = (V N, Va-, Ps, S ) where VN = { A t , A 2 ) , V T = {a, b}, S = A 1 and Pll
P~: A1 --* aAtA2, Pl2
P21
P22
A 2 --+aA2A2,
A t --'b,
A 2 ~aa.
The generating functions for CA, and CA2 are, respectively,
ft(s,,Sz)=pttStSz+pt2,
fz(s, s2)=p2tsZ+pz2.
The j t h level generating functions for j = 0, 1,2 are
Fo(st,s2)=st,
Ft(st,s2)=ft(sl,s2)=ptts,s2+pt2,
rz(st, s2) = FI( ft(st, s2), f2(s1, --
2
-- Plt
s 2 ) ) ~---P t l ft(s1,
s2)Tz(S,, S2) + Pt2
3
P2tSlS2 + p2tP22SISz + P11Pt2P22 + P12"
After examining the previous example we can express Fj(s 1..... sk) as Fjj.(S 1 . . . . .
Sk ) = Gj( S 1. . . . . Sk )"~- g j
where Gj(s t ..... sk) denotes the polynomial of s t .... ,s~ without including the constant term. The constant term, Kj, corresponds to the probability of all the strings x E L(Gs) that can be derived in j or fewer levels. This leads to the following theorem.
422
K. S. Fu
THEOREM 2.7.
A stochastic context-free grammar Gs is cons&tent if and only if
lim Kj = 1. j~CX~
(7)
Note that if the above limit is not equal to 1, there is a finite probability that a generation process may never terminate. Thus, the probability measure defined over L will be less than 1 and, consequently, G~ will not be consistent. On the other hand, if the limit is equal to 1, then there exists no such infinite (nonterminating) generation process since the limit represents the probability of all the strings which are generated by applications of a finite number of productions. Consequently, G~ is consistent. The problem of testing the consistency of a given stochastic context-free grammar can be solved by using the following testing procedure developed for branching processes. DEFINITION 2.8.
The expected number of occurrences of the nonterminal Aj in the
production set CA, is
(8)
eiJ = 3f/(Slasj ..... Sk) ~, ...... k : l
DEFINITION 2.9. The first moment matrix E of the generation process corresponding to a stochastic context-free grammar G~ is defined as
(9)
1 <~ i, j <~k
E = [ eij],
where k is the number of nonterminals in Gs. THEOREM 2.10. For a given stochastic context-free grammar Gs, order the eigenvalues or the characteristic roots p l , . . . , Ok of the first moment matrix according to the descending order of their magnitudes; that is,
1o, I lpjl
(10)
ifi
Then G s is consistent if p I <1 and it is not consistent if pl >1.
For the stochastic context-free grammars Gs given in Example 2,
el 1 --
3 f l ( s l ' s2) ~L,~2:l
~S 1
=PI1,
e21 -- 0f2(Sl'0s1S2) sl,s2 =1 = O, Thus 2p21 •
s,,s2= 1 = P ~ ' el2 -- Of,(s~,s2) ~$2 s,,s,=' e22 -- 0f2(s1'$2) ~s 2 = 2P21 .
Applications of stochastic languages
423
The characteristic equation associated with E is Co(x) ----( x -- p l l ) ( x -2p21). Hence, Gs will be consistent as long as Pal < ½.
3.
Application to communication and coding
An information source of a communication system is defined as a generative device which produces, at a uniform rate, sequences of symbols from an alphabet. In channel coding these symbols are usually assumed to be randomly generated and therefore they do not represent any specific meanings other than their own identifications. We shall call sources of this kind lexicographical, to emphasize the total lack of regularity or 'structure' in the formation of source messages. The realm of algebraic coding has hitherto been concerned almost exclusively with these sources and we shall also call the encoder, decoder, as well as the coding technique lexicographical, in order to emphasize the contrast between these and the new type of syntactic decoder introduced in this section. Other types of information sources, of course, do exist. In particular, Markov information source models are widely accepted in source coding. Several attempts have been made to treat the problem of human communication from the narrow, strictly probabilistic point of view, based on the measure of information introduced by Shannon in 1948. Recently it was pointed out that information theory fell quite short of an acceptable description for the process of transference of ideas that occurs when intelligent beings communicate [39]. It has been demonstrated that syntactic information about the source is useful for designing a 'syntactic decoder' which supplements an ordinary decoder [19]. On the other hand, Hutchins [25] showed that we can do source coding for some formal languages by removing a certain amount of 'redundancy' in the languages. Smith [38] suggested that the existence of 'built-in redundancy' in formal languages can be utilized in achiex4ng error-detection without the obligatory addition of redundancy. Problems of a similar nature were recently studied by Hellman [24] who developed a method of combining source coding (i.e., data compression) and channel coding. A brief discussion about the maximum-likelihood syntactic decoder is given in this section. Let the linguistic source be characterized by a context-free grammar G = (VN, VT, P, S) which generates L ( G ) C V.}. The source generates a sequence of sentences of L ( G ) . A special kind of synchronization mechanism is provided so that there is no error in locating the beginning and the end of every sentence issued by the source. With this assumption we may proceed to consider syntactic decoding for individual sentences. Refer to Fig. 1, the output of the source is encoded into binary form which is transmitted over a binary symmetric memoryless channel N with bit error rate p. Output symbols VT of the source are encoded by Cn. Suppose VT has l symbols, i.e., VT = (o l, (i2,... ,Ol} , then n/> [log 2 l]. Let u i = C,(oi) be the codeword for the ith symbol a~ of the alphabet, which can be expressed as a binary sequence
K. S. Fu
424
-kz
-I
05
I--
$.
t ~, 0
0 I--
Vz
I.-
o
8
<
u
d-
Applications of stochastic languages
425
i u , -m (u il, u2,..., ui). Then the conditional probability of a binary sequence v = (vl, v2 .... ,%) from the output of the channel given that the input sequence is u = ( u . u2 ..... u.) is
e ( v lu ) = P ( s o = v , I S l = u , ) . . . . .
P ( s o = vnl s , = u n)
= ( 1 - p ) ~ ' ~ ( ..... ).pn Y4_t~(..... )
(11)
where P(so=V,[S =ui)=f
i f v i : / = u ~ f o r i = l ..... n, otherwise,
p 1 --
p
and 6(,) is the Kronecker delta function. At the receiver a decoder assigns a symbol o to every binary sequence v E (0, 1) ". More precisely, let B be a block code defined by
O = {(Ul, UI) , (u2, U2) ..... (HI, Ul) ) where the ui's are codewords for the alphabet VT, and U's are disjoint subsets of (0, 1)" such that Ult_J . . . t.JUt--(0, 1}L A lexicographical decoder D O is a mapping from {0, l } n into Va-, defined by Do(v ) = oj if and only if
v E Uj.
(12)
Designs of the code as well as its encoders and decoders are studied extensively in information theory (e.g., [26]). We shall present in the following a decoding scheme which in a certain sense is superior to the lexicographical decoding scheme. The encoder, binary symmetric channel N and the decoder D o can be combined and represented by a mutational channel M characterized by the following transition probabilities:
QB(~=ol~=oi)= e~(v~Ujluj) = X
e B ( v = ( v , , v 2 ..... v~ Uj
=
X (l-p) vE~
X~=IS(U~'Vk)-p [n--'9~--l~(Uk
vn)lui) (13)
for every i, j = 1,2 ..... I. Here a and/3 are respectively the input and output of the mutational channel M, and QB(fl = ° j l a = °i) is the probability of observing a symbol oa. at the output of M when symbol oi is the true symbol at the input of M. Note t h a t ~ j Q B ( f l ~- oj[a = oi) ~- 1 for all i. As illustrated in Fig. 1, a syntactic decoder D is introduced to the output of the mutational channel, i.e., to the output of the lexicographical decoder Do, before the messages are conveyed to the user. Suppose the source grammar G generates a sentence w = ClC2. •, c N ~ L ( G ) . The symbols Cl,..., cw are fed into the encoder Cn
K. S. Fu
426
sequentially such that a sequence of binary codewords Cn(c 1)..... Cn(¢N) c o m e out of the encoder and are modulated and transmitted over the noisy channel N. Let u i = C~(ci), i = 1 . . . . . N , and vi be the binary sequence of length n coming out of the channel N corresponding to the transmitted codeword ui. These binary sequences v i . . . . , v u are decoded according to (12) so that the lexicographical decoder outputs a string of symbols in VT, z = a l a 2. • • a N where a~ = D o ( v i ) , i = 1,2 .... ,N. We can symbolically describe this coding scheme by z = a l a 2 • • • a N M(q)M(c2).--M(cu). The block code B is designed in such a way that B is optimal in some sense. For example, B results in minimal average probability of error for each individual symbol, among all possible block codes of codeword length n. That is, for every symbol oi transmitted, the average probability of error per symbol, =
1
P(elB)
t
= 7 i ~ l Q B ( f l ¢: ° i l a = ai)
(14)
is minimized. However, it is immediately obvious that the above optimal block code B is based on the performance of the code for individual symbols. For linguistic messages we may take advantage of our knowledge about the syntactic structure among the transmitted symbols to improve the efficiency of the communication system. Since the utility of the communicated linguistic messages depends mostly on the correctness of individual sentences, the average probability of obtaining a correct sentence for the source grammar will be considered to be a significant factor in evaluating the efficiency of the communication system. Let z = a l a 2 . . , a N be the decoded sequence of symbols from the lexicographical decoder D 0. The syntactic decoder D is a mapping from V-} into L ( G ) , which is optimal according to the maximum likelihood criterion: P ( Z = ala2" " " aNl D ( z ) = cl^*^*C2" " " O*N) >1 P ( z ---- a , a 2 . . . a
N I~ = 01c2.-- ON)
(15)
for all ~ E L ( G ) N V ( . Here, D(z)=0]~0~ . . . 0*v is the output sequence of the syntactic decoder with input sequence z. Since the channel N is memoryless and since the symbols of a transmitted sentence are encoded and decoded sequentially, (15) can be written as N
N
[I Q ( B - - a i l a = O * ) > ~
1-1 Q ( B = a ~ l a = O ~ ) .
i--I
(16)
i-1
Define d ( a i , oj) = - l o g Q ( f l = ojla = oi). Then (16) is equivalent to N N X d ( a i , c:) ~ X d ( a i , ci). i=l i 1
(17)
Applications of stochastic languages
427
Extend the definition of d(oj, ai) to the case of two strings z = a l a 2 . . , a N and w = clc 2. • • c u which are both of length N, N = 2,3 . . . . . Then N
d ( z = a , a 2 . . , aN; W = CLC2''" CN) : • d(ai, ci). i
(18)
1
Similarly, we can also define
d(z;L(G))=
inf wEL(G)
d(z;w).
(19)
Hence, using (17) and (18), we can rewrite (15) as
D(z) = (~*[~* = ~-
• • ~ ~ L ( G ) such that
d(z; ~*) <~d(z; ~ ) for all sentences ~ E L ( G ) } .
(20)
It follows from (19) that ~ * ~ D ( z ) implies that
d(z; ~*) : d(z; L ( G ) ) .
(21)
We say that ~* of the source language L ( G ) is the 'nearest' to the string z. An algorithm based on the Cocke-Younger-Kasami parsing scheme has been proposed for the syntactic decoder [19]. The syntactic decoder is basically an error-correcting parser which attempts to produce a collection of strings D ( z ) C L ( G ) for any string z not in the language L ( G ) so that every string ~* in D ( z ) is 'nearest' to z as compared with other strings in L(G). An example of applying such a syntactic decoder to computer networks is given in [20].
4. Application to syntactic pattern recognition In syntactic or linguistic pattern recognition [15, 18], patterns are described by sentences of a language characterized by a grammar. Typically, each class of patterns is characterized by a grammar which generates sentences describing each of the patterns in the class. The recognition of a pattern then becomes a syntax analysis or parsing of the sentence describing the pattern with respect to the grammar characterizing each pattern class. In some practical applications, however, noise and distortion occur in the measurement or processing of physical patterns. Languages used to describe the noisy and distorted patterns under consideration are often ambiguous in the sense that a sentence (a pattern) can be generated by more than one grammar which characterizes the patterns (sentences) generated from a particular pattern class. In terms of statistical pattern recognition [10], this is the case of overlapping pattern classes; patterns belonging to different classes may have the same descriptions or
K. S. Fu
428
b b ~"-t-....~ b
d
b b
b
~
a (a) Median String Representation cbbbabbbbdbbbbabbbcbbbabbbbdbbbbabbb
(b) Submedian String Representation cbabbbdbbbbbabbbcbbbabbbbbdbbbab a
a b a
b b b b b b ~ ' ~ " ~
a
a
(c) Acrocentrie String Representation cadbbbbbbabbbbbcbbbbbabbbbbbda
a
b b ' b~" h Fig. 2. Three s a m p l e chromosomes.
measurement values (but with different probabilities of occurrence). In these situations, in order to model the process more realistically, the approach of using stochastic languages has been suggested [11, 15, 18, 21]. For example, three type s of chromosome patterns and their corresponding string representations are shown in Fig. 2. It is easy to notice that with noise and distortion one type of chromosome (e.g., submedian) could become another type (e.g., median and acrocentric). In other words, the grammar characterizing submedian chromosomes should sometimes (with small probabilities) also generate median and acrocentric chromosomes. Similar conclusions can also be drawn for grammars characterizing median and acrocentric chromosomes. One approach to model the situation is to use one grammar which generates all the three types of chromosome patterns but to use three different sets of production probabilities. The following is a grammar generating median, submedian, and acrocentric chromosome patterns. However, three different production probability assignments result in three different stochastic grammars which characterize median, submedian, and acrocentric chromosome patterns, respectively. Presumably, the stochastic grammar for median chromosome patterns will have high probability of generating median strings than submedian and acrocentric strings; the stochastic grammar for submedian chromosome patterns will have high probability of generating submedian strings, etc. In practice, the production probabilities will have to be inferred from string probabilities or assigned subjectively by the designer [15, 16].
429
Applications of stochastic languages
Chromosome grammar
Gs = (VN, VT, P~, S),
where
VN = {S, A, B, D, H,
J,E,F,W,G,R,L},
vT= {n,l,u,{} abcd
and Ps: 1
P3
S --'AA,
R ~RE,
1
P4
A -'*cb,
R ~HDJ,
Pl
Pl
B ~FBE,
D ---'E,
P2
P2
B ~ HDJ,
D -~ d,
P3
P3
B ~ RE,
D ~ FG,
P3
P3
B ~F L ,
D --, W E ,
P3
P3
W--' WE, P4
W~d, 1
F~b, 1
E ---,b, 1
H ~ a, 1
J--,a,
P3
L ~ FL,
G ---"FG,
P4
P4
L ~ HDJ,
G ~ d,
p l + p2 + 2p3 = l,
p3+P4=l.
By associating probabilities with the strings, we can impose a probabilistic structure on the language to describe noisy patterns. The probability distribution characterizing the patterns in a class can be interpreted as the probability
Par,~
P(xlGI)
"I°' -'
Input
Maximum
x
f
Detector |
~ ~ p(xG I Fig. 3. Maximum-likelihood syntactic pattern recognition system.
x E L(G.) •,b. I
430
K.S. Fu
distribution associated with the strings in a language. Thus, statistical decision rules can be applied to the classification of a pattern under ambiguous situations (for example, use the maximum-likelihood or Bayes decision rule). A block diagram of such a recognition system using maximum-likelihood decision rule is shown in Fig. 3. Furthermore, because of the availability of the information about production probabilities, the speed of syntactic analysis can be improved through the use of this information [29, 37].
5. Application to error-correcting parsing The syntactic decoder discussed in Section 3 is actually implemented as an error-correcting parser with substitution error. In language parsing, in addition to substitution error; that is, the substitution of a symbol by another symbol, two other types of syntactic error also occur. They are the insertion error (the insertion of an extra symbol) and the deletion error (the deletion of a symbol). Application examples of this kind include error-correcting compiler [28] and the recognition of continuous speech [3, 32]. In this section, error-correcting parsing for the three types of error is briefly described. When the error statistics are known or can be estimated, the error transformations become probabilistic. For instance, we may know that the probability of substituting terminal a by a or the probability associated with Ts is qs(bla), the probability of inserting terminal b in front of a is qi(bla), and the probability of deleting terminal a is qo(~Ia) where ~ is the empty sentence or sentence with length zero. With the information of these error transformation probabilities, we can apply the maximum-likelihood decision criterion to obtain an error-corrected sentence in L. If the language L is also stochastic, Bayes decision rule can be used [18, 33]. DEFINITION 5.1. The deformation probabilities that are associated with substitution, insertion and deletion transformations Ts, TI, TD are defined as follows: Ts, qs(b/a)
(1)
xay t - -
xby
where qs(b/a) is the probability of substituting terminal a by b, 7"I, qi(b/a)
(2)
xay I
xbay
where qi(b/a) is the probability of inserting terminal b in front of a, TD, qD(a)
(3)
xay I - -
xy
Applications of stochastic languages
431
where qo(a) is the probability of deleting terminal a from a string, Tt,qi(a)
(4)
xI
xa
where q~(a) is the probability of inserting terminal a at the end of a string. It can be shown that the deformation probabilities for both single-error and multiple-error cases are consistent if E qs(b/a)+qD(a) + E qi(b/a)+q(a) b~ bvaa
=1
bE.Y,
for all a E 2, where q(a) is the probability that no error occurs on terminal a [18, 33]. Let L(G~) be a given stochastic context-free language and y be an erroneous string, y ~ L(G~). The maximum-likelihood error-correcting parsing algorithm searches for a string x, x E L(G~) such that
q(y/x)p(x) =
max ( q ( y / z ) p ( z ) l z E
L(Gs) )
Z
where p ( z ) is the probability of generating z by G~. By adopting the approach of constructing covering grammars used in [1], we present an algorithm of expanding the original stochastic context-free grammar to accomodate the stochastic deformation model as follows: ALGORITHM 5.2 (Stochastic error-induced grammar) Input. A stochastic context-free grammar. Output. G" = (N', 2', P~, S'), the stochastic grammar induced by error transformations for GS. Method Step 1. N ' = N U { S ' } U ( E a I a E 2 }. Step 2. Z' D_Z. Step 3. If A ~ Paobialb2a2 • • • bmam, m >f O, is a production in Ps such that a i is in N*, and b~ is in 2, then add the production A ~ PetoEbaiEb2...Eb, a m to P~', where each Eb, is a new nonterminal, E b E N'. Step 4. Add to P~' the productions (a) S'---~ 1-qis where q~ = ~aEz,q~(a). (b) S' ~ q'I(a)S'a for all a E 2'. Step 5. For all a E 2, add to P~' the productions (a) E~ ~ q~a, (b) E a ~ qs(bla)b for all b E 2 ' , b vL a,
(c) E~ ~ qo(°~,, ( d ) E s ~ qI(bla)bEa f o r all b E Y , ' .
432
K. S. Fu
Suppose that y is an error-deformed string of x, x = a l a 2 . . , an. By using productions added to P~' by Step 3, we have S --,g~X where X = EaEa2" • • Ea, if and only if S ~ x where Pi is the ith derivation of x in G~. Applying Step 4(a) first and then repeatedly applying Step 4(b), we can further derive S'--'g!Xan+ l where p~=piq'(an+l). The productions in Step 5 generate Ea, ~q!~'la')a i for all 1 <- i <- n, a l, ot2 ,Otn+ 1 is a partition ofy. Step 5(a)-(d) corresponds to nonerror transformation, substitution transformation, deletion transformation and insertion transformation, which allows multiple insertions, respectively. Thus, the stochastic language generated by G' s is ....
x~G
s ri
where r i is one form of deforming x to y. The consistency of L(G~) can be proved [33]. It is proposed to use a modified Earley parser on G" to implement the searching of the most likely error correction. The algorithm is essentially Earley's Mgorithm [2] with a provision added to keep accumulating the probabilities associated with each step of derivations. ALGORITHM 5.3 (Maximum-likelihood error-correcting algorithm) Input. A stochastic error-induced grammar G~'-- (V', 27, P~', S) of Gs, and string y = bib 2. • • bm in ~7". Output. q(ylGs) and string x, such that q ( y l x ) p ( x ) --- q(ylG~).
Method Step 1. Set j = 0 and add [ E - , .S',0,1] t o l j . Step 2. (a) If [A ~ a.Bfl, i, p] is in Ij, and B -~q is a production in p~, add item [B ~ "T, j, q] to Ij. (b) If [A -~ a., i, p] is in I i and if no item of the form [B ~ flA. y, k, r] can be found in Ij, add a new item [B ~ flA.y, k, pq] to Ij. If [B -~ flA. y, k, r] is already in Ij, then replace r by pq if pq > r. Step 3. If j = m go to Step 5. Otherwise, j - - j + 1. Step 4. For each item in Ij_ 1 of the form [ A ~ a . b j f l , i,p] add item [A ~ abj .fl, i, p] to Ij, go to Step 2. Step 5. If item [E ~ S'-,0, p] is in I m, then q(ylG~) = p, stop. Algorithm 5.3 together with Algorithm 5.2 is called Maximum-Likelihood Error-Correcting Parser. For a multiclass classification problem assume that there are K classes of patterns, denoted C l, C2..... Ck, each of which is described by a set of strings C 1, C2,..., Ck. The grammar G i inferred to characterize strings in C i satisfied L(Gi), C i for all i, l<-i<~K. We call strings in L ( G i ) - C i as unwanted or illegitimate strings caused by grammar error. By assigning very low probabilities to unwanted strings, a properly inferred stochastic grammar can discriminate unwanted and wanted strings from their frequencies of occurrence [15].
Applications of stochastic languages Using Bayes rule, for a given string y, y E L(Gi) for all i, l ~ i < - K , posteriori probability that y is in class j can be c o m p u t e d as
433 the a
q( yICj )P( Cj ) e ( f j l y ) : ~K ~=,q(yICi)P(C~) where P(C,) is the a priori probability of class C i. The Bayes decision rule that classifies y as class Cj is as follows,
P ( C j l y ) : miax { P ( C ~ y)ll<~i<~k}.
6.
Stochastic tree grammars and languages In this section we present some major results on stochastic tree languages [4].
DEFINITION 6.1 A stochastic tree grammar Gs is a 4 tuple G s = (V, r', P, S ) over ranked alphabet (V-r, r > where -
is a finite ranked alphabet such that V x _ V and r'[VT = r. T h e elements of VT and V C_ VT = VN are called terminals and nonterminal symbols respectively. -P is a finite set of stochastic p r o d u c t i o n rules of the form q~~ p xo where ~ and '/" are trees over (V, r ' ) and 0 ~< p ~< 1. -S C_T v is a finite set of start symbols, where T v is the set of all trees over V. DEFINITION 6.2. a --' Pfl is in G s if and only if there is a p r o d u c t i o n ~ --, p g" in P such that a[a=ep and f l = ( a ~ P q ' ) a ; i.e., ~ is a subtree of a at 'a' and /3 is obtained by replacing the occurrence of ~p at ' a ' by ~b. We write a --, P/3 in GS if and only if there exists a ~ D~, the d o m a i n of a, such that a ~Pafl. DEFINITION 6.3.
If there exists a sequence of trees t 0, tl,...,t m such that Pi
a = t 0,
f l = t m,
ti- 1 ---~ti,
i=l,...,m,
then we say that a generates fl with probability p = IIi=_Tpi and denote this derivation by a ~-- off or a ~ Pfl. The sequence of trees t 0 , . . . , t m is called a derivation of 13 from a. The probability associated with this derivation is equal to the p r o d u c t of the probabilities associated with the sequence of stochastic productions used in the derivation.
K. S. Fu
434
DEFINITION 6.4.
The language generated by stochastic tree g r a m m a r G~ is
L(Gs) =
( t , p ( t ) ) [ t E T v T , S = t ,~i = l
.... , k a n d p ( t )
= ~E~ ~PJ}
where Tv~ is the set of all trees over VT, k is the number of all distinctly different derivation of t from S and ps is the probability associated with the j t h distinct derivation of t from S.
DEFINITION 6.5. A stochastic tree grammar G~=(V, r', P, S) over (VT, r ) is simple if and only if all rules of p are of the form p
Xo-~
q
x
/ \ X I " - Xr(x)
,
Xo-~ Xl,
or
x
//\ X l " * Xr(x)
r
~Xo
where X 0, X l ..... X~(~) are nonterminal symbols and x E VT is a terminal symbol and 0 < p, q, r ~< 1. A rule with the form
X 0----~
x
J \
X 1 . . . X,(~) can also be written as X o ~ xX~. • • Xr(~).
LEMMA 6.6. Given a stochastic tree grammar G~ = (V, r, P, S ) over ( V T, r}, one can effectively construct a simple stochastic tree grammar G2 = (V', r', P', S') over V T which is equivalent to Gs. PROOF. To construct G~', examine each rule ~i-~P~i of P. Introduce new symbols U,~ and V~ for each a E D~,i and b E D+. Let P ' consist of: (1) rules that contract the tree ~i a level at a time having the form xU2.~... U(/., ~ U0` where ~i(a) = x E VT. and VTo is the set of terminal symbols with rank n; (2) the rule Ud -, PVoi; (3) rules of the form V2 ~ lxV,.1 ..... V2., that expand Vo/ to the tree ~ . The construction of Gs' is clearly effective. We must now show L(G~) = L(Gs). Note that d~i~---P@iin G" for each rule ~i----~P~i in P. Suppose that a-~Pfl in Gs. Then for some rule q,i-~Pq~ in Gs,:a[a=e~i and fl ~ (a ~--P@)a i.e., ~i is a subtree of a at a and fl is obtained by replacing the occurrence of 0~ at a by ~pgwith probability p.
435
Applications of stochastic languages
By the above argument a l a = @i~ p @i = • Ia in G2 hence
)(')
~-- a ~ a l a a
a~-flla a = a~gtl a = f i n G 2.
Thus a ~ - eft in G~ implies a~--Pfl in G" and L(Gs)_(G~). For the converse, suppose a~ L(G'), i.e., S ~ P a in Gs and aE TvT. A deduction of S ~ P a in G~ may be constructed as follows: Examine the deduction S [--Pa in G~. Each time a rule Uo' ~ PV0, is applied at b, apply the rule ~--, P ~ at b. The result will be a deduction of S~---Pa in G~, since if the rule Uo'~PV0' can be applied at b, all contracting rules in pi (i.e. those involving U~, aEDi) must have been applied previously at the corresponding addresses b. a and all expanding rules of pi (i.e., those involving Va~, a E Di) must be applied later at b • a, since all symbols Ud and Vo' are elements of V ' - VT. Note that the application of a single rule (~i ~ p ~/ti in Gs simulates the application of all rules of P'. An example should make this very clear. EXAMPLE 6.7. Let GS= (V, r, P, S) where V = { +, X, S}, VT = { + , X }, r ( + ) = 2, r(X) = 0 and P:
(1)
p
S~
/
+,
(2)
\
S
q
S~X,
p+q=l.
×
In this case P': (1')
SPUo l,
(4')
V,'~S,
(7')
Uo2 ~ V 0,
1
1
2
(2')
UolLV01,
(5')
V2'~X,
(8')
Vo2 ~ × .
1
(3')
V? L -J- ~,-~Iv.1 1 2,
(6')
squ?,
1
Note the productions (1'),(2'),...,(5') in P ' are the result of production (1) in P and productions (6'), (7'), (8') are due to production (2) in P. A simple deduction in Gs is as follows: S PUo ~ ~, V°~ ~, _~_V11V21 ~1
+ SV21 ,
1
1
st
The corresponding deduction in Gs is p
s--,+s×
q
--,+xx,
Pq
sI -
+XX.
Pq
+xx.
436
K.S.
Fu
Note that if the tree ¢Pi on the left-hand side of the production rule is a single symbol of alphabet V, we will have no contracting production rules in our grammar. DEFINITION 6.8. A stochastic tree grammar Gs=(V, r, P, S) over ( V T, r ) is expansive if and only if each rule in P is of the form p X 0 ---~
p x
/
or
\
Xo---~x
X l • •. Xr(x) where x E VT and X0, X 1..... Xr~x) are non-terminal symbols contained in V - VT. Following is a stochastic expansive tree grammar.
EXAMPLE 6.9.
G s = ( V , r , P , S ) over (VT, r ) where V N = V - V (a, b,$), r(a) = r(b) -- (2,0), r($) = 2, and P: 1.0
(1)
S~
=(S,A,B,C),VT=
p
S
,
/\ A
(2)
B
a
(5)
,
/\ A
1
B--. b ,
1--p
A--,
q
(4)
T (3)
A - - , a,
(6)
C~a,
B
q
B ~ b,
1.0
/
C 0~
0~
Define a mapping h: TvT -~ V~o as follows:
DEFINITION 6.10. (i)
h(t)=xift=xEVTo.
(ii)
h
t
• n
(Obviously, p ( t ) = p ( x ) ) .
--h(tl)...h(tn)
ifx~VT,,n>O.
Obviously,
p( /x\
)=p(x)p(tl)...p(tn).
t I • •. t n
The function h forms a string in VTo obtained from a tree t by writing the frontier of t. N o t e frontier is obtained by writing in order the images (labels) of all end points of tree 't '.
If L 7 & a stochastic tree language, then h ( L T ) & a stochastic context-free language with the same probability distribution on its strings as the trees of L z. Conversely, if L(G') is a stochastic context-free language, then there is a stochastic tree language L z such that L( G~) = h ( L T ) and both languages have the same probability distribution. THEOREM 6.11.
Applications of stochasticlanguages
437
PROOF. By Lemma 6.6, if L T is a stochastic tree language, there is a simple stochastic tree grammar Gs = (V, r, P, S ) such that L T = L(G~). Let
fI
,
P
P
P = I X° -~ X1" " " X" x E VT ' n > O' X° -->
U { Xo P x X P x ~
/\ Xl..- X.
P, x E VTo}.
Then if G~ is the stochastic context-free grammar ( V - VT, Vx, P', S)
t ( as) : h( Z ( as) )
:h(ZT).
For the converse, suppose L(G') is generated by the stochastic context-free grammar G~'= (V' - V~, V~, P', S). It may be assumed that all rules of G~ are of the form (Chomsky Normal Form) P
Xo .-+Xl X 2 or
P
Xo --) x
where X o, X l, X2E ( V ' " V~) and x ~ V,'Y0" Let Vy = V~to ( + } and V = V' tO ( + } where + ~ V~. Let r(x) = O, x E V~r and r(+)=2. Let
p=
X o P+X1X2E P'] Xo ~p / \ + x~ x 2
U {XoLx XoLx~p').
Let Gs = (V, r, P, S), then if Z T = Z ( G s ) , Z(Gs) = h(Z(Gs) ) = h ( L T ) . completes the proof.
This
DEFINITION 6.12. By a consistent stochastic representation for a language L(G~) generated by a stochastic tree grammar Gs, we mean that the following condition is satisfied. £
p(t) =1
t ~ L(Gs) where -t is a tree generated by G~, -L(Gs) is the set of trees generated by Gs, -p(t) is the probability of the generation of tree 't'. The set of consistency conditions for a stochastic tree grammar Gs is the set of conditions which the probability assignments associated with the set of stochastic
K.S. Fu
438
tree productions in Gs must satisfy such that Gs is a consistent stochastic tree grammar. The consistency conditions of stochastic context-free grammars has been discussed in Section 2. Since non-terminals in an intermediate generating tree appear only at its frontiers, they can be considered to be causing further branching. Thus, if only the frontier of an intermediate tree is considered at levels of branching and, due to Theorem 6.11, the consistency conditions for stochastic tree grammars are exactly the same as that for stochastic context-free grammars and the tree generating mechanism can be modelled by a generalized branching process [23]. Let P =/'A U/~A2U "-" tOFAKbe the partition of P into equivalent classes such that two productions are in the same class if and only if they have the same premise (i.e., same left-hand side non-terminal). For each FAy define the conditional probabilities (p(tlAj) } as the probability that the production rule Aj--, t, where t is a tree, will be applied to the non-terminal symbol Aj where ZrAjp(tlAj) =1.
Let ~t(t) denote the number of times the variable A t appears in the frontier of tree ' t ' of the production A j --, t. DEFINITION 6.13.
For each Faj, j = I , . . . , K , define the function gj( S1, S2..... SK) as
&(&, 32.... ,SK) = ~,p(tlAj)SfJ"(')"" rA, EXAMPLE 6.14.
K-argument generating
S,~'K(')
For the stochastic tree grammar GS in Example 6.9,
gl(S1,S2,S3,S4):P(/$ lS)$283=-$2S3, A B
g2(81,32,33,84): p ( / a
IA)3233-~p(alA )
A B = pS2S3 +(1- p), g3(S,,Sz,S3,S4)=P( ~'B)S4 + p(blB)=qS4 +(t-q), C g4(S1, 82, 83, 54) = P(alC) = 1.0. These generating functions can be used to define a generating function that describes all ith level trees. Note that for statistical properties, two i th level trees are equivalent if they contain the same number of non-terminal symbols of each type in the frontiers.
Applications of stochastic languages
439
DEFINITION 6.15. The ith level generating function Fi(S 1.... , S 2 .... ,SK) is defined recursively as Fo( SI, S2 ..... SK ) = S,, F,( S,, S2 ..... Sk) = g,( S,, S z , . . . , S K )
S2 .... ,SK) =
S2 .... , & ) ,
g2(Sl, 82 .... ,SK ) .... ,gK(S1,82 ..... Sk ) ] . F/(St, S2..... S/c) can be expressed as F/(S l, S 2 , . . . , S K ) = G,(S1, $2 ..... S x ) + Ci where Gi(- ) does not contain any constant term. The constant term Ci corresponds to the probability of all trees t E L(G~) that can be derived in i or fewer levels. THEOREM 6.16. A stochastic tree grammar Gs with unrestricted probabilistic representation R is consistent if and only if lim Ci = 1. PROOF. If the above limit is not equal to 1, this means that there is a finite probability that the generation process enters a generating sequence that has a finite probability of never terminating. Thus, the probability measure defined upon L(Gs) will always be less than 1 and R will not be consistent. On the other hand, if the limit is 1, this means that no such infinite generation sequence exists since the limit represents the probability measure of all trees that are generated by the application of a finite number of production rules. Consequently R is consistent. DEFINITION 6.17. The expected number of occurrences of non-terminal symbol Aj in the production set FAi is
eij = Ogi(S,, 0sjSa'""SK) s,.s2 ..... sK =, The first moment matrix E is defined as E = [e,j],<_i,g<_K LEMMA 6.18. A stochastic tree language with probabilistic representation R is consistent if all the eigenvalues of E are smaller than 1. Otherwise, it is not consistent.
440
K. S. Fu
EXAMPLE 6.19. In this example consistency conditions for the stochastic tree grammar Gs in Example 6.9 (as verified in part (a)) are found, and thus the consistency criterion verified. (a) The set of trees generated by GS is as follows: Tree (t) $
/\
(1-p)(1-q)
a
b $
/\ a
(l-p)q b
\
/\
a
$
p(1 - p)(1 - q)2
a
b
/\
a
/\ /\
a •
•
b
$
p(1--p)q 2
a
b\ b \
•
Probability of generation [p (t)]
a a
etc.
In all the above trees production (1) is always applied. If production (2) is applied ( n - 1) times, there will be one 'A' and n 'B's in the frontier of such obtained tree. Production (3) is then applied when no more production (2) is needed. In the n 'B's in the frontier, any one, two, three or all n 'B's may have production (4) applied and to the rest of ' B ' production (5) is applied• Production (6) always follows production (4). Thus we have
P( t ) = (1- p ) P°[ 'Co(1- q ) + 'C,q] t ~ L(G s)
+(1-- p)pt[2Co(1-- q)2 + 2C,q(1-- q)+ ZC2q2] +(1-p)p2[3Co(1-q)3 +3C,q(1-q) 2 + 3C2q2(1 - q ) + 3C3q3]
+(1-p)p"
'["Co(1-q)" +"C,(1-q)"-' q + ' " + n Cr(l_q)
n
rqr+...+nC,,qn ] + . . .
Applications of stochastic languages
441
Note that the power of p in the above terms shows the number of times production (2) has been applied before applying production (3). So
p(t)=(1-p)[~-q+q] t@ L(Gs)
q+qt +(1--p)p"-l[(~-q+q)"]+... or
Z p(t)=(1-p)+(1-p)p+...(l--p)p" ~+... t E L(Gs)
=(l_p)[l+pl+pZ+...p, 1+...] =(1-p)X
1
(ifp
Hence, Gs is consistent for all values of p such that 0 ~ p <~1. (b) Let us find the consistency condition for the grammar Gs using Lemma 6.18 and verify the consistency criterion. From Example 6.7 we obtain
E=
0 ,01 0 0
p 0
p 0
0 q "
0
0
0
0
The characteristic equation for E is ~b(~-)=('r--p)~ -3. Thus, the probability representation will be consistent as long as 0 ~
7.
Application of stochastic tree grammars to texture modelling
Research on texture modelling in picture processing has received increasing attention recently [43]. Most of the previous research has concentrated on the statistical approach [22, 42]. An alternative approach is the structural approach [31]. In the structural approach, a texture is considered to be defined by subpatterns which occur repeatedly according to a set of well-defined placement rules within the overall pattern. Furthermore, the subpatterns themselves are made of structural elements. We have proposed a texture model based on the structural approach [18, 34]. A texture pattern is divided into fixed-size windows. Repetition of subpatterns or a portion of a subpattern may appear in a window. A windowed pattern is treated
442
K S. Fu
T ¢-t111/i1 t
+
i/1 I
t
] I
Starting point
i.i [/ .liliti
[
i
.+-.+-.+~.1:i!1!1[ (a) Structure A
Starting point
i .....
+
• -U
:i:!
....
4-,
"-" - ° -
I
.
.
.
.
. - - ,
__,
o_
(b) Structure B Fig. 4. Two tree structures for texture modeling; (a) Structure A, (b) Structure B.
as a subpattern and is represented by a tree. Each tree node corresponds to a single pixel or a small homogeneous area of the windowed patterns. A tree grammar is used to characterize windowed patterns of the same class. Two convenient tree structures and their corresponding windowed patterns are illustrated in Fig. 4. The advantage of the proposed model is its computational simplicity. The decomposition of a pattern into fixed-size windows and the use of a fixed tree structure for representation make the texture analysis procedures and its implementation very easy. However, the proposed model is very sensitive to local noise and structural distortion such as shift, rotation and fluctuation. In this section we will describe the use of stochastic tree grammars and high level syntax rules to model local noise and structural distortions. Figs. 5a and 5b are digitized pictures of the patterns D22, and D68 from Brodatz' book Textures [8]. For simplicity we use only two primitives, black as primitive '1', and white as primitive '0'. For pattern D22, the reptile skin, we may consider that it is the result of twisting the regular tessellation such as the pattern shown in Fig. 6. The regular tessellation pattern is composed of two basic
Appfications of stochastic languages
443
Fig. 5a. Texture pattern: D22, reptile skin.
subpatterns shown in Fig. 7. A distorted tessellation can result from shifting a series of basic subpatterns in one direction. Let us use the set of shifted subpatterns as the set of primitives. There will be 81 such windowed pattern primitives. Fig. 8 shows several of them. A tree grammar can be constructed for the generation of the 81 windowed patterns [18, 34]. Local noise and distortion of the windowed patterns can be taken care of by constructing a stochastic tree grammar. The procedure of inferring a stochastic tree grammar from a set of texture patterns is described in [18, 35]. A tree grammar for the placement of the 81 windowed patterns can then be constructed for the twisted texture pattern. A generated pattern D22 using a stochastic tree grammar is shown in Fig. 9. The texture pattern D68, the wood grain pattern, consists of long verticle lines. It shows a higher degree of randomness than D22. N o clear tessellation or subpattern exists in the pattern. Using verticle lines as subpatterns we can construct a stochastic tree grammar G68 to characterize the repetition of the subpatterns. The density of verticle lines depends on the probabilities associated with production rules. Fig. 10 shows two patterns generated from G68 using different sets of production probabilities.
444
IC S. Fu
"]i i i iH~i~'i !i= =i i ~
-
:
iii~il: ! :I!!
i
g
71 Fig. 5b. Texture pattern: D68, woodgrain.
Fig. 6. The ideal texture of pattern D22.
Applications of stochastic languages
445
***
|
*******/ ***
J
(a) Fig. 7. Basic pattern of Fig. 6.
*.1I:1I:~ k
A]
*~**
A2
A5
*** ****'1
r ,x****~ ********1 [
*Ill*
Zl
Dq
[3 2
[3 5
Fig. 8. Windowed pattern primitives.
I i~,lIF, IF,4~,IP:4|~,f,v,1 ~,,~MI~.4M|,,,L A i Q ' , I i !
,
'
ainu-, ~'ql~r"
~!1%
o
u
..~
=
'=
~i ,~" ¢~ ~ ~ . . . . r?~ " " " [
~t~~,4
iJ,,,O:,l~,~,l!i~l. ~I
,I~',
,'lilr.~W.l~"
~"'" . . . .
I~
"~R"...,,
,,
.
.
.
"-
•
,i, l~,|i~,.|&:.Jti.,:Jill,,i|~iJil;,,lI,Q.i
Fig. 9. Synthesis results for pattern D22.
446
K. S. Fu
Fig. 10. Synthesis results for pattern D68.
G68 = (V, r, P, S) where V : {S, A, B,0, 1}, VT = {0, 1), r(0) : r(1) = {0, 1,2,3}, and P is 0 0.5 s-*/l\, A
SA
A
0
0.09
s-~ /
I\,
A
0
0.05
s -* / 0.09
0 0.09 s -* / I \ ,
\, A
B
1
0.09
I\,
B
SB
A
0.85
0.10
SB
s-* /
I\,
BSA
SA
s-* /
1
0.09
s-* / 1
0.90
A-*0,
I\, SB
I
A 0.05
A -* 0,
8.
0.05
A -*0,
B-*
1,
B ~
1,
I
L
I
B
B
A
0.05
B-*I.
Conclusions and remarks
Stochastic string languages are first introduced and some of their applications to coding, pattern recognition, and language analysis are briefly described in this paper. With probabilistic information about the process under study, maximumlikelihood and Bayes decision rules can be directly applied to the coding/decoding and analysis of linguistic source and the classification of noisy and distorted linguistic patterns. Stochastic finite-state and context-free languages are easier to analyze compared with stochastic context-sensitive languages; however, their descriptive power of complex processes is less. The consistency problem of
Applications of stochastic languages
447
stochastic context-sensitive languages is still a problem of investigation. This, of course, directly limits the practical applications of stochastic context-sensitive languages except in a few special cases (e.g., stochastic context-sensitive languages generated by stochastic context-free programmed grammars [15]). Only very limited results in grammatical inference are practically useful [16, 18]. Efficient inference algorithms are definitely needed before we can design a system to automatically infer a grammar from sample sentences of a language. Stochastic tree grammars are also introduced and some of their properties studied. Tree grammars have been used in the description and modelling of fingerprint patterns, bubble chamber pictures, highway and rivers patterns in LANDSAT images, and texture patterns. In order to describe and model noisy and distorted patterns more realistically, stochastic tree grammars have been suggested. We have briefly presented some recent results in texture modeling using stochastic tree grammars. For a given stochastic tree grammar describing a set of patterns we can construct a stochastic tree automation which will accept the set of patterns with their associated probabilities [4]. In the case of multiclass recognition problems, the maximum-likelihood or Bayes decision rule can be used to decide the class label of an input pattern represented by a tree [15, 18]. In order to characterize the patterns of interest realistically, it would be nice to have the stochastic tree grammar actually inferred from the available pattern samples. Such an inference procedure requires the inference of both tree grammar and its production probabilities. Unfortunately, a general inference procedure for stochastic tree grammars is still a subject of research. Only some very special cases in practice have been discussed [7, 35].
References [1] Aho, A. V. and Peterson, T. G. (1972). A minimum distance error-correcting parser for context-free languages. S I A M J. Comput. 4. [2] Aho, A. V. and Ullman, J. D. (1972). Theory of Parsing, Translation and Compiling, Vol. 1. (Vol. 2: 1973). Prentice-Hall, Englewood Cliffs. [3] Bahl, L. R. and Jelinek, F. (1975). Decoding for channels with insertion, deletion and substitutions with applications to speech recognition. IEEE Trans. Inform. Theory 21, 4. [4] Bhargava, B. K. and Fu, K. S. (1974). Stochastic tree system for syntactic pattern recognition. Proc. 12th Annual Allerton Conf. on Comm., Control and Comput., Monticello, IL, U.S.A. [5] Booth, T. L. (1974). Design of minimal expected processing time finite-state transducers. Proc. IFIP Congress 74. North-Holland, Amsterdam. [6] Booth, T. L. (1969). Probability representation of formal languages. IEEE lOth Annual Syrup. Switching and Automata Theory. [7] Brayer, J. M. and Fu, K. S. (1977). A Note on the k-tail method of tree grammar influence, IEEE Trans. System Man Cybernet. 7 (4) 293-299. [8] Brodatz, P. (1966). Textures. Dover, New York. [9] Chomsky, N. (1956). Three models for the description of language. IEEE Trans. Inform. Theory 2, 113-124. [1o1 Fu, K. S. (1968). Sequential Methods in Pattern Recognition and Machine Learning. Academic Press, New York.
448
K. S. Fu
[11] Fu, K. S. (1972). On syntactic pattern recognition and stochastic languages. In: S. Watanabe, ed., Frontiers of Pattern Recognition. Academic Press, New York. [12] Fu, K. S. and Huang, T. (1972). Stochastic grammars and languages. Internat. J. Comput. Inform. Sci. 1 (2) 135-170. [13] Fu, K. S. (1973). Stochastic languages for picture analysis. Comput. Graphics and Image Processing 2 (4) 433-453. [14] Fu, K. S. and Bhargava, B. K. (1973). Tree systems for syntactic pattern recognition. IEEE Trans. Comput. 22, 1087-1099. [15] Fu, K. S. (I974). Syntactic Methods in Pattern Recognition. Academic Press, New York. [16] Fu, K. S. and Booth, T. L. (1975). Grammatical inference: Introduction and survey, I-II, IEEE Trans. Systems Man Cybernet. 5; 95-111,409-423. [17] Fu, K. S. (1976). Tree languages and syntactic pattern recognition. In: C. H. Chen, ed., Pattern Recognition and Artificial Intelligence. Academic Press, New York. [ 18] Fu, K. S. ( 1981). Syntactic Pattern Recognition and Applications. Prentice-Hall, Englewood Cliffs. [19] Fung, L. W. and Fu, K. S. (1975). Maximum-likelihood syntactic decoding. IEEE Trans. Inform. Theory 21. [20] Fung, L. W. and Fu, K. S. (1976). An error-correcting syntactic decoder for computer networks. Internat. J. Comput. Inform. Sci. 5 (1). [21] Grenander, U. (1969). Foundation of pattern analysis. Quart. Appl. Math. 27, 1-55: [22] Haralick, R. M., Shammugam, K. and Dinstein, I. (1973). Texture features for image classification. IEEE Trans. Systems Man Cybernet. 3. [23] Harris, T. E. (1963). The Theory of Branching Processes. Springer, Berlin. [24] Hellman, M. E. (1973). Joint source and channel encoding. Stanford Electronics Lab., Stanford University. [25] Hutchins, S. E. (1970). Stochastic sources for context-free languages. Ph.D. Dissertation, University of California, San Diego, CA. [26] Jelinek, F. (1968). Probabilistic Information Theory. McGraw-Hill, New York. [27] Keng, J. and Fu, K. S. (1976) A syntax-directed method for land use classification of LANDSAT Images. Proc. Syrup. Current Math. Problems in Image Sci. Monterey, CA, U.S.A. [28] Lafrance, J. E. (1971). Syntax-directed error-recovery for compilers. Rept. No. 459, Dept. of Comput. Sci., University of Illinois, IL, U.S.A. [29] Lee, H. C. and Fu, K. S. (1972). A stochastic syntax analysis procedure and its application to pattern recognition. IEEE Trans. Comput. 21, 660-666. [30] Li, R. Y. and Fu, K. S. (1976). Tree system approach to LANDSATdata interpretation. Proc. Symp. Machine Processing of Remotely Sensed Data, Lafayette, IN, U.S.A. [31] Lipkin, B. S. and Rosenfeld, A., eds. (1970). Picture Processing and Psychopietories, 289-381. Academic Press, New York. [32] Lipton, R. J. and Snyder, L. (1974). On the optimal parsing of speech. Res. Rept. No. 37, Dept. of Comput. Sci., Yale University. [33] Lu, S. Y. and Fu, K. S. (1977). Stochastic error-correcting syntax analysis for recognition of noisy patterns. IEEE Trans. Comput. 26, 1268-1276. [34] Lu, S. Y. and Fu, K. S. (1978). A syntactic approach to texture analysis. Comput. Graphics and Image Processing 7. [35] Lu, S. Y. and Fu, K. S. (1979). Stochastic tree grammar inference for texture synthesis and discrimination. Comput. Graphics and Image Processing 8, 234-245. [36] Moayer, B. and Fu, K. S. (1976). A tree system approach for fingerprint pattern recognition. IEEE Trans. Comput. 25 (3) 262-274. [37] Persoon, E. and Fu, K. S. (1975). Sequential classification of strings generated by SCFG's. lnternat. J. Comput. Inform. Sci. 4 (3) 205-217. [38] Smith, W. B. (1970). Error detection in formal languages. J. Comput. System Sci. [39] Souza, C. R. and Scholtz, R. A. (1969). Syntactical decoders and backtracking S-grammars. ALOHA System Rept. A69-9, University of Hawaii. [40] Suppes, P. (1970). Probabilistic grammars for natural languages. Syntheses 22, 95-116.
Applications of stochastic languages
449
[41] Tanaka, E. and Fu, K. S. (1976). Error-correcting parsers for formal languages. Tech. Rept. EE-76-7, Purdue University. [42] Weszka, J. S., Dyer, C. R. and Rosenfeld, A. (1976). A comparative study of texture measures for terrain classification. IEEE Trans. Systems Man Cybernet. 6. [43] Zucker, S. W. (1976). Toward a model of texture, Comput. Graphics and Image Processing 5, 190-202.