0031 3203/86 $3.00+ .00 Pergamon Journals Ltd. Pattern Recognition Society
Pattern Re~oglnition. Vol. 19. No 5, pp. 407 412. 1986
Printed in Great Britain.
HIGH SPEED ERROR CORRECTION OF P H O N E M E SEQUENCES EIICHI TANAKA, TAKANORI TOYAMA and
SACHIKO KAWAI
Department of Information Science, Faculty of Engineering, Utsunomiya University, 2753 Ishiimachi, Utsunomiya 321, Japan {Receired 13 September 1985: in rerisedJbrm 18 December 1985)
Abstract This paper describes two high speed error-correction methods for garbled phoneme sequences. The methods use a confusion matrix skillfully to attain both high correction rates and computation time reduction comparing with a dictionary method. The experiments were conducted with the proposed methods using 5561 phoneme sequences, and the methods showed good performance, that is, higher correction rates and less computing time than the dictionary method. Spelling-correction Levenshtein distance
Substitution
Insertion
I. I N T R O D U C T I O N
The requirements of spelling correction are found in O C R output correction, typographical error correction, speech recognition and linguistic information transmission and storage. Many spelling correction methods have already been proposed. They are classified into two classes, namely, the methods which use statistic information such as n-gram, confusion matrix, occurrence information, etc; and the methods based on similarities or distances) t 3~Recently, researchers have taken a growing interest in fast correction methods. Itahashi and Yokoyama ~'*~ reported that the set of words which have a few designated phonemes is rather small. For instance, if three phonemes are designated in 5620 high occurrence Japanese words, the number of words including those phonemes is about 128. Sugamura and Furui I5~ proposed the S P L I T method to use pseudo-phoneme templates to reduce the amount of computer computation for large vocabulary word recognition. In the work of Kurita and Aizawa, ~6~ a high speed correction method for only substitution errors was reported. Using this method they obtained a slightly lower correction rate than the dictionary method ~v/but a speed which was ten times faster. Tanaka, Kohashiguchi and Shimamura t8~ presented another high speed correction method for substitution errors. This paper describes two fast error correction methods based on the W E D method/lt~ These methods are considered as generalizations of the methods described in Ref. (8).
one
dimensional
weighted
Speech recognition
Let X = x l x , " ... x,, and Y = YlY2 ... Y, be two finite strings of symbols from a given alphabet E. Define a mapping M~ from X to Y. Ms is a set of pairs of integers (i, j) which satisfies the following conditions, where i and j are labels for x~ and Yi, respectively. (1)
1 <_i<_m,l
12)
For ( i l , j t ) , (i2,./2)E Ms,
(a)
it = is iffjl
(b)
i~ < i2 iff.jl <]~.
= J2-
(2.1)
Let u~ be the number of elements {i,j) in M, such that x~ ¢ yj. Let ~'~ = n - IM~I and w, = m - [M,[; u~, Vs and w, are considered to be the number of substitutions, of insertions of extra symbols, and of deletions, respectively, to transform X to K Here I M, I denotes the number of elements in M,. The one dimensional weighted Levenshtein distance ( l W L D ) from X to Y, denoted by WLD(X, Y), is defined to be: WLD(X, Y) = min {p* u~ + q * v s + r * w,1,(2.2) s
where p, q and r are nonnegative weights assigned to a substitution, an insertion and a deletion, respectively. l W L D has following properties: {1)
WED(X, Y) > 0, with equality iffX = Y.
(2)
WED(X, Z) + WED(Z, Y) > WED(X, Yt. (2.3)
{31 WED(X, Y ) = W L D ( Y , X ) , i f q
= r.
WED(X, Y) can be computed by the following recurrence relation.
2. P R E L I M I N A R I E S
2.1. The metrid t ~
Deletion
Lecenshtein
dli, j) = m i n [ d ( i - l,./)+r, d ( i - l , j -
1)
+p(i,.j), d ( i , j - 1)+q],
407
(2.4)
408
EUCHI TANAKA, TAKANOR1 TOYAMA a n d SACHIKO KAWAI
where
d(i, O) = i * r, d(0, j) = j * q, p(i, j) =
p,
if x, :~ y~.
0,
otherwise.
(2.5)
Then WLD(X, Y) = d(m, n).
(2.6).
If we write ct = /p~s/(pease) using class names, we have C(~t) = GIGTG 4 which is called the class name expression of ~./kTs/(kiss) and/c6t/(cote) have C(~) as their class name expression. In other words, we can classify a dictionary by class name expressions. Let C1, C 2..... C, be the class name expressions generated by phoneme sequences in Dict. Let D(Ck) (k = 1..... n) be a subdictionary with a catch word C k.
The computational complexity to compute WLD(X, Y) is O(mn). 2.2. Error-correction using 1 WLD.
Dict = ~
D(Ci) ca D(Ci) = ,~ (the null set),
of garbled phoneme sequences
Let Dict, ~ and :t' be a dictionary of phoneme sequences, a phoneme sequence in Dict and a garbled phoneme sequence of ~, respectively. Consider an error-correction method for a garbled phoneme sequence ~'. Define WLDmi . as follows: WLD~i ~ = min {WLD(X, :t')[X ~ Dict}. x
(2.7)
(1) Assume that there is only one phoneme sequence X which satisfies WLDmi n = WLD(X, ~'); then: (a) If X = ct, ~t' is identified with a correctly. (b) If X :~ ~, ~' is identified with X incorrectly. (2) If there are at least two phoneme sequences X and Y in Dict which satisfy WLDm~, = WLD(X, ct') = WLD(Y, ct') (X # Y), ~' is rejected. We call this method the dictionary method. The dictionary method can be adapted to the error tendencies of a system by adjusting the weights p, q and r. However, it takes considerable time to correct garbled phoneme sequences.
D(Ck),
k=l
(3.1)
wherei#jandl
{p, t, k}, A 2 = {b,d,g}, A 3 = {m, n, ng},
/14 =
{v, th, f, th, z, s, zh, sh, ch, ~},
A5 :
{h, h w } , a 6 =
A7:
{6, 7, ~, ~t, t~, ~r, 6, f, d'o, 5-d, ~, 6, ~tr, ~i, oi, lr,
{w,l,y,r},
ou, F, o, or, y~__o}, where a phoneme with an underline is a phoneme which is found in Ref. (12) but not in Ref. (10). (Case 2:3 classes) B1 = {voiced consonants}, B 2 = {unvoiced consonants},
3. DIVIDING A DICTIONARY USING A C O N F U S I O N MATRIX
It is needless to say that there occur some recognition errors due to the difficulties of segmentation and of phoneme recognition. These errors have certain characteristic tendencies. For instance, vowels and nasal vowels are apt to be misrecognized each other, but vowels are unlikely to be misrecognized as plosives. This kind of information can be summarized in a confusion matrix. According to Dixon and Silverman, (9~ English phonemes can be classified into the following classes: a I = {p, t, k}, a 2 = {b, d, g}, G 3 = {m, n, ng},
G 4 : {v, th, f, th, z, s, zh, sh}, G 5 = {t}, 6 6
=
{W, I, y, r},
G7 = {< r, < ,~ < r,r, < ~, 60, ~ ; a, ,~}, where phoneme symbols follow the notation of Davis." o~It goes without saying that a phoneme which belongs to the class G k is more liable to be misrecognized as a phoneme in the class Gk than as a phoneme not in the class G~. We call G k (k = 1-7) the class name.
n 3 =
{vowels}.
(Case 3:2 classes) C 1 = {consonants}, C 2 = {vowels}. The numbers of class name expressions for the three cases are shown in Table 1. Note that the average numbers of phoneme sequences contained in a subdictionary are 2.0, 4.4 and 14.3 for Case 1, Case 2 and Case 3, respectively. The following are examples of subdictionaries using the case 2 classification.
D(C(B2B~B2B3B2B1)) = {/str~st/(stressed),/strecht/(stretched) D(C(BIB3B2B3BIB3Bz))
= {/p6zativ/(positive), /p6sobol/(possible), /k~mikol/(chemical), /terobol/(terrible), /k~nodEz/(Kennedy's) }.
4. H I G H SPEED C O R R E C T I O N M E T H O D I
In the previous section, we showed that the average sizes of subdictionaries are astonishingly small. As-
High speed error correction of phoneme sequences
409
Table 1. Distribution of phoneme sequences and numbers of class name expressions L
N
Numbers of class name expressions Case 1 Case 2 Case 3
1
1|
1
1
1
2 3 4 5 6 7 8 9 l0 II 12 13 14 15
156 739 989 996 848 64l 511 315 169 109 47 25 4 1
4 7 9 14 25 29 45 60 72 65 32 21 4 l
5 17 41 91 142 224 245 205 139 93 38 20 4 l
12 62 219 428 548 513 415 269 155 96 41 22 4 1
5561
389
1266
2783
Total
(L: length of a phoneme sequence, N: number of phoneme sequences).
sume that 7 ' = a l a 2 ... a,, is a garbled phoneme sequence of ~. As is easily imagined, the probability that ~' belongs to D(C(~)) is rather high. L e t f be the number of outer class substitution errors, that is, a phoneme from Gi is substituted for another phoneme in G) (i # j). Let g and h be the number of insertion errors and of deletion errors, respectively. Assume that WLD(C(~), C(~')) < dmax, that is, p * f + q • g + r * h < dmax (*1), where d m a x is a given constant. (i) Designate C(~') to be the class name expression of ~'. (ii) Compute WLD(Ck, C(~')) (k = I ~ n). Let Chi (1 --< i --< S --< n) be a catch word such that WLD(C,~, C(~')) < dmax. Let
/3 = ~j D{Chi ).
(4.1)
i=1
(iii) Compute WLD(X, ~') for all X •/3. (iv) Let WLDmi . = min{WLD(X, ~')IX e/3}. (a) If there is only one phoneme sequence X such that WLDmm = WLD(X, ~'), the system outputs X as the result of correction. I f X = ~, the correction is right. If not, the correction is wrong. (b) If there are at least two phoneme sequences X and Ysuch that WLDmi n -~ WLD(X, ~t'), WLDmi . = WLD(Y, ~') and X # Y, the system reports that ~' is rejected. Note that if a phoneme ai of ~ is substituted for another phoneme b which belongs to the class ofa~, the class name expression of~' is the same as that of~. For instance, if ~ = / p ~ s / a n d ~' = /pi-s/, then C(~) = C(~') = A ~ A v A ~ . In other words, inner class substitution errors are not taken into account in step (ii), for example when a phoneme in G~ is substituted for another phoneme in Gi. Therefore, a phoneme sequence X in/3 satisfies the following condition.
"X can be transformed into ~' by an arbitrary number of inner class substitutions, f outer class substitutions, g insertions and h deletions, and p , f + q , g + r * h < dmax." Note that if (*1) is satisfied, ~ is in/3.
5. H I G H SPEED C O R R E C T I O N M E T H O D 2
For simplicity, let us assume the following (*2): (1) There occur inner class substitutions, but not outer class substitutions. (2) The sum of the number of insertions and deletions is at most 1. Let/)(k~, k 2..... kT) be the set of phoneme sequences in Dict which have k~ phonemes belonging to G~ (1 __< i __< 7), where k i >~ O. Example 2
Consider c~ = /p~s/, fl = /omfiroko/and ~, = /iv~tnjolizom/. We have C(oO = A 1A~A4, C(t5') = A 7 A 3 A v A 6 A T A I A ~ and C(7) = A T A 4 A T A 3 A 4 A 7 A 6 A ~ A 4 A v A 3 . Therefore,/3(1, 0, 0, 1, 0, 0, !) 9 c~, b(1, 0, 1, 0, 0, 1,4) 9/3 a n d / ) ( 0 , 0, 2, 3, 0, 1, 5) 9 7. Assume that ~' has k~phonemes in Gi. If~' is generated by substitutions, C(~) = C(~'). Then, ~ e / ) ( k l, k 2..... k7). If a' is generated by insertion of a phoneme belonging to Gi, ~ ~ /)(kt, ..., ki - 1. . . . . k7). If ~' is generated by deletion of a phoneme belonging to G~, D(k 1..... k~ + 1..... kv). Then ~ can be found in one of the following sets: /)(k 1, k 2 . . . . . k7) ,
(5.1)
D(k 1 + 1, k2 ..... k~),/)(k,, k 2 + 1..... k7), .... /)(kl, k z..... k3 + 1),
(5.2)
410
/)(k 1 -
EIICHI TANAKA, TAKANOR1 TOYAMA a n d SACHIKO KAWAI
1, k 2 . . . . .
k.7), D(k 1, k 2 -
.... /)(kl,
l ..... k2 . . . . .
kT), k7 - 1), (5.3)
Let L3(k~, k2 ..... kT) be the set of phoneme sequences in (5. l)-(5.3). Evidently, ct e/~(kl, k2 ..... kT). Example 3
lf~' =/p-is~, then kl = k4 = k7 = 1 and other k i are 0. Therefore
methods. The following cases for the dictionary method need to be considered. (a) Min < d. (i) If there is only one phoneme sequence 6 such that Min = WLD(6, ~t') and 6 e / ) , ~' is wrongly corrected by the dictionary method. (ii) If there are at least two phoneme sequences 6 and e, such that Min = WLD(6, ~'), Min = WLD(e, ~t')
/)(I, 0, 0, 1, 0, 0, 1 ) = D(1, 0, 0, 1, 0, 0, 1) w ~(2, 0, o, l, 0, o, 1) w /)(1, 1,0, 1, 0, 0, 1) w/)(1, O, 1, 1, O, O, 1) w/)(1, O, O, 2, O, O, 1) w/)(1, 0, 0, I, 1,0, 1)
w t3(1, 0, 0, 1,0, 1, 1) u /)(1, 0, 0, 1, 0, 0, 2) w /)(0, 0, 0, 1, 0, 0, 1) w /)(1, 0, 0, 0, 0, 0, 1) u /)(1, 0, 0, 1, 0, 0, 0). The correcting procedures are as follows: (i) Compute WLD(X, ~') for each X in/3(kl, k 2..... kT).
(ii) The judging principle is the same as (iv) of Method 1. Note that it is quite easy to relax the condition (,2).
6. C O M P A R I S O N B E T W E E N T H E D I C T I O N A R Y M E T H O D AND THE PROPOSED METHODS
In both Method 1 and Method 2, the scope of phoneme sequences from which we compute distances to ~' is limited to the subdictionary/), that is,
= ~j D(fhi) i=1
for Method 1 a n d / ) = /)(kl, k 2..... k7) for Method 2. Always we have IDict[ > [/)1. Let /) = D i c t - / 9 , WLD(~, ct') = d, aT = min{WLD(7, f)17 e /5, ~ ~ 7} and Min = min{WLD(6, ~')16 e/)}. (1) If aT> d, ~' can be corrected by the proposed Dict
Fig. 1. Relation between symbols.
and 6, e. e /), then ~' is rejected by the dictionary method. (b) Min = d. ~' is rejected by the dictionary method. (c) Min > d. ~' is corrected by the dictionary method. (2) If there is only one phoneme sequence/3 such that d = WLD(fl, ~t'), d < d and fie/3, then :t' is wrongly corrected by the proposed methods. There are following cases for the dictionary method. (a) If there is at least one phoneme sequence 6 such that d = Min -- WLD(~, ct') and 6 • L), or if there are at least two phoneme sequences 6 and ~. such that d > Min -- WLD(6, c(), aT > Min = WLD(e,, ~t') and 6, e. e/), then ~' is rejected by the dictionary method. (b) Otherwise, ~' is wrongly corrected by the dictionary method. (3) If there is at least one phoneme sequence/3 such that aT = d = WLD(fl, ~') and fle/3, or if there are at least two phoneme sequences fl and ), such that aT = WLD(/L f),aT--- WLD(7,~'),aT < d a n d fl,'; e/~,then :t' is rejected by the proposed methods. There are following cases for the dictionary method. (a) If there is only one 6 such that a7 > Min = WLD(6, c() and 6 e/), then ct' is wrongly corrected by the dictionary method. (b) Otherwise, ct' is rejected by the dictionary method. Summing up the above discussions, we have the following lemma. Lemma 1. (i) The dictionary method rejects some garbled phoneme sequences which are corrected by the proposed methods. (ii) The dictionary method corrects wrongly some garbled phoneme sequences which are corrected by the proposed methods. (iii) The dictionary method rejects some garbled phoneme sequences which are wrongly corrected by the proposed method. (iv) The dictionary method corrects wrongly some garbled phoneme sequences which are rejected by the proposed method. (v) The proposed methods always correct garbled phoneme sequences which are corrected by the dictionary method. Based on Lemma 1, we have Lemma 2.
High speed error correction of phoneme sequences
411
Table 2. Results of experiment 1 Errors S
Dictionary method
1
D
I I 1 2 1 I
1 1
Method 1
Method 2
C
M
R
T
C
M
R
T
C
M
R
T
(%)
(%)
(%)
(s)
(%)
(%)
(%)
(s)
(%)
(%)
(%)
(s)
81.2 98.2 58.4 53.2 83.6 12.0
0.4 0 4.0 1.6 0 12.8
18.4 1.8 37.6 45.2 16.4 75.2
16.7 17.5 14.3 16.7 17.5 14.2
84.0 99.4 67.2 62.6 90.4 28.4
0.4 0 4.0 1.0 0 11.4
15.6 0.6 28.8 36.4 9.6 60.2
7.5 10.7 4.3 7.5 10.7 4.4
84.0 99.4 67.2 61.0 89.2 24.2
0.4 0 4.0 1.0 0 11.4
15.6 0.6 28.8 38.0 10.8 64.4
5.9 4.8 4.8 5.6 4.6 4.6
( S substitution, 1 insertion, D. deletion, C. correction rate, M: miscorrection rate, R: rejection rate, T: computing time).
Table 3. Results of experiment 2 Errors S
/
Method 2 D
I 1 I 2 1 I
I 1
C
M
R
T
(%)
(%)
(%)
(s)
86.0 99.8 69.6 68.2 91.6 33.0
0.4 0 4.0 1.0 0 10.2
13.6 0.2 26.4 30.8 8.4 56.8
0.98 0.75 0.98 0.91 0.67 0.91
by introducing multi-stages decision processes. M e t h o d 2 will require more c o m p u t i n g time for the case of more errors t h a n the a s s u m p t i o n (,2) in Section 5.
Lemma 2. The correction rates of the proposed m e t h o d s are equal to or higher t h a n that of the dictionary method. REFERENCES
7. EXPERIMENTAL RESULTS The c o m p u t e r experiments were carried out on an ACOS-600S ( 1 M I P S computer). The dictionary which was used c o n t a i n s the 5561 p h o n e m e sequences which are s u m m a r i z e d in Table 1. In all experiments, p h o n e m e sequences with length 6 were garbled under the restriction (,2) of Section 5 and corrected. The experiments were conducted for six cases s h o w n in Table 2 a n d each case had 500 garbled p h o n e m e sequences. Experiment 1. The case 2 classification was assumed. The results are s h o w n in Table 2. The c o m p u t i n g times were reduced to 27°~,, ~ 61 °11 of the dictionary m e t h o d with higher correction rates t h a n that of the dictionary method. Experimem 2. The case 1 classification was assumed. The results in Table 3 show an astonishing reduction in o o/ the c o m p u t i n g times which are 3.8/~i-6.9/0 of those of the dictionary method. An experiment based on M e t h o d 1 was not carried out, because we did not expect c o m p u t i n g time reduction since there are too m a n y class n a m e expressions. 8. C O N C L U D I N G REMARKS
The two high speed p h o n e m e sequence correction m e t h o d s were presented. M e t h o d 1 could be improved
1. J. L. Peterson, Computer programs for detecting and correcting spellings errors, CACM 23, 676 (1980). 2. A. V. H. Patric and R. D. Geoff, Approximate string matching, Comput. Surv. 12, 381 (1980). 3. T. Ito, Correcting spellings errors in English sentences, J. Inf. Process. Sot'. (Jpn). 25, 471 (1984). (In Japanese.) 4. S. ltahashi and S. Yokoyama, Vocabulary reduction effect by specifying phoneme sequences in words, Trans. Inst. Electro. Comm. Engr. (Jpn) J67-D, 869 (1984). (In Japanese.) 5. N. Sugamura and S. Furui, Large vocabulary word recognition using pseudo-phoneme templates, Trans. Inst. Electro. Comm. Engr. (Jpn ) J65-D, 1041 (1982). (In Japanese.) 6. T. Kurita and T. Aizawa, A method for correcting errors in Japanese words input and its application to spoken word recognition with large vocabulary, J. Inf. Process. Soc. (Jpn) 25, 831 (1984). (In Japanese). 7. R. Shinghal, A hybrid algorithm for contextual text recognition, Pattern Recognition 16, 261 (1983). 8. E. Tanaka, T. Kohashiguchi and K. Shimamura, High speed error-correcting methods for substitution errors, Trans. Inf Process. Soc. ( Jpn ) (to appear). (In Japanese). 9. N. R. Dixon and H. R. Silverman, A general languageoperated decision implementation system (GLODIS): its application to continuous-speech segmentation. IEEE Trans. ASSP ASSP-24, 137 (1976). 10. P. Davis, The American Heritage Dictionary of the English Language. Dell, New York (1983). tl. T. Okuda, E. Tanaka and T. Kasai, A method for the correction of garbled words based on the Levenshtein metric, IEEE Trans. Comput. C-25, 172 (1976). 12. H. Kucera and W. N. Francis, Computational Analysis qf Present-day American English. Brown Univ. Press. Providence, RI (1967).
412
EnCHI TANAKA,TAKANORITOYAMAand SACH1KOKAWAI About the Autbor--EllCH1 TANAKAreceived a B.E. in Electrical Engineering from the University of Osaka Prefecture, Japan in 1962 and an M.E. and a Dr. E. in Communication Engineering from Osaka University in 1964 and ! 968, respectively. From 1967 to 1977 he was with the University of Osaka Prefecture. Since 1977 he has been a professor in the Department of Information Science at Utsunomiya University. His main interests are in pattern recognition and linguistics. About the Author--TAKANORITOYAMAreceived a B.E. in Information Science from Utsunomiya University, Japan in 1985. He is now working at Fujitsu Shizuoka Engineering Co. Ltd., Shizuoka, Japan. About the Author--SAcHXKOKAWAI received a B.E. in Information Science from Utsunomiya University, Japan in 1985. She is now working at Mitsubishi Control Software Co. Ltd., Kobe, Japan.