INFORMATION AND CONTROL 12, 121--142
(1968)
Nth-Order Autocorrelations in Pattern Recognition J. A. M c L A u o H L I ~ AND J. R A w v International Business Machines Corporation, Thomas J. Watson Research Center, Yorktown Heights, New York 10598 Given a real-valued time function x(t), its nth-order autocorrelation function is defined by r ~ ( r l , " " , ~',) = f x ( t ) x ( t
-}- rl) " " x ( t -b r , ) dt.
I t is easy to see that this function is translation invariant in the sense that x(t) and y(t) = x(t -}- .r) have the same nth-order autocorrelation function. In this paper we investigate the properties of these functions and their use in pattern recognition. Even for small values of n the dimensionality of nth-order autocorrelation space can become very high. We describe indirect methods of using nth-order autocorrelations without explicitly computing them. We also investigate conditions for uniqueness in the sense that equality of r~(rl, ... , T.) and r~(T1 , .-- , 7,) implies that x(t) = y(t -}- T) for some r. Finally we present some experimental results on character recognition. I. INTRODUCTION I n m o s t p a t t e r n r e c o g n i t i o n p r o b l e m s , each p a t t e r n c a n b e d e s c r i b e d a s a s c a l a r f u n c t i o n of t i m e or s p a t i a l c o o r d i n a t e s . I n m a n y cases, a t r a n s l a t i o n or scale c h a n g e h a s no effect on class m e m b e r s h i p . F o r e x a m p l e , t h e s y m b o l 5 belongs to t h e class of fives r e g a r d l e s s of p o s i t i o n or size. I n s u c h cases, p o s i t i o n a n d scale a r e n u i s a n c e p a r a m e t e r s t h a t one w o u l d like to r e m o v e before e v a l u a t i n g decision functions. A c o m m o n a p p r o a c h is to register a n d n o r m a l i z e t h e p a t t e r n , b u t t h i s is o f t e n difficult to do r e l i a b l y . A n a l t e r n a t e a p p r o a c h , followed in t h i s p a p e r , is to a p p l y t r a n s formations which would eliminate these parameters. Regarding each p a t t e r n as a p o i n t in a v e c t o r space, w e wish to m a p all p o i n t s corres p o n d i n g to t r a n s l a t e d ( o r s c a l e d ) versions of one p a t t e r n i n t o a single p o i n t . I n addition, p a t t e r n s w h i c h differ in o t h e r w a y s s h o u l d m a p i n t o 121
122
McLAUGItLIN AND RAVIV
distinct points, and in some sense, patterns which are similar should map into points that are close together. To simplify notation, most of this paper is written for the speciM case of reM-valued functions of time, but essentially all of the results extend in an obvious way to functions of vectors. In particular, the experiments include functions of two spatiM coordinates. Most of the analysis is in terms of continuous variables, but is easily converted to discrete variables with integrals replaced b y sums. PracticM problems of computation involve finite-dimensional spaces, and some comments are made about dimensions, where appropriate, even though the main discussion is in terms of infinite dimension~l spaces. Unless otherwise noted, all integrMs are over the infinite i n t e r w l ( - oo, oo ). NTH ORDER AUTOCORRELATION FUNCTIONS
Given a real-valued time function x(t), its n t h order autocorrelation function is defined by
...,
= fx(t)x(t
÷
"
x(t + .,.) dt.
(1)
I t is easy to see that this function is translation invariant in the sense that x(t) and y(t) = x(t + r) have the same nth-order autoeorrelation function. The use of these functions in pattern recognition was suggested b y Horwitz and Shelton (1965), but their experiments were limited to n < 2 because of the high dimensionality of nth-order autocorrelation space. If each r ranges over m values, nth-order autocorrelation space (denoted by 6t~) has dimension my. In this paper we describe indirect methods of using nth-order autoeorrel~tions in pattern recognition without explicitly computing the autocorrelations, and thus avoiding this limitation on n. We also investigate conditions for uniqueness in the sense t h a t equality of r~(rl, . - . , r . ) and r~(rl, .. • , r~) implies that x(t) = y(t ÷ .r) for some r. Finally we present some experimental results on character recognition. II. PROPERTIES OF NTH ORDE~ AUTOCORRELATION FUNCTIONS Consider a real-valued time function x(t) with nth-order autocorrelation function r~(71, . . . , T~) and Fourier transform
x(y) = Jf x(t)e
tit.
(s)
nTH-ORDER AUTOCORRELATIONS IN PATTERN RECOGNITION"
123
T h e Fourier transform of r x ( r l , . . . , r , ) is given b y
R~(£, . . . , A ) ----- f - . . / r x ( r l , . . .
~- f~ Tn)] drl . . . dr,
, r . ) e x p [--21rj(fl rl + " " + rl)..,
= f...fx(t)x(t
x(t ~ - r n )
•exp [--2~rj(f~ rl + " " + f~ r~)] dt drl . . . dr, = z(f~)x(A)
... x(A)x(-A
= x(k)x(A)
... x(A)x*(k
- A .....
+ A + "
(3)
A) + A).
SCALE INVARIANCEx
N o w consider the transform 2 given b y
L~(f) = G[x(t)] = [ efx(t)e -~eft dt J
(4)
= JX(eS). L e t y(t) = x(kt) where k is an arbitrary scaling constant. T h e n
L~(f) = G[y(t)] = f eYx( kt )e -2.jeff dt = f e]x(v)e -2"jJ~/k d(v/k) d
(5)
f e]-l'~x(v)e -2~f-~'k dv =
L~(f
--
In k).
T h e change of time scale in x(t) is transformed into a frequency translation in L~(f), and nth-order autocorrelations can be used to get a composite function which is invariant to scale:
rL,(r~, . " ,
r~) = f L~(f)L,(f -]- r~) " " L~(f -{- r,) dr.
(6)
Since this paper was submitted for publication the authors have formulated u more satisfactory method for handling problems of invariance to scale change and rotation. This method will be presented in a forthcoming paper. In an internal IBM Research Note (No. 46, 1962), J. B. Gunn suggested the use of this transform in pattern recognition.
24
..............
McLAUGIff_LINAND IRA¥IV
A combination of translation and scale variation can be removed by a composite of three transformations. First find the autocorrelation function r~(zl, . . . , r,,) of the original pattern x(t) to remove translation, noting that a scale change in x(t) gives a scale change in r~(rl, • :- , T,). Then apply (4) to convert scaie change to translation, and finally take the autocorrelation function again (in the frequency domain) to remove translation. This process is fairly complex and presents serious computational difficulties in most applications. We have not yet explored this problem area in depth, a and the remainder of the paper will be confined to the problem of translation. UNIQUENESS
FROM
TIME
DOMAIN
VIEWPOINT
We have noted that x(t) and x(t -I- ~), viewed as two points in a pattern space, are mapped into the same point in nth-order autocorrelation space (Rn. But if y(t) is not a translation of x(t), we would like them to be mapped into distinct points. It is easy to show that the first order autocorrelation function does not have this property, but for a broad class of functions, the second order (and higher even order) autocorrelation functions are equal only if one function is a translation of the other. In this section we outline a constructive proof, in the sense that we obtain the original pattern from its second order autocorrelation function (except for a possible translation). In the next section we give a more general proof in terms of Fourier transforms. ~ To allow an easy geometric interpretation, the constructive proof is given for a function of two spatial coordinates. Let x(s, t) be an integer-valued function defined for all integer values of s and t, and assume x(s, t) = 0 outside a bounded region. These conditions are more restrictive than necessary, but apply to nil functions dealt with by digital computers. Now consider a line 11 (see Fig. 1 ) with an irrational slope, say ~/2. Then at most one point (sl, tl) with integer coordinates can lie o n ll. Now place ll in the unique position such that x(s~, tl) ~ 0, (s~, tl) lies on l~, and x(s, t) = 0 for all points to the right of/1. Similarly define a line l~ with slope ~¢/2 positioned such that x(s2, t2) ~ O, @2, t2) lies on 12, and x(s, t) = 0 for all points to the left of/2. See footnote 1. 4 The more general proof given in the next section applies to the class of functions whose Fourier Transforms are analytic. This is not the same class of functions for which a constructive proof is given i n this section.
nTH-O:RDER AUTOCORRELATIONS :IN PATTERN
RECOGNITION
125
(So+S?_-s I ,to+tz-t I) •
.
. * ~ ~ ~ * i*m ~. i~," m."~* • • •
I
/
/ * * . .
~. $ , ~ a, $
(s2, t2)~: ,* /]
j.j"
:: .
::::'-,.
/
~ *.
IF ~ *
/
xJ.s*
/..~,
/
*'*.* ~'~'q. lr * .
$~
*****
/
// (so,to) //,,
\
.,.
* ~,',~
- ~ [.,,,/
qq
. . . . .
$.
~
il~
/I
****
/
• * * * * */(S~ ~ ' * * ~ * ~1,* $ • $ * * * ' / *$***.$*$~$$ "
.t~
I./W
)
F:e. 1. Construction for showing uniqueness. I f (so, to) lies t o t h e left of l : , t h e n (so + s~ - s : , to + t2 - 6 ) lies t o t h e left of 12 a n d x(sc + s2 - s : , to + t~ - tl) = 0. I f (So, to) lies t o t h e r i g h t of l l , X(So, to) = 0. H e n c e t h e p r o d u c t of x(s, t) a n d x(s + s2 - s l , t + t2 - 6 ) is n o n z e r o o n l y if (s, t) = ( s l , h ) . T h e s e c o n d o r d e r a u t o c o r r e l a t i o n f u n c t i o n of x(s, t) is g i v e n b y
r~(u~ , v~ , u~ , v~) = ~ s
(7) ~ x(s, t)x(s + u~, t + v:)x(s + u~, t + v2). t
L e t t i n g As = s2 -
s~ a n d At = t2 -
r~(As, At, u2,v2)
= ~
tl,
~x(s,t)x(s+s2-s
s l , t + t2 -- tl)
t
• x(s + u~, t + v~) = X ( S l , t1)X(82,
(S)
t2)x(sl + u2, tl + v2),
r~( As, At, O, O) = x~(sl, tl)x(s2, t2),
(O)
r~( As, At, As, At) = x(sl, tl)x2(s2, t2).
(10)
126
'
McLAUGHLIN AND RAVIV
Combining (8), (9) and (10) we get
r~( As, /% s; t) x(sl + s, tl + t) - (r~(As, At, 0, 0)r~(As, At, As, At)) 173
(11)
In the coordinate system u, v, let la be a straight line with slope ~ / 2 passing through the point (As, At). Now consider the autocorrelation function
r~(u, v, O, O) = ~_, ~_, x2(s, t)x(s + u, t + v) s
(12)
t
in this coordinate system. Letting (u, v) = (As, At), we see that r~(As, At, 0, 0) # 0. It is also easy to see that r~(u, v, O, O) = 0 for all points (u, v) to the left of la. Since (As, At) is the only point with integer coordinates on la, we see that (As, At) can be found from the function r~(u, v, 0, 0) by the procedure of finding the unique line with slope ~¢f2, containing a point for which r~ is nonzero, and such that r~ is zero on all points to the left of the line. This result, in combination with (11), shows that x(s, t) can be found from r , , except for an unknown translation
(sl, tl). UNIQU1~NESS FROM FREQUENCY
DOMAIN VIEWPOINT
We will now show that if x(t) and y(t) are real-valued functions of time whose Fourier transforms are analytic, then r~(rl, r2) = rv(rl, r~) implies x(t) = y(t + r) for some r. Proof. Assume r~( rl, r~) = rv(rl, 72), implying R~(fl, f2) = Ry(f,, f2),
(13)
and
X(f~)X(fs)X(-fl-f2)
= Y(f,)Y(f2)Y(-f~-f2).
(14)
It is sufficient to show that for some real number b,
X(f) = emY(f).
(15)
The quotient X ( f ) / Y ( f ) is defined for all f such that Y(f) # O. We assume that X ( f ) and Y(f) are analytic on the real line, and in the Appendix we show that X ( f ) has a zero of order/c at a point f~ if and only if Y(f) has a zero of order k at fl. Therefore X ( f ) / Y ( f ) has a continuous extension for all f on the real line. Let I(f) be this extension. From (14) we obtain
I ( f , ) I ( f 2 ) I ( - - k -- f2) --- 1.
(16)
StTH-ORDER AUTOCORRELATIONS IN PATTERN RECOGNITION
127
Letting fl = f2 = 0 we have I~(0) = 1. In the Appendix we show that I ( 0 ) is real, thus I ( 0 ) = 1. In this Appendix we also show that I ( f ) # 0 for all f on the real line. Substituting f2 = 0 in (16) we obtain I(--f)
1 = /-~,
(17)
and thus I ( f l ) I ( f 2 ) = i(f~ +
f~).
(is)
Recalling that I ( 0 ) = 1 and applying a theorem in tIille and Phillips (1957), we conclude that I(f)
= e (a+sb)1
(19)
for some a and b real. Setting f2 = 0 in (16) and recalling that X ( - f ) = X * ( f ) , we have [ I ( f ) [ - 1. This implies that a = 0, and I ( f ) = e ~b~. From the Appendix X ( f 1) = 0 if and only ff Y ( f l ) = O, and for such f (15) is satisfied trivially. Therefore (15) holds for all f on the real line. The same type of proof can be ex~ended for n > 2, n even. For the case of n odd we immediately encounter the difficulty that x ( t ) and y ( t ) = - x ( t ) will have the same nth-order autocorrelation function. It is well known that the class of functions whose Fourier transforms are analytic is very rich. For example, it includes functions which vanish outside an interval ( - a , a) for some positive a (see Paley and Wiener, (1934)). Chazan and Weiss (1966) pursued this problem in more generality. They verified a conjecture of Adler and Konheim (1962), which states that in general no finite set of nth-order autocorrelation functions forms a complete set of invariants (in the sense that two functions whose nthorder autocorrelation functions agree for all n in a finite set are translates of each other). III. L I N E A R OPERATIONS IN AUTOCORRELATION SPACE
INNER
PRODUCTS
M a n y pattern recognition procedures can be described in terms of inner products. We will now show that an inner product in autoeorrelation space can be found directly from the original patterns without explicitly finding the autocorrelations. This result is useful in computa-
128
McLAUGHLIN AND RAVIV
gions because the dimension of nth-order autocorrelation space m a y be very large compared with the original pattern space, particularly for large values of n. For autocorrelations r ~ ( r l , . . . , r,), r ~ ( r l , . . . , r , ) of patterns x(t), y(t) we will use the inner product in ( ~ given by (r~, %) =
..., r,)ry(rl, .-. ,r,)drl-..dr,
f...ff .(t)x(t -~- rl)
-
"
"
x(t +
rn) dt
• f y(v)y(v + rl) "'" y(v -+- r,) dv dr~ . . . dr, ,
z x(t)y(v)
=
x(t ~, r)y(v -]- r ) d
x ( t ) y ( u + t)
.} dv dt
x ( s ) y ( u + s) d
(20)
du dt
:/ A discrete version of this result was used throughout our experiments. PROJECTIONS
To allow us to use simple matrix analysis, we will now restrict ourselves to finite-dimensional spaces, t and r taking on only a finite set of values. The nth-order autocorrelation function of a pattern x(t) is given by
r,(rl, . . . , r,) = ~ x(t)x(t + rO "" x(t + T~). t
(21)
The values of r~ can be listed in some order, for all possible values of the arguments, to form a column vector r representing the nth-order autocorrelation function of x(t). Let r l , • .. , rk be the corresponding column vectors representing the nth-order autocorrelation functions of patterns xl(t) , . . . , xk(t). The inner product between autocorrelations of x(t) and x~(t) is then given by r ' r i , where the prime symbol denotes transpose. Suppose r~, --- , r~ are linearly independent, and let W denote the
n T H - O R D E R AUTOCORRELATIONS IN PATTERN" RECOGNITIOI~
129
s u b s p a c e s p a n n e d b y t h e m . W e will n o w show how to find t h e d i s t a n c e f r o m r to W w i t h o u t e x p l i c i t l y finding r, r l , • • • , r k . T h e v e c t o r r c a n b e expressed as t h e s u m r = a + b
(22)
w h e r e a lies in W a n d b is o r t h o g o n a l to W . L e t R be a m a t r i x of k c o l u m n s consisting of t h e v e c t o r s r l , • • • , r k . Since a is a l i n e a r c o m b i n a t i o n of r~, • • • , r~, t h e r e is a k - d i m e n s i o n a l v e c t o r c such t h a t a = Rc.
(23)
U s i n g t h e f a c t t h a t R ' b = 0, w e t h e n h a v e
R'r = R'a = R'Rc,
(24)
c = (RIR)-IR'r,
(25)
a = R(R'R)-tR'r.
(26)
w h i c h implies
and T h u s we h a v e t h e o r t h o g o n a l p r o j e c t i o n of r in W expressed in t e r m s of a n o n o r t h o g o n a l basis for W . T o find t h e s q u a r e d n o r m of a w e h a v e
II a II2 = ~'a
= r'R(R'R)-IR'R(R'R)-IR'r
(27)
= (R'r)'(R'R)-I(R'r). T h e s q u a r e d norIn of b is given b y
II b I]2 = II r II~ -
II ~ II~
= r'r -- (R'r)'(R'R)-I(R'r).
(2s)
N o t i c e t h a t t h e e l e m e n t s of t h e v e c t o r R1r a n d t h e m a t r i x R ' R a r e s i m p l y i n n e r p r o d u c t s b e t w e e n v e c t o r s r, r l , . . . , rk. These inner p r o d u c t s c a n b e c o m p u t e d f r o m a d i s c r e t e v e r s i o n of (20) g i v e n b y
(r~, r~) = E { E x~(t)xs(t + ~-) }~+~. v
t
(29)
P R I N C I P A L COMPONENTS
G i v e n a set of v e c t o r s r l , • • • , r k , s u p p o s e we wish to find a basis v l , • • • , v~, s < k, for a s u b s p a c e W such t h a t t h e v e c t o r s are well
130
McLAUGI4_L1N AND R A V I ¥
approximated b y their orthogonal projections into W. If error of approximation is measured b y the m e a n square distance to the subspaee, a solution is given b y the s principal components of the scatter matrix of the vectors r l , -. • , rk (see Walks, 1962). Various interpretations of principal components in the analysis of multiple measurements were given b y Rao (1964). T h e use of principal components (of K a r h u n e n Lo~ve expansions) for dimensionality reduction in p a t t e r n recognition has been discussed b y a number of workers (Watanabe, 1965; R a v i v and Streeter, 1965). Here we wish to show t h a t principal component analysis can be carried out indirectly in nth-order autocorrelation space in spite of its high dimensionality. Let R be the matrix whose columns are the vectors r l , • • • , r~, V a matrix whose columns are vectors v l , • • • , v~, s ~ k, A a matrix whose columns are vectors a l , .. • , a~ arid let
[~ = VA'. Given
(30)
R, we wish to find V and A such t h a t the m e a n square error 1
H r, - 2~ I[2
(31)
is minimized, where ~l, • • • , ~k are the columns of /~. I n the principal component solution, v ~ , . - - , v , are orthonormal eigenvectors corresponding to the s largest eigenvalues of the symmetric matrix RR', and aj = R'v~-.
(32)
T h e mean square error of approximation is equal to the sum of the remaining eigenvalues divided by/~. T h e rank of RR' is no greater t h a n ]¢, and in our problem is generally m u c h smaller t h a n the dimension of the vectors r~.. B u t RR' is usually m u c h too large to find eigenvectors directly and we m u s t resort to an indirect method using the smaller/~ × k matrix R'R. We first solve the equation
R'Rw~ = ~ w ~ ,
(33)
requiring the eigenvectors w~ to be orthonormal. Premultiplying b y R we have
RR'(Rw~) = ~(Rw~).
(34)
nTH-ORDER
AUTOCORRELA2IONS
IN
PATTERN
RECOGNITION
131
If ~,~ # 0 (the only case of interest) we can let Vi :
Rw~
(•i)l] 2
(35)
and we see that ~ , v~ are solutions to the equation
RR'vl
= ~vi
(36)
for the larger matrix RR r. Since the rank of RR' and R'R is the same there are no other nonzero eigenvalues of RR'. Also note that the set of vi's obtained this way are orthonormal. From (32) ?
as-
R(X~)lt Rw~~ ,
(37)
and these coefficients can be found without explicitly finding any of the vectors r~ or v~. Note t h a t v~ generally is not a valid nth-order autocorrelation function of any pattern, hence the inner product of v~ with a new vector r must generally be computed b y the formula !
r Rwi
r v, = ( ~ ) 1 t 2 ,
(3s)
which requires k inner products to obtain r'R. IV. RECOGNITION METHODS AND EXPERIMENTS In exploring the use of nth-order autocorrelation functions for pattern recognition, we conducted experiments with typewritten samples of the characters a, e, s and handlettered samples of numerics. This recognition problem was chosen because of readily available experimental data, and the existence of performance results with other recognition methods on the same data. To start with, we wanted to perform simple experiments using only one reference pattern per class. Let xl(t) , . . . , xc(t) be the reference patterns for c classes, and let r l , . . . , rc be the corresponding nth-order autocorrelations (where t is a discrete two-dimensional vector corresponding to spatial coordinates, and ranges over a finite set of values). Suppose r is the nth-order autocorrelation of a pattern x ( t ) which is to be classified. In this application, multiplying x(t) be a scalar factor b does not affect its class membership, that is, y ( t ) = bx(t) belongs to the same class as x(t). Notice that the nth-order autocorrelations of y ( t )
132
•~
McLAUGHLIN
:::- .::: ............ ".~::.::::":
A N D RA-¥IV
.d~.:: • "-:~h ~-'.:. " = :ii~ ".q!'::'!':" .::::':'ii'.. (o)
iC'""ii~"
:~!i!~ii~ii~:
.ii!!'~-:~i!-...
E ....... :ii
.::~:~::
. .:::,!~:. ................ .~:-
.ii~:::.::~.ii~.
............ ":~i.~:-i~i~:"
iiii!i!!iiii!i
"~!:-"~i -~ . .....~. .:.:.: : : °
~;;:!'":i!i
,::::-:o::::.
: H .H. .:::.Ho h
":.~iii'..~!ii:"
~;ii~ ~
. . . . . . .
,~ .....,~'
::: ......
:~i'"~!i':
":'i!iiii "~"
::
~,~,~:~:~
(¢)
F I G . 2. T y p e w r i t t e n (c) T y p i c a l s a m p l e s .
characters
for Experiment
I. ( a ) R e f e r e n c e s .
(b) Errors.
nTH-ORDER AUTOCORRELATIONS IN PATTERN RECOGNITION
133
and x(t) differ only by the multiplicative factor b ~+1. Therefore the lengths of r, r l , • • • , rc are irrelevant and it seems appropriate to use the angle between autocorrelation vectors as a measure of similarity. Specifically, for each class i we computed the score st
-
(r, r~) ~
(39)
(r, r)(r,, r,)
which gives the squared cosine of the anglo between r and rl in nthorder autocorrelation space. Then x(t) was assigned to a class k such that sk >_- st,
i = 1,-.',c.
(40)
This procedure was also used by Horwitz and Shelton (1961) with first order autocorrelation functions. Our work was largely motivated b y additional small scale experiments with second order autocolTelations of patterns perturbed by additive noise, described by Shelton in a private communication. According to the Schwartz inequality, 0 =< s~ < 1, and s~ = 1 only if r and r~ are identical. Notice that we restricted our choice of references r~ to valid nth-order autocorrelations of some patterns x~(t). This restriction was used to allow computation of inner products from :(29), EXPERIMENT I : TYPEWRITTEN a, e, 8
T h e data for this experiment consisted of 1354 samples of typewritten characters a, e, s. T h e y were optically scanned and represented in discrete black and white form as illustrated in Fig. 2 (a 'black point corresponds to x(t) = 1, and absence of a black point corresponds to x(t) = 0). The characters were obtained from field documents and represent a variety of typewriters and fonts. Typical characters are given in Fig. 2(e). The classes a, e, s were chosen because previous experiments with other recognition methods indicated they were the most difficult to classify. The first pattern from each class was chosen to be a reference character xi(t). As can be seen in Fig. 2(a), they are from a common font and are of relatively good quality. Decision runs were made using (39) and (29), with n ranging from 1 to 12. The number of errors is shown in Table I for each value of n. Notice that the performance impro ves rapidly with increasing n until fairly large values of n are reached. Explicit computation of nth-order autocorrelations would be impossible for thes e cases. For n = 12, the three characters classified incorrectly are shown in Fig.
134
McLAUGHLIN AND RAVIV TABLE I EXPERIMENT I: 1354 S.U~IPLESOF a, e, s
n
Errors
1
218 64 26 12 5 5 5 4 3 3 3 3
2 3 4 5 6 7 8 9 10 11 ]2 Matched Filter
7
2(b). In each case, s was called e. For purposes of comparison, we also used a matched filter approach in which we found
max ~ . x(t)x,(t + ~) t
8, = { Z x 2 ( t ) } l / Z { ~ . x 2 ( t ) } i / 2 . t
(41)
t
This method gave seven errors. We also ran an experiment with multi-level values of x(t) by using the output of the scanner before the threshold operation. Performance did not differ greatly from the binary case. EXPERIMENT I~: HANDLETTERED NUMERICS To obtain experience with patterns having greater variability, we next worked with a collection of handlettered numeric characters. 5 We assumed at the outset t h a t the simple decision method of Experiment I would not be adequate, and looked for reasonable ways to select k representative characters x~(t), j = 1, . . . , k (with nth-order autocorrelation vectors r~i) for each class i. Let W~ be the subspace spanned by r~i, j = 1, -- • , k. If x(t) belongs to class i, we would like the orthog6 These numerals were produced in the course of routine operations by four inventory clerks in a department store.
nTtI-ORDER AUTOCORRELATIONS IN PATTERN RECOGNITION
135
TABLE II EIGENVALUESOF R~R FoI~ 100 Fom~s 19.55 7.86 5.83 4.65 3.01
2.85 2.29 1.95
1.56 1.34
1.77
1.25 1.18
1.27
1.76
1.11 1.08 1.07 1.01 0.95
onal projection of its autocorrelation r into W~ to be large, or equivalently the distance from r to W~ to be small. Then the projection of r in W~ becomes a good approximation of r. To estimate how good an approximation is possible, we did a principal component analysis of 100 normalized seventh order autocorrelations of numeric fours, using the method described in Section III. The elements in the matrix R ' R of Equation (33) were set equal to (r4i, r4i)
iJr4, H"
,'
i = 1, 100, j = 1, 100,
the inner products being computed by (29). The first twenty eigenvalues are listed in Table II. On the average, one-third of the "energy" in each autocorrelation vector was projected into the first three eigenvectors, and half the "energy" is projected into the first ten eigenvectors (since the vectors were normalized, the sum of the eigenvalues is 100). Of course this experimental result depended on the choice of n. As n is increased, the nth-order autocorrelation vectors tend to be more nearly orthogonal; but on the other hand, the approximation needn't be as good. We considered selecting representative samples for each class by choosing sample nth-order autocorrelations that fall closest to the first few eigenvectors, but this procedure has two possible disadvantages; there may not be any sample autocorrelations close to an eigenvector, and the mean square distance criterion is probably not the most appropriate for this problem. We felt that more emphasis should be given to the few sample autocorrelations lying nearly orthogonM to the subspace we plan to project into. We then decided to try the following procedure: Given a set of sample patterns from one class, pick the pattern xl(t) from this set whose nth-order autoeorrelation vector rr has the largest
136
MCLAUGItLIN AND PAVIV
norm. Then pick the second pattern x2(t) such that the angle between r2 a n d r l is m a x i m i z e d . N e x t p i c k x ~ ( t ) s u c h t h a t t h e a n g l e b e t w e e n r~ a n d i t s p r o j e c t i o n i n t o t h e s u b s p a c e s p a n n e d b y r l , r~ is m a x i m i z e d . In general, pick xj(t) to maximize the angle between ri and its projection
++!ii+~+++++i + ii+ .++~
ji! ++
i.)
....,+++;+ i~i;;i+ ....+,+~+++++.,+,+++++++. ::++~+ : +++++ .+++++
i+
,+i+~ + •++~
+++'~iii~+ ~ iii:~~
.......+++++ +%
.,++ -++i?
,!i~ .!++
.ilil ,,,j ,iiii++ +~
+~+.,++,i+++!i+,,+.+/
!i;' +ii
,j+ ++ii~+~
i' ~il
~+ ++,,+i+i++;ii~ ++'+;~ ++'.':i~,++,++!:' + ,...... +;~+,~++++' +,+ :~++++. +,,+.++:::~++ +++'+::+~+++++.
++++++++. ++:+.... + ............
'++" +++ii
+++++++~.+++ ++~+: -+'
+++ii++~+'+++++++~ .++iii~+ii,+, ~+~+ ............ ,y+++++. ++++~ +++++++,++ +++~ .............+ .:++:+:
......iiili+ .....+~!iii+
..................... ++....
+,~+iiill
......~++ .........'!++~ +++:;i++ ":ili
+'.,+++'+i++
~'+++i..... + ++!ii~
.,++~++~(+/ i+. ii! .,i!+.~i!!i.. +:!~ ++~ .... "~+ .'+++~'./ +i+ +++ ~,+'~.+++~ ......++:++!: il ++i' +........+++:+:+' il;i~+.... +,y ....... ++?+ ++++i~++++++++++ ii++ ++++i+i ........ ............... ii~'+ ++++~?++ ...... +++i ++++++++++y++++++ +++il;i+ i+i:++~'+..... ~ +++.....-.+~++~ ill............ ,i+.......... ........ + +i~?:~' + ~m i+ ~il~+..... iF'~m~+m'+ .,+i +ii+i............+!+i+ ++~i+i+i+~+ +:+++++++i +:'":~!i+ +++++++"~i++~ ~i!~+~.+~ +t+~++% ~iiii~+~%+ ,~m~,+~++++
~+~++
+~+!CII
.,~i+++
++++i i~++ +:::::~ +~+ii!ii::::+ +i;i~++~:~+++++++~,++,+,++ii:::;? +i+ii+~c::ii~+++iiiii:iiiiiii~' ++!=iii~+
+ili:::i:?
+~++~+"+~++i li++........... +i+ili~' i ~...... !+iiiiF i+
....++ +ilii"
+..........++
'~~
,+,:+~+~F :+.......... ~i!++,!i ?+++ +........ S ++ ++~ ~ii;i~,+ ++~+
.j+:+~+++++i++i ?+i+++~+ ....+~+++i ++++ ++,....... ,..... :~
++++'+++ ......++ ........... ~i!ii+:.+i~ ,~.... +ii,~+~ i'
~........
~,ii!+"~"~' m++m+!!i!i~"
~+J+
:+~p%. +~++++~ .... + ++++' ~+:: '+++:+++ ++:+/i+ +++++++ ........ :17 +++ + +?~'++:++, +, i(ii~+ii+~ ++++++ +ii!i + +++' iiii~,++!:::'!!i~+ ~i'....... :+ii !i~+++' +i~i+ ++; ++#++ +~+,,~.
++:ii+iii~+ +C::ii+
i+~!+ +ii+~"
+ii~++ ............ j!(:+,++ ....i ~+++ .....i iii~,,+~
............
........
!~m~-
+........++z +cii~IIII+ ill::~j~ ,~iii:ii i~+ ~'+'
,ili+
FIG. 3. O u t l y i n g p a t t e r n s s e l e c t e d in E x p e r i m e n t
......
+ii:
II.
+izP"
n T H - O R D E R A U T O C O R R E L A ~ O N S I N PATTERN R E C O G N I ~ O N
137
as on the subspace spanned by r l , .. • , r~-1. Using (26) we find that the squared cosine of this angle is given by {
(rj., aj} .~2 ]1 rj II II aj I[) =
(R,rj),(R,R)-I(R,r~) r/r:.
: '
(42) '
where R is the m a t r i x of column vectors r l , -- • , r:.-~, Using this technique, we picked 10 patterns out of 100 samples for each of the 10 classes, and the selected patterns are shown in Fig. 3 in order o f Selection. As before, we used seventh-order autocorrelations. Let W~ denote the subspace spanned by the seventh-order autocorrelations of the 10 patterns selected for class i. Let at now denote the projection of r On W t , where r is the seventh-order autocorrelation vector of an unknown pattern x(t) which we wish to classify. Now suppose we assign x ( t ) to a class k such t h a t
IIak[I = [[ail[,
i = 1,...,10.
(43)
This experiment was run with a set of 3100 characters (distinct from the set used for finding the subspaces Wt), and 38 classification errors were made. Another decision experiment was run as follows: Let r~, i = 1, • • • , 100 be the seventh-order autocorrelation vectors of the patterns shown in Fig. 3. For each pattern x(t) with seventh-order autocorrelation vector r, compute the column vector s with elements rrt st - [[ rl II[ rt]]"
(44)
These m a y be viewed as measurements. For a linear decision we compute a vector z - As, where z has 10 elements zt, and assign x(t) to a class k such that Zk
>
_~
Zl
i = 1,
,
"'"
,10.
To determine the matrix A, we used a separate set of 2725 sample patterns in a training procedure which adjusted A to t r y t o make 0.5zk => z~,
i=
1,.-.,10,
i # k,
where k is the correct class. On the test set, 18 errors were made. Fig. 4 shows the characters which were classified incorrectly, and below each character the correct class and chosen class are indicated. Several zeroes
~mm~i~,
:++~
m+++ :
i+i++~
,'+"+
0--7
.
.,~ii~,~ ~ '
i;,,~,++E
~--9
.
.
.
.
4 --9
:-;"
~....
......]":
• ~+-o..,.t~.
=:=
+II"
.==. ...
.II"
.~'~" ...:.
"='"
:.~:
iii "
;;""
,o,--9
"~
+
0--7
++++'++flip
0--7
"++
0--6
0 --7
• o-
.,S"+I+
+
"~
.j~ .
........
.
.
0--6
o° ....
.... ++++
.
3+
~F"
0 --T
......
0 +7
,ill,
*.o
+++ +
++++++ +++= +,++
++++
8+2
0--7'
m'""~++
++.
+++++++ .+++. . . .
+++++'++T
,+,iii;;+++!i+ ++ 8--6
.ii~
++:
F I e . 4. C h a r a c t e r s
+~.
iii+++'
~:: .......... ~ :
3--5
o.
::" :=
4--9 misreeognized 138
8--1 in Experiment
II.
nTH-ORDER
AUTOCORRELAT]ONS
IN
PATTERN
RECOGNITION
139
were classified incorrectly because the scanner improperly scanned only part of the character, a difficulty that was not present in the training set. In other experiments with the same data, Bakis, Herbst and Nagy 6 obtained 23 errors using a combination of n-tuple and topological measurements. V. DISCUSSION AND CONCLUSIONS We have shown that for a broad class of functions, x(t) and y(t) can have the same second-order autocorrelation function only if x(t) = y(t + r) for all t and some T. Thus decision functions applied to secondorder autocorrelations ~ve a composite function which essentially includes all translation invariant decision functions. This suggests that we could concentrate our attention on the second-order case. But we obtained better experimental results with higher-order autocorrelations, apparently because they allow the use of simple decision functions. Our method of calculating inner products in nth-order autocorrelation space allows the use of large n, at least for exploratory experiments on a general purpose digital computer. We have also given examples of feasible computational methods for fairly general pattern recognition techniques applied in autocorrelation space. We have not yet made a serious study of the practical application of these ideas in recognition systems. It is apparent that a major problem is to obtain fast and economical computation of the function
f(7) = ~ x ( t ) y ( t
+ r)
t
(which can be written as a convolution). This function is of great importance in many applications, and much work has been done on different ways of implementing it. It is interesting to note the similarity (and differences between our use of r
and Woodward's (1953) use of T
exp hi( )
e Our colleagues R. Bakis, N. I~Ierbst, and G. Nagy performed extensive experiments on handlettered number recognition, and we used part of their data in our experiments.
140
'
McLAUGI~LIN
AND
RAVIV
for computing a probability in the presence of a nuisance translation parameter. If the term for one r dominates all others in the sum, both functions are closely related to max f(T), r
which corresponds to the peak response of a matched filter. In our experiments, the last expression was not as effective as the first for moderately large n. Optical processing techniques offer possibility of very fast filtering and correlation operations. A discussion of the application of these techniques to character recognition is given in Vander Lugt et al. (1965). Bliss and Crane (1965) mention functions of the form ~ , {f(r)} n+l in the context of optical processing, but apparently failed to apply a suitable normalization procedure. Our experiments show that nth-order autocorrelation functions can be applied effectively in pattern recognition, and in our opinion, this is a rich area for further research. APPENDIX We shall show that if X(f) and and if
Y(f)
are analytic on the real line R
X(f~)X(f2)Z*(f~ + f2) = Y(f~)Y(f2)Y*(£ + f2),
(45)
then X(f) has a zero of order k at a point fl if Y(f) has a zero of order k a t f 1. Proof. We shall first show that given a point fl there exists a point f0 such that
(a) (b)
(c) (d)
:
X(f °)
o,
X(fl + fo) # O, Y(f)
o,
y(fi + •) # O.
Assume this is not true. Then for all f E R at least one of these inequalities does not hold. Let us define Fa = {f: equality (a) does not hold}
nTH-ORDER AUTOCORRELATIONS IN PATTERN RECOGNITION
141
Similarly for Fb, Fo, and Fd. Therefore
F ~ U F b U F ~ U F d = R, and at least one set must have an uncountable number of points. But this contradicts the fact that X(f) and Y(f) were assumed to be analytic. Thus given f i there exists a n f ° such that (a), (b), (c), and (d) hold, and let us define
y(fO) y , ( f + re) el(f) = Z(fO)X,( f + fo)"
(46)
Since X(f) and Y(f) were assumed to be analytic and (a), (b), (c), and (d) hold, it follows that cl(f) is defined and continuous in a neighborhood of fl and lim ci(f) = ci(f l) ~ 0.
f~fi
Moreover for some neighborhood of fl O <
< cKf) <
<
Substituting (46) in (45) we obtain
X(f) = c~(f)Y(f)
(47)
in a neighborhood of fl. It therefore follows that the zeroes of X(f) and Y(f) are of the same order. Let us also note that given fl = 0, there exists an f0 such that 0
,
0
lira X(f) _ lira Co(f) = C0(0) - Y(f )Y (f )
f~o Y(f)
~o
X(f°)Z*(f °)
] y(fO)[2 --
I X ( f °) I~
--
I(O)
=
real
~
O.
[I(0) is clearly real if X(O) ~ 0 and Y(O) # 0.] RECEIVED: June 6, 1967 REFERENCES ADLER, R. L., AND KONHEIM, A. G. (1962), A note on t r a n s l a t i o n i n v a r i a n t s . Prec. Am. Math. Soc. 13, 425-428. BLIss, J. C. AND CR.~NE, H. O. (1965), R e l a t i v e m o t i o n and n o n l i n e a r photocells in optical image processing. Proceedings of Symposium on Optical and Elec-
142
McLA.UGttLIN AND RAVIV
tro-0ptical Information Processing. M.I.T. Press, Cambridge, Massachusetts, pp. 615-637. CHAZiN, D. A-~mWEIss, B. (1966), "A Further Note on Translation Invariants." IBM Research Paper, RC-1659. HILLE, E. AND PmLLIPS, R. S. (1957), "Functional Analysis and Semigroups." Rev. ed. Providence, Rhode Island, American Mathematical Society. (American Mathematical Society Colloquium Publications, Vol. 31) pp. 144-145. HoRw~z, L. P. AND S~ELTON, JR., G. L. (1961), Pattern recognition using autocorrelations. Proc. IRE 49, 175-185. ttoRwITz, L. P. AND SH~LTON, JR., G. L. (1965), Specimen Identification Techniques Employing Nth Order Autocorrelation Functions. U.S. Patent No. 3,196,397. PALEY, R. E. A. C. ANDWIENER, N. (1934), "Fourier Transforms in the Complex Domain." Providence, Rhode Island, American Mathematical Society. (American Mathematical Society Colloquium Publications, Vol. 19) pp. 3-4. RAo, C. R. (1964), The use and interpretation of principal components analysis in applied research. Sankhya, Indian J. Star., ~er. A. 26,329-358. RAvIv, J. AND STREETER, D. N. (1965), "Linear Methods for Biological Data Processing." IBM Research Report, RC-1577. VANDERLUGT, A., ROTZ, F. B. AND KLOOSTER,JR., A. (1965), Character reading by optical spatial filtering. Proceedings of Symposium on Optical and Electro-Optical Information Processing. M.I.T. Press, Cambridge, Massachusetts, pp. 125-141. WiTiNABE, S. (1965), Karhunen-Loeve expansion and factor analysis, theoretical remarks and applications. Proceedings of Fourth Prague Conference on Information Theory, Prague, Czechoslovakia. WILKS, S. S. (1962), "Mathematical Statistics." Wiley, New York, London. WOODWARD,P. M. (1953), "Probability and Information Theory, with Application to Radar." McGraw-Hill, New York, pp. 72-73, 81-82.