Pattern Recognition Letters 12 (1991) 635-643 North-Holland
October 1991
A comparison between Fourier-Mellin descriptors and moment based features for invariant object recognition using neural networks A.E. Grace and M. Spann School of Electronic and Electrical Engineering, Universityof Birmingham, Edgbaston, Birmingham, United Kingdom BI5 2TT
Received 9 January 1991 Revised 19 June 1991
Abstract Grace,'A.E. and M. Spann, A comparison between Fourier-Mellin descriptors and moment based features for invariant object recognition using neural networks, Pattern Recognition Letters 12 (1991) 635-643. Two algorithms for invariant object recognition are compared using a neural network and two statistical classifiers as classifiers for features taken from noiseless and noisy images. It was found that the Fourier-Mellin descriptors perform as well as moment based features for noiseless images but perform significantly better when noise is added. Neural networks perform better than statistical classifiers when noise is added. It was also found that Fourier-Mellin descriptors are very sensitive to uncertainty in the position of the object centroid whereas the moment based features provide good performance when the object is not presented at the centroid.
Keywords. Moment based features, Fourier-Mellin descriptors, neural networks, invariant object recognition.
1. Introduction This paper compares two methods for invariant object recognition. The first method is based on spatial moments computed across the image [l] and the second is a combination of the circularharmonic-expansion [2] and the Mellin transform [3,4]. These methods are used with a neural network and two statistical classifiers, the nearest neighbour classifier and the minimum mean distance classifier. The paper demonstrates the effectiveness of these methods for a number of scaled
and rotated objects for varying signal to noise ratios.
2. Moment invariant features Moment invariant features are a set of nonlinear functions which are invariant to translation, scale and orientation and are defined on geometrical moments of the image. The M . M image plane is first mapped onto a square defined by x ~ [ - l , + l ] , y ~ [ - 1 , + l ] . The
0167-8655/91/$03.50 © 1991 - - Elsevier Science Publishers B.V. All rights reserved
635
Volume 12, Number 10
PATTERN RECOGNITION LETTERS
( p + q ) t h geometrical moment for the image described by f ( x , y ) is defined as +1 +1 mpq = ~., ~ xPyqf(x, y). (2. I) x=-I y=-I To make these moments invariant to translation, a central moment can be defined as +1 +1 Upq = E Z (X--R)P(Y--.P)qf(x,Y) (2.2) x=-l y=-I with :~=
ml0
and
m0l
y=
moo
.
moo
Central moments can be normalised to become invariant to scale change by defining
nflp_~ , y r/Pq = Poo
=
-P +- q
÷ 1.
(b) ~b2 = (/120-//02)2÷4/721, (C) {b3 = (//30-- 3//12) 2 + (3//21 --/703) 2,
~5 = (//30 - 3/112)(/130 ÷/712)*
(If0(r)l
.... ,
I]N(r)l)
Scale invariance is achieved using the Mellin transform M(u, o) [3,4] of an image. For an image f(x, y) the Mellin transform is defined by
t~6 ---- (/720 --/702)[(/730 +/712) 2 -- (r/El + r/03) 2]
(2.3)
The absolute values of these six functions are selected as features representing the image.
3. Fourier-MelUn descriptors Fourier-Mellin descriptors transform the image space to provide a code which is invariant to scale and rotation. The following describes how this is achieved. 636
(3.2)
,¢n J0
3.2. Scale invariance
r/i2) 2 - (r/El -I- r/03)2],
+ 4r/ll (r/3o+ r/IE)(r/03 +/']El )"
f,(r) = =-- I ](r, 0) e -,'~ dO,
is a rotation invariant' signature for f(r, O) that can be used for rotation invariant pattern recognition.
+ (3/'/21 --/'/03)(/721 ÷/703)*
(f)
(3.1)
where
C(r) =
[(/130 ÷/712) 2 - 3(/721 +/'/03) 2]
[3(r/3o +
fn(r)O "°
~
n . . . . , - I, O, 1.... , is known as the nth circular harmonic expansion coefficient o f f ( r , 0). It can be shown that the vector
(a) ¢1 = r/20÷r/02,
(e)
Rotation invariance is achieved using the circular harmonic function expansion [2]. The image is sampled radially from the centre of. the image space and its spectral content obtained. The Fourier series of these samples is taken. Removing the phase yields a rotation independent code for the image at radius r. The complete image can be described by sampling in r for particular harmonics depending upon the images. More formally, the image space is a two-dimensional function described by f(i, 0). The convention used here is that f ( ) denotes polar coordinates and f ( ) denotes cartesian coordinates. The image can be expressed as a Fourer series:
II = --¢~0
A set of non-linear functions defined on r/pq which are invariant to rotation, translation and scale have been derived [1]. They are
(b4 = (//30 ÷//12) 2 ÷ (//21 ÷/703) 2,
3.1. Rotation invariance
f(t;0)=
2
(d)
October 1991
MfCtt, o) = d l [ f ( x , y)] f ( x , y ) x J U - l y j°-I d x d y
= o
(3.3)
0
where d.( denotes the Mellin transform. The Mellin transform can be realised efficiently by using the non-linear mapping x = e -¢ and y = e -'t. Now
h((, r/) e - j ° ' ¢ + o,) d ( dr/
Mf(u, o) = --(3O
(3.4)
Volume 12, Number 10
f(r,O)
Extract Circular Harmonic
October 1991
PATTERN RECOGNITION LETTERS
fn
Component
e~
Non-linear
m
-I n
--X
fn(e
mapping X
Rotation and scale
),[
invariant
M~ (~', Fast Fourier
fn
Magnitude
signature
transform r
Figure 1. Algorithm for signature extraction.
where
4. Classifiers
h((, q) = f ( e -~, e-"). Equation (3.4) shows that the Mellin transform can be implemented by a non-linear mapping followed by a two-dimensional Fourier transform.
3.3. Combined circular harmonic expansion and Mellin transform The two methods previously described for scale and rotation invariance can be combined into a single algorithm. Consider a scaled and rotated version of an image
4(',O) = f(ar, O+ a).
(3.5)
The Mellin transform of this is given by"
Mg,,(w) = eJ~"aJWMA(w)
(3.6)
where
M],,,(w) =
~,(r) r jw- l dr.
(3.7)
0
Taking the modulus of both sides yields a rotation and scale invariant signature for g(r, O) since the a dependence occurs only in the phase.
3.4. bnplementation Equation (3.7) can be realised by the substitution r = e -x. This gives M:,(w) =
l
oo
f,,(e-") e-J" ~ .
(3.8)
0
Hence, a transformation of e-x is required followed by a Fourier transform to obtain Ml,,(w). Figure 1 shows the complete algorithm for rotation and scale signature extraction.
In these tests three classifiers were used. The multi-layer perceptron is compared with two staffstical classifiers, the nearest neighbour and minimum mean distance types.
4.1. The multi-layer perceptron The type of neural network used is known as the multi-layer perceptron. It is a feed-forward type of network with input and output layers of nodes. Between the inputs and outputs are more layers of nodes called hidden nodes. The output of a node is connected to the input of every node in the following layer via a variable weight. The weights are adjusted according to the back-propagation algorithm presented in [5]. For training, all outputs are set to zero except the desired output which is set to one. In this paper the inputs are either the moment invariant features or the Fourier-Mellin descriptors. The multi-layer perceptron used has two hidden layers. For the moment invariant features, the network structure has 6 input nodes, 6 nodes in the first and second hidden layers and
iq r- E [: H I
I
/
S
Figure 2. The 10 clmracters used. 637
Volume 12, Number 10
PATTERN RECOGNITION LETTERS
l0 output nodes. For the Fourier-Mellin descriptors, the network structure had 256 input nodes, 128 nodes in the first hidden layer, 64 nodes in the second hidden layer and 10 output nodes. There is no precise way to choose the number of nodes in the hidden layers. In [6] it was suggested that the number of second layer nodes required in the worst case should at least be equal to the number of disconnected regions in input distributions. It was also suggested that there should be more than three times as many nodes in the second layer as in the first layer. However, it has been shown in [3] that increasing the number of hidden nodes does not improve performance.
October 1991
class i respectively. The unknown test sample X is assigned to the class i* which has minimum city block distance from X in feature space, where
i* = m i n d ( X , ttki)), i = 1,2 ..... C, i
k = 1,2, ....
d(X,
(4.1)
N i,
t(i) I
tol'k , =
Ix,,, - -k,,,, m = 1
(4.2)
~(i) Otm
with U,~(i) l l n representing standard deviation of the ruth component o f the n-dimensional training feature vector of class i and where C is the number of classes.
4.3. M i n i m u m mean distance rule
4.2. The nearest neighbour classifier When an unknown sample X is to be classified the nearest neighbour of X is found among a pool of all m available training samples and its label assigned to X. To prevent one subgroup of vector components dominating another the feature vectors are normalised. The normalisation consists of subtracting the sample mean and dividing by the standard deviation of the corresponding class. Let t~!) = [t(ki~,..., tk, ,'(i)1 and N/ denote the kth ndimensional training feature vector of the ith class and the number of available training samples of
The weighted minimum mean distance classifier characterises each category by mean and standard deviations o f the components o f its training feature vectors. The weighted distance between an unknown sample X and the mean of the features of class i, d ( X , i ) , is then measured. The weighting factor is the standard deviation of the respective feature. The unknown sample is then assigned to the class i* which has minimum distance from X in feature space, that is,
i*=mind(X,i), i
i = 1 , 2 ..... C
(4.3)
and
HHH
"
H
H
EI-EEE
Ill
Figure 3. Various scales and rotations for H, E and O.
=
(4.4)
,~(i)
1
villi
where t~ ) represents the sample mean of the ruth component of the m-dimensional training feature vector of class i. Again, weighting by the standard
I
J
638
Ix,,,- t /?l
d(X, i) = ~]
l
"
.
Figure 4. Character H with different levels of noise. From left to right signal to noise ratio is noiseless, 12, 50, 8, 25, 5 dB.
Volume 12, Number 10
PATTERN RECOGNITION LETTERS Z
80
correct
October 1991
classification
-
60
. . . . . . . .
40
......
20
....
0
CxO
50
25
Signal --
N e u r a l Network
12
to noise
8
5
ratio
--4-- Nearest Ncighbour
--Fe- M/n M e a n Distance
Figure 5. Noise immunity test for moment invariant features, 3 samples per class.
d e v i a t i o n is to b a l a n c e the effect o f all m feature v e c t o r c o m p o n e n t s on d i s t a n c e d.
In a d d i t i o n to the a b o v e noiseless i m a g e set, five o t h e r sets o f noisy images with respective signal to noise ratios o f 50, 25, 12, 8 a n d 5 dB were also cons t r u c t e d f r o m the noiseless images. T h e noise was a d d e d by r a n d o m l y selecting s o m e o f the pixels o f a noiseless i m a g e a n d inverting the values. T h e rand o m pixel selection is d o n e a c c o r d i n g to a u n i f o r m p r o b a b i l i t y d i s t r i b u t i o n . T h e signal to noise ratio is c o m p u t e d using
5. T h e test d a t a sets T h e images used were 2 5 6 , 2 5 6 b i n a r y T h e c h a r a c t e r s used are s h o w n in F i g u r e 2. t r a i n i n g sets, r a n d o m r o t a t i o n s a n d scales i m a g e were g e n e r a t e d . F i g u r e 3 shows scales a n d r o t a t i o n s for three images.
images. F o r the o f each various
20 log [(N 2 -
L)/L]
% correct classification
95
9O
85
80 0
i
I
l
i
5
i0
15
20
Pixels Neural Network
from
t 25
30
centroid
---+-- N e a r e s t N e i g h b o u r
~
blin },lean D i s t a n c e
Figure 6. Translation invariance tests for moment invariant features. 639
Volume 12, Number 10
PATTERN RECOGNITION LETTERS
where L is the number of pixels which are different between the noisy images and the noiseless images and N is the image size. Figure 4 shows one image with different signal to noise ratios.
October 1991
In a realistic situation the position of the character in the image space is uncertain. Ten noiseless images from each class at random rotations and scales are shifted one pixel at a time in the x-plane to determine the effect upon classification.
6. Description of experiments 7. Experiments with moment invariant features
For training, data sets were created consisting of 3, 5 and 10 presentations of each randomly rotated and scaled image. For testing, the data set comprised 10 presentations o f each randomly rotated and scaled image. In experiments with noisy images, each classifier was trained using noiseless images and was tested with noisy images. For the neural network, the training features were normalised to have zero mean and unit variance. This is necessary to ensure that a subgroup of the features does not dominate the weight adjustment process during training. The ruth feature is normalised by: I m - - "i"m
(6.1)
Fm= ~
Glm
where/'m and G1m are the sample mean and standard deviation of the ruth training features of all the classes.
Figure 5 shows a graph of results when trained with 3 samples per class. The moment invariant features perform well for noiseless images achieving 99°7o correct classification. However, when the signal to noise ratio is 50 dB or below, none of the classifiers can perform better than 10°70. This is because the image space has been defined between - 1 and +1. Added noise near these limits has a greater effect than added noise around the image origin. Hence even if no noise is added to the object, the effect of noise near the edges of the image will dominate the moments. This will increase greatly for higher order moments. If the number of samples per class are increased to 5 and 10, performance does not improve. Figure 6 shows the effect of translation on the moment invariant features. All classifiers gave 9607o or better classification even when the centroid was displaced by 30 pixels.
correct classification
C>
50
Neural Network
25 12 Signal to noise ratio --4-- Nearest Neilhbour
~
8
5
Min Mean Distance
Figure 7. Noise immunity tests for Fourier-Mellin descriptors, 3 samples per class. 64O
Volume 12, Number 10
PATTERN RECOGNITION LETTERS
I00
October 1991
~. correct classification
80 60 40 20
1
0
50
c><3 ~Neural
Network
I
I
I
25 12 Signal to noise ratio -'b-Nearest Neighbour
~Min
8
6
Mean D i s t a n c e
Figure 8. Noise immunity tests for Fourier-Mellin descriptors, 5 samples per class.
8. Experiments with Fourier-Mellin descriptors
Figure 7 shows a graph of results for 3 samples per class. The neural network gives the best results achieving 99070 for noiseless and 50 dB SNR images. The two other classifiers perform approximately the same and cannot distinguish classes after 25 dB SNR. If the number of samples per training set is increased to 5 and then I0 as shown in Figures 8 and 9, the performance of the nearest
100
neighbour and minimum mean distance classifiers should improve. This is because a better estimate o f the mean and covariance is gained. From Figure 8, it can be seen that the performance of the neural network has improved for the noisy images. Results for the other classifiers are similar and initial performance is improved. The two statistical classifiers still perform badly after 25 dB SNR. For 10 samples per class the performance of all the classifiers is not greatly im-
% correct classification
80 ~ 6O
,t0
I
0
C><3
50 Neural Network
I
I
I
25 12 Signal to noise ratio - - b - Nearest Nelghbour
8
5
- ~ - Min Mean Distance
Figure 9. Noise immunity tests for Fouricr-Mellin descriptors, 10 samples per class. 641
Volume 12, Number 10
PATTERN RECOGNITION LETTERS
October 1991
correct classification
100
80;
80 40
20
0 0
I
I
I
I
I
5
10
15
20
25
Pixels from
centroid
Neural Network
"4-- Nearest Neighbour
30
--~-- Min M e a n D i s t a n c e
Figure 10. Translation invariance test for Fourier-Mellin descriptors.
proved. The best that the nearest neighbour classifiers could achieve was 88% correct classification. One reason for this is that statistical classifiers do not perform well with data sets that have disjoint regions in classification space. For Gaussian distributions the nearest mean distance test is optimum in terms of Bayesian classification for equal priors [7]. Neural networks are not model based and require no prior statistical knowledge. Figure 10 shows the translation effects on the Fourier-Mellin descriptors. The classification performance is worse than the moment invariant features with correct classification reduced to 43% for a shift of 5 pixels using the neural network. Correct classification is reduced to approximately 10% for a shift of 30 pixels. Hence the method is not translation invariant.
9. Conclusions This paper has shown that a neural network classifier can provide improved classification when compared with conventional statistical classifiers. The neural network performs better with noisy images whereas the other classifiers degrade after a signal to noise ratio of 25 dB. The neural network was still able to give a best figure of 27% classification for a signal to noise ratio of 5 dB 642
using the Fourier-Mellin descriptors. A drawback with the neural network is the training time required. The performance of the neural network was not greatly affected when the number of training samples per class was changed from 3 to 5 and then to 10. For the other classifiers, performance increased when the number of training samples was increased from 3 to 5 but changed little when 10 samples were used. For the continuous image space these errors would not occur as the codes for all noiseless objects from the same class would be the same. The quantisation errors that occur for the discrete image space are averaged and the total error becomes smaller as the number of samples is increased. The Fourier-Mellin descriptors perform as well as the Moment Invariant features for noiseless images. When a small amount of noise is added the Moment Invariant features become ineffective. The Fourier-Mellin descriptors continue to give a high percentage correct classification when used with the neural network. The Fourier-Mellin descriptors perform well with the images used in the tests. A problem is that the image must be presented with its centroid in the same position every time. Therefore, the algorithm is not translation invariant as demonstrated by the shifting experiments. Another drawback is that the
Volume 12, Number 10
PATTERN RECOGNITION LETTERS
information extracted is global, no local information is used. This tends to lead to a large amount of data containing little information about the actual image.
Acknowledgements The Science and Engineering Research Council (UK) and Inmos Ltd. are gratefully thanked for sponsoring this work.
References [1] Hu, M. (1962). Visual pattern recognition by moment invariants. IRE Trans. Inform. Theory 8, 179-187.
October 1991
[2] Hsu, Y.N., H.H. Aresnault and G. April (1982). Rotational invariant digital pattern recognition using circular harmonic expansion. Appl. Opt. 21, 4016-4019. [3] Khotanzad, A. and J. Lu (1990). Classification of invariant image representations using a neural network. IEEE Trans. Acou~'t. Speech Signal Process. 38 (6). [41 Wu, R. and H. Stark (1986). Three dimensional object recognition from multiple views. J. Opt. Soc. Am. 3 (9). [5] Rumelhart, D.E., G.E. Hinton and R.J. Williams (1986). Learning internal representations by error propagation. In: Parallel Distributed Processhtg: Explorations in the Microstructure o f Cognition, VoL 1 (MIT Press, Cambridge, MA, 1986). [61 Lippmann, R.P. (1987). An introduction to computing with neural nets. IEEE A S S P Magazine, April. [7] Fukunaga, K. (1972). Introduction to Statistical Pattern Recognition. Academic Press, New York.
643