A comparison between Fourier-Mellin descriptors and moment based features for invariant object recognition using neural networks

A comparison between Fourier-Mellin descriptors and moment based features for invariant object recognition using neural networks

Pattern Recognition Letters 12 (1991) 635-643 North-Holland October 1991 A comparison between Fourier-Mellin descriptors and moment based features f...

454KB Sizes 18 Downloads 82 Views

Pattern Recognition Letters 12 (1991) 635-643 North-Holland

October 1991

A comparison between Fourier-Mellin descriptors and moment based features for invariant object recognition using neural networks A.E. Grace and M. Spann School of Electronic and Electrical Engineering, Universityof Birmingham, Edgbaston, Birmingham, United Kingdom BI5 2TT

Received 9 January 1991 Revised 19 June 1991

Abstract Grace,'A.E. and M. Spann, A comparison between Fourier-Mellin descriptors and moment based features for invariant object recognition using neural networks, Pattern Recognition Letters 12 (1991) 635-643. Two algorithms for invariant object recognition are compared using a neural network and two statistical classifiers as classifiers for features taken from noiseless and noisy images. It was found that the Fourier-Mellin descriptors perform as well as moment based features for noiseless images but perform significantly better when noise is added. Neural networks perform better than statistical classifiers when noise is added. It was also found that Fourier-Mellin descriptors are very sensitive to uncertainty in the position of the object centroid whereas the moment based features provide good performance when the object is not presented at the centroid.

Keywords. Moment based features, Fourier-Mellin descriptors, neural networks, invariant object recognition.

1. Introduction This paper compares two methods for invariant object recognition. The first method is based on spatial moments computed across the image [l] and the second is a combination of the circularharmonic-expansion [2] and the Mellin transform [3,4]. These methods are used with a neural network and two statistical classifiers, the nearest neighbour classifier and the minimum mean distance classifier. The paper demonstrates the effectiveness of these methods for a number of scaled

and rotated objects for varying signal to noise ratios.

2. Moment invariant features Moment invariant features are a set of nonlinear functions which are invariant to translation, scale and orientation and are defined on geometrical moments of the image. The M . M image plane is first mapped onto a square defined by x ~ [ - l , + l ] , y ~ [ - 1 , + l ] . The

0167-8655/91/$03.50 © 1991 - - Elsevier Science Publishers B.V. All rights reserved

635

Volume 12, Number 10

PATTERN RECOGNITION LETTERS

( p + q ) t h geometrical moment for the image described by f ( x , y ) is defined as +1 +1 mpq = ~., ~ xPyqf(x, y). (2. I) x=-I y=-I To make these moments invariant to translation, a central moment can be defined as +1 +1 Upq = E Z (X--R)P(Y--.P)qf(x,Y) (2.2) x=-l y=-I with :~=

ml0

and

m0l

y=

moo

.

moo

Central moments can be normalised to become invariant to scale change by defining

nflp_~ , y r/Pq = Poo

=

-P +- q

÷ 1.

(b) ~b2 = (/120-//02)2÷4/721, (C) {b3 = (//30-- 3//12) 2 + (3//21 --/703) 2,

~5 = (//30 - 3/112)(/130 ÷/712)*

(If0(r)l

.... ,

I]N(r)l)

Scale invariance is achieved using the Mellin transform M(u, o) [3,4] of an image. For an image f(x, y) the Mellin transform is defined by

t~6 ---- (/720 --/702)[(/730 +/712) 2 -- (r/El + r/03) 2]

(2.3)

The absolute values of these six functions are selected as features representing the image.

3. Fourier-MelUn descriptors Fourier-Mellin descriptors transform the image space to provide a code which is invariant to scale and rotation. The following describes how this is achieved. 636

(3.2)

,¢n J0

3.2. Scale invariance

r/i2) 2 - (r/El -I- r/03)2],

+ 4r/ll (r/3o+ r/IE)(r/03 +/']El )"

f,(r) = =-- I ](r, 0) e -,'~ dO,

is a rotation invariant' signature for f(r, O) that can be used for rotation invariant pattern recognition.

+ (3/'/21 --/'/03)(/721 ÷/703)*

(f)

(3.1)

where

C(r) =

[(/130 ÷/712) 2 - 3(/721 +/'/03) 2]

[3(r/3o +

fn(r)O "°

~

n . . . . , - I, O, 1.... , is known as the nth circular harmonic expansion coefficient o f f ( r , 0). It can be shown that the vector

(a) ¢1 = r/20÷r/02,

(e)

Rotation invariance is achieved using the circular harmonic function expansion [2]. The image is sampled radially from the centre of. the image space and its spectral content obtained. The Fourier series of these samples is taken. Removing the phase yields a rotation independent code for the image at radius r. The complete image can be described by sampling in r for particular harmonics depending upon the images. More formally, the image space is a two-dimensional function described by f(i, 0). The convention used here is that f ( ) denotes polar coordinates and f ( ) denotes cartesian coordinates. The image can be expressed as a Fourer series:

II = --¢~0

A set of non-linear functions defined on r/pq which are invariant to rotation, translation and scale have been derived [1]. They are

(b4 = (//30 ÷//12) 2 ÷ (//21 ÷/703) 2,

3.1. Rotation invariance

f(t;0)=

2

(d)

October 1991

MfCtt, o) = d l [ f ( x , y)] f ( x , y ) x J U - l y j°-I d x d y

= o

(3.3)

0

where d.( denotes the Mellin transform. The Mellin transform can be realised efficiently by using the non-linear mapping x = e -¢ and y = e -'t. Now

h((, r/) e - j ° ' ¢ + o,) d ( dr/

Mf(u, o) = --(3O

(3.4)

Volume 12, Number 10

f(r,O)

Extract Circular Harmonic

October 1991

PATTERN RECOGNITION LETTERS

fn
Component

e~

Non-linear

m

-I n

--X

fn(e

mapping X

Rotation and scale

),[

invariant

M~ (~', Fast Fourier

fn

Magnitude

signature

transform r

Figure 1. Algorithm for signature extraction.

where

4. Classifiers

h((, q) = f ( e -~, e-"). Equation (3.4) shows that the Mellin transform can be implemented by a non-linear mapping followed by a two-dimensional Fourier transform.

3.3. Combined circular harmonic expansion and Mellin transform The two methods previously described for scale and rotation invariance can be combined into a single algorithm. Consider a scaled and rotated version of an image

4(',O) = f(ar, O+ a).

(3.5)

The Mellin transform of this is given by"

Mg,,(w) = eJ~"aJWMA(w)

(3.6)

where

M],,,(w) =

~,(r) r jw- l dr.

(3.7)

0

Taking the modulus of both sides yields a rotation and scale invariant signature for g(r, O) since the a dependence occurs only in the phase.

3.4. bnplementation Equation (3.7) can be realised by the substitution r = e -x. This gives M:,(w) =

l

oo

f,,(e-") e-J" ~ .

(3.8)

0

Hence, a transformation of e-x is required followed by a Fourier transform to obtain Ml,,(w). Figure 1 shows the complete algorithm for rotation and scale signature extraction.

In these tests three classifiers were used. The multi-layer perceptron is compared with two staffstical classifiers, the nearest neighbour and minimum mean distance types.

4.1. The multi-layer perceptron The type of neural network used is known as the multi-layer perceptron. It is a feed-forward type of network with input and output layers of nodes. Between the inputs and outputs are more layers of nodes called hidden nodes. The output of a node is connected to the input of every node in the following layer via a variable weight. The weights are adjusted according to the back-propagation algorithm presented in [5]. For training, all outputs are set to zero except the desired output which is set to one. In this paper the inputs are either the moment invariant features or the Fourier-Mellin descriptors. The multi-layer perceptron used has two hidden layers. For the moment invariant features, the network structure has 6 input nodes, 6 nodes in the first and second hidden layers and

iq r- E [: H I

I

/

S

Figure 2. The 10 clmracters used. 637

Volume 12, Number 10

PATTERN RECOGNITION LETTERS

l0 output nodes. For the Fourier-Mellin descriptors, the network structure had 256 input nodes, 128 nodes in the first hidden layer, 64 nodes in the second hidden layer and 10 output nodes. There is no precise way to choose the number of nodes in the hidden layers. In [6] it was suggested that the number of second layer nodes required in the worst case should at least be equal to the number of disconnected regions in input distributions. It was also suggested that there should be more than three times as many nodes in the second layer as in the first layer. However, it has been shown in [3] that increasing the number of hidden nodes does not improve performance.

October 1991

class i respectively. The unknown test sample X is assigned to the class i* which has minimum city block distance from X in feature space, where

i* = m i n d ( X , ttki)), i = 1,2 ..... C, i

k = 1,2, ....

d(X,

(4.1)

N i,

t(i) I

tol'k , =

Ix,,, - -k,,,, m = 1

(4.2)

~(i) Otm

with U,~(i) l l n representing standard deviation of the ruth component o f the n-dimensional training feature vector of class i and where C is the number of classes.

4.3. M i n i m u m mean distance rule

4.2. The nearest neighbour classifier When an unknown sample X is to be classified the nearest neighbour of X is found among a pool of all m available training samples and its label assigned to X. To prevent one subgroup of vector components dominating another the feature vectors are normalised. The normalisation consists of subtracting the sample mean and dividing by the standard deviation of the corresponding class. Let t~!) = [t(ki~,..., tk, ,'(i)1 and N/ denote the kth ndimensional training feature vector of the ith class and the number of available training samples of

The weighted minimum mean distance classifier characterises each category by mean and standard deviations o f the components o f its training feature vectors. The weighted distance between an unknown sample X and the mean of the features of class i, d ( X , i ) , is then measured. The weighting factor is the standard deviation of the respective feature. The unknown sample is then assigned to the class i* which has minimum distance from X in feature space, that is,

i*=mind(X,i), i

i = 1 , 2 ..... C

(4.3)

and

HHH

"

H

H

EI-EEE

Ill

Figure 3. Various scales and rotations for H, E and O.

=

(4.4)

,~(i)

1

villi

where t~ ) represents the sample mean of the ruth component of the m-dimensional training feature vector of class i. Again, weighting by the standard

I

J

638

Ix,,,- t /?l

d(X, i) = ~]

l

"

.

Figure 4. Character H with different levels of noise. From left to right signal to noise ratio is noiseless, 12, 50, 8, 25, 5 dB.

Volume 12, Number 10

PATTERN RECOGNITION LETTERS Z

80

correct

October 1991

classification

-

60

. . . . . . . .

40

......

20

....

0

CxO

50

25

Signal --

N e u r a l Network

12

to noise

8

5

ratio

--4-- Nearest Ncighbour

--Fe- M/n M e a n Distance

Figure 5. Noise immunity test for moment invariant features, 3 samples per class.

d e v i a t i o n is to b a l a n c e the effect o f all m feature v e c t o r c o m p o n e n t s on d i s t a n c e d.

In a d d i t i o n to the a b o v e noiseless i m a g e set, five o t h e r sets o f noisy images with respective signal to noise ratios o f 50, 25, 12, 8 a n d 5 dB were also cons t r u c t e d f r o m the noiseless images. T h e noise was a d d e d by r a n d o m l y selecting s o m e o f the pixels o f a noiseless i m a g e a n d inverting the values. T h e rand o m pixel selection is d o n e a c c o r d i n g to a u n i f o r m p r o b a b i l i t y d i s t r i b u t i o n . T h e signal to noise ratio is c o m p u t e d using

5. T h e test d a t a sets T h e images used were 2 5 6 , 2 5 6 b i n a r y T h e c h a r a c t e r s used are s h o w n in F i g u r e 2. t r a i n i n g sets, r a n d o m r o t a t i o n s a n d scales i m a g e were g e n e r a t e d . F i g u r e 3 shows scales a n d r o t a t i o n s for three images.

images. F o r the o f each various

20 log [(N 2 -

L)/L]

% correct classification

95

9O

85

80 0

i

I

l

i

5

i0

15

20

Pixels Neural Network

from

t 25

30

centroid

---+-- N e a r e s t N e i g h b o u r

~

blin },lean D i s t a n c e

Figure 6. Translation invariance tests for moment invariant features. 639

Volume 12, Number 10

PATTERN RECOGNITION LETTERS

where L is the number of pixels which are different between the noisy images and the noiseless images and N is the image size. Figure 4 shows one image with different signal to noise ratios.

October 1991

In a realistic situation the position of the character in the image space is uncertain. Ten noiseless images from each class at random rotations and scales are shifted one pixel at a time in the x-plane to determine the effect upon classification.

6. Description of experiments 7. Experiments with moment invariant features

For training, data sets were created consisting of 3, 5 and 10 presentations of each randomly rotated and scaled image. For testing, the data set comprised 10 presentations o f each randomly rotated and scaled image. In experiments with noisy images, each classifier was trained using noiseless images and was tested with noisy images. For the neural network, the training features were normalised to have zero mean and unit variance. This is necessary to ensure that a subgroup of the features does not dominate the weight adjustment process during training. The ruth feature is normalised by: I m - - "i"m

(6.1)

Fm= ~

Glm

where/'m and G1m are the sample mean and standard deviation of the ruth training features of all the classes.

Figure 5 shows a graph of results when trained with 3 samples per class. The moment invariant features perform well for noiseless images achieving 99°7o correct classification. However, when the signal to noise ratio is 50 dB or below, none of the classifiers can perform better than 10°70. This is because the image space has been defined between - 1 and +1. Added noise near these limits has a greater effect than added noise around the image origin. Hence even if no noise is added to the object, the effect of noise near the edges of the image will dominate the moments. This will increase greatly for higher order moments. If the number of samples per class are increased to 5 and 10, performance does not improve. Figure 6 shows the effect of translation on the moment invariant features. All classifiers gave 9607o or better classification even when the centroid was displaced by 30 pixels.

correct classification

C>
50

Neural Network

25 12 Signal to noise ratio --4-- Nearest Neilhbour

~

8

5

Min Mean Distance

Figure 7. Noise immunity tests for Fourier-Mellin descriptors, 3 samples per class. 64O

Volume 12, Number 10

PATTERN RECOGNITION LETTERS

I00

October 1991

~. correct classification

80 60 40 20

1

0

50

c><3 ~Neural

Network

I

I

I

25 12 Signal to noise ratio -'b-Nearest Neighbour

~Min

8

6

Mean D i s t a n c e

Figure 8. Noise immunity tests for Fourier-Mellin descriptors, 5 samples per class.

8. Experiments with Fourier-Mellin descriptors

Figure 7 shows a graph of results for 3 samples per class. The neural network gives the best results achieving 99070 for noiseless and 50 dB SNR images. The two other classifiers perform approximately the same and cannot distinguish classes after 25 dB SNR. If the number of samples per training set is increased to 5 and then I0 as shown in Figures 8 and 9, the performance of the nearest

100

neighbour and minimum mean distance classifiers should improve. This is because a better estimate o f the mean and covariance is gained. From Figure 8, it can be seen that the performance of the neural network has improved for the noisy images. Results for the other classifiers are similar and initial performance is improved. The two statistical classifiers still perform badly after 25 dB SNR. For 10 samples per class the performance of all the classifiers is not greatly im-

% correct classification

80 ~ 6O

,t0

I

0

C><3

50 Neural Network

I

I

I

25 12 Signal to noise ratio - - b - Nearest Nelghbour

8

5

- ~ - Min Mean Distance

Figure 9. Noise immunity tests for Fouricr-Mellin descriptors, 10 samples per class. 641

Volume 12, Number 10

PATTERN RECOGNITION LETTERS

October 1991

correct classification

100

80;

80 40

20

0 0

I

I

I

I

I

5

10

15

20

25

Pixels from

centroid

Neural Network

"4-- Nearest Neighbour

30

--~-- Min M e a n D i s t a n c e

Figure 10. Translation invariance test for Fourier-Mellin descriptors.

proved. The best that the nearest neighbour classifiers could achieve was 88% correct classification. One reason for this is that statistical classifiers do not perform well with data sets that have disjoint regions in classification space. For Gaussian distributions the nearest mean distance test is optimum in terms of Bayesian classification for equal priors [7]. Neural networks are not model based and require no prior statistical knowledge. Figure 10 shows the translation effects on the Fourier-Mellin descriptors. The classification performance is worse than the moment invariant features with correct classification reduced to 43% for a shift of 5 pixels using the neural network. Correct classification is reduced to approximately 10% for a shift of 30 pixels. Hence the method is not translation invariant.

9. Conclusions This paper has shown that a neural network classifier can provide improved classification when compared with conventional statistical classifiers. The neural network performs better with noisy images whereas the other classifiers degrade after a signal to noise ratio of 25 dB. The neural network was still able to give a best figure of 27% classification for a signal to noise ratio of 5 dB 642

using the Fourier-Mellin descriptors. A drawback with the neural network is the training time required. The performance of the neural network was not greatly affected when the number of training samples per class was changed from 3 to 5 and then to 10. For the other classifiers, performance increased when the number of training samples was increased from 3 to 5 but changed little when 10 samples were used. For the continuous image space these errors would not occur as the codes for all noiseless objects from the same class would be the same. The quantisation errors that occur for the discrete image space are averaged and the total error becomes smaller as the number of samples is increased. The Fourier-Mellin descriptors perform as well as the Moment Invariant features for noiseless images. When a small amount of noise is added the Moment Invariant features become ineffective. The Fourier-Mellin descriptors continue to give a high percentage correct classification when used with the neural network. The Fourier-Mellin descriptors perform well with the images used in the tests. A problem is that the image must be presented with its centroid in the same position every time. Therefore, the algorithm is not translation invariant as demonstrated by the shifting experiments. Another drawback is that the

Volume 12, Number 10

PATTERN RECOGNITION LETTERS

information extracted is global, no local information is used. This tends to lead to a large amount of data containing little information about the actual image.

Acknowledgements The Science and Engineering Research Council (UK) and Inmos Ltd. are gratefully thanked for sponsoring this work.

References [1] Hu, M. (1962). Visual pattern recognition by moment invariants. IRE Trans. Inform. Theory 8, 179-187.

October 1991

[2] Hsu, Y.N., H.H. Aresnault and G. April (1982). Rotational invariant digital pattern recognition using circular harmonic expansion. Appl. Opt. 21, 4016-4019. [3] Khotanzad, A. and J. Lu (1990). Classification of invariant image representations using a neural network. IEEE Trans. Acou~'t. Speech Signal Process. 38 (6). [41 Wu, R. and H. Stark (1986). Three dimensional object recognition from multiple views. J. Opt. Soc. Am. 3 (9). [5] Rumelhart, D.E., G.E. Hinton and R.J. Williams (1986). Learning internal representations by error propagation. In: Parallel Distributed Processhtg: Explorations in the Microstructure o f Cognition, VoL 1 (MIT Press, Cambridge, MA, 1986). [61 Lippmann, R.P. (1987). An introduction to computing with neural nets. IEEE A S S P Magazine, April. [7] Fukunaga, K. (1972). Introduction to Statistical Pattern Recognition. Academic Press, New York.

643