Face recognition using perspective invariant features

Face recognition using perspective invariant features

September 1994 Pattern Recognition Letters i-- Ipj n SFv II-It patent Recognition Letters 15 (1994) 877-883 Face recognition using perspect...

477KB Sizes 4 Downloads 174 Views



September

1994

Pattern Recognition Letters

i--

Ipj

n SFv II-It

patent Recognition Letters

15 (1994) 877-883

Face recognition using perspective invariant features M .S. Kamel a, *, H.C. Shen a

a,

A.K.C. Wong a, T.M. Hong a, R .I. Campeanu "

PAMI Group, Department of Systems Design Engineering, University of Waterloo, Waterloo, Ontario, Canada .N2L 3G I

Department of Computer Science, Glendon College, York University . Toronto, Ontario . Canada,MI4,N 3116

Received 6 January

1993 ;

revised

8 July 1993

Abstract Although the analytic methods based on front view facial images seem to he the best candidate for a practical facial recognition system, the existing attempts to employ these methods are not very successful . .A major obstacle is the fact that these attempts do not consider perspective variations . The present paper describes an approach which makes full use of the invariant properties of a number of interfeature point distances . Our method, which considers both rotation invariance and feature normalization, achieves a very good retrieval rate when searching for an individual for whom the database contains pictures taken in different postures .

1 . Introduction Facial identification is a natural human skill . We recognize faces even under different variations and alterations such as aging, hair style, or makeup . Machine recognition of faces is however still in its infancy . The existing approaches that may be used for the automatic facial recognition can be classified according to several criteria such as their methodology (holistic or analytic), their level of automation, or their capability to recognize faces under different observation angles . The holistic methods emphasize the global properties of the form of the pattern . Some were based on neural networks (Kohonen, 1984 ; Stonham, 1986), others on the analysis of the isodensity lines from the full facial images (Sakaguchi et al ., 1989) . The neural network approach requires large training instances consisting of different views of each individual who Corresponding author . 0167-8655/94/$07 .00 © 1994 Elsevier SSDI0167-8655(94)00047-7

Science

B .V . All

is to be recognized . The approach based on the analysis of the isodensity lines requires all images to be taken at a precise face-on position and under a special illuminating device . A recent holistic approach (Turk and Pentland, 1989, 1991) is based on projecting face images onto a feature space which is defined by the eigenvectors of the set of faces . The analytic methods concentrate on the study of spatial domain feature extraction . In this approach the pictures are stored together with a set of fiducial features, extracted manually or automatically by an image processing system . These features will be used by a search method to retrieve candidates from the image database . .A number of studies have been conducted on the recognition of the facial profiles (Kaufman and Breeding, 1976 ; Harmon et al . . 1981 ; Samal and Iyengar, 199'_) but the analytic methods based on front view facial images seem to he the most suitable for a practical recognition system from a large image database . In spite of a relatively large number of attempts the performance of the existing systems is

rights reserved



878

M .S. Kamel et at /Pattern Recognition Letters 15 (1994) 877-883

rather limited . One of the major obstacles is the fact that the number of features that can be automatically extracted by the system is small (Sakai et al ., 1972 ; Tsui, 1989 ; Nixon, 1985 ; Craw et al ., 1987 ; Wong et

tical facial recognition system using a very large photo database .

al ., 1989a ; Lim et al . . 1992 ), The difficulties encoun-

2 . Quantitative facial features

tered in the automatic feature extraction lead to the development of several interactive systems . Such systems are of use in assisting police investigations in identifying individuals from a set of descriptions of qualitative features, which are combined with some quantitative features extracted from photos (Batten et al ., 1978 ; Shepherd, 1986 : Riccia and Iserles . 1978 ; Goldstein et al . 1972) . These interactive systems arc the only computerized face recognition systems that are used in practical applications . They all imply extensive work by trained operators . The digital image processing component plays generally a minor role . Even the recent system of Shepherd (1986) uses the image analysis component only to check on the ratings given by a group of trained judges . Another drawback of these systems is that, although they include geometric measurements and account for the distance between the camera and the subject, they are unable to take into consideration perspective variations and the invariant properties of the interfeature point distances . Particularly in the case when the image database contains different views of the individual who is to be identified, perspective variations become extremely important . Our work addressed all these problems . We introduced a semi-automatic feature extraction method, which significantly reduces the role of the operator . Our transformation and matching techniques used some of the recent developments in the image database management . These and many other details of our system were presented elsewhere (Kamel et al ., 1993 ) . In this paper we concentrate on the geometric facial features component of our analytical face recognition system . A general discussion of the facial features employed in our system appears also in (Kamel et al ., 1993) but here we give more details . In particular, we shall discuss the invariant properties of our set of interfeature point distances . We believe that the use of these interfeature point distances, coupled with a fully automatic feature extraction method, will be the key elements in the development of a prac-

In this section we discuss the facial interfeature distances which could be selected for candidate screening and identification . We called these distances quantitative features . Qualitative features such as the size of the nostrils and the ears position, although considered in our work, are discussed only in (Kamel et al ., 1993) . While it is no doubt that the interfeature distances are extremely effective for subject comparison and identification, it is not very clear what is the number of distances which uniquely characterize a human face . This is because the spatial relation of distinct facial features usually furnishes highly redundant information in the identification of a subject . In our early study (Wong et al ., 1989b), based on the existing literature, we considered 14 interfeature distances, which implied 23 feature points . At the end of our project only 7 feature points were proven sufficient for the identification of a face out of a group of over 80 faces. We expect that for larger databases the number of important feature points could be larger . Since in practice, the same posture of a subject cannot be easily obtained in a photograph or a live image, our set of features has to be perspective invariant . As it will be shown in the next section, we employed a new feature configuration, "invariant" to perspective transformation, developed for recognizing perspective invariant features in 3-D vision . This configuration employs a "Cross Ratio" which is independent of both spatial and facial expressions . Based on the deviation of certain measurements from the symmetry, the rotation of the head relative to a vertical plane can be estimated and the projection of a 3-D feature configuration on an imaginary plane can be recovered and used for identification . The "Cross Ratio" together with random graphs (Wong ct al ., 1990) can also he used to reduce the search space and enhance the effectiveness during the retrieval process . Hence, in an image, essential human face features which usually vary with posture variations can be recovered . Experiments have demonstrated the

M.S. Karnel e1 al. /Pattern Recognition Letters 15 (1994) 877-885

effective use of such feature configurations . Another point of concern is that the distance of the subject to the camera, or the size of the digitized photo, will play no role in the recognition process . The solution to this problem is the normalization process, discussed in (Kamel et al ., 1993) . This normalization, together with the perspective invariance of the chosen features will also ensure that the positioning angle of the photo does not influence the recognition process . Fig. I shows the set of facial features which collectively defines a configuration which possesses the perspective invariant characteristics . Their extraction and measurement from an image are relatively easy to accomplish . They can be used for matching the subject in an image with potential candidates retrieved from the database if the orientation of the head of the subject or the candidate facing the camera is less than 4) degrees . The seven feature points (A, B, C, D, F., F, G in Fig . 1) arc extracted from each picture through a semi-automated process which involves 3 stages :

A,B,C,D : Four eye comers, E : Base point for nose, F,G : Mouth comers Fig. I . A set of feature configuration .

879

1, Either the image of a human subject or a photo of the subject is digitized by a CCD camera . Each digitized image is stored into an image database and it is displayed for the features capture . Next to the image the system presents a features list window containing the feature names and X-Ycoordinates . 2 . The user indicates which feature is to be located on the image by clicking the mouse on the appropriate feature name (in the feature list window) . As a result, a cross-hair marker is positioned in the vicinity of the chosen feature point . 3 . The operator clicks the mouse, after having established the exact position of the feature point . At that moment the X- and Y-coordinates are entered into the feature list window.

3. Perspective invariance In the present work we assumed that the images satisfy the following requirements : 1 . the rotation of the head about the X-axis is not significant ; 2 . the four corners of the eyes ( 4, B . C . D) are colinear to form a line (D, ) : 3 . the line (D 2 ) passing through the mouth corners F& G and (D,) will form a plane


880

M.S. Kamel et al . /Pattern Recognition Letters 15 (1994) 877-883

D3

Fig. 3 . Feature points and interfeature distance used in the recognition process .

Fig . 2 . Determination

of feature

points for recognition .

d(II, B) =d(11, C) =d;, d(H, A) =d(H, D) =d, This implies that

opposite : spatially invariant, but affected by the facial expression . We will not consider them in the present work . Since in most of the cases, the human head posture is captured under rotations about the Y-axis, the vertical measurements will be quite invariant from one photo to another . The horizontal ones are the most distorted by the Y-axis rotation . Yet, a very effective method to recover this information will be presented in this section . Given that the "Cross Ratio" of any four points on a line is completely independent of any perspective transformation, the one R(A . B, C, D)=

d(A .. C)*d(B, D) d(A, D) .d(B. C)

on line (D, ) will constitute an invariant feature . We have denoted the distance between the points A and C with d(A, C), the distance between B and D with d(B, D) etc . Even in reality, the left and right parts of the human face are not perfectly symmetric . However, the following hypothesis is adopted in our measurement estimation (Fig . 3) :

R(A,B,C,D)=(d;+do)2/4xd;Xd,, =(1+/3)e/4X/3 where (3=d;/do will provide a first estimate ,8, of 3. Moreover, R(A,B,H,C)=R(D,C,H,B)=A2+~d,

l+#

implies a second estimate $2 which can be expressed as (32

I I R(D , C, H, B )-I R(A,B,H,C)+

N°(fl, +R2)/2, When the human head is turned to one side (for example to the right and less than 45 degrees), the distance d (A C) will be much more precise than d (B, D) and will nearly stay invariant as shown in Table I_ Thus, only corrections on points B and D are necessary to recover the original distances d ( C, D') and d(B', D') (all the others can be easily deduced from these two distances by applying the symmetry on the four eye corners) .

88 1

U.S. Kamet el at I Pattern Recognition Letters 15 (1994) 877-883

Table I Invariant properties of various interfealure point distance measures ; all distances are given in millimeters AC

AD

BD

PG

ILFI

HE

Pos .

Candidate

56 .06 56 .07 54 .08 56 .08

83 .02 83 .08 73 .17 83 .08

56 .01 56 .07 44 .17 56 .08

44 .00 44 .03 39 .20 44 .51

50 .50 50 .48 52 .28 48 .96

3L46 31.45 34 .57 3 2 .3'

FrtB FrtA T45B T45A

Andrew Andrew Andrew Andrew

FrtB, FrtA : Frontal picture Before and After applying the Cross Ratio correction T45B, T45A : Turning at 45 degrees Before and After applying the the Cross Ratio correction

Let D' . B' be the original points so that

fd(il .D')=d(H,D)+Xd, ld(B', D') =d(B . D)+Xd -Xh . Then from d(H,D')ld(B',D')=dol(do+d)=1/0 +R)=a z (d(H, D)+Xd)/(d(B, D)+Xd -Xb ) and from d(B', D' )=d(B, D)+Xd -Xb =d(A, C) , it follows that a system of two equations of two unknowns xd and X„ J(1-a)Xd +aXb ~ad(B,D)-d(H,D), lXd -Xb ~d(A,C)-d(B,D) can be solved to infer the original distances d(C, D' ) and d(B',D') (for d(CD')=d(A,B')=d(A,B)+ Xb ) .

Now .. let k=d(A, D' )/d(A, D) . Then the original distance D( F' . G') can be expressed as d(F', G') =kxd(F, G) . Due to the fact that the base point E under the nose is not necessary on the plane

determined by A, B . C, D, F. G, the real distance d(H. E') is obtained by projecting d(H, F) on the symmetry axis (D3 ) (Fig. 4) . In summary, the four most reliable quantitative features consist ofd (A, C'), d (A, D'), d(H, E') and d(H, M) . These features are normalized in order to account for different exposure distances between the subjects and the camera .

Fig. 4 .The actual eye-nose distance. 4 . Results Details about the way in which the feature information is stored in our database and about the possible operation modes of our system are presented in (Kamel et al ., 1993) . This paper deals only with the experiment in which the retrieval is made by similarity of a given image. When the candidate retrieval and identification is required, the search subject image is captured . the features extracted and the screening and resemblance ranking procedures are executed (Kamel et al ., 1993) . In this case, images will be retrieved from the database based on the level of retrieval that is required and the type of feature measures that are supplied . The program typically produces the 10 most likely matches (the number 10 can be modified by the user), based on the City Block norm defined similar-



882

M.S. Kamel et al. /Pattern Recognition Letters 15 (1994)8",7-883

ity measure . The system also allows the user to request additional candidates . Each search presents a list of image file names of the potential candidates in descending order of the similarity measures (also shown in the list) . The user can then display the search subject image and the candidate images and make the final decision . Our tests were executed with a set of 84 images: 46 were obtained by digitizing existing photos, while 40 were obtained by capturing and digitizing the images of a number of subjects . For each of these subjects a number of pictures were taken, each corresponding to a different posture . The tests were all based on the "hold-one-out" idea, i .e ., one of the 84 images was the search subject and the database searched had 83 images . In the list of the 10 candidates which match the subject the similarity measures varied depending on the search subject . In some cases all the similarity measures were relatively small, in others the similarity measures for some of the candidates were fairly large (i .e., they were very unlikely candidates) . We noticed that if for a certain search subject the database contained images of the same person but in different postures, the search will always put the corresponding images in the candidates list . In 95% of cases the person was in the top 4 on the similarity list, in 86% of the cases the person was in the top 2 on the list and in 66% of the cases, the person was number I on the list . Our review of the existing analytic systems shows that no other projects contain perspective invariant features and therefore there are no similar results to compare our data against . The only other face recognition systems allowing for perspective variation were of holistic type, but in those cases the number of training pictures was extremely large (Kohonen, 1984 ; Stonham, 1986) . In our system, good results are obtained even if the database contains a single picture corresponding to the search subject in a different posture .

that the number of the facial features that can be automatically extracted by the system is too small to replace the manual extraction, and (b) the lack of facial features that are invariant to perspective and aspect angles . Although we have made considerable progress in the user-machine interface (Kamel et al ., 1993), our system is still semi-automated . Much more work is required in order to provide the system with a fully automated features capture mechanism . Our paper solves however the second problem . The present paper presents for the first time an analytic approach based on front view facial images which makes full use of the invariant properties of a number of intcrfeature point distances . Together with feature normalization this capability makes our approach very reliable, particularly when used to recognize individuals who are in different postures in the photos stored in the database . The key element in our approach was the use of the Cross Ratio correction as explained in the section on the Perspective Invariance . Table I showec that the inclusion of this correction snakes the rele nt interfeature point distances nearly rotation invariant . One could slightly improve on this performance, by deriving a set of features which is perspective invariant relative to rotations not only about the Y-axis, but also about the X-axis . While large rotations about the X-axis are not very usual in practical situations, our present results show that small rotations about the X-axis do not influence the recognition rates significantly .

Acknowledgement

This work was supported in part by a research contract from the IBM Toronto Lab . The authors would also like to acknowledge the development effort of the following graduate students at the University of Waterloo : Keith Chan, Lian Guan, Bruce McArthur, Glen Newton, Kim Nguyen and Queintin Tang .

5 . Conclusion The analytic method based on front view facial images seems to be the best candidate for a practical automatic face recognition system . The previous attempts encountered two major problems : (a) the fact

References Batten, G .W ., Jr. and B .T. Rhodes, Jr . (1978) . UHMFS: The University of Houston MUG File Systein . Proc . 1978



ALS. Kamel et a!. /Pattern Recognition Letters 15 (1994) 877- 8 8 3 CarnahanGoof on ('rime Counter Measures, Kentucky, May

1978,15- 2_6 . C I ., H . Ellis and 1 .R . Lishman (1987) . Automatic extraction of face features . Pattern Recognition Lett . 5, 183-187 . Goldstein, A .J ., L .D . Harmon and A .B . Less (1972) . Manmachine interaction in human-face identification, Bell Spst. Tech . J. 5l. 399-427 . Harmon, LD ., M .K . Khan and P .F . Rating (1981 ) . Machine identification of human faces. Pattern Recognition 13 (2), 97-I10 . Kamel, M .S ., H .C . Shen, A .K .C. Wong and R .I . Campeanu (1993) . System for the recognition of human faces . !BM Spat. J . 32 (2), 307-320 . Kaufman, G .J ., Jr . and K .J . Breeding (1976) . The automatic recognition of human faces from profile silhouettes . IEEE Trans. Sist . Man C'ybernet. 6, 113-121 . . Kohonen . T . (1984) . Self-organization and Ascociative Memory Springer, Berlin . Lim, K .M., Y.C . Sim and K .W. Oh (1992) . A face recognition system using fuzzy logic and artificial neural network . 1992 IEEE Internal. Conf on Fur-y Systems, 1063-1069. Nixon, M . (1985) . Eye spacing measurement for facial recognition . Mot . SPIE In! . .Soc . Opt. Eng. 575, 279-285 . Riccia, G .D. and A . Iserles (1978) . Automatic identification of pictures of human faces. Proc . 1977 Carnahan Goof on Crime Counter Measurer. Kentucky, May 1978, 145-148 . Sakaguchi, T . . O . Nakamura and T . Minami (1989) . Personal identification through image using isodensily lines . SPIT 1199, 643-654 . Sakai, T., M, Nagao and T . Kanade ( 1972) . Computer analysis

883

and classification photographs of human lace . Proc . Ist USAJapan Computer Con(, 55-62, Samal, A . and P .A . Iyengar (1992) . Automatic recognition and analysis of human faces and facial expressions : a survey. Pattern Recognition 25 (1), 65-77Shepherd, 1 .W . (1986) . An interactive computer system for retrieving faces . In : H .D . Ellis, M .A . Jeeves, F . Newcomve and A . Young, Eds ., Aspects of Face Processing . Nijhoff, Dordrecht/Boston,398-409 . Stonham . T .J . (1986), Practical face recognition and verification with WISCAD. In : H.D . Ellis, M .A. Jeeves, F . Newcomve and A . Young, Eds . . Aspects of Face Processing . Nijhoff. Dordrecht/Boston . 426-441 . Tsui . K.K . (1989), Computer Recognition of Human Face . PhD Thesis, School of Electrical Engineering, University of Sydney . Turk, M . and A . Pentland (1989) . Eigenfaces for Recognition . SPIE 1192, IRCI' VII, 22-32 . Turk, M . and A. Pentland (1991) . Eigenfaces for recognition . .1 . Cognitive Neuroscience . March . Wong, K ., H . Law and P . Tsang (1989a) . A system for recognition human faces . Proc. IC' SSP, May 1989 . 1638-1642 . Wong, A .K .C., M .S . Kamel and H .C. Shen (1989b) . Feature extraction and recognition techniques for content based retrieval in image databases . Part I : a semi-automated system, Progress Reports I and II . Wong, A .K.C . . J. Constant and M .L. You (1990) . Random graphs . In : H . Bunke and A . Sanfeliu, Eds. . Somatic and Structural Pattern Recognition - Fundamentals, .ldvanees, and .4ppliicalions . World Scientific, Singapore .