Pattern Recognition 34 (2001) 1033}1046
Multi-cues eye detection on gray intensity image Guo Can Feng , Pong C. Yuen * Department of Computer Science, Hong Kong Baptist University, 224 Waterloo Road, Kowloon, Hong Kong Department of Mathematics, Zhongshan University, Guangzhou, People's Republic of China Received 21 June 1999; accepted 24 January 2000
Abstract This paper presents a novel eye detection method for gray intensity image. The precise eye position can be located if the eye windows are accurately detected. The proposed method uses multi-cues for detecting eye windows from a face image. Three cues from the face image are used. Each cue indicates the positions of the potential eye windows. The "rst cue is the face intensity because the intensity of eye regions is relatively low. The second cue is based on the estimated direction of the line joining the centers of the eyes. The third cue is from the response of convolving the proposed eye variance "lter with the face image. Based on the three cues, a cross-validation process is performed. This process generates a list of possible eye window pairs. For each possible case, variance projection function is used for eye detection and veri"cation. A face database from MIT AI laboratory, which contains 930 face images with di!erent orientations and hairstyles captured from di!erent people, is used to evaluate the proposed system. The detection accuracy is 92.5%. 2001 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Eye detection; Face recognition; Face detection; Eye variance "lter
1. Introduction Eye detection is a crucial step in face recognition and human computer communication. By determining the orientation of face and locating the positions of eyes, the gaze of face can be determined. In turn, the computer `knowsa where the operator is looking at. Many face recognition systems are based on facial features, such as eyes, nose and mouth, and their spatial relationship, called the constituted approach [1,2]. Eyes are the most important facial features. So the detection of eyes will be the "rst step in a recognition system. In the face-based approach, eye corners are important [1,3]. In this approach, faces have to be aligned before recognition. Moreover, eye corners are also the important landmarks in the face [4]. Many statistical-based face recognition systems, such as eigenface [3] or independent
* Corresponding author. Tel.: #852-2339-7811; fax: #8522339-7892. E-mail addresses:
[email protected] (G.C. Feng),
[email protected] (P.C. Yuen).
component analysis method [5] use eye corners for alignment. A brief review on existing eye detection methods is given in Section 2. Many eye detection methods have been developed in the last decade. The existing methods can be divided into two categories. The "rst category assumes that rough eye regions (which we called eye windows) have been located or there are some restrictions on the face image such that eye windows can be easily located. The detection of eyes in the face is then operated on the eye windows. Basically, if the eye windows can be located, the results are generally good. However, in practice, the eye windows cannot be easily detected. The second category started from a cluttered image. A face detection algorithm [6,7] is adopted to locate faces in images. Eye detection is then proceeded based on the detected face(s). As the face detection algorithms are mature, it is reasonable to adopt an existing algorithm for face detection. However, there are two problems. The "rst problem is that a face detection algorithm usually gives a rough estimation of face region. The accuracy in locating eye windows will be highly sensitive to (1) how accurately the face can be detected and (2) the hairstyle of the person. This problem
0031-3203/01/$20.00 2001 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 0 0 ) 0 0 0 4 2 - X
1034
G.C. Feng, P.C. Yuen / Pattern Recognition 34 (2001) 1033}1046
can be partially solved by using skin color information [8]. However, in many situations, such as image captured at night, color information cannot be obtained. The second problem is that even if the face can be accurately determined, the orientation of the face is not known. A common way to determine the eye windows is based on the anthropological human model [9]. This means that if we know the orientation of face, eye windows can be easily determined. Conclusively, if we can detect the eye windows from a face with unknown orientation and hairstyle, the precise position of eyes can be detected. This paper proposes the use of multi-cues face information to determine the eye windows. Variance projection function [10] is employed to detect the "nal position of the eye. The organization of this paper is as follows. A brief review on existing eye detection methods is given in Section 2. The proposed method is presented in Section 3 while the experimental results are discussed in Section 4. Finally, the conclusion is presented in Section 5.
2. Review on existing methods As mentioned in Section 1, detection of eye consists of two steps, namely, (1) locating face and eye windows, and (2) eye detection from eye window. A brief review on methods developed in each step is discussed below. Sung and Piggio [7] developed an example-based approach for locating vertical frontal views of human face in complex scenes. The basic idea is to model the distribution of human face by means of `facea and `non-facea images. As this method is developed for vertical frontal view faces, faces with other orientations cannot be detected. Rowley et al. [6] developed a neural network-based upright frontal face detection system. They employed a connected neural network to examine small windows of an image and decided whether each window contains a face or not. This system is further developed [11] to detect faces with di!erent orientations. Lam and Yan [2] and Yuen et al. [10] employed a snake model for detecting face boundary. The eye windows are then located based on the anthropological human model [9]. However, it is well known that a snake gives a good result in boundary detection if the initial position is close to the target. Moghaddam and Pentland [12] proposed to use principal component analysis for describing the face pattern with lower-dimensional feature space. Matsuno et al. [13] proposed a potential net to detect human face. The precise location of the face is determined by horizontal and vertical projections. Jeng et al. [14] adopted morphological operations and boost-"ltering to remove the e!ect of complex background. A run-length local table method is employed to the active pixels into maximum connected blocks. Moment is employed to determine the center and the orientation of each block. The connected
blocks will be the potential eye windows and a grouping algorithm is developed to determine the correct eye windows. All the methods mentioned in the previous paragraph are based on gray-level images. Sobettka et al. [15] showed that human skin color for all races is clustered in normalized RGB space. Saber and Tekalp [8] employed the skin color characteristics for face detection. Shape symmetry is used to verify whether it is a correct face region or not. Again, they focused on the frontal view face images. After locating eye windows from face, eye detection can be performed. Yuille et al. [16] "rst proposed the use of a deformable template in locating human eye. In this method, an eye model is designed and the eye position can be obtained through a recursive process. However, this method is feasible only if the initial position of the eye model is placed near the actual eye position. Moreover, deformable template su!ers from two limitations. First, it is computationally expensive. Second, the weighting factors for energy terms have to be determined manually. Improper selection of the weighting factors will provide an unexpected result. In view of these limitations, Lam et al. [17] introduced the concept of eye corners to guide the recursive process and partially solved the problems. In [17], Xie et al. [18] the corner detection algorithm is adopted. However, the detected algorithm is based on the edge image, while a good edge image is hard to obtain when the contrast of eye image is relatively low. In turn, the performance of the eye detection algorithm will be degraded. Along with the Lam et al. direction, Feng and Yuen [10] developed variance projection function (VPF) for locating the landmarks (corner points) of an eye. It is observed that some eye landmarks are with relatively high contrast, such as the boundary points between eye white and eye ball. The located landmarks are then employed to guide the eye detection process. Saber et al. [8] and Jeng et al. [14] proposed to use facial features' geometrical structure to estimate the location of eye whereas the precise locations have not been reported in their articles. Takacs et al. [4] developed iconic "lter banks, which are based on biologically motivated image representation, for detecting facial landmarks. The "lter banks are implemented via selforganizing feature maps (SOFM). The proposed method is generic for object detection. However, before detection, an object (such as eye and mouth) model has to be trained. For an object with di!erent orientations, di!erent models may be required.
3. Proposed method Currently, there are a number of promising face detection methods [6,7,19]. This paper therefore assumes that
G.C. Feng, P.C. Yuen / Pattern Recognition 34 (2001) 1033}1046
1035
Fig. 1. Block diagram of the proposed system.
(1) a rough face region has been located or the image consists of only one face, and (2) eye(s) in the face image can be seen. The proposed method is based on multi-cues from a face image and the block diagram is shown in Fig. 1. When a rough face region is presented to the system, an improved snake model [20] is adopted to locate the precise head contour. Very often, hair is included in the detected head contour. The second step is to locate the precise face region using morphological operations. To locate the exact eye windows, three cues are used. The "rst cue is the face intensity because the intensity of eye regions is relatively low. The second cue is the direction of the line joining the centers of the eyes which is determined using principal component analysis on the face edge image. The third cue is from the response of convolving the proposed eye variance "lter with the face image. We have developed an eye variance "lter for extracting potential eye windows. Based on the three cues, a crossvalidation process is conducted. This process generates a list of possible eye window pairs. Experimental results show that in most of the images, the number of possible cases is less than three. For each possible case, variance projection function [10] is adopted for eye detection and veri"cation. If eyes are detected in the eye windows, the detection process is completed. If not, the next possible case is tested. Details of each block are discussed as follows. In order to clearly present our proposed method,
Fig. 2. (a) Detected contour using the improved snake model; (b) detected head region for further processing.
the image shown in Fig. 2(a) will be used as an example for illustrating the process in each step. 3.1. Detect head boundary It is well known that snake is a popular method for boundary detection and it has been adopted in face boundary detection [2,17,21]. However, the performance of snake is sensitive to the initial position. In our case, as an approximated head region has been detected using existing face detection methods, we therefore can apply the snake model to locate the actual head location. Various snake algorithms have been developed. Each algorithm has its own characteristic, such as ability to
1036
G.C. Feng, P.C. Yuen / Pattern Recognition 34 (2001) 1033}1046
locate concave shape, convergence speed and computational time. In this paper, the improved snake algorithm proposed by Yuen et al. [20] is adopted. The main advantage of this algorithm is the fast convergence rate. The improved snake algorithm modi"es the internal energy term in the original snake model. The improved snake minimizes the distance between the boundary snake points and the boundary center, instead of the length of the boundary. The new internal energy is de"ned as follows: EGKN"a"l(s)!G"#b"l(s)", GLR
(1)
where a and b are normalizing factors. The detailed derivation can be found in Ref. [20]. Applying the improved snake algorithm on the image shown in Fig. 2(a), the detected head boundary is represented by the black line and is overlaid on the original image. The detected head region is cut and pasted on another region with black background for further processing, as shown in Fig. 2(b).
E eliminate the false regions created by the ear, head boundary and nose, etc., E the facial features will create some holes in the face skin region. We need to remove the e!ect caused by these holes. To remove those false regions, we employ the erosion and dilation process, which has been proved to be an e!ective morphological operation that is used to analyze regions of the binary image [22]. Suppose the object X and structure element B are represented as sets in 2D Euclidean space. Let B denote the translation of B so that its origin V is located at x. Then the erosion of X by B is de"ned as the set of all points x such that B is included in X, V that is, Erosion: XB8+x: B LX,. V
(2)
Similarly, the dilation of X by B is de"ned as the set of all points x such that B hits X, that is, they have a nonV empty intersection:
3.2. Locate face's skin region
Dilation: XB8+x: B 5XO,. V
The detected face boundary usually consists of the hair and the hairstyles are di!erent from person to person. The objective of this step is to locate the skin region from the detected face region and get rid of the e!ect of di!erent hairstyles. Under normal illumination, the facial features, such as eyes, nose and mouth, possess relatively lower graylevel. If we plot the intensity histogram of the image in Fig. 2(b), the result is shown in Fig. 3(a). Such a histogram can always be obtained because skin colour has a relatively high gray intensity while other facial components have a relatively low intensity. In this way, it is easy to "nd the threshold, which is marked by a `#a in Fig. 3(a). After thresholding, the result is shown in Fig. 3(b). It can be seen that, in order to obtain the face skin region, we have to solve the following problems:
In the proposed method, we adopt the set of structure
Fig. 3. Thresholding result.
(3)
z z z
elements B as zz zz zz . V Clearly, erosion makes an object (white region) shrink. The operation causes the background (black region) to expand and makes the thinner white region to separate. However, dilation makes the object expand. But it is not an inverse operation that completely recovers shrink by erosion. Erosion followed by dilation [22], denoted as (X * B)B, can smooth contours, suppress small islands of images. The result of applying erosion process 3 times on the image shown in Fig. 3(b) is shown in Fig. 4(c) while the intermediate results of the "rst and second iterations are shown in Figs. 4(a) and (b), respectively. Applying the dilation process on the image shown in Fig. 4(c) for 1,
Fig. 4. Face image by erosion after (a) "rst iteration, (b) second iteration and (c) third iteration.
G.C. Feng, P.C. Yuen / Pattern Recognition 34 (2001) 1033}1046
1037
Fig. 5. Face image by dilation after (a) "rst iteration (b) second iteration and (c) third iteration ("nal result).
2 and 3 times, the results are as shown in Figs. 5(a), (b) and (c), respectively. The image in Fig. 5(c) is the "nal result. After applying the erosion and dilation processes on the image, the face skin region is explicitly shown in Fig. 5(c). However, there may be small regions in the image. To determine the face skin region, we select the largest connected region as the face skin area in the image. 3.3. Determine the potential eye windows based on intensity The detected face skin region contains holes and some concave parts, which in fact, are facial features such as eyes, eyebrow, nose or mouth. This is because these facial features have a relatively low intensity compared with the skin intensity. After thresholding the face region, all these features will be classi"ed as low-intensity regions. When head is rotated in depth, one of the eyes and mouth will be close to the head boundary. After the open operations, eyes and mouth regions will be merged with the background and therefore concave parts are formed. Also, some of these features may be merged to form a new region. Therefore, these holes or concave parts are important facial features. An e$cient two-step algorithm is developed to recover convex shape as follows: 1. Represent the face region boundary by an 8-direction chain code. 2. Connect those concave points by a straight line in order to construct a convex shape and the result is shown in Fig. 6(a). In Rosen"eld et al. [19] the concave point detection algorithm is adopted. Let FR be the face skin region. The holes on region FR indicate potential facial features. In order to "nd the correct eye windows in a later stage, we label those holes as h , h ,2, h . In this example, m is equal to 3. This K information will be used in Section 3.6 for determining the actual eye windows.
Fig. 6. (a) Detected holes in face region. (b) Detected face skin region boundary (black line) overlaid on the original image.
3.4. Estimate the direction of the line joining the centers of the eyes The direction of the line joining the centers of the eyes (or eye windows) is a very useful information to select the possible eye window pairs. Based on the detected face skin region in the previous section, principal component analysis (PCA) is employed for estimation of the direction. It is well known that, in 2D geometrical shape, PCA can be used to detect principal directions of the spatial shape. Fig. 7 shows the face edge images and the principal directions of the images. As face is symmetric and there are a lot of edge points near eye positions, the "rst principal axis indicates the direction of the head while the second principal axis shows the direction of the line joining the two eyes. Applying the Sobel edge operator on the skin region in Fig. 6(b), the edge image is obtained as in Fig. 8. Let P (x, y) be a point on edge image in face region FR G (Fig. 8(b)), the spatial covariance matrix of (x, y), M is AMT given as
p M " VV AMT p WV
p VW , p WW
(4)
where p , p , p and p represent covariances between VV WW VW WV x and x, y and y, x and y, y and x, respectively. The eigenvalues of M , (j , j ), provide a reasonable AMT estimate of the FR in the directions of the corresponding eigenvectors (v , v ). The directions of eigenvectors indiV W cate the principal axes of the FR. Without loss of generality, we can assume that j )j . Letting a and a be the V W major axis and the minor axis, respectively, we have a "(kj , V
(5)
a "(kj W
(6)
1038
G.C. Feng, P.C. Yuen / Pattern Recognition 34 (2001) 1033}1046
Fig. 7. Principal axes of face skin region.
Fig. 8. Estimated direction of the line joining the eyes using PCA.
where the parameter k is simply estimated by the formula k"Area(FR)/p(j j . The major direction v represents the head-up direcW tion, while the minor direction v represents the eye V direction. The detected principal axes are shown in Fig. 8(b). It can be seen that the minor axis, v , shows the V direction of the eye. Meanwhile, the major and minor axes also provide an estimation of the face size. As human face is oval in shape, if the face region is approximated by an ellipse with axes a and a , the scale of the face is then normalized. W V 3.5. Locate potential eye windows using eye variance xlter In the previous sections, we have extracted information for the potential eye windows, namely, holes after opening operation and eye direction using PCA. This section provides an additional cue for determining the potential eye windows using eye variance "lter. Eye detection is a di$cult task because of two reasons. Firstly, the eye is an active model constructed by an eyeball and two eyelids. Secondly, the edge feature on this region is faint. Although a good edge image on eye region is hard to obtain, the change of gray intensity on this region is more obvious than other region on the human face. The variance, a statistic that describes the diversity of a random variable, on a domain is the
second-order moment which indicates the measurement of variation of gray intensity. Based on these observations, an eye variance "lter is developed. Applying the eye variance "lter to a face region, an obvious response in the eye region will be obtained while the response in a noneye region will be relatively low. The objective of using variance "lter is to extract the two eye regions with some false detection. A cross-validation by combining other evidence will be performed in the next section. 3.5.1. Construction of eye variance xlter Let I(x, y) be an eye image, the variance on a domain X, is de"ned as 1 pX " AX
[I(x, y)!IM X ] dx dy X
V WZ
or in discrete form as 1 pX " AX
[I(x, y)!IM X ] V WZX
(7)
where AX and IM X represent the area and average gray intensity on the domain X, respectively. It is shown that pX is independent of the spatial distribution of I(x, y). It only depends on the gray level changes. From the de"nition, the statistic pX has two properties. First, pX is rotation invariant in domain X. Second, pX re#ects gray intensity variations rather than the exact shape on the domain. For an image I(x, y), we de"ne its variance image as follows: I (i, j)"pXGH , X "+(i!1)l#1)x(il, N GH ( j!1)l#1)y(jl,
(8)
To construct an eye variance "lter, we select 20 eye images of size 28;28 with di!erent orientations from
G.C. Feng, P.C. Yuen / Pattern Recognition 34 (2001) 1033}1046
1039
Fig. 9. (a) Eye images, (b) non-eye images.
di!erent persons as a training image set, which corresponds to the "rst two rows in Fig. 9. The centers of the eyeballs are manually aligned. For each image, a 4;4 non-overlapped subblock is selected to calculate the variance. The variance images of size 7;7 are determined and shown in Fig. 10. The eye variance "lter F is C constructed by calculating the average of the variance images 1 , F " IG . C N N G The eye variance "lter is constructed and shown in Fig. 11(a) and its 3D plot is shown in Fig. 11(b). 3.5.2. Evaluation of the eye variance xlter To detect the potential eye windows using an eye variance "lter, a correlation is calculated between the "lter and each variance image block. The correlation is de"ned as follows: E(m !E(m NG ))(m C !E(m C )) ' $ $ R(I G , F )" 'NG N C (D(m NG )D(m C ) ' $
(9)
where m NG and m C are the concatenated vectors of the ' $ variance image I and F , respectively, I is an image NG C NG block, and E(.) and D(.) represent the mathematical expectation and variance of the random variable. Fig. 9 shows the images used to test the capability of the developed eye variance "lter F . Fig. 9(a) shows C the eye images with di!erent orientations from di!erent people while Fig. 9(b) shows the non-eye images. A correlation between the eye variance "lter and each image is calculated. The correlation results are plotted in Fig. 12. The threshold value can be determined and shown in the "gure. It can be seen that all eye images are greater than the threshold value. This means that the proposed eye variance "lter can extract all eye regions. However, the correlations of some non-eye images are also greater than the threshold. This is because the variance distribution of the eye is not unique. To detect the potential eye windows in the face region, the image shown in Fig. 6(b) is transformed into a variance image as shown in Fig. 13(a). To reduce the sensitivity of the size of the face, the face is normalized using Eqs. (5) and (6) before convolution. By convolving the eye variance "lter with the variance image, the response is as
1040
G.C. Feng, P.C. Yuen / Pattern Recognition 34 (2001) 1033}1046
Fig. 10. Variance images for images in Fig. 9, the intensity level represents the value of the variance.
Fig. 11. (a) Eye variance "lter and its 3D plot in (b).
shown in Fig. 13(b), in which the higher the intensity, the larger is the response, i.e. the higher the possibility of an eye. Using the threshold determined as shown in Fig. 12, the potential eye windows are located and represented by rectangular boxes in Fig. 13(c). In determining potential eye windows using an eye variance "lter, we aim to minimize the false acceptance rate with zero false rejection rate.
Fig. 12. The correlation between eye "lter and the testing images in Fig. 9.
3.6. Cross validation The previous three sections described the details in extracting three cues from the face image. This section combines these three cues and generates a list of possible
G.C. Feng, P.C. Yuen / Pattern Recognition 34 (2001) 1033}1046
1041
Fig. 13. (a) Variance face image, (b) response of eye variance "lter, (c) potential eye windows.
Fig. 15. Eye model. Fig. 14. Detected eye windows.
eye window pairs. A rule-based system is adopted to cross-validate each cue. The rules are as follows: 1. Both eye windows are located in the face skin region. 2. The eye direction should be close to the direction of the second principal axis v calculated in Section 3.4. V 3. The detected holes in Section 3.3 are overlapped with the potential eye windows detected using an eye variance "lter in Section 3.5. 4. Actual eye windows are usually located at the upper half of the face skin region (because very often, head image would not be up-side-down). By applying these rules for cross-validating the three cues, the eye windows are located as shown in Fig. 14(a). The `*a represents the peak response from the eye variance "lter. That is, probably, the potential eyeball location. 3.7. Locate precise eye position After locating the eye windows, the precise eye detection is proceeded using the variance projection function (VPF) [10]. VPF is proved to be orientation and scale invariant for eye detection. Suppose that the eye window is bounded by [x , x ] and [ y , y ]. Let I(x, y) be the intensity of pixel at location (x, y), <(x) and let H(x) be the average of vertical integral projection and horizontal
integral projection of I(x, y) on the intervals [x , x ] and [ y , y ]. The variance projection functions in the vertical direction p(x) and the horizontal direction p(y) are T F de"ned as follows: W 1 [I(x, y )!<(x)], p(x)" T G y !y WG W 1 V [I(x , y)!H(y)]. p(y)" F G x !x VG V An eye model, shown in Fig. 15, is used. Six landmarks in the eye model are de"ned. The eye model consists of three components, namely iris, upper eyelid and lower eyelid. The six landmarks are used to determine the positions of the three components of an eye. Iris is located using the landmarks P , P , P and P . The upper eyelid can be determined by P , P and P , and so on. Details of the algorithm can be found in Ref. [10]. Applying the VPF on the detected eye windows shown in Fig. 14 (a), the detected iris position is consistent with the response from the eye variance "lter. The detected iris location is marked by `*a as shown in Fig. 14(a). The eye corners of each eye are detected and marked in Figs. 14(b) and (c). 4. Results The proposed approach to extract each cue is discussed in Section 3. This section presents the
1042
G.C. Feng, P.C. Yuen / Pattern Recognition 34 (2001) 1033}1046
experimental results. The "rst part presents the result of processing another face image by the proposed method, not the one used in the previous sections. The second part uses more images to evaluate the proposed method. Finally, a comparison between the proposed method and the existing methods is discussed in part three. The proposed method is implemented using Matlab on a Pentium II-based computer. 930 face images from MIT AI laboratory with size 240;360 are used to test the proposed method. The detection accuracy is 92.5%. 4.1. Part I A lady's face image shown in Fig. 16(a) is used to test the proposed method. Using the improved snake algorithm, the head contour is detected and represented by the white line in Fig. 16(a). Applying the erosion and dilation processes on the head region, the result is as shown in Fig. 16(b). It can be seen that more holes are detected in this image. Applying the principal component analysis on the face skin region, the principal axes and therefore the direction of eyes are detected and shown in Fig. 16(c). Based on these two cues, there are three possibilities for the eye windows pairs, created from the eyebrows, eyes and the mouth. Applying the eye variance "lter to the face skin region, the potential eye windows are located and shown in Fig. 16(d). By combining these three cues, the "nal eye windows are detected and shown in Fig. 16(e), where `*a is the detected position of the eyeball. Using the variance projection function, the corners are located and shown in Fig. 16(f ). This example shows that by combining the three cues, the eye windows are accurately located. 4.2. Part II This part presents the results of applying the proposed method on more face images from di!erent persons with
di!erent orientations. First, four images with di!erent orientations as shown in Fig. 17(a) are selected. Applying the proposed method on these images, the three cues are extracted and displayed in Figs. 17(b)}(d). The "nal eye windows are then located and shown in Fig. 17(e) while the eye corners are detected and shown in Fig. 17(f ). This example shows that the eyeball and the eye corners are accurately detected on face images with di!erent orientations. Another 12 face images from di!erent persons with di!erent orientations and hairstyles are shown in Fig. 18. Applying the proposed method on these images, the eye windows are located and marked by rectangles. The detected eye corners are also shown in Fig. 18. This example shows that the proposed method is insensitive to di!erent hairstyles. 4.3. Part III This part gives a qualitative comparison between the existing eye detection methods and the proposed method. Five of the most recently developed methods are selected for comparison. The comparison focuses on the capability of the method in handling (1) frontal view face image, (2) face image with rotation in depth, (3) face image with on-the-plane rotation, (4) di!erent hairstyles and (5) locating precise iris and eye corner positions. The comparison is summarized in Table 1. 4.3.1. Frontal view images All methods are able to detect eyes from frontal view face images. 4.3.2. Face image with in-depth and on-the-plane rotation When there is a rotation in depth, the anthropological standard may not be preserved and therefore, the Lam et al. method and the Feng et al. method are sensitive to
Fig. 16. Another example showing the eye detection process.
G.C. Feng, P.C. Yuen / Pattern Recognition 34 (2001) 1033}1046
1043
Fig. 17. Applying the proposed method on face images with di!erent orientations.
the rotation in depth. Also, both methods assume that the face is upright. Therefore, both methods allow only small variations of on-the-plane rotation. As retinal "lter is designed for frontal image, the "lter may need to be redesigned for other orientations. Hence,
the Takacs et al. method is not able to handle both in-depth and on-the-plane rotation. The Saber et al. and Jeng et al. methods are based on face/facial features geometrical relationship. The relationship is invariant for on-the-plane rotation, therefore,
1044
G.C. Feng, P.C. Yuen / Pattern Recognition 34 (2001) 1033}1046
Fig. 18. Applying the proposed method on face images from di!erent persons with di!erent orientations and hairstyles.
Table 1 Comparison between existing methods and the proposed method Capability to handle with head variations
Lam et al. [2,17] Takacs et al. [4]
Saber et al. [8]
Jeng et al. [14]
Feng et al. [10]
The proposed method
Locate precise eye and eye corner locations
Approach
Frontal view
Rotation in depth
On-the-plane rotation
Di!erent hairstyles
Snake#anthropological model Retinal "lter banks implemented via a self-organizing feature map Color information and face geometrical information Frontal face model# relative geometrical relation between facial features Snake#anthropological model#variance projection function Multi-cues
(
;
Limited range
Moderate
(
(
;
;
(
(
(
Limited range
(
(
;
(
Limited range
(
(
;
(
;
Limited range
;
(
(
(
(
(
(
G.C. Feng, P.C. Yuen / Pattern Recognition 34 (2001) 1033}1046
these two methods are invariant to on-the-plane rotation. However, the relationship may not be preserved when there is a large rotation in depth. Therefore, we classify as `limited rangea. For the proposed method, all the three cues are insensitive to face orientation as long as the eyes can be seen. So the proposed method is able to handle both in-depth and on-the-plane rotation.
1045
E A wrong detection of eyebrows instead of eyes. In view of the above limitations, the future work will be concentrated on improving the accuracy by developing additional cues and enhancing the cross-validation rules.
Acknowledgements 4.3.3. Diwerent hairstyles The Feng et al. method uses snake to detect head boundary, which usually includes the hair. Therefore, their method cannot handle di!erent hairstyles. Although Lam et al. also employ the snake model for boundary detection, the inner face boundary, in which hair is not included, is detected. However, the details are not discussed in the articles. Therefore, we classify as `moderatea. Theoretically, both the Saber et al. and Jeng et al. methods are not sensitive to hairstyles as face geometrical information is used. However, these features are sensitive to the performance of face detection as well as facial feature detection. For the Takacs et al. method, the retinal "lter banks are insensitive to the hairstyles and therefore, their method can handle di!erent hairstyles. For the proposed method, as multi-cues are used for cross-validation, the proposed method is insensitive to the hairstyle. 4.3.4. Locate precise eye and eye corner locations The Lam et al., Takacs et al. and Feng et al. methods give the precise iris and eye corner locations while the Saber et al. and Jeng et al. methods provide only a roughly estimated location of eye. The proposed method makes use of the variance projection function and eye variance "lter to locate the iris and eye corner positions.
5. Conclusions A robust eye detection method is reported in this paper. The proposed method makes use of multi-cues extracted from a gray-level image to detect the eye windows. The precise iris and eye corner locations are then detected by variance projection function and eye variance "lter. The proposed method is insensitive to the face orientation and hairstyle of the person, and is capable of locating the precise locations of iris and eye corners. The proposed method has been tested by 930 face images from MIT AI laboratory and the detection accuracy is 92.5%. Experimental results show that the false detection is mainly due the following situations: E Parts of the eyes are covered by hair. E When there is a rotation in depth (facing downwards), eyes are almost closed.
This project was supported by the Science Faculty Research Grant, Hong Kong Baptist University. The testing images are downloaded from MIT AI laboratory at ftp site `ftp.ai.mit.edu/pub/users/beymera. The authors would like to thank Dr. J.H. Lai for useful discussion and Ms. Joan Yuen for proofreading this manuscript.
References [1] R. Chellappa, C.L. Wilson, S. Sirohey, Human and machine recognition of faces: a survey, Proc. IEEE 83 (5) (1995) 705}740. [2] K.M. Lam, H. Yan, An Analytic-to holistic approach for face recognition based on a single frontal view, IEEE Trans. Pattern Anal. Mach. Intell. 20 (7) (1998) 673}686. [3] M. Turk, A. Pentland, Eigenfaces for recognition, J. Cognitive Neurosci 3 (1) (1991) 71}86. [4] B. Takacs, H. Wechsler, Detection of faces and facial landmarks using iconic "lter banks, Pattern Recognition 30 (10) (1997) 1623}1636. [5] M.S. Barlett, T.J. Sejnowski, Viewpoint invariant face recognition using independent component analysis and attractor networks, Neural Inform. Process, System-Natural Synth. 9 (1997) 817}823. [6] H.A. Rowley, S. Baluja, T. Kanade, Neural network-based face detection, IEEE Trans. Pattern Anal. Mach. Intell. 20 (1) (1998) 23}38. [7] K.K. Sung, T. Poggio, Example-based learning for viewbase human face detection, IEEE Trans, Pattern Anal. Mach. Intell. 20 (1) (1998) 39}51. [8] E. Saber, A.M. Tekalp, Frontal-view face detection and facial feature extraction using color, shape symmetry based cost functions, Pattern Recognition Lett. 19 (1998) 669}680. [9] M. Vezjak, M. Stephancic, An anthropological model for automatic recognition of the male human face, Ann. Hum. Biol. 21 (4) (1994) 363}380. [10] G.C. Feng, P.C. Yuen, Variance projection function and its application to eye detection for human face recognition, Pattern Recognition Lett. 19 (1998) 899}906. [11] H.A. Rowley, S. Baluja, T. Kanade, Rotation invariant neural network-based face detection, Proceeding of Computer Vision and Pattern Recognition, 1998. [12] B. Moghaddam, A. Pentland, Probabalistic visual learning for object recognition, IEEE Trans. Pattern Anal. Mach. Intell. 19 (7) (1997) 696}710.
1046
G.C. Feng, P.C. Yuen / Pattern Recognition 34 (2001) 1033}1046
[13] K. Matsuno, C. Lee, S. Kimura, S. Tsuli, Automatic recognition of human facial expressions, Proceedings of ICCV, 1995, pp. 352}359. [14] S.-H. Jeng, H.Y.M. Liao, C.C. Han, M.Y. Chern, Y.T. Liu, Facial feature detection using geometrical face model: an e$cient approach, Pattern Recognition 31 (3) (1998) 273}282. [15] K. Sobottka, I. Pitas, Extraction of facial regions and features using colour and shape information, Proceeding of ICIP 3 (1996) 483}486. [16] A.L. Yuille, D.S. Cohen, P.W. Hallinan, Feature extraction from faces using deformable templates, Proceeding of CVPR'89, 1989, pp. 104}109. [17] K.M. Lam, H. Yan, Locating and extracting the eye in human face images, Pattern Recognition 29 (5) (1996) 771}779.
[18] X. Xie, R. Sudhakar, H. Zhunag, On improving eye feature extraction using deformable templates, Pattern Recognition 27 (1994) 791}799. [19] A. Rosenfeld, E. Johnshon, Angle detection on digital curves, IEEE Trans. Comput 22 (1973) 875}878. [20] P.C. Yuen, G.C. Feng, J.P. Zhou, A contour detection method: initialization and contour model, Pattern Recognition Lett. 20 (2) (1999) 141}148. [21] P.C. Yuen, G.C. Feng, Automatic eye detection for human identi"cation, Proceedings of the 10th Scandinavian Conference on Image Analysis (SCIA'97), Vol. 1, Finland, 1997, pp. 293}300. [22] J. Serra, Image Analysis and Mathematical Morphology, Vol. 1, Academic Press, London, 1988.
About the Author*G.C. FENG received his M.Sc. degree in bio-mathematics from Zhongshan (Sun Yat-Sen) University, China in 1988 and Ph.D. degree in Computer Science from Hong Kong Baptist University in 1999. Currently, he is a lecturer in the department of Mathematics, Zhongshan University. His current interests include pattern recognition and computer vision. About the Author*P.C. YUEN received his B.Sc. degree in electronic engineering with "rst class honours in 1989 from City Polytechnic of Hong Kong, and his Ph.D. degree in Electrical and Electronic Engineering in 1993 from The University of Hong Kong. Currently, he is an Associate Professor in the Department of Computer Science, Hong Kong Baptist University. Dr. Yuen was a recipient of the University Fellowship to visit The University of Sydney in 1996. He was associated with the Laboratory of Imaging Science and Engineering, Department of Electrical Engineering. In 1998, Dr. Yuen spent a 6-month sabbatical leave in The University of Maryland Institute for Advanced Computer Studies (UMIACS), University of Maryland at College Park. He was associated with the Computer Vision Laboratory, CFAR. His major research interests include human face recognition, signature recognition and medical image processing.