Optics and Lasers in Engineering 30 (1998) 305—314
An automatic human face recognition system Haisong Liu*, Minxian Wu, Guofan Jin, Gang Cheng, Qingsheng He Department of Precision Instruments, Tsinghua University, Beijing, 100084, People+s Republic of China Received 21 January 1998; accepted 26 March 1998
Abstract In this paper, we develop an automatic human face recognition system in which several well-known technologies, such as the affine transform, the histogram equalization, the spatial encoding, and the hit—miss transform, are incorporated in a new form. The input image is optically correlated with stored database images consecutively by a shadow-casting correlator after necessary preprocessing steps digitally processed by a computer. A SHARP QA-1200 TFT computer projection panel is used as two spatial light modulators (SLMs) for both the input and the database images. The optical structure is optimized for practical use given the limitations of the present devices. 200 face images have been stored in computer database. The system can now reach an recognition accuracy over 94% and a speed of 12 images/s. The tolerance ability to various distortion variances has been tested. ( 1998 Elsevier Science Ltd. All rights reserved.
1. Introduction Human face recognition is an active area of research spanning several disciplines such as image processing, pattern recognition, computer vision and neural networks. A critical survey of existing literature on this technology has been presented by Chellappa [1]. Most researches have concentrated on the algorithms of segmentation and feature extraction of human faces, which are generally realized by electronic computers. However, many commercial and law enforcement applications of human face recognition need to be high-speed and real-time, such as passing through customs quickly while ensuring security, and the work using computers is only time consuming due to the intensive calculation task. The Optical system, with its inherent twodimensional parallelism, is a promising method to realize real-time face recognition.
*Corresponding author. E-mail:
[email protected]. 0143-8166/98/$19.00 ( 1998 Elsevier Science Ltd. All rights reserved PII: S 01 4 3-8 1 66 ( 9 8) 0 0 02 2 - 0
306
H. Liu et al. /Optics and Lasers in Engineering 30 (1998) 305—314
Li et al. [2] have applied an optical network for face recognition by using a photorefractive correlator. Javidi et al. [3] have used nonlinear joint transform correlators to implement optical neural networks for this work. These neural network approaches are only implemented on limited datasets. In this paper, we develop an optoelectronic hybrid system for human face recognition, combining the advantages of massive parallelism of the optical system and programmability of the electronic computer. The optical module in this system is an incoherent shadow-casting correlator, on which the morphological hit—miss transform (HMT) can be optically implemented [4, 5]. The HMT is a strict template matching process and can be regarded as an improvement to the pattern recognition ability of the conventional correlation methods [6]. In a previous paper, we have optically implemented the HMT for human face recognition [5]. However, the recognition process cannot be automatically realized and the tolerance ability to distortion variances of real-time faces was limited because several critical preprocessing steps had not been achieved at that time. In this paper, the whole system has been built and the automatic recognition for real face can be achieved. We will describe the flow of the whole system, several critical preprocessing steps, the optimum design of the optical structure for practical use purpose given the limitations of the present devices, and some related properties of the liquid crystal display (LCD) panel used as a light intensity modulator in the optical correlator. Experimental results and conclusions are given at the end of the paper.
2. Flow of the automatic human face recognition system Fig. 1 shows the flow diagram of the automatic human face recognition system. The input image of the real-time human face captured by a CCD camera will first be preprocessed and encoded by a computer software, then be consecutively correlated with the database images stored in the computer memory in advance by an optical correlator, and finally the recognition results are given. The main difference between this optoelectronic hybrid system and those computer-software-based systems is that an optical correlator is used to undertake the extensive computational task of correlation so that the processing speed can be greatly improved. The key device in the optical correlator is the real-time spatial light modulators (SLMs). Liquid crystal (LC) devices such as LCTVs and LCD panels can be used for this purpose. The preprocessing module (see the upper part of Fig. 1) includes illumination normalization, feature points extraction, posture normalization, and spatial encoding. When a real-time face image is picked up by the CCD camera, the original image will first be normalized to a uniformly illuminated image by histogram equalization [7], which is a well-known digital image processing method used to correct the illumination variations. In histogram equalization, the pixel values are transformed so that they have a uniform gray-scale distribution. Then a simple quadratic-differentialbased algorithm is used to extract three reference points on the face, including two internal corners of both eyes and the mouth center. We don’t use some advanced algorithms for feature points extraction (refer to Ref. [1]) for the consideration of
H. Liu et al./Optics and Lasers in Engineering 30 (1998) 305—314
307
Fig. 1. Illustration of the automatic human face recognition system; the upper part illustrates the flow diagram of the prepoccessing module and the left part illustrates the folded structure of the incoherent shadow-casting correlator.
reducing the computational complexity. Based on the three points, the face image is translated, scaled, and rotated through an affine transform [8] so that the reference points are in a specific spatial arrangement with constant distances. The posturenormalized image will then be encoded by extensive complementary encoding [9] and then displayed on a liquid crystal display (LCD) panel to be optically correlated with the database images consecutively by the optical correlator. The encoding step has two goals. One is to represent the gray-scale image with an encoded binary image because the optical correlation between two binary images is more accurate than that between two gray-scale images. The other more important function is to improve the discrimination ability of the optical correlation because the extensive complementary encoded image incorporates both the foreground and background information of the original image. In the extensive complementary encoding method, each pixel is represented by two cells. Pixels whose gray-scale level (GSL) is less than a chosen threshold are represented with the code shown in Fig. 2a, in which the left cell is bright (GSL"255) and the right one is dark (GSL"0). Pixels whose GSL is greater than the threshold chosen are represented with the code shown in Fig. 2b, in which the left cell is dark (GSL"0) and the right one is bright (GSL"255). Using extensive complementary encoding, the foreground and the background of an original gray-scale image are
308
H. Liu et al. /Optics and Lasers in Engineering 30 (1998) 305—314
Fig. 2. The extensive complementary encoding method.
combined into an encoded binary image, so that the pattern recognition ability of the optical correlation can be improved. This is actually the optical implementation of the morphological hit—miss transform by using one optical correlator [10, 11]. The most important unit in the optoelectronic hybrid system is the optical correlator. Coherent optical correlators are known to be sensitive to misalignment of the system components and to coherent noise [12]. Incoherent optical correlators are more suitable for practical use for some advantages over their coherent counterparts, for example, less expensive light sources such as LEDs and white light sources, a wider variety of input objects such as CRT and TV screens, less strict quality requirement of optical components, less critical alignment accuracy requirement between the input and the database image, multi-channel redundancy easily exploited to suppress noise, and twice the space-bandwidth product of the coherent optical correlator for a given numerical aperture and input image size [13]. Furthermore, as a commercial product for the projection display use, the LCD panel used as the real-time SLM in the optical correlator, is designed to operate with the incoherent light where the phase modulation capability is not an important issue, while the phase modulation is more important than the amplitude modulation in coherent systems. Although many papers [14—16] proposed some methods to implement complex modulation, the amplitude-phase coupling is still a difficulty in the coherent systems. The incoherent optical correlator can accommodate this characteristic since the amplitude can be controlled without regard to the phase. Brasher [13] and Takaki [17] have used the incoherent spatially matched filtering system with a phase-only filter for pattern recognition, respectively. However, a phase-only filter has a lower pattern discrimination ability than a complex filter in the incoherent matched filtering system and they have to use some optimization algorithms to optimize the phase-only filter to improve the discrimination ability. In this paper, we use an incoherent shadow-casting system to perform the optical correlation. The LCD panel is most suited for this system as a light intensity modulator without any modifications to the panel and optimum designs. Fig. 3 sketches an incoherent shadow-casting correlator, including an incoherent diffusing light source, an input image, a database image, an imaging lens L1, and a CCD camera with lens L2, from left to right along the optical axis. Using the diffusing characteristic of the incoherent light, the input image is projected onto the database image along various directions. Imaging lens L1 collects the superimposed images of the input and the database image from each direction and makes a superposition of all these images so that a resultant correlation image is formed on an output plane
H. Liu et al./Optics and Lasers in Engineering 30 (1998) 305—314
309
Fig. 3. An incoherent shadow-casting correlator.
behind L1. The axial position of the output plane is determined by the distance between the input and the database image, and the image sizes of each [5]. The simplest structure of an incoherent shadow-casting correlator uses only one imaging lens L1 and lets the resultant correlation image form directly on the CCD detecting plane. However, in order to make the radial size of the correlation resultant image smaller than that of the CCD detecting array, which is only 1/2 in diagonally, the imaging lens L1 must have a short focal length for a given field of view. On the other hand, the aperture of the imaging lens L1 should be not less than the database image size, which is about 70]70 mm2 in this paper, in order to collect all the superimposed images to get an accurate correlation result. Large aperture size and short focal length imply an imaging lens with large numerical aperture (NA). It is difficult to obtain such high-quality imaging lens. In this paper, we adopt a twoimaging lens scheme, in which the former one mainly undertakes the summation operation of those superimposed images from each direction and the latter one acts as a translator to image the correlation result on to the CCD detecting plane. By this division of work, the requirement of both lenses can be lowered. The correlation resultant image will be captured by the CCD camera, and then stored in the frame grabber or read into the computer memory for later use. A proper threshold value is selected and set by the computer according to different requirements. For example, when recognition accuracy is mainly considered and the error tolerance next, the threshold level should be higher to make the identification condition more strict and thus eliminate ambiguity, and vise versa, because recognition accuracy and error tolerance are a pair of inherent discrepancy in pattern recognition technology. A simple minimum distance ranking algorithm is used for giving the final recognition results. 3. Liquid crystal display panel used in this system 3.1. Basic parameters of the LCD panel and of the correlator The liquid crystal display (LCD) panel used in this system is a SHARP QA-1200 model color computer/video projection panel, which uses a thin-film transistor (TFT)
310
H. Liu et al. /Optics and Lasers in Engineering 30 (1998) 305—314
display system and an active matrix TFT drive system. The number of pixels is 640]3, horizontally, and 480 vertically (each pixel includes three parallel sub-pixels for R, G, and B, respectively). Each pixel has a center-to-center spacing (called pixel pitch) of 0.267 mm horizontally by 0.27 mm vertically. The typical contrast ratio is 150 : 1. It is compatible for direct PC connection without via a frame grabber. When the input signal is IBM VGA with 640]480 display size, the update rate is 60 Hz [18]. Its additional backlight panel QA-BL2 can provide the uniform and diffusing light source required in the incoherent shadow-casting correlator. In order to fully demonstrate the speed advantage of the optical system over the computer software, the pixel numbers of the images are the more, the better because the optical system computes in parallel way and increasing the pixel numbers does not reduce its processing speed, while computer computes in serial way and increasing the pixel numbers will greatly increase the time consumed. In this paper, images of 256]256 pixels are used. For the 0.267]0.27 mm2 pixel pitch in this panel, the image size is 68.352]69.12 mm2. Both the transversal and the longitudinal sizes of the optical system are basically determined by this size. The former imaging lens L1 has 180 mm focal length and 100 mm aperture size, which just adapts to the image size diagonally. The latter imaging lens L2 is a Cosmicar TV camera lens with 50 mm focal length and 1.4 D/f. The distance between the input and the database image is about 4 times of the focal length of lens L1, 720 mm, and the distance from the correlation plane to the CCD detecting array plane is about 300 mm. To shorten the whole length of the optical system, a folded geometric structure has been designed as shown in the left part of Fig. 1. The preprocessed input image and the encoded database images are displayed side by side on the LCD panel. Using only one panel as two real-time SLMs for both images, not only can the structure be made more compact, but the two images are ensured of the same pixel pitch and aspect ratio as well. A reflecting mirror is used to connect the two parts of the LCD panel into one optical system. This is different from other folded schemes in which two mirrors are used. This choice is made in order to shorten the distance between the two images as much as possible. Although the system will therefore be an off-axis system, no significant negative impacts on the recognition performance exists. In addition, one mirror can be operated more flexibly to avoid the double image effect which often appears in the double-mirror schemes. Furthermore, the structure of the correlator can also be more compact by this choice. 3.2. Modulation characteristic of the LCD panel The LCD panel used in this system serves only as an light intensity modulator. It is desirable that the intensity modulation is a high-contrast, linear function of the drive voltage. Three parameters can be adjusted to achieve such a performance: orientations of both the polarizer and the analyzer, and the bias voltage added to the video signal. In our practical use, when the settings of the polarizer and analyzer were close to those originally set by the manufacturers, the best performance was achieved. And when the brightness control is set approximately to the midpoint of the available range, the LCD panel operates in the linear region of the intensity modulation.
H. Liu et al./Optics and Lasers in Engineering 30 (1998) 305—314
311
The grid structure of the electrodes meshed on the LCD panel will result in a diffraction phenomenon that results in a convolution effect on the correlation image. The correlation peak will be convoluted into a small area of a 2-D point array. In order to weaken this effect to improve the correlation peak quality, the shorter the distance between the two images, the better. However, the shorter this distance, the larger the correlation distribution area. This problem has been solved by the twoimaging lens scheme as stated above in Section 2. The light transmissivity of the LCD panel is quite low because the polarizer and analyzer are attached on both sides of the panel. It significantly reduces the contrast ratio of the correlation image. Except for shortening the length of the optical system to reduce the light intensity loss and adjusting the brightness control of the backlight panel and the contrast-ratio control of the LCD panel to achieve a high contrast correlation image, another special method used to improve the contrast ratio is to get a dark point (we called it a correlation valley) instead of a bright one (correlation peak) by simply complementing either the encoded image of the input image or that of the database image before the correlation operation [5]. 3.3. Update rate limitation of the LCD panel in this system Because we are interested in quickly scanning through the stored database images, we should know the effective update rate of the LCD panel in this system. Although the LCD panel has an update rate of 60 Hz for normal use, it cannot reach such a high rate in this system because it is restricted by the data exchange speed between the optical module and the electronic control module such as the transmission speed limitation of the frame grabber when the computer reads the resultant correlation image from the CCD camera via the frame grabber. An experiment was done to examine the highest update rate the LCD panel can reach in this system. An image in which all 256]256 pixels are black (GSL"0) and another image in which all 256]256 pixels are bright (GSL"255) are displayed alternately at the position of the database image. Another pure bright image is rested at the position of the input image. The period of alteration is specified by the program. The CCD camera reads the gray-scale level of only one point on the correlation plane. Obviously, if the update rate of the LCD panel is suitable for this system, the GSL of the detected point will change regularly from 0 to 255. If the LCD panel updates at a much higher rate than the response time of the CCD camera and the frame grabber, the GSL of that point will be confused. Experimental result has shown that it takes about 14 s for scanning 100 images through the database and the average update rate is about 7 Hz to ensure the stability of the system. However, 7 Hz is acquired under the condition that no judgement and the results given in a subprogram were added to the program. Since those parts need additional time for each loop during the scanning process, the actual time needed is more than the testing value. The improvement of the processing speed depends mainly on the characteristic improvement of the hardware (LCD panel, CCD camera, PC, and the frame grabber). A multichannel scheme has been utilized for this improvement, in which four database images are correlated with
312
H. Liu et al. /Optics and Lasers in Engineering 30 (1998) 305—314
one input image simultaneously so that the LCD panel only needs to update 50 times for a database of 200 images [5].
4. Experimental results In our experiments, a dataset of 200 human face images has been built. When a new face is captured in real-time, the processor can automatically search the dataset and discriminate the one concerning person. The system can now reach an recognition accuracy of over 94% and a speed of 12 images/s under the condition that a fourchannel structure be used. To test the robustness of the system to different distortion variances, noise disturbance, and information loss, an experiment has been done using the face image of one of our co-authors, Cheng. Fig. 4a—d show his face images with different postures (head nodding, tilting, and scaling) and illuminations, (e) shows the histogram equalized image of (d), (f) and (g) show the images with 30% Gaussian noise and 40% blocking, respectively, which are both digitally processed from (e) by the Adobe Photoshop 4.0. All the above images can be recognized to be the same person by the system. Fig. 4h cannot be recognized because the distortion is too large. Fig. 4i is a face image of another student in our research group whose face is very similar to that of Cheng’s and the system can distinguish them as different persons. Fig. 5a—e shows the face images of the author, Liu, with different facial expressions (seriousness, smile, laughing, and astonishment) and the system can identify them to be the same person. Another experiment has been done to test the significance of different facial features in recognizing human faces with this system. A black bar blocks 25% of a testing face (Fig 4e) on different parts, vertically from left to right and horizontally from bottom to top continuously and 14 testing images are formed as shown in Fig. 6a—n. Using each image as the input image to correlate with the database images, we can find whether the corresponding database image (Fig. 4e) can be recognized or not and then determine which parts of a face play more important roles than the others.
Fig. 4. Face images in the robustness testing experiment.
H. Liu et al./Optics and Lasers in Engineering 30 (1998) 305—314
313
Fig. 5. Face images with different facial expressions; (a) seriousness, (b) no expression, (c) smile, (d) laughing, and (e) astonishment.
Fig. 6. Face images with blocks on different parts of a face.
Experimental result has shown that regions with facial features (mouth, nose, and eyes) are more important than the non-features regions (Fig. 6a, g, h). Among those features, the double eyes play the most important role (Fig. 6b, f, m, n). The nose and the mouth seem to have no noticeable difference in our experiment. 5. Conclusions We have built an automatic human face recognition system in this paper. On the basis of our previous work, in which an incoherent shadow-casting correlator has been used to optically implement the hit—miss transform for human face recognition, several well-known techniques, such as the affine transform and the histogram equalization, have been incorporated into the preprocessing steps, by which the recognition process can be automatically realized and the tolerance ability to distortion variances of the real-time faces can be improved. Many advanced algorithms for the feature extraction of human faces [1, 8, 19] and for the distortion invariant pattern recognition [20—22] may be incorporated into this system for further development. This system may also be modified to use in other pattern recognition fields such as the industrial on-line inspection and the optical character recognition (OCR). Acknowledgements This work was supported by the National Natural Science Foundation of China (69775008), the High Technology Research and Development Program of China (863-307-07-02) and Cao Guangbiao High Technology Development Foundation.
314
H. Liu et al. /Optics and Lasers in Engineering 30 (1998) 305—314
References [1] Chellappa R, Wilson CL, Sirohey A. Human and machine recognition of faces: a survey. Proc IEEE, 1995;83(5):704—40. [2] Li HS, Qiao Y, Psalltis D. Optical network for real-time face recognition. Appl Optics 1993;32:5026—35. [3] Javidi B, Li J, Tang, Q. Optical implementation of neural networks for face recognition by the use of nonlinear joint transform correlators. Appl Optics 1995;34:3950—62. [4] Yuan S, Wu M, Jin G, Zheang X, Chen L. Programmable optical hit—miss transformation using an incoherent optical correlator. Opt Lasers Eng, 1996;24(4):289—99. [5] Liu H, Wu M, Jin G, Cheng G, He Q. Real-time optoelectronic morphological processor for human face recognition. Opt Engng 1998;37(1):151—7. [6] Huang KS, Jenkins BK., Sawchuk AA. Binary image algebra and digital optical cellular image processor design. Comput Vision, Graphics, Image Processing, 1989;45:295—345. [7] Phillips PJ, Vardi Y. Efficient illumination normalization of facial images. Patt Recog Lett 1996;17: 921—7. [8] Akamatsu S, Sasaki T, Fukamachi H, Suenaga Y. A robust face identification scheme—KL expansion of an invariant feature space. Proc SPIE 1991;1607:71—84. [9] Liu H, Wu M, Jin G, Cheng G, He Q. Optoelectronic morphological processor for industrial on-line inspection. Proc SPIE 1998;3306:141—8. [10] Liu L. Morphological hit-or-miss transform for binary and gray-tone image processing and its optical implementation. Opt Engng 1994;33:3447—55. [11] Yuan S, Jin G, Wu M, Yan Y. Optical implementation of morphlogical hit-miss transform using complementary-encoding. Proc. SPIE, 1995;2564:336—42. [12] van der Gracht J, Mait JN, Prather DW, Athale, RA. Role of coherence in optical pattern recognition. Proc SPIE 1994;2237:152—63. [13] Brasher JD, Johnson EG. Incoherent optical correlator and phase encoding of identification codes for access control or authentication. Opt Engng 1997;36(9):2409—16. [14] Gregory DA, Kirsch JC. Full complex modulation using liquid crystal televisions. Appl Opt 1992;31: 163—5. [15] Amako J, Miura H, Sonehara T. Wave-front control using liquid-crystal devices. Appl Opt 1993;32: 4323—9. [16] Roberg D, Neto LG, Sheng Y. Full complex modulation spatial light modulator using two coupledmode modulation liquid crystal televisions. Proc SPIE, 1995;2490:407—15. [17] Takaki Y, Ishida K, Kume Y, Ohzu H. Incoherent pattern detection using a liquid-crystal active lens. Appl Opt 1996;35(17):3134—40. [18] SHARP QA-1200 colour computor/video projection panel operation manual, pp. 27. [19] Yuille A, Cohen D, Hallinan P. Feature extraction from faces using deformable templates. Proc IEEE Comput Soc Conf on Computer Vision and Patt Recog 1989:104—9. [20] Vijaya Kumar BVK. Tutorial survey of composite filter designs for optical correlators. Appl Opt 1992;31:4773—801. [21] Rahmati M, Hassebrook LG. Intensity- and distortion-invariant pattern recognition with complex linear morphology. Patt Recog 1994;27(4):549—68. [22] Gualdron O, Arsenault HH. Improved invariant pattern recognition methods. Real-time optical information processing. New York: Academic Press, 1994:89—113.