Pattern Recognition 34 (2001) 1555}1564
Face detection and location based on skin chrominance and lip chrominance transformation from color images Hongxun Yao *, Wen Gao Department of Computer Science and Engineering, Harbin Institute of Technology, 92 West Da Zhi Street, Harbin City, Heilongjiang Province 150001, People's Republic of China Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, People's Republic of China Received 15 May 2000; accepted 15 May 2000
Abstract This paper investigates the color facts from color images, obtains the relationship between chrominance and color components, and establishes a type of coordinate transformation which is able to improve chrominance of skin and lip. With these coordinates, a new method of human face detection and location based on skin chrominance and lip chrominance transformation from color images is presented. It is an e!ective and robust way to detect the position of objects which is not in#uenced by the pose of objects and their complex background. The advantage of relatively stable chromatic information from color images is taken to detect or locate objects; also, the intensity information from color images is applied to enhance the contrast ratio between two objects with tiny di!erences. 2001 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Skin-color model; Chrominance; Face detection; Principal component analysis
1. Introduction Human face perception is currently an active research topic in computer vision. Detection and/or location of human faces is a prerequisite for face perception. It is the key to identity con"rmation in security systems, object detection in video tele-conference, human face recognition, facial expression analysis, lip-reading and human} computer interactive technology. Robust and accurate facial feature analysis is a di$cult object recognition problem, due to the large appearance di!erences between subjects and large appearance variability of a speci"c subject causes changes in pose, lighting, specularity and mouth opening. Previous approaches to face detection can be classi"ed into three categories. Category 1 is to locate a human face "rst by locating eyes or some other
* Corresponding author. Tel.: #86-451-6416485; fax: #86451-6413309. E-mail addresses:
[email protected] (H. Yao),
[email protected] (W. Gao).
important facial features [1}4]. Once the features are identi"ed, the overall location of face is determined by using face geometric information. The main disadvantages are: the lack of clearing and high-resolution power imaging, which were unable to achieve the condition in practice. However, these facial features may change from time to time, under di!erent imaging conditions, i.e., face lighting, face scale, face orientation. It is di$cult to detect these facial features robustly and reliably. It is less reliable if a simple feature model is selected or a complex feature model is employed therefore taking a much longer time. By the second type of approach, the face is examined as a whole, usually using the principal component analysis [5}8] or neural networks [9}11]. The disadvantage lies in being complicated with a mass of data in training calculation and in not being intuitional in result. When the result of detection or location with this method is not right, we do not know which image is the key to in#uencing the conclusion and how the image arises. Thus, we do not know how to improve our experiments. Both of the above approaches do not utilize the color information from color images. In fact, color is a
0031-3203/01/$20.00 2001 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 0 0 ) 0 0 0 9 0 - X
1556
H. Yao, W. Gao / Pattern Recognition 34 (2001) 1555}1564
very important physical feature of human faces. Therefore, the third approach, skin-color model [12}14]; however, human skin color di!ers from person to person, even if the same person, at di!erent times or wearing di!erent color clothes. So some researchers locate the skin color region with I component, which is one axis from the YIQ color representation. Unfortunately, in many cases, I component image is not good enough to distinguish skin color from ambiences. This paper proposes an approach for human face detection and location based on skin chrominance and lip chrominance (not including co!ee-like, gray-like, blue-like or black-like lipsticks, only the natural lip color) transformation from color images, which can improve the features of skin color and lip color so that it is convenient to detect. In fact, a mouth or a lip, occupying a larger area on a face, should be used for their speci"c property. Additionally, the mouth and lip are signi"cant during facial expression recognition and lip-reading. This method of chrominance transformation is an e!ective and robust way of detecting the position of objects, not in#uenced by the pose of these objects and their background. It can also overcome the in#uence of illumination variance or the di!erent camera devices by on-line learning. It uses the stable chrominance information of skin and lip from color images to detect, determine, and locate objects. At the same time, it also applies intensity information to enhance the contrast ratio between two objects of tiny di!erences in order to give prominence to details and to describe them. This approach is fast and stable. It is a prerequisite for most human face perception systems and identi"cation systems, tele-conference, human face recognition, facial expression recognition and lip-reading systems, because there exists reliability and limitation of processing time for real-time systems. Additionally, this idea of the method based on chrominance transformation can be used in many other similar "elds to detect or locate objects with their chrominance.
2. Chrominance transformation space 2.1. Human vision system and color space A variety of spectral distributions of light can produce perceptions of color which are distinguishable from one another. The human retina has three di!erent types of color photoreceptor cells: photoreceptor cells, bipolar cells and ganglion cells. Photoreceptor cells are classi"ed as photoreceptor cone cells and photoreceptor bacilliform cells. There are about 7,000,000 photoreceptor cone cells which have the bright photoreceptor function, that is the color visual function, and are centralized into the central hollow of the retina. Each photoreceptor cone cell is connected with a bipolar cell. There are about 120,000,000 photoreceptor bacilliform cells which
perform the dark photoreceptor function and are distributed near the central hollow of the retina. More than a dozen photoreceptor bacilliform cells are connected with one bipolar cell. Bipolar cells act as a bridge from photoreceptor cells to ganglion cells. The ganglion cells connect with the brain. There are three types of lightsensitive cells in the human eye, which are sensitive to three colors. The colors may be any three basic colors. There are many possibilities to choose three basic colors. The only rule to be satis"ed is that the third color you want to choose may not be obtained by mixing the other two colors you have already chosen. However, if you select four colors, it will not be "tted because the fourth color has a linear relationship with the other three colors, that is the fourth can be mixed with the above three colors according to a kind of proportion among them. This means that only three independent colors existed. Color is a three-dimensional component. This is usually known as the three basic colors principle. This is the reason why we select red, green and blue as basic colors, for their mixture has a wide range of colors. In order to get the expression to "t human visual property, we can use the color space >;< or >IQ through recoding the values of R, G, B. In the color space >;<, we obtain the values of >, ;, < by >"0.299R#0.587G#0.114B, ;"K (B!>), (2.1) <"K (R!>), where K and K are called compressive coe$cients, and usually K "0.493, K "0.877. Generally, K#K"1, let K "sin , K "cos . > component represents the intensity of color, while two chromatic signals ; (that is (B!>) and < (that is (R!>)) are components orthogonal to each other, named as chrominance signals. A color signal is composed of two parts: intensity and chrominance components. The chrominance signals ; and < constitute a vector in two-dimensional space. This module C of the F vector represents the saturation of color, re#ecting the magnitude of value R, G, B, and the phase angle represents hue, re#ecting the property among R, G, B. The module C and the phase angle are de"ned as follows: F C "(u#v, F v "tg\ . u
(2.2)
(2.3)
There is a formula to transform from RGB space to >;< space:
0.587
0.114
R
; " !0.147 !0.289
0.436
G .
>
<
0.299
0.615 !0.515 !0.100
B
(2.4)
H. Yao, W. Gao / Pattern Recognition 34 (2001) 1555}1564
1557
Fig. 2. Relation of axes of I, Q and ;, <. Fig. 1. Vector graph of color information in UV space.
Fig. 1 shows the color distribution in ;< plane directly. The other representation of color space >IQ is similar in principle to >;<. The component > in the two spaces has the same meaning, i.e. the intensity of light; while I and Q are components orthogonal to each other, representing chrominance signals. The chrominance signal I component is at the phase angle 1233 red chromatic position and 3033 blue chromatic one; in contrast, the chrominance signal Q component is at the purple and green}yellow chromatic position which is the weakest distinguishable colors for human eyes. The aim is to take full advantage of human's color-distinguishing property. This representation will get the minimum redundant information by >IQ to code color information. The relation of >IQ space and >;< space is shown in Fig. 2: I"< cos 333!; sin 333,
(2.5)
Q"< sin 333#; cos 333.
Therefore, the transformation formula from RGB space to >IQ space is as follows:
>
0.299
0.587
0.114
R
I " 0.596 !0.274 !0.322
G .
Q
B
0.211 !0.523
0.312
(2.6)
2.2. Skin chrominance space and lip chrominance space Skin color, which contains relatively concentrated information in human's face color image, is more reliable than the features of eyes or other face organs especially detecting unclear images or connotating very small face images. If the feature of skin color is stable, in other words if the invariance component of skin color can be extracted, then the changes from person to person or
Fig. 3. Skin color and lip color clustered in RGB color space.
from bright to dark or di!erent backgrounds can be eliminated. The idea of detecting human objects in color images with skin color features can be accepted and possible. Fortunately, it is possible. Locating face with features from skin color has the advantage of rotation invariance and simpli"cation. Human face skin colors have di!erent appearances with di!erent races, di!erent persons or di!erent individuals, even the same person under di!erent lighting source or wearing di!erent clothes. However, they are in a relatively narrow color space. We have investigated the distribution of face images under di!erent video collecting devices, di!erent sexes, di!erent ages or di!erent skin colors, and found that human skin color and lip color are clustered in the RGB color space. Fig. 3 shows that skin color and lip color are clustered in RGB color space. It is well known that di!erent persons have di!erent skin color and lip color appearances. Our experiments reveal that human color appearances di!er more in intensity than in chrominance. The intensity from the color representation is normalized so that, the di!erences of human skin color or lip color can be greatly reduced. As we know, the values (R, G, B) of color images in RGB space contain not only chrominance but also
1558
H. Yao, W. Gao / Pattern Recognition 34 (2001) 1555}1564
intensity. RGB components have strong correlation property, which is shown in Fig. 4. If two pixels (R , G , B ) and (R , G , B ) in RGB space have the following proportion: R G B " " , R G B then they are identical in chrominance but di!erent in intensity. The intensity can be eliminated from color space through normalization. The pure color removed from the intensity is called chrominance, which can be represented by two vectors, ;and <. In order to remove the intensity information from color, we normalize the values (R, G, B) with intensity. The normalized values are represented as a triple (r, g, b), which can be calculated by the following formula:
r
g " b
1/(R#G#B)
0
0
0
1/(R#G#B)
0
0
0
1/(R#G#B)
R
G . B
(2.7)
In fact, the above formula de"nes a mapping from RGB space to chromatic space, that is a transformation from three-dimensional space to two-dimensional space. Blue color is redundant after the normalization because r#g#b"1. Fig. 4 shows the (R, G, B) distribution graph of the object pixels from which we can conclued that the given pixel's values of R, G, B are greatly interrelated. Furthermore, we draw Fig. 5, with values R of (R, G, B) acting as the horizontal axis, and values of R, G or B acting as the vertical axis; hence, the relation of corresponding (R, G, B) clearly shows that there possibly exists a linear
relation. The correlation of value R to itself, of course, is linear, a line with a slope rate equalling 1 and through the origin point; the correlation of value G or value B to value R similarly is of a linear relation, although sometimes the value G of an object to value R is closer than the value B to R, sometimes the value B to value R is closer than the value G to R. In further steps, we can get the conclusion that the problem of looking for the skin chromatic space or lip chromatic space which is invariant to di!erent people becomes the problem of looking for an appropriate phase angle in chromatic ;< plane to make the skin color or lip color much more concentrated along a certain axis. By using the coordinates, we will get the principal components of the skin color or lip color information. In other words, the skin chrominance coordinates ;I
(2.8)
;I "< sin 243#; cos 243 "0.877(R!>)sin 243#0.493(B!>)cos 243
Fig. 4. Relation of values of (R, G, B) on object: the horizontal axis represents the orders of sampling, the vertical axis represents the value of RGB.
H. Yao, W. Gao / Pattern Recognition 34 (2001) 1555}1564
1559
Fig. 5. The relationship of (R, G, B) to R.
The transforming formula can be employed as form (2.8) from RGB space to skin chromatic space > ;I
0.587
0.114
R
;I " 0.115 !0.473
0.358
G .
>
0.299
0.620 !0.352 !0.268
(2.9)
B
So should be lip chromatic space > ;I
Fig. 6. Hue phase distribution histogram of face skin color.
"0.357(R!>)#0.450(B!>) "0.357(R!0.299R!0.587G!0.114B) #0.450(B!0.299R!0.587G!0.114B)
> 0.299 0.587 0.114 ;I " !0.137 !0.298 0.435
R
G .
(2.10)
B
Fig. 7 is the binary result of detecting skin color by the threshold process of the
"0.115R!0.473G#0.358B,
) cos 243!0.493(B!>)sin 243 "0.800(R!>)!0.200(B!>) "0.800(R!0.299R!0.587G!0.114B) !0.200(B!0.299R!0.587G!0.114B) "0.620R!0.352G!0.268B.
To look for and locate human faces, we look for skin (including face, hand, neck and other things looking like skin) by applying the skin chrominance transformation, and then try to "nd lip or mouth (or lip-color-like) by using the lip chrominance transformation. After that, we could get the position relation, containing relation and area relation between the skin color region and lip color region. Let the #ag R represent the pixel in skin 1IGL color region, the #ag R represent the pixel in the lip *GN color region, detect the connecting region among all regions of R and R , each connecting region marked 1IGL *GN
1560
H. Yao, W. Gao / Pattern Recognition 34 (2001) 1555}1564
Fig. 7. Comparison of di!erent results by applying variable value in skin chrominance detection.
Fig. 8. Comparison of di!erent results by appling variable value in Lip chrominance detection.
by the same #ag, respectively, R ,R , and 1IGL\ 1IGL\ 2 R ,R ,2, then, calculate the area of each con*GN\ *GN\ necting region:
S " 1IGL\H
p(x, y) dx dy,
S " *GN\H
VWZ01IGL\H q(x, y) dx dy,
(2.11)
VWZ0*GN\H
(2.13)
1, (x, y) 3 R , 1IGL\H 0, (x, y) , R , 1IGL\H
q(x, y)"
1, (x, y) 3 R , *GN\H 0, (x, y) , R . *GN\H
If the following restrained conditions are satis"ed, the pixels are the facial ones which we are looking for. Restrained condition 1: S *S , 1IGL\H *GN\G R MR . 1IGL\H *GN\G
1 r (x, y)" L L r(x#i, y#j), n H\L G\L 1 g (x, y)" L L g(x#i, y#j). n H\L G\L
where j"1, 2,2, p(x, y)"
where i, j go through each combination of R ,R , 1IGL\H *GN\G and in general, *15. If each group of i, j satis"es the above restrained conditions, that region is probably the object region which we are looking for. Next, smoothing the region of chromatic image r(x, y) and g(x, y):
(2.12)
We have got the logarithm histogram h(r, g) in order to normalize the size of face. The energy of face chromatic space, a one-peak function, is centralized near the maximum value which marks the basic chrominance of human face. The logarithm histogram has a similar shape even in di!erent poses and sizes, showing the regular pattern of human face chrominance. So we get r "arg max h(r, g), K P g "arg max h(r, g). K E
(2.14)
H. Yao, W. Gao / Pattern Recognition 34 (2001) 1555}1564
Let R mark the region in the original state; calculate $?AC r , g according to the following formula: A A r "arg max r (x, y)!r , A K VWZ0$?AC (2.15) g "arg max g (x, y)!g . K A VWZ0$?AC Formula (2.15) extracts the chrominance of pixel, which is the maximum di!erence of skin color. Calculate the pixel I (x, y), which satis"es the following formula: 1 (r (x, y)!r )#(g (x, y)!g )(, A A 0 otherwise.
I(x, y)"
(2.16)
Eliminate other regions by restraining condition 2 e$ciently. Restrained condition 2: t ) I(x, y))t (2.17) VWZ5 Formula (2.17) is to get enough pixels which satisfy the above restrained conditions in detecting image. In normal cases, the width and height of a human face accord with a given regular, that is a proportion in a range. Calculate the width and height = , H of R, to 0 0 get restrained condition 3: Restrained condition 3: p (= /H (p . (2.18) 0 0 In the above discussion, , t , t , p , p , are experi mental values. According to experiments, t should be selected about one-twentieth of the number of window's pixels, t should be selected one-fourth of the number of window's pixels or so. is more than 15 times the lip area. Fig. 9 shows the result of detection by the above restrained conditions. These are the two results of our experiments, and we have done about 200 images from di!erent channels with great di!erence in backgrounds.
1561
2.4. The help for detecting and describing image details from intensity information It seems perfect that the objects can be detected or located only by the chromatic information. Is it really hardly signi"cant for intensity information to detect objects? It is. Otherwise, we could not watch the black and white TV program whose signals are transformed by the intensity of color TV signals. Since we can enjoy it, it has been proved that the object can be segmented by the intensity signal > which is transformed from color signal RGB. Therefore, the intensity signals can be utilized. Experiments reveal that our eyes are more sensitive to the changes of the intensity information than to the chromatic information (including hue and saturation). It means that more details of an image can appear with intensity (brightness) whereas our eyes cannot distinguish the colors of these tiny details. Some details of an object can be obtained from great intensity. Furthermore, two objects (or the two parts of an object) will become clearly distinguishable if a group of intensity parameters between them are magni"ed because their details are described clearly. Human vision of intensity is decided by the changes of intensity. Fig. 10 indicates the logarithmic relation between our subjective intensity sense and the actual intensity. In low intensity, distinguishing the change of the intensity progression mainly depends on the photoreceptor bacilliform cells. And at the same range of intensity change, the progression is less. However in high intensity, the photoreceptor cone cells are mainly for telling the change of the intensity progression. Fig. 10 shows that there is approximately a straight line in the middle of both curves, which indicates that the change of intensity is constant; it needs much more change of intensity to sense a certain change while in the higher or lower region. The #exure of the curves can imply this. From Fig. 10, we can also learn that the visual scope of distinguishing sense in di!erent intensities becomes much less if our eyes have adapted to a certain average intensity. This property can be applied. We can
Fig. 9. The result of detection and location by the above restrained conditions.
1562
H. Yao, W. Gao / Pattern Recognition 34 (2001) 1555}1564
Fig. 10. Perception peculiarity of intensity in eyes.
magnify the di!erence of the intensity (not the intensity itself) or magnify the contrast of two objects in an image where details are important; thus, there will be created an obvious visual di!erence between two objects which are di$cult to distinguish. Fig. 11 is an example of getting details of the lip by the proper brightness coe$cient. Even though the chromatic coe$cient is not sensitive to our eyes, it is distinguishable. Fig. 12 illustrates the separability of two regions. Comparing intensity and chrominance, we "nd that the former is more e!ective to describe object details.
Fig. 11. Example of improving the contrast of skin and lip with suitable intensity coe$cients.
3. The results and conclusion All our experiments have achieved a rate of up to 26# frames per second using Pentium-II 300 with CPE-1000 image board, JVC TK-1070 color camera or MIMTRON MTV-3301CB color camera, 24 bits color images. We have detected 187 faces from 60 images with 196 persons under many complicated backgrounds by this approach, got to 95.4% right detection rate by the approach. Also, it reveals good appearance in real time for any sequence of images with only one person. It is revealed that (1) human skin colors cluster in a small region in a color space; (2) each type of object has its own chrominance; (3) easily detecting one object can use its chrominance information by transforming from R, G, B to the chromatic coordinates in order to improve its chrominance; (4) it is e!ective and robust to detect a human face with skin chrominance and lip chrominance. In a word, unlike similar previous approaches based on skin models, which are less reliable using I component images, our method enhances the skin or lip region in its chrominance images by transforming the special object chrominance training. We deduce two new coordinate types which can enhance skin chrominance and lip chrominance, respectively, and based on it we present the approach of human face detection and location. These enhanced chrominance images overcome the inelasticity
Fig. 12. Separability with normalization value of r, g between skin and lip. The dark region shows distribution of lips, and the light region does that of skin.
of I component images. And we also propose applying the approach to all of the human}computer interaction systems or detection and application systems in which man is the main object, because he has pose invariance property by extracting face features using chrominance. Another advantage of this approach is that the features are relatively stable, not in#uenced by changes in the size of objects or complex background. We propose getting details of images by improving the contrast rate of local image due to the special e!ect of intensity information to human vision, and distinguish objects by constructing relatively obvious di!erence of local objects. With this method, images can only enhance lip pixels if the transformation is based on the maximum distance between lip and skin, and lip pixels can be extracted from face (skin region) easily and obviously. It is very useful to many methods based on geometric models such as deformable templates or `snakesa. Preprocessed by this enhanced transformation, the outline of interested objects is emerging obviously.
H. Yao, W. Gao / Pattern Recognition 34 (2001) 1555}1564
The algorithms and training methods are general, and can be applied to other views of the face, as well as to similar object and pattern recognition problems. In the future, we are going to study the relationship between di!erent camera and skin chrominance or lip chrominance, and get to know how many shifts of skin chrominance and lip chrominance exist among di!erent persons. 4. Concluding summary This paper investigates color's composing facts from color images, obtains the relationship between chrominance and color components, and establishes a set of methods of coordinate transformation which is able to improve chrominance of interesting objects. With these coordinates, a new method of human face detection and location based on skin chrominance and lip chrominance transformation from color images is presented. It is an e!ective and robust way to detect positions of objects, which is not in#uenced by the pose of objects and their complex background. By the experiments, it is found that human skin colors cluster into a small region in a color space and objects of each type have their own chrominance. It is easy and e!ective to detect one object by using its chrominance information with transforming from R, G, B to the chromatic coordinates in order to improve its chrominance. It is e!ective and robust to detect a human face with skin chrominance and lip chrominance, and the method is also e!ective to detect any other objects. The advantage of this chromatic improving method is taking relatively stable information from color images to detect or locate objects: thus, it is not in#uenced by the pose of objects and illumination. At the same time, the intensity information from color images is applied to enhance the contrast ratio between two objects with tiny di!erences. References
1563
[2] G. Chow, X.B. Li, Towards a system for automatic facial features detection, Pattern Recognition 26 (12) (1993) 1735}1799. [3] K.C. Yow, R. Cipolla, Feature-based human face detection, Technical Report, No. 249, University of Cambridge, 1996. [4] D. Reisfeld, Y. Yeshurum, Robust detection of facial features by generalized symmetry, Proceedings of the 11th International Conference on Pattern Recognition, Hague, Netherland, 1992, pp. A117}120. [5] M.A. Turk, A.P. Pentland, Face recognition using eigenfaces, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Maui, June 1991, pp. 586}591. [6] D. Ying, N. Yasuaki, Face-texture model based on SGLD and its application in face detection in a color scene, Pattern Recognition 29 (6) (1996) 1007}1017. [7] M. Kirby, L. Sirovich, Application of the Karhunen} Loeve procedure for the characterization of human faces, IEEE Trans. on Pattern Analysis and Machine Intelligence 12 (1) (1990) 12}16. [8] Gao Wen, Liu Mingbao, A hierarchical approach to human face detection in a complex background, Proceedings of the First International Conference on Multimodal Interface, Beijing, 1996. [9] H. Martin, H. Hunke, Locating and tracking of human faces with neural networks, Technical Report of CMU, CMU-CS-94-155, 1994. [10] H.A. Rowley, S. Buluja, Human face detection in visual scenes, Technical Report of CMU, CMU-CS-95-158R, 1995. [11] A. Jacquin, A. Eleftheriadis, Automatic location and tracking of faces and facial features in video sequences. Proceedings of the International Workshop on Automatic Face and Gesture Recognition, Zurich, Switzerland, 1995, pp. 237}242. [12] G. Yang, Human face detection in a complex background, Pattern Recognition 27 (1) (1994) 53}63. [13] J. Yang, A. Waibel, Tracking human faces in real-time, Technical Report of CMU, CMU-CS-95-210, 1995, pp. 142}147. [14] J. Yang, W. Lu, A. Waibel, Skin-Color modeling and adaption, Proceedings of the AVSP'97 workshop, Rhodes (Greece), September 26, 27, 1997, ESCA, pp. 45}54. ISSN C 1018.
[1] K.M. Lam, H. Yan, Locating and extracting the eye in human face images, Pattern Recognition 29 (5) (1996) 771}779. About the Author0HONGXUN YAO was born in Zhejiang, People's Republic of China on 3 November 1965, received her B.E. and M.E. degrees in computer science from Harbin Shipbuilding Engineering Institute in 1987 and 1990, respectively. She was an assistant from 1990 to 1992 and a lecturer from 1993 to 1999 in the Department of Computer Science and Engineering of Harbin Institute of Technology. Since September 1997, she has been a Ph.D. candidate and an associate professor 2 years later at the same university, going in for human face perception and multimedia information processing. Her research interests lie in image processing, pattern recognition, multimedia technology and natural human}computer interface such as image databases, face recognition and lipreading. She has published more than 20 papers and books. About the Author0WEN GAO received his "rst Ph.D. degree in computer science from Harbin Institute of Technology in 1988, and received his second Ph.D. degree in electronics engineering from the University of Tokyo, Japan in 1991. Since December 1991, he has been professor at the Department of Computer Science in Harbin Institute of Technology. He was visiting research fellow in the Institute of Medical Electronic Engineering, the University of Tokyo, visiting professor in the Institute of Robotics, Carnegie Mellon University,
1564
H. Yao, W. Gao / Pattern Recognition 34 (2001) 1555}1564
visiting professor in Arti"cial Intelligence Laboratory, MIT. Currently, he is also a professor at Institute of Computing Technology, Chinese Academy of Sciences, the chairman of steering committee for China national Hi-Tech programme intelligent computing system, chief editor of Chinese Journal of Computers, honorary professor of computer science, City University of Hong Kong, guest professor at Tsinghua University, Xi'an Jiaotong University, Chinese University of Science and Technology, etc. His research interests include multimodal user interface, multimedia data compression, computer vision and arti"cial intelligence. He has published more than 150 papers and books.