~;. ~,
• ';S;-~
IEternaQona[journa[ol
Industrial Ergonomics ELSEVIER
International Journal of Industrial Ergonomics 19 ( 1997) 147-159
Non-contact method for measuring facial skin temperature Hirokazu Genno *, Atsuo Saijo, Hiroyuki Yoshida, Ryuuzi Suzuki, Masato Osumi Mechatronics Research Center, SANYO Electric Co.. Ltd., 1-18-13, Hashiridai, Hirakata, Osaka 573, Japan
Abstract Because facial skin temperature varies with sensations, it is an effective means of objectively evaluating sensations. The present report proposes a method that continuously measures facial skin temperature automatically using color and infrared cameras. With this method, the skin temperature range within an infrared image is superimposed over the skin color range within a color image to produce overlapping areas from which the facial area is extracted. Low-luminosity areas are then extracted from the extracted facial area in order to identify the eyes and eyebrows. Once the eyes and eyebrows are identified, the nose and forehead sections can be identified using their relative positions to the eyes and eyebrows, so now skin temperature for relevant facial areas can be measured from infrared image values.
Relevance to the industry The present method can be used to develop systems that evaluate sensations through facial skin temperature without placing stress on test subjects. Accidents caused by human error can be prevented, because the system estimates both stress and fatigue of operators in nuclear power plant. Keywords: Skin temperature; Sensation; Color image; Infrared image; Recognition
1. Introduction Technologies that objectively evaluate human sensations would be highly desirable from the standpoint o f creating user-friendly products and comfortable environments. Various physiological measurements have been used successfully to objectively measure human sensations in the past, but the problem has been that the measurement procedures themselves have always affected the test subjects. This problem can never be resolved as long as there is
* Corresponding author. Fax: + 81 720 44-2985; Tel.: + 81 720 41-1286.
physical contact when measurements are taken. Skin temperature from among the various physiological quantities has the greatest potential for automatic measurement without physical contact, and it is thought to be an effective means for evaluating sensations because autonomic nerve activity associated with sensations causes skin temperature to vary. Accordingly, the authors looked toward skin temperature on the face, which moves relatively little and is easy to measure, in developing a method of evaluating sensations through facial skin temperature (Genno et al., 1994), and at the same time concentrated on developing a method that continuously measures facial skin temperature automatically. The present study
0169-8141/96/$15.00 Copyright © 1996 Elsevier Science B.V. All rights reserved. PII S01 69-8 141 (96)0001 0-8
148
H. Genno et al. / International Journal qf lndustrial Ergonomics 19 (1997) 147-159
details this method for continuously measuring facial skin temperature automatically without physical contact.
IRtransmission/colorreflexmirror :4 ~
IRcamera
2. System structure and operating principle Fig. 1 shows the structure of the system. Images obtained from the infrared camera (IR-3000 made by Mitsubishi Electric) and the color camera (XC-009 made by Sony) are processed together in an image processor (Max Video 20 made by Datacube) controlled by a host EWS (Sparc Station 10 made by Sun). As shown in Fig. 2, the image input section is comprised of an infrared camera, a color camera and an optical system that matches the optical path length and the optical axes in both cameras. An infrared sensor housed in the infrared camera generates strong or weak electrical signals, depending on the intensity of the infrared beam reflected back from the target object, that are displayed as a temperature distribution in 256 × 256-pixel color or monochrome images. The resolution for temperature in the infrared camera is 0.2°C. In order to shoot images with the same field of vision using the infrared and color cameras together, the optical system first matches the optical axes of both cameras with calcium fluoride (CaF2) coated reflecting mirrors that pass infrared light. The optical path lengths of the cameras are also matched to ensure that both cameras are shooting images with the same angle of visual field, and then the infrared camera sends out a video synch signal (60 Hz) input to the color camera to ensure that images are shot at the same instant. The structure of the optical system just described enables the infrared and color cameras
.
~ - - ~
.
.
.
.
.
...... _t_ H - ' ~ C o l o r camera
Colorreflexmirror Fig. 2. Optical instrument in system structure.
to shoot exactly the same image at exactly the same instant. Fig. 3 shows the basic processing principle that takes place with the image processor and the host EWS shown in Fig. 1. Here the skin temperature range contained in the infrared image is superimposed over the skin color range contained in the color image, and the overlapping areas are extracted to extract the skin portions of the face and other areas. The low-luminosity range is then extracted from the extracted facial area in order to identify the eyes and eyebrows. Once the eyes and eyebrows are identified, the nose and forehead sections can be identified using their relative position to the eyes and eyebrows, so now skin temperature for relevant facial areas can be measured from infrared image values. The following describes in detail a method for extracting human forms as well as a method for measuring skin temperature at specific facial locations.
3. Method for extracting human forms Image Processor
Host EWS
Fig. 1. System structure.
In order to measure facial skin temperature without physical contact, the human form range must be extracted from the image input into the system. But since the human form is found in a variety of shapes and moves irregularly, computers have an extremely difficult time trying to extract human forms from
H. Genno et al. / International Journal o f lndustrial Ergonomics 19 (1997) 1 4 7 - 1 5 9
t"
t"
I","
'
mE~tl~
o t-
tu
t
t
t
~
t'- t~ .0_ 0
t
"5 E
t
~,.e
E_
149
150
H. Genno et al. / International Journal of Industrial Ergonomics 19 (1997) 147-159
complex backgrounds. The present study therefore proposes a method that uses body temperature intbrmation obtained from the infrared image together with skin color information obtained from the color image to extract the face and human form with an extremely high degree of accuracy.
3.1. Extracting the human form candidate range using infrared images In order to extract the human form using Infrared and color images, the human form candidate range is first extracted using just the infrared image. In concrete terms, this means that the infrared image is divided up into temperature ranges corresponding to the human form as well as the ambience. The human form candidate range can now be extracted if we define the temperature threshold as Th in order to separate the human form from the ambience, and then eliminate the temperature range below Th from within the infrared image. We can also extract the skin candidate range if we define the temperature
threshold as Ts in order to separate human forms with living skin from human forms without living skin, and then eliminate the temperature range below Ts from within the image. This method thereIbre demands that we find Th or Ts with a very high degree of accuracy. Generally peak waveforms corresponding to ambient, clothing and skin temperature bands appear as typified in Fig. 4(b) when background scenery that includes human forms is shot with an infrared camera and the temperature of each pixel in the image is displayed using a histogram. Here, it is relatively easy to determine Th and Ts. That is to say, Th is the temperature at which the differential value is 0 in the area between the individual peak wave forms for the ambience and clothing in a temperature histogram. Ts can be determined in a similar manner. If the ambient temperature is high, however, then the individual peak waveforms for the ambience and clothing are located very close to each other in the histogram as shown in Fig. 4(c), so that no temperature with a differential value of 0 can be found in
Ambient
c>, m
1
Th ~
20
T~ Clothing /.: Skin
i,
25 30 35 40 Temperature [°C] (b) Temp. histogram (room temp. 24°C)
(a) IR image
o¢o la_
Am
$ =0.8°C Ts ~ci0thing
,
S ,n
25 30 35 40 Temperature [°C] (c) Temp. histogram (room temp. 27°C) 20
Fig. 4. IR image and temperature histogram (demonstration figure).
H. Genno et al. / International Journal of Industrial Ergonomics 19 (1997) 147-159
(a) IR image
(b) Temp. histogram
f
lPO
B
(c) Segmentation of human area (d) Segmentation of skin area Fig. 5. IR image, temperature histogram and segmentation image (experimental result, room temperature 24°C).
(a) IR image
(b) Temp. histogram
m (c) Segmentation of human area
(d) Segmentation of skin area
Fig. 6. IR image, temp. histogram and segmentation image (experimental result, room temperature 27°C).
151
152
H. Genno et al. / International Journal of Industrial Ergonomics 19 (1997) 147-159
/
(a) Color image
(b) Infrared image
j
(c) Extraction of nominees for skin area
(e) Extraction of skin area
~ ~ ' ~ ' ~ ' ~ ~ ( g ) (h) Extraction of human area
•
(d) Extraction of nominees for human area
(f) Labeling
Mask of human area
14. Genno et al. / International Journal of Industrial Ergonomics 19 (1997) 147-159
this area. In such cases, a peak value that is detected relatively consistently for ambient temperature was used as the standard, and a fixed value 6 was added to find Th. When 6 was determined experimentally, the value turned out to be about 0.8°C in order to extract without losing the human form. The ambient temperature at this time was 27 to 28°C, and a temperature with a differential value of 0 was present in the range between the individual peak waveforms for clothing and skin (Hirono et al., 1992). The ambient temperature was further increased, and beyond 28°C, the individual peak waveforms for the ambience and clothing overlapped in the temperature histogram, and it was no longer possible to extract the human form candidate range. From this we know that the present method for extracting the human form candidate range from infrared images is only applicable as long as the ambient temperature remains below 28°C. This is not a problem in terms of practical applications because the ambient temperature in most office environments is kept below 28°C. On the other hand, the peak waveform corresponding to the temperature band of skin lies within a temperature band higher than 30°C, so the skin candidate range can be extracted even at a slightly higher ambient temperature. Experimental results proved that the skin candidate range could be extracted using this method as long as the ambient temperature remained below 30°C. Figs. 5 and 6 show the results of extraction at ambient temperatures of 24 and 27°C, respectively. We know from these figures that candidate ranges for the human form and for skin can be extracted with a high degree of accuracy. Fig. 7 shows the overall flow for the human form extraction method. The human form candidate range (Fig. 7(d)) and the skin candidate range (Fig. 7(c)) were extracted from the infrared image (Fig. 7(b)) using the method described above. Fig. 7(d) shows the setting for the temperature histogram used to determine the temperature threshold. Next, compression and expansion processing were used to remove noise from the infrared image shown in Fig. 7(d). Compression processing removed noise
153
points and noise lines, but this proved ineffective when the human form candidate range was distributed over a broad area. When expansion processing was applied after compression processing, several human form candidate ranges alone reappeared without any noise, so each range was labeled for future processing. Fig. 7(f) shows the differently labeled ranges displayed at various levels of luminosity for verification. At this point, we knew that we had extracted the human form range and the powered up CRT range as human form candidate ranges. 3.2. Extracting the human f o r m range by adding color information
The following describes a method that extracts the human form range by adding color information to the human form candidate range extracted using the method described above. In concrete terms, a determination is made as to whether any of the colors in the extracted skin candidate range is skin color or not. If one is skin color, then that skin candidate range is deemed a skin range. Then the human form candidate range containing the skin range is extracted as a human form range. This is explained in more detail below. First compression and expansion processing are used to eliminate noise from the extracted skin candidate range (Fig. 7(c)). From this, several skin candidate ranges alone reappear without any noise. The color image (Fig. 7(a)) is then used to determine whether any of the colors in these skin candidate ranges is skin color or not. In order to speed up the determination process, grid points are placed roughly once every 100 pixels in the image, and a skin color determination is made using only those pixels where the grid point and the skin candidate range overlap. In a skin color determination, the RGB values for relevant pixels are converted to XYZ and L* a ' b * color specification systems recognized by the Commission International de l'Eclairage (CIE), and is then converted to the polar coordinate system for the L* a* b* color specification system - the
Fig. 7. Method of human image extraction.
154
H. Genno et al. / International Journal of Industrial Ergonomics 19 (1997) 147-159
o~ i
¢0 ¢D x III ¢D v
c
E
~
.t=
~6 rr
t~
E
° ~
9,,,-
0 0
¢D v
H. Genno et al. / International Journal of Industrial Ergonomics 19 (1997) 147-159
L* Cab * Hab ° color specification system. This color specification system is a uniform perception space where color differences in space as well as differences in human perception are often found. Here the values L*, Cab* and Hab ° correspond to the three characteristics of color - illumination, chroma and hue. When these three characteristics satisfy the following equations, then the color is deemed skin color (Genno et al., 1990). 10
(1)
Fig. 7(e) shows the pixels that fit the criteria in Eq. (1), and are thus judged to be a skin range. Fig. 7(g) shows the human form candidate range that includes this skin range. At this point, we knew that the extracted powered up CRT range was mistakenly eliminated when the human form candidate range was extracted using only the infrared image. Since the range shown in Fig. 7(g) is a human form range, this range was masked, and the resultant image was superimposed over the color image shown in Fig. 7(a) to finally extract the human form range (Fig. 7(h)). Fig. 8 shows an example used for human form extraction. Although Fig. 8 shows a complex scene that includes a poster with a human form, a potted plant, a powered up CRT and other objects in addition to a human form, the human form alone was extracted with a high degree of accuracy. The fact that the poster with the human form was eliminated clearly shows the excellent features of this method.
(a) Visible image
155
4. Method for measuring skin temperature at specific facial locations The human form range as well as the skin range for the face and other physical locations were successfully extracted using the method described above. The present study proposes a method for measuring the skin temperature at critical locations by locating the eyes and eyebrows, which are relatively easy to identify from the extracted facial range, and then using the position of those parts relative to the critical location. 4.1. Identifying f a c i a l locations
Since the face is recognized as skin area, the facial range can be extracted using the method described above simply by inputting scenes with faces into the system. Here an infrared image is used to set the temperature threshold so that only the skin range remains after the background is removed, and then the extraction process begins. Fig. 9(a) shows the visual image, and Fig. 9(b) shows the result of facial range extraction. From these, we know that the facial range can be extracted with a high degree of accuracy. Next the low-luminosity range is extracted from the facial range, and then the roughly rectangular shape resulting from the positional relationship of the eye and eyebrow ranges is used to identify these facial parts from within the low-luminosity range (Fig. 9(c)). The method for identifying eyes and eyebrows is described in more detail below.
(b) Extraction result Fig. 9. Recognitionof eyes and eyebrows.
(c) Recognition result
156
H. Genno et al. / International Journal of Industrial Ergonomics 19 (1997) 147-159
cO
.
I
C
0 0 rr o
E
m i
1,.,,.
k.,,.
O
O'~
,.i.....a
Jl
° R
c '-E
o
0
~"
~-
r-O
-=
""
r"
"--
~ 0
_j
0
E 'ZE :55
CD
g T_
a
e
g
v
/~ouanbaJ-I
o
_C~ V
L~
t~
E ..Q
V
H. Genno et al. / International Journal of Industrial Ergonomics 19 (1997) 147-159
-'1
03 0
o
tO ~
C O~ 0 0
0 rr 0
E
..---
<
0
~
~
C
=
m
'! , ,,,~,-,-
C
E~
°
_..I 0
~
[
; I
/[0uenbeJ-i
o
0
.~
:~ _: b.T,
t~
E
° ~
0 o ~
v
157
158
H. Genno et al. / International Journal of Industrial Ergonomics 19 (1997) 147-159
Since eyes and eyebrows are contained within the relatively low-luminosity range in the facial range, then the eyes and eyebrows should be found within the range of extracted pixels with a luminosity level below a certain threshold. The threshold must include the eye and eyebrow ranges, and must be set so that it satisfies conditions that will enable separation of the eyes and eyebrows into four ranges for subsequent processing. Unfortunately, it is quite difficult to satisfy those conditions if the threshold is always set at a fixed value for all visual images because the luminosity of the images varies with the intensity of illumination. Fig. 10(a) shows a relatively bright visual image, while Fig. 1 l(a) shows a relatively dark visual image. It is clear from the luminosity histogram for the facial range in the bright visual image shown in Fig. 10(b) that there is a higher number of pixels on the high-luminosity side. By contrast, the luminosity histogram for the facial range in the dark visual image shown in Fig. 1 l(b) shows a higher number of pixels on the low-luminosity side. From this, it is clear that the aforementioned conditions cannot be satisfied if the luminosity threshold for extracting eyes and eyebrows is fixed for both figures. Here, therefore, the luminosity threshold is set using the P tile method (Takagi and Shimoda, 1991) based on the fact that the surface ratio of the eye and eyebrow ranges to the entire face is almost constant. In other words, the number of P% pixels from the low-luminosity side in the luminosity histogram for the facial range is counted to set the threshold for the luminosity values of the P%th pixel. This method can set threshold levels that satisfy the aforementioned conditions no matter how intense the illumination. With the test subjects shown in Figs. 9-11, P is about 6.2%. Next, the positional relationship of the eyes and eyebrows is used to identify these facial positions. First the center of gravity positions for the eyes and eyebrows are calculated for the extracted candidate ranges of the respective facial positions. Then we search for four sets that are located in a roughly rectangular positional relationship, and that fall within the center of gravity coordinates for each candidate range. If the result of the search yields a match, then the four ranges are recognized as eyes and eyebrows. If multiple sets are found, then the
positional relationship with four centers of gravity that form the most rectangular shape is the one selected. The method described above successfully identified the eyes and eyebrows within the facial area. In Fig. 9(c), the four center of gravity positions identified for eyes and eyebrows are marked with cross cursors.
4.2. Measuring skin temperature at specific locations Research by the authors to date strongly suggests the possibility that the location on the face where the temperature varies in line with stress is the nose, which is considered to be an extremity, and that forehead skin temperature can be used as a substitute for nose skin temperature during rest periods (Genno et al., 1996). The research also suggests the possibility that the skin temperature of the cheeks, chin, nose and ears can be used to evaluate thermal sensation (Genno et al., 1995). Since facial skin temperature can therefore be used to objectively evaluate human sensations, some means for continuously measuring skin temperature in these areas must be developed. Since the positional relationship of these parts to the eyes and eyebrows identified using the method described above is almost constant, the temperature of these parts can be measured with relative ease when required simply by storing their position relative to the eyes and eyebrows in the system prior to measurement. The system as it stands today has a 0.3-second cycle and can continuously measure the skin temperature at any five locations on the face automatically without physical contact.
5. Conclusion The method proposed in the present study enables continuous, non-contact skin temperature measurement at any location on the face without placing stress on test subjects. This method can be combined with a method that evaluates sensations based on facial skin temperature to create a sensory evaluation system that places absolutely no stress on test subjects. Unfortunately, the system has some drawbacks like large size and poor reliability that result from processing with EWS and image processing equip-
H. Genno et al. / International Journal of Industrial Ergonomics 19 (1997) 147-159
ment. T h e r e f o r e , the n e x t step in this line o f research will be to reduce the size of the system through hardware d e v e l o p m e n t , and to i m p r o v e the s y s t e m so that it will be easier to operate.
Acknowledgements This research was supported by M I T I ' s project on ' H u m a n Sensory M e a s u r e m e n t A p p l i c a t i o n T e c h n o l o g y ' , and w e w o u l d like to thank all those w h o contributed to this research.
References Genno, H., Fujiwara, Y., Yoneda, H. and Fukushima, K., 1990. Human sensory perception oriented image processing in a
159
color copy system. In: Proc. International Conference on Fuzzy Logic & Neural Networks, pp. 423-427. Genno, H., Matsumoto, K., Suzuki, R. and Fukushima, K., 1994. Sensory estimations using facial skin temperature. IEEE Denshi Tokyo, 33, 212-214. Genno, H., Matsumoto, K. and Fukushima, K., 1995. Evaluation of thermal sensation using facial skin temperature. Transactions of the Society of Instrument and Control Engineers, 21(7): 973-981. Genno, H., Ishikawa, K., Kanbara, O., Kikumoto, M., Fujiwara. Y., Suzuki, R., Osumi, M., 1996. Using facial skin temperature to objectively evaluate sensations. International Journal of Industrial Ergonomics, this issue. Hirono, Y., Kihara, S., Saijo, A. and Kawata, H., 1992. Segmentation of human images by infrared and color camera for the care of console operators. In: Proc. Japan-U.S.A. Symposium on Flexible Automation, No. 2. Takagi, M. and Shimoda, H., 1991. Image Analysis Handbook. Tokyo University Publisher's Association.