Design and implementation of a real-time eye tracking system

Design and implementation of a real-time eye tracking system

The Journal of China Universities of Posts and Telecommunications August 2013, 20(Suppl. 1): 1–5 www.sciencedirect.com/science/journal/10058885 http:...

428KB Sizes 1 Downloads 54 Views

The Journal of China Universities of Posts and Telecommunications August 2013, 20(Suppl. 1): 1–5 www.sciencedirect.com/science/journal/10058885

http://jcupt.xsw.bupt.cn

Design and implementation of a real-time eye tracking system PENG yan1 (), ZHOU Tian2, WANG Shao-peng2, CHENG Du2 1. School of Management, Capital Normal University, Beijing 100048, China 2. College of Information Engineering, Capital Normal University, Beijing 100048, China

Abstract Driver fatigue severely affects driver’s alertness and ability to drive safely. There are vital problems related to drivers fatigue on driving of trains, vehicles and airplanes. Therefore, the driver fatigue research is important. In this paper, we first study the impact of eye locations on face recognition accuracy, with Haar-like feature and AdaBoost classifier, face and eye area can be detected quickly and accurately. In the part of eye tracking, cam-shift based mean-shift algorithm is used to track the eyes. This method could automatically adjust the size of tracking window according to the different posture of driver. The performance of our eye detection method is validated by using image database with more than 6000 pictures. In addition, our real-time eye tracking system has been tested on railway line segment (China). There are 5 train drivers involved in the experiment. The validation shows that our eye detector has an overall 93% eye detection rate. Keywords

Harr-like feature, AdaBoost classifier, mean-shift algorithm, eye location, eye tracking

1 Introduction Although driving a vehicle seems like a simple activity to most people, distraction and lack of attention while driving can result in serious life-threatening accidents. As a result, an electronic device to control the driver’s awareness is needed. This device should monitor and detect driver’s drowsiness online and activate alarm system immediately [1]. The core of this research is to detect driver's eye and face quickly and exactly, and then judge fatigue extent of the driver considering physiology index [2]. In this paper, a real-time system for eyes tracking with a combination of unified algorithms and electronic device are proposed. The face and eye region detection process, and eye tracking process are performed. Remainder of this paper is organized as follows: a survey of typical existed eye tracking methods is presented in Sect. 2. The detailed algorithms of the proposed approach are presented in Sect. 3. Then in Sect. 4, framework of the whole detection system is described in detail, and

Received date: 15-07-2013 Corresponding author: PENG yan, E-mail: [email protected] DOI: 10.1016/S1005-8885(13)60260-5

experimental results are given as well. Section 5 is the conclusion of the paper.

2 Brief review on eye detection and tracking algorithm Eye recognition is based on face recognition [3]. Before we locate and detect the eye, we should distinguish the face with eye at first. With the same method, the eye would be located gradually. 2.1 Eye detection 2.1.1

AdaBoost learning algorithm

Adaptive boosting (AdaBoost) is an ensemble learning algorithm that can be used for classification or regression. AdaBoost can extract a large number of simple one-dimensional features which have particular discrimination when it is applied to detect the human face or eye. And the system will combine thousands of one-dimensional simple classifiers together to get high accuracy result. Traditional AdaBoost algorithm can be described as

2

The Journal of China Universities of Posts and Telecommunications

follows: suppose there is a training set that consists of n samples including (x1, y1),…(xn, yn). The yi={ − 1, +1} (i=1, 2,…, n) corresponds to false sample and true sample; each training sample has K simple features f j ( xi ) , (1jk; xi is the ith training sample). The j feature of the xi training sample forms the weak classifier h j ( xi ) : ⎧⎪1; p j f j ( xi )p jθ j h j ( xi ) = ⎨ (1) ⎪⎩0; otherwise where p j means the offset value of the inequality sign’s direction and only with two values which are 1 and −1 ; θ j is the threshold; there’s a binarization classifier for each input feature f j ( xi ) . The training aims at choosing weak classifiers which are better than others by analyzing the positive and negative samples in order to get a combined strong classifier. 2.1.2

Harr-like features

Haar-like features are digital image features used in object recognition. AdaBoost needs to extract a large number of one-dimensional simple features which can distinguish whether it is a face or an eye. Weak feature chosen by AdaBoost is a rectangle consisting of 1 to 4 small ones. A simple rectangular Haar-like feature can be defined as the difference of the sum of pixels of areas inside the rectangle, which can be put at any position and scale within the original image. This modified feature set is called 2-rectangle feature. Viola and Jones also defined 4-rectangle features [4]. The values indicate certain characteristics of a particular area of the image. They owe their names to their intuitive similarity with Haar wavelets and were used in the first real-time face detector. Fig. 1 shows the Harr-like feature Fig. 1

Harr-like feature

There are two ways to generate the rectangular characteristic matrix. One is the two-dimensional array, and the other is coordinate. Representation of the two-dimensional array is a 20× 20 array whose value of plot in black rectangle is −1 and the value of plot in white rectangle is 1. Otherwise, the value is 0. 2.2 Eye tracking 1) Mean-shift tracking algorithm Mean-shift is a procedure for locating the maxima of a

2013

density function given discrete data sampled from that function [5]. It is useful to detect the modes of this density. Mean-shift is a nonparametric density estimation method based on gradient. Its principle can be described as follows. Data analysis in computer vision always happens in hyperspace. Suppose in a d-dimensional Euclidean space. K H ( x ) is the kernel function of the space. The probability density estimation of the point x in R d space is: 1 n fˆ ( x ) = ∑ K H ( x − xi ) n i =1 where, K H ( x) = H



1 2

K( H



1 2

x)

(2)

(3)

In Eq. (3), where H is a d × d bandwidth matrix. The bandwidth H improves the flexibility of the estimate. The mean-shift algorithm can be used for visual tracking. Such a simple algorithm would create a confidence map in the new image based on the color histogram of the object in the previous image, and would use mean-shift to find the peak of a confidence map near the object's old position. Mean-shift algorithm is a variable step-size gradient ascent algorithm, and is also called Adaptive gradient ascent algorithm. 2) Cam-shift tracking algorithm Cam-shift algorithm is a tracking method that improves the mean-shift algorithm based on target color as feature. This algorithm can automatically resize the window size based on the dimension of face when a man comes up, leaves or changes pose. Cam-shift is primarily designed to perform efficient head and face tracking in a perceptual user interface. It is based on an adaptation of mean-shift that, given a probability density image, could find the mean (mode) of the distribution by iterating in the direction of maximum increase in probability density. The primary difference between Cam-shift and the mean-shift algorithm is that Cam-shift uses continuously adaptive probability distributions (that is, distributions that may be recomputed for each frame) while mean-shift is based on static distributions, which are not updated unless the target experiences significant changes in shape, size or color [6].

3

Our experiments on eyes detection and tracking

3.1 Experiments on eyes detection The result can be better if more samples are provided for classifier learning. For the relative quantity of positive and negative samples, number of negative samples should be

Supplement 1

PENG Yan, et al. / Design and implementation of a real-time eye tracking system

more than positive samples and the ideal proportion should be 2:1. In most cases, it is not necessary to meet this ratio strictly. The minute difference in number of the two kind’s samples has a very small influence on the detection accuracy of the classifier. We have 2 137 positive samples and 3 995 negative samples to train classifier. The positive samples are provided by a database company. We collect the negative samples by the “Global Fetch” and the website http://www.pdphoto.org. In order to guarantee the intensity of the classifier, we should ensure each positive sample is open, integrated and with clear human eye. Negative samples should be pictures with background and without human eye. Haar-like training requires the positive samples should have the same size, so we should change all the positive samples’ images into a uniform size with relevant software. The acceptable training input includes .vec file (positive sample) and .dat file (negative sample). After preparing all the sample files that training classifier needs, we use the following command to create and train a cascade classifier that is used to detect human eye: "D:\ProgramFiles\OpenCV\bin\haartraining.exe"–dataD :\Study\EyeDect\trainout-vecE:\Study\EyeDect\eyesample s\pos.vec -bg G:\store\negsamples.dat -npos 2137 -nneg 3995 -nonsym -mode ALL -w 20 -h 10. The parameter data is the storage path of cascade classifier after training, and this cascade classifier is also saved as an xml file in its parent path. Vec, bg present the positive or negative sample file’s storage path and the file name. Npos and nneg present the number of positive and negative samples. Sym [defult]/nonsym present whether the target object is going to be detected and vertical symmetry. If it is, the training speed of classifier will increase. Mode is the type of Haar-like feature set that is used to train the classifier. There are three values: Basic, vertical feature used only. Core, 45° revolve feature used only. All, both vertical and 45° revolve feature are used. w and h are the length and width (pixel) of samples. How long the classifier training will last is proportional to the number of cascade. If we do not set the number manually, Haar-like training program will provide a reasonable number according to the training sample automatically.

3

The training result of the classifier shown as follow: Number of features used: 23739 Parent node: 13 XXX 1 cluster XXX POS: 2003 2137 0.937295 NET: 3744 0.000149952 BACKGROUND PROCESSING TIME: 130.26 Required number of stages achieved. Branch training terminated. Total number of splits: 0 Cascade performance POS: 2003 2137 0.937295 NEG: 3744 0.000138245 BACKGROUND PROCESSING TIME: 113.32

With these processes, training of the classifier is completed. Finally, we can use the test sample set to detect the performance of classifier. With different kinds of given images, identify results are shown in Fig. 2.

(a) Image 1

(b) Image 2

(c) Image 3

(d) Image 4

Fig. 2

(e) Image 5 Identify result with our classifier

3.2 Experiment on eyes tracking Our experiment use the driver navigated video. We

4

The Journal of China Universities of Posts and Telecommunications

locate human face at first, and detect the eye. After that we track the eye with cam-shift algorithm. Result of this experiment show the tracking algorithm could solve the problem which is the difference of facial gesture and high speed of face movement. The experiment screenshots are shown in Fig. 3.

Product composition is shown as Fig. 4 and Fig. 5:

Fig. 4

(a) Front

2013

Host

(b) Right side

Fig. 5

Sensor

The product is deployed as Fig. 6 (c) Left side Fig. 3

Detection result of face and eye

4 System implementation This system applies driver’s eye state detecting arithmetic, which can detect driver’s face, eye region and eye state quickly and accurately. This system can locate, detect, track and distinguish a series of biological features intellectively by obtaining the video image of driver’s face (as well as other facial parts). The eye tracking algorithm takes both accurate tracking and real-time tracking. 4.1

Architecture of the integrated system

Host device: the host is a SBC embedded single-board computer. The factors needs five factors: 1) Adaptability to temperature and humidity of the environment. 2) Astigmatic design (SBC based single board, wireless, and shockproof). 3) Sealing and dust-free design. 4) Power consumption. 5) Cost control. Video device: the video image of production terminal are collected by near-IR camera and it should be: 1) Easy to install. 2) Adjustable angle and view. 3) Anti-dither design. 4) Compact size subject which is not to affect driver field and cab layout.

Fig. 6 Hardware Implementation of real-time system for eye tracking

Install camera: The camera in this system is fixed at the 20 degrees up (or below) in front of the driving position and its effective imaging distance is 50 cm to 150 cm. Device host: According to the specification, our device has big, middle, small three classes in pilot phase and it can adapt into different install environment and operating requirement. The setting host has no installation and initialization requirement. It can run automatically. Product main function: 1) Driver detection on guard: (driving to leave-voice prompt and alert). 2) Driver detection of state: (driving state-activity detection and voice prompt). 4.2 Algorithm Implementation in software Besides the technology realization based on eye condition detection and PERCLOS fatigue algorithm, the product development is referred to some assistant technologies such as detection of facial information, detection of mouth information, the face characteristic matching and so on. It has three based advantages in

Supplement 1

PENG Yan, et al. / Design and implementation of a real-time eye tracking system

veracity, practicability and operationally. It belongs to the ‘objective + direct’ detection technical branch of the fatigue driving detection method. Our algorithm mainly completes the eye detection module. In order to detect the status of driver accurately and in real-time under multifarious cases, the algorithm takes the restricted conditions of hardware device. 4.3

System performance

We implement the algorithm on a PC with a Pentium M 1.86 GHz CPU and OpenCV compile environment. It runs at the rate of 20 frame/s. It works for train drivers’ detection quite robustly. There are 5 train drivers taking part in our experiments. Because the system will be used in China, the interface is in Chinese. Off-post warning: Fig. 7 is the detection of on or off-the-job, Video of the tester has the status of off-post.

5

5 Conclusions This paper describes a series of algorithms from eye locating and eye tracking. In the eye locating part, we use the human face and eye detection technology based on Harr-like feature and AdaBoost classifier to obtain the eye minimum rectangle which is included in human face rapidly and accurately. In the aspect of eye tracking, the cam-shift algorithm is used to track the eye and different poses of the driver. This algorithm can explain the function of the product by appliance and effect used in the monitor system as well as the performance analysis in the final test. Based on the usage on practical locomotives, this algorithm plays an important role in driver status detection. Now, there’re still some shortcomings about the tracking system. For instance, the tracking result would be affected by position information, tracking in the condition that complicate light and whether it can be tracked if the target is partly overlapped and so on.

References  (Please go back to your driving seat!)

Fig. 7

Off-post warning

Posture warning: there is no meter board on the left. But the driver doesn’t look straight aheadand is not in the posture sensitivity rang. Fig. 8 is posture warning.

1. Antonino A A, Enzo P S. Comparative study on photometric normalization algorithms for an innovative, robust and real-time eye gaze tracker. Journal of Real-Time Image Processing, Special issue, Aug.09, 2011: 13p 2. Marco J. F, Jos´e M. A, Arturo E. Driver drowsiness warning system using visual information for both diurnal and nocturnal illumination conditions, EURASIP Journal on Advances in Signal Processing, 2010, 2010(3): 19p 3. Gong X, Wang G Y. Realistic face modeling based on multiple deformations.

The

Journal

of China

Universities of Posts and

Telecommunications2007, 14 (4): 110−117 4. Wang Q, Yang J Y. Eye location and eye state detection in facial images with unconstrained background. Journal of Information and Computing Science, 2005, 1(5): 284−289  (Please correct your driving posture!)

Fig. 8

Posture warning

5. Wang P, Matthew B. G, Qiang J. Automatic eye detection and its validation. Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR’05), 20−26 June 2005, San Diego,

Based on the personnel individual difference and self-study process, which the equipment fix in the train working under actual running environment, the statistics result shows that the detection accuracy rate is: 1) Driver’s off-post warning: >99%. 2) Driver’s posture warning: 90%~95%.

CA, USA. IEEE Computer Society 2005: 8p 6. John G. A, Richard Y. D. X, Jesse S. J. Object tracking using cam-shift algorithm and multiple quantized feature spaces, Visual Information Processing 2004. Proceedings of the 2003 Pan-Sydney Area Workshop on Visual Information Processing (VIP2003). CRPIT. 36. Sydney, Australia, ACS: 5p