Eye movements during scene understanding for biometric identification

ARTICLE IN PRESS JID: PATREC [m5G;July 15, 2015;21:22] Pattern Recognition Letters 000 (2015) 1–6 Contents lists available at ScienceDirect Patte...

Download PDF

315KB Sizes 0 Downloads 29 Views

Report

PDF Reader
Full Text

ARTICLE IN PRESS

JID: PATREC

[m5G;July 15, 2015;21:22]

Pattern Recognition Letters 000 (2015) 1–6

Contents lists available at ScienceDirect

Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec

Eye movements during scene understanding for biometric identiﬁcation ✩ Usman Saeed∗ University of Jeddah, Jeddah 21589, Saudi Arabia

a r t i c l e

i n f o

Article history: Available online xxx Keywords: Biometric identiﬁcation Eye movement Scene understanding

a b s t r a c t The human eye is rich in physical and behavioral attributes that can be used for biometric identiﬁcation. Eye movement is a behavioral attribute that can be collected non-intrusively for biometric identiﬁcation. Usually a task oriented visual stimulus is presented to the subject and his eyes are tracked using a video camera, which are then used for biometric identiﬁcation. The most common visual stimulus employed includes the moving object and free viewing. In this paper I have experimented with a novel task oriented visual stimulus i.e. scene understanding. In scene understanding the observers are instructed beforehand that they must perform a task based on the contents of the image/video that will be presented. A biometric identiﬁcation system has been developed based on the eye movements extracted during scene understanding. A compact and easy to extract feature vector based on clustering of eye movements has been proposed and tested using several publicly available databases and two classiﬁcation schemes. The results presented in this paper with a correct identiﬁcation rate of 85.72% are quite promising. Furthermore, I also provide comparative results by implementing three commonly used feature vectors for eye movements. © 2015 Elsevier B.V. All rights reserved.

1. Introduction The human eye provides several measures that can be used for biometric identiﬁcation, the most performant amongst them being physiological such as iris and retina patterns. However, the eye also exhibits a strong behavioral component visible in the form of eye movements. Eye movements depict the viewing behavior of a person i.e. what and how a person chooses to see a visual scene. Although the structure of the eye has an inﬂuence on the eye movements, but the features of the visual scene the person ﬁxates on and the skills and techniques one uses to gather information is mainly a behavioral attribute. Eye movements for biometric identiﬁcation are typically extracted by presenting a visual stimulus and tracking the eye using a video camera. This setup is contactless and non-intrusive, thus ideal for continuous authentication. The movements of the eye are mainly classiﬁed into four categories: saccades, smooth pursuit, vergence, and vestibulo–ocular movements [21]. Saccades are rapid motion of the eyes that are used to quickly change the ﬁxation point (the point of interest at which the eye is relatively stable). Saccades range in scale from small movements used during reading, to much larger movements required for ﬁnding an object in a scene. Smooth pursuit movements are slower ✩ ∗

This paper has been recommended for acceptance by Gabriella Sanniti di Baja. Tel.: +966 599566058. E-mail address: [email protected], [email protected]

than saccades and are used to keep the fovea aligned with the moving stimulus. Vergence movements align the fovea of each eye with targets located at different distances from the observer. Vestibulo– ocular movements are used to compensate for head movements, thus stabilize the eyes relative to the world. The nature of the task being performed determines the type of eye movements produced. In eye movement research the tasks are generally classiﬁed as free viewing, reading, object pursuit and scene understanding. Therefore I shall present the literature review of biometric identiﬁcation techniques categorized using the task employed to elicit the eye movements. During free viewing eye movements are highly dependent on the nature of the stimulus. Free viewing of image and video has been employed to gather eye movements which are then used for biometric identiﬁcation. In case of image based stimulus [6,16,23], ﬁxations are the most important whereas in video stimulus [14], ﬁxations, saccades and smooth pursuit movements have been found signiﬁcant. Human eye movements during reading [22], exhibit certain distinctive characteristics such as the average ﬁxation duration is about 225–250 ms and the average saccade is 8–9 character spaces. Another important characteristic called regression is that 10–15% of the time readers voluntarily move their eyes to material that they have already read before. For biometric identiﬁcation the subjects are required to read a speciﬁc text and eye movements are tracked as in Holland and Komogortsev [9]. Human eyes mainly exhibit smooth pursuit movements while following moving objects, but for biometric identiﬁcation [6,11,12,15]

http://dx.doi.org/10.1016/j.patrec.2015.06.019 0167-8655/© 2015 Elsevier B.V. All rights reserved.

Please cite this article as: U. Saeed, Eye movements during scene understanding for biometric identiﬁcation, Pattern Recognition Letters (2015), http://dx.doi.org/10.1016/j.patrec.2015.06.019

JID: PATREC 2

ARTICLE IN PRESS

[m5G;July 15, 2015;21:22]

U. Saeed / Pattern Recognition Letters 000 (2015) 1–6

and [18] two different moving objects have been employed. The ﬁrst type is moving smoothly on the screen while the second one is in the form of an object jumping from point to point in a uniform or random pattern. The eye movements that are most prominent in this category are smooth pursuit movements when the object is moving smoothly and saccades in the jumping object scenario. Scene understanding is deﬁned as gazing upon a visual stimulus to develop a basic understanding of the scene or to memorize the important features. The average ﬁxation duration in scene understanding is longer than in reading, and the average saccade distance is also larger [22]. It has been found that the task also has a strong impact on eye movements during scene understanding e.g. memorizing the scene for a later test. A comprehensive survey on the use of eye movements for biometric identiﬁcation has been compiled by me in Saeed [24]. During the survey I was unable to ﬁnd any study utilizing eye movements extracted from scene understanding based task for biometric identiﬁcation. Therefore in this paper I have employed scene understanding based visual stimulus and extracted the observer’s eye movements for biometric identiﬁcation. The observers in the databases utilized [8,20], were instructed beforehand that they must perform a task based on the contents of the image/video that will be presented to them. A biometric identiﬁcation system has been developed, including pre-processing, feature extraction and classiﬁcation. Several feature vectors based on clustering of eye movements have been extracted and tested using two different classiﬁcation methods. The proposed system has been tested on three publicly available databases and the results presented in this paper with a correct identiﬁcation rate of 85.72% are quite promising. Furthermore, three feature vectors commonly used for eye movements have also been implemented for comparison. Lastly I have also presented a comparative analysis of our results with some recent studies which have employed different task oriented visual stimuli. The rest of the paper is divided as follows. In Section 2 the proposed method is elaborated. In Section 3 the experiments and the results obtained are described. Finally the paper concludes with the remarks and future works in Section 4. 2. Proposed method

•

•

Furthermore, only the XY coordinates of the dominant eye are used to reduce data complexity. In some of the databases employed the parameter identifying the eye movement as a ﬁxation or saccade is present, this parameter is retained and used in the feature extraction phase. However in other databases, this parameter is not available, therefore the algorithm developed by Krassanakis et al. [17] was employed to identify ﬁxations.

2.2. Proposed features The pre-processing steps results in the XY coordinates of the dominant eye movements and a parameter signifying whether the current eye coordinate is a ﬁxation or a saccade. Using this information each eye tracking recording is represented by the raw XY coordinates of the eye movements, including both ﬁxation and saccades, given by VRfs in Eq. (1).

VR f s = [x1 , y1 , x2 , y2 , . . . , xn , yn ]

(1)

Next another vector VRf given by Eq. (2) was created which contains only the raw XY coordinates of ﬁxation points.

VR f =

x f 1, y f 1, x f 2, y f 2, . . . , x f n, y f n

Next several feature vectors were extracted which are described below. (1) Clustered XY coordinates It was observed that the raw XY coordinates contain a large amount of redundant information due to the ﬁxations, when the eye does not move. Therefore k-mean clustering [19] was applied to reduce this redundancy by using only cluster centroids. The ﬁrst feature vector VCfs was created by applying clustering to the raw XY coordinates vector VRfs consisting of both ﬁxations and saccades. Thus VCfs given by Eq. (3) contains the XY coordinates of the cluster centroids consisting of both ﬁxation and saccades.

VC f s = [xc1 , yc1 , xc2 , yc2 , . . . , xcn , ycn ]

2.1. Pre-processing The databases used in our study have been designed for human eye movement analysis under cognitive load, whereas the objective of this study is biometric identiﬁcation using eye movements. Therefore certain pre-processing steps are required, which are described below. •

•

•

Some observers in the databases were not instructed to perform the scene understanding task and were free to view as they please. These observers were removed as they were not relevant to our study. The databases were arranged according to the objective of their designers, which was mostly inclined towards eye movement analysis. Therefore the databases were rearranged to separate the videos according to the observer’s identity. Next, data such as time stamp, pupil diameter etc. which is not employed for biometric identiﬁcation in this study was removed. Leaving behind only XY coordinates of eye movements and parameters identifying dominant eye, ﬁxations/saccades.

(3)

The second feature vector VCf was created by applying clustering to the raw XY coordinates vector VRf consisting of only ﬁxations. Thus VCf given by Eq. (4) contains the XY coordinates of the cluster centroids consisting of only ﬁxation.

VC f = xc f 1 , yc f 1 , xc f 2 , yc f 2 , . . . , xc f n , yc f n In this section I shall present the steps of the proposed system. First, depending on the database employed various pre-processing steps are applied. Next, various features pertinent to biometric identiﬁcation are extracted and ﬁnally classiﬁcation methods are applied for biometric identiﬁcation.

(2)

(4)

(2) Clustered XY coordinates with time parameter One of the side effects of clustering which was observed was that repeat ﬁxations (when the observer returns to view the same point after some time) were combined into a single cluster. To avoid this situation and retain the independence of repeat ﬁxations a time parameter t was added to the raw XY coordinates vector VRfs and VRf to create new vectors VRfst and VRft given by Eqs. (5) and (6) respectively.

VR f st = [x1 , y1 , t1 , x2 , y2 , t2 , . . . , xn , yn , tn ] where {t ∈ Z; t > 0}

VR f t =

x f 1 , y f 1 , t f 1 , x f 2 , y f 2 , t f 2 , . . . ., x f n , y f n , t f n

(5)

(6)

where {t ∈ Z; t > 0}. Next clustering was applied and two feature vectors were extracted. The ﬁrst feature vector VCfst as in Eq. (7), was created by applying clustering to VRfst and it contains the XY coordinates of the cluster centroids containing both ﬁxations and saccades and the time parameter.

VC f st = [xc1 , yc1 , tc1 , xc2 , yc2 , tc2 , . . . , xcn , ycn , tcn ]

(7)

The second feature vector VCft as in Eq. (8), was created by applying clustering to VRft and it contains the XY coordinates of the cluster centroids containing only ﬁxations and the time parameter.

VC f t = xc f 1 , yc f 1 , tc f 1 , xc f 2 , yc f 2 , tc f 2 , . . . , xc f n , yc f n , tc f n

(8)

Please cite this article as: U. Saeed, Eye movements during scene understanding for biometric identiﬁcation, Pattern Recognition Letters (2015), http://dx.doi.org/10.1016/j.patrec.2015.06.019

ARTICLE IN PRESS

JID: PATREC

[m5G;July 15, 2015;21:22]

U. Saeed / Pattern Recognition Letters 000 (2015) 1–6

3

Table 1 Details of databases.

Database

Number of observers

Number of stimuli classes

NETRs/ observer

NETRs/stimulus class/observer

Hollywood Ucfsa IRCCyN /IVC

12 12 30

12 activities 9 actions 5 distortion levels

1707 150 100

∼150 4–20 20

Table 2 Percentage correct identiﬁcation rates for CART.

2.3. Features for comparison In this section I shall describe the three commonly used feature vectors that I have implemented for comparison with the proposed feature vectors.

Feature vector

Proposed

(1) Raw XY coordinates Raw XY coordinates of eye movements have been used extensively in eye movement research, but Kasprowski and Rigas [13] have compared raw eye movements with several data pre-processing techniques and found that the raw eye movements outperform all the studied methods. Therefore the ﬁrst comparative feature vectors VRfs and VRf consisting of raw XY coordinates of eye movements are given by Eqs. (1) and (2) respectively.

VCf VCfs VCft VCfst VRf VRfs VPCA VM

This approach is based on the feature vector proposed by Bednarik et al. [1] with some modiﬁcations. Firstly the original authors have tested several dimensionality reduction methods and found Principal Component Analysis (PCA) to be the most performant for eye movement data, therefore I restricted my study only to PCA. Secondly the authors utilized several attributes of eyes, some of which are based on eye physiology and others on eye movements. As this study is primarily focused on eye movements as a means of biometric identiﬁcation, the feature vector VPCA was created by applying PCA to the raw XY coordinates vector VRfs . (3) Metrics of saccades and ﬁxation This feature vector is based on the feature vector proposed by Holland and Komogortsev [9]. The authors have extracted several metrics from saccades, ﬁxations and scanpaths and studied their utility for person identiﬁcation. Out of the numerous metrics studied they found that ﬁxation count (FC), average ﬁxation duration (AFD), average vectorial saccade velocity (ASV) and average horizontal saccade amplitude (HSA) produced the best results. Therefore these metrics and one metric i.e. average vertical saccade amplitude (VSA) that did not perform very well in their study, were combined to create a feature vector. The most likely reason behind the poor performance of VSA is that the reading task was employed by the authors which restricts the eye movements mainly along horizontal lines. I believe due to the nature of the scene understanding task in our study, this metric may provide better results. Further details of the feature extraction process is available in Holland and Komogortsev [9]. The extracted metrics were then concatenated to form the feature vector VM given by Eq. (9). (9)

2.4. Classiﬁcation Support Vector Machines (SVM) [5] and Classiﬁcation and Regression Trees (CART) [2] have been employed for classiﬁcation as they have shown good results in previous studies [6,12].

Holywood DB

Ucfsa DB

IRCCyN/IVC DB

78.9444 81.7444 80.1333 82.3444 56.3646 58.7662 77.7434 46.7682

77.2222 77.2222 78.3333 80.1111 56.4800 56.7800 76.8842 43.1627

62.5333 66.0000 63.6000 66.0000 39.3259 41.0188 59.7800 32.0000

Table 3 Percentage correct identiﬁcation rates for SVM. Feature vector

(2) Principal Component Analysis on eye movements

VM = [FC, AF D, ASV, HSA, V SA]

Comparative

Correct identiﬁcation rates (%)

Proposed

Comparative

Correct identiﬁcation rates (%)

VCf VCfs VCft VCfst VRf VRfs VPCA VM

Holywood DB

Ucfsa DB

IRCCyN/IVC DB

78.4889 78.5467 84.8222 85.7222 54.3601 56.771 75.6145 45.6300

75.1111 75.6667 77.3333 81.1111 52.0200 55.6689 74.4778 43.4433

77.6667 78.0667 84.0667 84.7333 55.2145 55.7601 77.1767 51.5880

3. Experiments and results In this section I will describe the databases that have been employed for the experiments, the experimental setup and the results obtained. 3.1. Databases (1) Actions in the eyes [20] The ﬁrst eye movement database that I have employed consists of two distinct datasets; the ﬁrst one utilizes the Hollywood-2 Movie Dataset as the stimulus. It contains videos of 12 activity classes: answering phone, driving a car, eating, ﬁghting, getting out of a car, shaking hands, hugging, kissing, running, sitting down, sitting up and standing up. Most of the activity classes have 150 eye tracking recordings for each observer, but some have less. The second one uses the UCF Sports Action Dataset as stimulus which was collected mostly from broadcast television channels containing 9 sports action classes: diving, golf swinging, kicking, lifting, horseback riding, running, skateboarding, swinging and walking. The number of eye tracking recordings per person for each action varies from 4 to 20. The stimuli videos were shown to 16 human volunteers (9 male and 7 female), split into two groups. The ﬁrst active group consisting of 12 observers, had to solve an action recognition task, whereas the second free viewing group consisting of 4 observers, was not required to solve any speciﬁc task while being presented with the videos in the two datasets. Eye movements were recorded using an SMI iView X HiSpeed 1250 eye tracker, with a sampling frequency of 500 Hz.

Please cite this article as: U. Saeed, Eye movements during scene understanding for biometric identiﬁcation, Pattern Recognition Letters (2015), http://dx.doi.org/10.1016/j.patrec.2015.06.019

ARTICLE IN PRESS

JID: PATREC 4

[m5G;July 15, 2015;21:22]

U. Saeed / Pattern Recognition Letters 000 (2015) 1–6

Table 4 Comparative results. Ref.

Proposed method [12]

Database

Eye measures

Extracted features

Classiﬁer

Results

SMI iView

XY coordinate

Clustering of XY coordinate

CART and SVM

85.72% identiﬁcation rate with SVM

Moving object

Ober 2

XY coordinate

XY coordinate

J48, random Forest, Naïve Bayes and SVM. Decision Tree, KNN, Bayesian network, and SVM

Identiﬁcation rate 54% with SVM Identiﬁcation rate Set A 93.6% Set B 91.1% with SVM Best result of 86% identiﬁcation rate with Mel based with no time interval using raw dataset Identiﬁcation rate of 82% with SVM

No. of subjects

Stimulus task

Device

Hollywood: 12 Ucfsa: 12 IRCCyN: 30 37

Scene understanding

[6]

Set A 48 Set B 79

Moving object

Ober 2

XY coordinate, cye difference, eye velocity

MFCC

[13]

79

Moving object

Ober 2

XY coordinate

XY coordinate

Mel based, graph based, C4.5 and random forest

[18]

5

Moving object

Tobii X120

XY coordinate

Neural networks and SVM

[15]

68

Moving object

Tobii X120

XY coordinate

Acceleration, geometric, muscle property OPM using saccades

[16]

59

Moving object

EyeLink 1000

XY coordinate

[11]

EOG 40 VOG 40

Moving object

EOG & VOG

Saccades

OPM using saccades Amplitude, latency, accuracy, maximum angular velocity

T-test with voting, hoteling’s T-square test KNN, linear and quadratic discriminant analysis, naive Bayes

[23]

15

CRS eye track

XY coordinate

112

Tobii 1750

XY coordinate

Frobenius norm

EER 25%

Deravi et al. (2011)

3

Free viewing of images

Webcam

KNN, SVM, LDC and Fisher classiﬁer

Identiﬁcation rate 100% with Fisher classiﬁer

[14]

17

Tobii

UBM with ML

Error rate EER 28.7%

[9]

32

Free viewing of video Reading

X-Y coordinate, interocular distance, pupil diameter XY coordinate

Eye Link II

XY coordinate

GCDF

Error rate EER 27%

[1]

12

Reading, free viewing, moving object

Tobii ET-1750

KNN

Best results of 90% identiﬁcation rate with FFT of distance between the reﬂections

[10]

Exp 1: 22 Exp 2: 32 Exp 3: 28

Reading, free viewing moving object, password

Tobii TX300 Eye link 1000 play station eye

Pupil diameters, distance between the reﬂection velocities delta pupil diameters XY coordinate

Graph representation Graph representation using ﬁxations X-Y coordinate, interocular distance, pupil diameter Histogram of angles Saccade, ﬁxation, scanpath based features FFT, PCA, FFT+PCA.

GED with KNN

[3]

Free viewing of face images Free viewing of face images

Identiﬁcation rate VOG 93% EOG 97% with quadratic discriminant analysis Error rate EER 30%

GCDF

Best results of 26.5% EER for reading stimulus.

(2) IRCCyN/IVC 2009 [8] The second eye movement database employs 20 video sequences referred to as SEQR from the Video Quality Experts Group (VQEG) as stimulus. These 20 videos were selected based on the saliency and the spatial and temporal characteristics of the content. Next packet loss was introduced into these videos such that the distortions appear either in a salient region or non-salient region. Four distorted video sequences were created and referred to as SEQS 0.4, SEQN 0.4, SEQS 1.2, and SEQN 1.2, where 0.4 and 1.2 relates to error propagation length and the indices S and N, respectively, denote the salient and the non-salient region. The videos were presented on a LVM-401W full HD screen by TVlogic. The observers were seated at a distance of about 150 cm from the display. Thirty non-expert observers participated in the experiment (10 female, 20 male) with an average age of about 23 years. The observers were presented the 100 test sequences (20 original, 80 distorted) in a pseudo random order. The 5-point impairment scale was used to assess the annoyance of the distortions in the sequences. The

Saccade, ﬁxation, scanpath based features.

KNN, C4.5

Error rate FAR 5.4% FRR 56.6% with KNN Best result of HTER 19%

observers were asked to assign one of the following adjectival ratings to each of the sequences: ‘Imperceptible (5)’, ‘Perceptible, but not annoying (4)’, ‘Slightly annoying (3)’, ‘Annoying (2)’, and ‘Very annoying (1)’. Details of the databases, such as number of stimuli classes, number of eye tracking recordings (NETRs) per observer and NETRs per stimulus class are given in Table 1.

3.2. Experimental setup In the pre-processing step (Section 2.1), videos belonging to the 4 observers of the free viewing group were removed from the Actions in the Eyes Database (Section 3.1) as they are not relevant to this study. In k-mean clustering of the feature extraction step (Section 2.2), I experimented with 30 to 100 clusters per eye tracking recording and found 80 to provide the optimal results. Also in k-mean clustering if a cluster lost all of its members a new cluster is created consisting of the one observation furthest from the current clusters centroid.

Please cite this article as: U. Saeed, Eye movements during scene understanding for biometric identiﬁcation, Pattern Recognition Letters (2015), http://dx.doi.org/10.1016/j.patrec.2015.06.019

JID: PATREC

ARTICLE IN PRESS

[m5G;July 15, 2015;21:22]

U. Saeed / Pattern Recognition Letters 000 (2015) 1–6

In the features extracted for comparison (Section 2.3), PCA was applied to the raw XY coordinates vectors VRfs such that 90% of the variability was conserved. Thus, the size of the resultant feature vector varies with the database. e.g. in the Hollywood Database, the average size of the XY coordinates vectors VRfs is approximately 8000 and after PCA is applied, the size of a feature vector in VPCA is approximately 300. In the classiﬁcation step (Section 2.4), the selection for the training and testing dataset was random and independent of the stimulus class. Therefore all the eye tracking recordings of each person in a database regardless of the stimulus class were combined and the training and testing datasets were created by randomly selecting half of the recordings of a person for training and the remaining half for testing. The reason behind this choice, was that the study is primarily focused on how the observer searched for information, not what is being observed. For the CART classiﬁer, Gini index was used as the optimization criteria and 10 fold cross validation to select the optimal size of the classiﬁcation tree. For the SVM classiﬁer, scaling was ﬁrst applied to map the data to the range [0–1]. Next an SVM was trained using a Radial Basis Function (RBF) kernel with parameters estimated using the grid search method proposed by Chang and Lin [4]. 3.3. Results The results obtained for the various feature vectors are depicted as percentage correct identiﬁcation rates in Table 2 for the CART classiﬁer and in Table 3 for the SVM classiﬁer. Certain trends can be observed in the results; ﬁrstly the feature vectors that include both ﬁxations and saccades outperform feature vectors with only ﬁxations. This trend is visible for all the databases tested. The second trend that can be observed is that the inclusion of time parameter t improves the recognition results as compared to the feature vectors without t parameter. The best result of 85.72% correct identiﬁcation rate was obtained for the feature vector, VCfst which contains the XY coordinates consisting of both ﬁxations and saccades after clustering and the inclusion of the time parameter using the SVM classiﬁer. With regards to the comparative features (Section 2.3), it can be observed that clustering proposed in this study improves the recognition results in comparison to the raw feature vectors proposed by Kasprowski and Rigas [13]. Furthermore, the clustered features also outperform the PCA based features proposed by Bednarik et al. [1]. This performance gap is further increased by the inclusion of the time parameter. Finally the worst results were obtained by metrics of saccades and ﬁxation features proposed by Holland and Komogortsev [9]. The most probable reason for the poor performance may be that the original study was conducted using a reading task containing considerably more saccades and ﬁxations than our study, which was based on the scene understanding task. Both CART and SVM classiﬁers exhibit similar performance except in the case of IRCCyN/IVC database, where the CART classiﬁer performs poorly as compared to the SVM. However the feature vector trends mentioned above are still clearly noticeable. Table 4 depicts the results of our proposed study in comparison with some other recent studies based on different task oriented visual stimulus. A direct comparison is meaningless as it can be observed there exist major differences in the studies such as the number of observer in the databases, the equipment for eye movement tracking etc. However, our study outperforms most comparable studies except three, the ﬁrst two [6,13] have employed the moving object task and the third [1] is based on a multi-task approach combining reading, free viewing and moving object. I have already shown that under similar experimental conditions the proposed feature vectors

exhibit better performance as compared to the raw eye movement features proposed by Kasprowski and Rigas [13] and the PCA based features proposed by Bednarik et al. [1]. Two more studies provided better results, however they are not really comparable. The ﬁrst, [7] has been tested on only 3 observers. The second, [11] have employed specialized equipment (EOG & VOG), which renders a direct comparison of results irrelevant. 4. Conclusions In this paper I have presented a biometric identiﬁcation system using eye movements extracted while observers are presented with a scene understanding based visual stimulus. To the best of my knowledge, this is the ﬁrst time eye movements from scene understanding based task have been used for biometric identiﬁcation. Several feature vectors based on clustering have been extracted and two classiﬁers i.e. CART and SVM have been adopted. The proposed system has been tested on several publicly available databases and the results presented in this paper are quite promising. An observation that can be made from the results is that both ﬁxations and saccades provide valuable information to person identiﬁcation and outperform feature vectors with only ﬁxations. However, ﬁxations also contain redundant information that can be effectively removed by clustering to improve identiﬁcation results. Furthermore adding a time parameter to the feature vector before clustering further improves the identiﬁcation results. With regards to future works, one of the major hindrances in this ﬁeld of research is the lack of an eye movement database of considerable size extracted using scene understanding task. Another future work may be to combine various tasks such as moving object, reading and scene understanding to improve identiﬁcation results. References [1] R. Bednarik, T. Kinnunen, A. Mihaila, P. Fränti, Eye-movements as a biometric, Lect. Notes Comput. Sci. 354 (2005) 780–789. [2] L. Breiman, J. Friedman, R. Olshen, C. Stone, Classiﬁcation and Regression Trees, CRC Press, Boca Raton, FL, 1984. [3] V. Cantoni, C. Galdi, M. Nappi, M. Porta, D. Riccio, GANT: gaze analysis technique for human identiﬁcation, Pattern Recognit 48 (2015) 1027–1038. [4] C.-C. Chang, C.-J. Lin, LIBSVM: a library for support vector machines, ACM Trans. Intell. Syst. Technol. 2 (2011) 1–27. [5] C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn. 20 (1995) 273–297. [6] N.V. Cuong, V. Dinh, L.S. Tung, Mel-frequency Cepstral coeﬃcients for eye movement identiﬁcation, in: Proceedings of the IEEE 24th International Conference on Tools with Artiﬁcial Intelligence (ICTAI), 2012, pp. 253– 260. [7] F. Deravi, S.P. Guness, Gaze Trajectory as a Biometric Modality, in: F. Babiloni, A.L.N. Fred, J. Filipe, H. Gamboa (Eds.), BIOSIGNALS, SciTePress, 2011, pp. 335–341. [8] U. Engelke, M. Barkowsky, P. Le Callet, H-J. Zepernick, Modelling saliency awareness for objective video quality assessment, in: Proceedings of the International Workshop on Quality of Multimedia Experience (QoMEX), 2010, pp. 212– 217. [9] C. Holland, O.V. Komogortsev, Biometric identiﬁcation via eye movement scanpaths in reading, in: Proceedings of the International Joint Conference on Biometrics (IJCB), 2011, pp. 1–8. [10] C. Holland, O.V. Komogortsev, Biometric veriﬁcation via complex eye movements: the effects of environment and stimulus, in: Proceedings of the IEEE Fifth International Conference on Biometrics: Theory, Applications and Systems (BTAS), 2012, pp. 39–46. [11] M. Juhola, Y. Zhang, J. Rasku, Biometric veriﬁcation of a subject through eye movements, Comput. Biol. Med. 43 (2013) 42–50. [12] P. Kasprowski, The impact of temporal proximity between samples on eye movement biometric identiﬁcation, in: Proceedings of the12th International Conference IFIP TC8, 2013, pp. 77–87. [13] P. Kasprowski, I. Rigas, The inﬂuence of dataset quality on the results of behavioral biometric experiments, in: Proceedings of the International Conference of the Biometrics Special Interest Group (BIOSIG), 2013, pp. 1–8. [14] T. Kinnunen, F. Sedlak, R. Bednarik, Towards task-independent person authentication using eye movement signals, in: Proceedings of the Symposium on EyeTracking Research and Applications (ETRA’10), 2010, pp. 87–90. [15] O.V. Komogortsev, S. Jayarathna, C.R. Aragon, M. Mahmoud, Biometric identiﬁcation via an oculomotor plant mathematical model, in: Proceedings of the Symposium on Eye-Tracking Research & Applications, 2010, pp. 57–60. [16] O.V. Komogortsev, A. Karpov, L. Price, C. Aragon, Biometric authentication via Oculomotor plant characteristic, in: Proceedings of the IEEE/IARP International Conference on Biometrics (ICB), 2012, pp. 1–8.

Please cite this article as: U. Saeed, Eye movements during scene understanding for biometric identiﬁcation, Pattern Recognition Letters (2015), http://dx.doi.org/10.1016/j.patrec.2015.06.019

5

JID: PATREC 6

ARTICLE IN PRESS

[m5G;July 15, 2015;21:22]

U. Saeed / Pattern Recognition Letters 000 (2015) 1–6

[17] V. Krassanakis, V. Filippakopoulou, B. Nakos, EyeMMV toolbox: an eye movement post-analysis tool based on a two-step spatial dispersion threshold for ﬁxation identiﬁcation, J. Eye Mov. Res. 7 (2014) 1–10. [18] Z. Liang, F. Tan, Z. Chi, Video-based biometric identiﬁcation using eye tracking technique, in: Proceedings of IEEE International Conference on Signal Processing, Communication and Computing (ICSPCC), 2012, pp. 728–733. [19] J. MacQueen, Some methods for classiﬁcation and analysis of multivariate observations, in: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1967, pp. 281–297. [20] Mathe, S., Sminchisescu, C., 2012. Actions in the eye: dynamic gaze datasets and learnt saliency models for visual recognition. Technical Report, Institute of Mathematics at the Romanian Academy and University of Bonn.

[21] D. Purves, G.J. Augustine, D. Fitzpatrick, et al., Neuroscience 2nd edition, Sinauer Associates, Sunderland (MA), 2001. [22] K. Rayner, Eye movements and attention in reading, scene perception, and visual search, Q. J. Exp. Psychol. 62 (2009) 1457–1506. [23] I. Rigas, G. Economou, S. Fotopoulos, Biometric identiﬁcation based on the eye movements and graph matching techniques, Pattern Recognit. Lett. 33 (2012) 786–792. [24] U. Saeed, A Survey of automatic person recognition using eye movements, Int. J. Pattern Recognit. Artif. Intell. 28 (2014) 21 pages.

Please cite this article as: U. Saeed, Eye movements during scene understanding for biometric identiﬁcation, Pattern Recognition Letters (2015), http://dx.doi.org/10.1016/j.patrec.2015.06.019

Eye movements during scene understanding for biometric identification

Eye movements during scene understanding for biometric identification

Recommend Documents