Biomedical Signal Processing and Control 47 (2019) 159–167
Contents lists available at ScienceDirect
Biomedical Signal Processing and Control journal homepage: www.elsevier.com/locate/bspc
EOG-based eye movement detection and gaze estimation for an asynchronous virtual keyboard Nathaniel Barbara ∗ , Tracey A. Camilleri, Kenneth P. Camilleri Department of Systems and Control Engineering, Faculty of Engineering, University of Malta, Msida MSD2080, Malta
a r t i c l e
i n f o
Article history: Received 28 March 2018 Received in revised form 13 June 2018 Accepted 10 July 2018 Keywords: Electrooculography Gaze estimation Eye movement detection Saccades Blinks Virtual keyboard
a b s t r a c t This work aims to develop a novel electrooculography (EOG)-based virtual keyboard with a standard QWERTY layout which, unlike similar state-of-the-art systems, allows users to reach any icon from any location directly and asynchronously. The saccadic EOG potential displacement is mapped to angular gaze displacement using a novel two-channel input linear regression model, which considers features extracted from both the horizontal and vertical EOG signal components jointly. Using this technique, a gaze displacement estimation error of 1.32 ± 0.26◦ and 1.67 ± 0.26◦ in the horizontal and vertical directions respectively was achieved, a performance which was also found to be generally statistically significantly better than the performance obtained using one model for each EOG component to model the relationship in the horizontal and vertical directions separately, as typically used in the literature. Furthermore, this work also proposes a threshold-based method to detect eye movements from EOG signals in real-time, which are then classified as saccades or blinks using a novel cascade of a parametric and a signal-morphological classifier based on the EOG peak and gradient features. This resulted in an average saccade and blink labelling accuracy of 99.92% and 100.00% respectively, demonstrating that these two eye movements could be reliably detected and discriminated in real-time using the proposed algorithms. When these techniques were used to interface with the proposed asynchronous EOG-based virtual keyboard, an average writing speed across subjects of 11.89 ± 4.42 characters per minute was achieved, a performance which has been shown to improve substantially with user experience. © 2018 Elsevier Ltd. All rights reserved.
1. Introduction Computers are nowadays regarded as being ubiquitous, generally requiring very little effort to use. However, individuals with mobility impairments, such as those diagnosed with Amyotrophic Lateral Sclerosis (ALS) or paralysed stroke patients, may be seriously challenged in their autonomy and control of such devices. Despite the limitations imposed by the different conditions, the eyes are typically the last organs to be affected and hence, an eye movement-based human–computer interface (HCI) system could provide an alternative communication channel to such intelligent systems, giving the individuals suffering from these conditions more independence and an enhanced quality of life [1]. In recent years, such eye-based HCIs have been widely developed using videooculography (VOG)-based techniques, which use cameras and image processing algorithms to track the user’s ocular pose. Although VOG-based techniques yield a better resolution
∗ Corresponding author. E-mail address:
[email protected] (N. Barbara). https://doi.org/10.1016/j.bspc.2018.07.005 1746-8094/© 2018 Elsevier Ltd. All rights reserved.
than electrooculography (EOG)-based techniques, they are known to be computationally demanding, susceptible to the lighting conditions, sensitive to the user’s movements and also normally require an external illumination source. Alternative eye movement recording techniques include infrared reflection oculography, which is generally restricted to the recording of horizontal eye movements only; or the scleral search coil technique, which is semi-invasive as it requires the user to wear contact lenses with embedded coils [2]. EOG, on the other hand, can offer a good alternative solution to these techniques by capturing the electrical activity generated by the human eye, which could be regarded to behave like an electric dipole, having the positive and negative poles at the cornea and retina respectively. In fact, this is known to create a potential difference varying in the range of 0.4–1.0 mV, referred to as the corneo-retinal potential (CRP), which creates an electrical field. Specifically, EOG captures the electrical activity generated by the CRP non-invasively, using a set of gel-based electrodes, attached to the face in peri-orbital positions around the eyes [2,3]. This work is concerned with the use of EOG signals to interact with a virtual keyboard application. State-of-the-art EOG-based
160
N. Barbara et al. / Biomedical Signal Processing and Control 47 (2019) 159–167
virtual keyboards typically require users to perform repetitive up, down, left, right and possibly oblique saccadic movements to hover over icons in discrete fixed-sized steps [1,4,5], or to make subsequent icon selections by performing eye movements originating from the centre of the screen towards a set of icons placed at the periphery to transcribe each character [6,7]. In contrast, the proposed virtual keyboard allows users to reach any icon from anywhere on the screen. Specifically, this is implemented by modelling the voltage–angle relationship of eye movements in EOG signals to allow the subject’s saccadic angular displacement to be directly estimated as opposed to simply detecting the direction of the saccade; thus, the subject can traverse from one target to a final target destination in one step, thereby eliminating the restriction of having to pass through intermediary locations or to repetitively originate eye movements from a central location on the screen. The proposed virtual keyboard is also controlled asynchronously, thus not requiring the eye movements to be performed within cued intervals [6,7]. Specifically, this is implemented by proposing a novel technique to detect the user’s saccadic movements and distinguish them from blinks by processing the EOG signals in realtime. This also permits the user to perform specific blink sequences that are detected to address the Midas touch problem, an aspect which is typically neglected in the literature. In the literature, the voltage–angle relationship of eye movements in EOG signals is typically modelled by analysing the horizontal and vertical EOG components separately, specifically by adopting one model for each EOG component to model the relationship between the gaze angles and EOG potential in the horizontal and vertical directions separately [8–10]. The correctness of this method however, depends on the assumption that the horizontal EOG signal component is only a function of the horizontal ocular displacement, and similarly the vertical EOG signal component is only a function of the vertical ocular displacement, which may not be typically the case in practice, such as due to misalignment between the horizontal and vertical EOG electrode pairs and the horizontal and vertical ocular dipole axes [11]. This was observed by Lee et al. [12] wherein each EOG component was represented by a linear model depending on both the horizontal and vertical displacements. Here the dependence of each EOG component on the horizontal and vertical ocular displacements is further studied. Consequently, this work investigates whether both components ought to be used individually or jointly when estimating the gaze angles, specifically by proposing a two-channel input linear regression model, using features extracted from both EOG components jointly, and by comparing this against state-of-the-art methods comprising one model for each EOG component. On the other hand, the limited literature available with regard to asynchronous eye movement detection from EOG signals is typically restricted to the detection of blinks and a discrete set of saccadic movements of one particular displacement in four [1,4] or eight directions [3,5] only. This restricts users to hover over the screen in fixed-size steps in discrete directions only, as previously indicated. State-of-the-art blink and saccade detection techniques are typically based on amplitude and duration thresholds [3–5]. Template-matching based approaches are also typically used, particularly for blink detection [1,11], which however suffer from a long labelling delay as they have to wait for the acquisition of the entire EOG segment prior to labelling the eye movements. In view of this, we propose a novel technique which exploits the saccade and blink EOG peak-gradient feature distribution to distinguish between blinks and saccades, of any displacement and direction, from EOG signals in real-time. Thus, in summary, the main contributions of this work include: (i) a statistical analysis of the dependence of each EOG signal component on the horizontal and vertical ocular displacements, henceforth referred to as the dependence test; (ii) a two-channel
Fig. 1. EOG electrode configuration.
input regression model to estimate the ocular pose; (iii) a novel technique to detect and label eye movements by processing the EOG signals in real-time; and (iv) the combined use of these techniques to interact with a virtual keyboard. The rest of the paper is divided as follows; Section 2 focuses on the acquisition and processing of EOG signals, followed by Section 3 which presents the dependence test as well as the proposed two-channel input regression model. The proposed real-time eye movement detection and labelling technique is presented in Section 4, while Section 5 presents the ocular pose estimation and eye movement labelling performance obtained using the proposed techniques. Finally, an asynchronous EOG-based virtual keyboard is presented in Section 6 together with its performance as tested by a number of subjects. Section 7 concludes this paper. 2. Acquisition and processing of EOG signals 2.1. Acquisition of EOG signals The acquisition of EOG signals was approved by the University Research Ethics Committee (UREC) at the University of Malta and before each recording session subjects had to provide their informed consent. Subjects were placed approximately 60 cm away from a 24 in. LCD monitor, with their head held immobile by means of ophthalmic chin and forehead rests. During these sessions, they were instructed to follow onscreen instructions as will be discussed further in the sections which follow. The EOG electrode configuration was set as shown in Fig. 1, with two electrodes placed to the left and right of the respective outer canthi and another pair placed above and under the right eye. A ground (‘G’) and a reference (‘R’) electrode were also placed on the forehead and on the mastoid behind the left ear respectively, as shown. EOG data was recorded using the g.tec g.USBamp biosignal amplifier (g.tec medical engineering GmbH, Austria) with a sampling frequency of 256 Hz. The potential differences between the horizontally- and vertically-aligned electrodes were computed to yield what are referred to as the horizontal and vertical EOG components, denoted by EOGh (t) and EOGv (t) respectively: EOGh (t) = V1 (t) − V2 (t)
(1)
EOGv (t) = V3 (t) − V4 (t)
(2)
N. Barbara et al. / Biomedical Signal Processing and Control 47 (2019) 159–167
161
Fig. 3. Diagrammatic representation of the dependence test.
Fig. 2. Manifestation of saccades (S) and blinks (B) in raw and pre-processed EOG signals.
where V1 (t), V2 (t), V3 (t) and V4 (t) denote the potential recorded from electrodes 1–4 in Fig. 1 respectively. 2.2. Processing of EOG data EOG data is likely to be contaminated by external noise such as electrical grid interference and high-frequency electromyographic (EMG) noise, particularly due to underlying facial muscle contractions such as during squinting or smiling. Another noise factor is the baseline drift effect, which refers to a low-frequency signal that interferes with the acquired EOG signals and which arises due to background signal interference or electrode polarization [3]. In this regard, the acquired EOG signals were bandpass filtered between 0 and 30 Hz, while a 50 Hz notch filter was also applied to eliminate high-frequency noise and grid interference respectively. On the other hand, in order to mitigate the baseline drift effect, a difference signal was computed as follows: EOGh (t) = EOGh (t) − EOGh (t − )
(3)
EOGv (t) = EOGv (t) − EOGv (t − )
(4)
where EOGh (t) and EOGv (t) represent the horizontal and vertical differenced EOG components respectively. In order to preserve the EOG potential displacements for saccadic events, was set equal to the duration of the largest saccade during which the eye traverses saccadically from one angular pose to the opposite extreme pose. At a sampling frequency of 256 Hz, this was empirically found to be 25 samples, equivalent to = 0.0977 s. Therefore EOG signals arising due to saccades (S) and blinks (B) are transformed into difference signals, EOG, as shown in Fig. 2. 3. EOG-based gaze displacement estimation In this section, the dependence of each EOG signal component on the horizontal and vertical ocular displacements is investigated. Furthermore, the proposed two-channel input linear regression model to estimate the angular gaze displacements is also presented. 3.1. Dependence test The aim behind this test was to investigate whether the horizontal and vertical EOG components are dependent on the horizontal and vertical ocular displacements jointly or otherwise. This is
Fig. 4. Cue nomenclature adopted for the dependence test.
tested by analysing whether the horizontal and vertical EOG components for a general oblique saccadic movement are equivalent to the respective horizontal and vertical EOG components for a pure (non-oblique) horizontal and a pure (non-oblique) vertical movement of the same angular displacement respectively; this is represented diagrammatically in Fig. 3. If this analysis demonstrates that the EOG signal components for oblique movements are different from those of corresponding pure movements, it follows that the horizontal and vertical EOG components are dependent on the horizontal and vertical ocular displacements jointly, which further implies that the two EOG signal components ought to be used jointly for estimating general angular ocular poses. For this analysis, the cue setup shown in Fig. 4 was used, with the horizontal and vertical inter-cue angular separation set to 5◦ . A set of saccadic movements originating from cue 0 to cues 1–28, each followed by the corresponding return movement back to cue 0, were recorded from six subjects. Furthermore, a visionbased eye gaze tracker, specifically the SensoMotoric Instruments (SMI) RED500 eye tracker, was used simultaneously to validate the required point of gaze (POG) of the subjects on the screen. The recorded EOG data was processed as discussed in Section 2.2, and the peak amplitudes corresponding to the different saccadic movements were extracted. In order to test for dependence, the peaks extracted from the horizontal EOG component of oblique saccadic movements were statistically compared to the peaks extracted from the horizontal component of pure horizontal movements for the same angular displacements using the two-sample t-test. For example, peaks acquired from the horizontal component of oblique saccades from cue 0 to cue 21 were compared to the peak amplitudes of pure horizontal saccades from cue 0 to cue 5; this procedure was repeated for all oblique movements. A similar exercise was carried out for the vertical EOG components.
162
N. Barbara et al. / Biomedical Signal Processing and Control 47 (2019) 159–167
Table 1 Dependence test results. Highlighted cells indicate p-values less than 0.05.
Fig. 5. Linear regression models with 1D and 2D inputs. Table 2 The sets of BFs adopted in the proposed two-channel input linear regression model grouped according to the respective maximum order, L2D , where xn = (x1n , x2n )T . L2D
1
2
3
···
0 (xn ) = 1 1 (xn ) = x1n 2 (xn ) = x2n
0 (xn ) = 1 1 (xn ) = x1n 2 (xn ) = x2n
0 (xn ) = 1 1 (xn ) = x1n 2 (xn ) = x2n
··· ··· ···
3 (xn ) = (x1n )2 4 (xn ) = (x2n )2 5 (xn ) = x1n x2n
3 (xn ) = (x1n )2 4 (xn ) = (x2n )2 5 (xn ) = x1n x2n
··· ··· ···
6 (xn ) = (x1n )3 7 (xn ) = (x2n )3 8 (xn ) = (x1n )2 x2n 9 (xn ) = x1n (x2n )2
··· ··· ··· ···
BFs
..
The results obtained are shown in Table 1, where cues P and O refer to the destination cues of pure and oblique saccades respectively. The highlighted cells indicate that there is a statistically significant difference between EOG peaks extracted from the horizontal and vertical components of oblique saccadic movements and peaks of pure horizontal and vertical movements respectively. This may be due to electrodes that are not aligned with the ocular dipole axes. Therefore, these results demonstrate that the horizontal and vertical EOG components ought to be used jointly when estimating the angular ocular pose.
.
In general, linear regression models Mx , My and M involve linear combinations of fixed non-linear basis functions (BFs) of the input variables, which could be generally represented in the form [13]: y(x, W) = WT (x)
(5)
where y represents a K-dimensional column vector output, x is a D-dimensional column vector input, W is an L × K matrix of parameters and (x) is an L-dimensional column vector whose elements are the BFs j (x) for j = 0, . . ., L − 1, where L represents the total number of BFs considered [13]. Therefore, models Mx and My in Fig. 5a are represented ˆ h = WT (Ph ) and ˆ v = WT (Pv ) respectively. The BFs as x y j (xn ) in (xn ) considered in these models were set as (xn ) =
3.2. Modelling the relationship between the EOG signal and angular gaze displacement In this work, the relationship between the EOG signal and angular gaze displacement is modelled using two different techniques. Both techniques use the saccadic EOG peak amplitudes, Ph and Pv , extracted from the pre-processed horizontal and vertical EOG components respectively, to estimate the angular gaze displacement, ˆ h and ˆ v , in the horizontal and vertical directions respectively. The first technique, shown in Fig. 5a, shows the state-of-the-art model [8–10] which comprises two linear regression models, Mx and My , that represent the relationship between the horizontal and vertical EOG components and the angular gaze displacement in the respective directions separately. On the other hand, based on the dependence test conclusions outlined in Section 3.1, an alternative single two-channel input linear regression model, M, is also proposed which, as shown in Fig. 5b, considers peaks extracted from both the horizontal and vertical EOG components jointly as its 2D input.
T
(0 (xn ), . . ., L1D (xn )) , where [13]: j
j (xn ) = xn
(6)
for j = 0, . . ., L1D , where L1D is the model order. On the other hand, the proposed two-channel input linear
regression model M of Fig. 5b is represented as WTxy
T
ˆ v ˆ h ,
T
=
(Ph , Pv ) . The BFs considered in this case were grouped according to the respective maximum order, L2D , where specifically, different sets of polynomial BFs of the 2D input vector xn up for any integer L2D > 0; the BFs to the L2D th degree were considered, considered for L2D ∈ 1, 2, 3 are presented in Table 2, where a similar scheme applies for higher orders.
4. Real-time saccade and blink detection and labelling Since we aim to develop an asynchronous EOG-based virtual keyboard, techniques are required to detect ocular events, specifically saccades and blinks, and distinguish between these two events
N. Barbara et al. / Biomedical Signal Processing and Control 47 (2019) 159–167
163
Pv and Gv are then used in a classifier to label the event either as a saccade or a blink. It is pertinent to point out that, since only upward-going saccades possess similar characteristics to blinks, whenever a salient event is detected and Pv < 0, the event is immediately labelled as a saccade. If the event is labelled as a saccade, the angular gaze displacement is then estimated from Ph and Pv . Conversely, if the event is labelled as a blink, it is suppressed and no new gaze displacement is estimated. At this stage, an input feature vector x = (Pv , Gv )T is used to classify the event into K = 2 classes, Ck , where k = 1, 2 representing saccades and blinks respectively. Specifically, an optimal Bayes’ classifier is used where the class prior probabilities are given by p(Ck ) and the saccade and blink (Pv , Gv ) distributions are modelled as bivariate Gaussians, denoted by N(1 , ˙ 1 ) and N(2 , ˙ 2 ) respectively with k and ˙ k representing the mean and covariance matrix respectively of the (Pv , Gv )T distribution of class Ck , leading to the class posterior probabilities p(Ck |x) [15]:
Fig. 6. (Pv , Gv ) feature distribution corresponding to upward-going saccades and blinks.
in real-time. It has been shown in the literature that there is a relationship between the angular saccadic displacement and the corresponding peak velocity [14]. In this regard, we represent saccades and blinks in the feature space of EOG peak, Pv , and EOG signal gradient, Gv , extracted from the pre-processed vertical EOG signal component which, as shown in Fig. 6, are generally characterised by two clusters, thus affording discrimination between these two events. 4.1. Proposed algorithm Based on the signal characteristics of the pre-processed horizontal and vertical EOG components, EOGh (t) and EOGv (t), a salient event threshold (SET) applied to both EOG components is used to detect the onset of any eye movement, here referred to as a salient event. An example of a particular oblique movement is presented in Fig. 7, where the instant at which the onset is detected is marked by ‘A’. After an onset is detected, the peak amplitudes in the horizontal and vertical EOG components, Ph and Pv respectively, as well as the gradient of the vertical EOG signal component, Gv , are extracted.
Fig. 7. Detection of salient events from pre-processed EOG signals.
p(C1 |x) = (xT Wx + wT x + w0 )
(7)
p(C2 |x) = 1 − p(C1 |x)
(8)
where represents the logistic sigmoid function and the classifier weights are given by [15]: W=
1 −1 −1 ˙2 − ˙1 2 −1
(9)
−1
w = ˙ 1 1 − ˙ 2 2 w0 = +
(10)
1 T −1 −1 2 ˙ 2 2 − T1 ˙ 1 1 2
1 ln |˙ 2 |/|˙ 1 | + ln p(C1 )/p(C2 ) 2
(11)
In this work, equal class prior probabilities, p(C1 ) = p(C2 ), were assumed and thus the term ln p(C1 )/p(C2 ) in (11) vanishes. In general, the saccade and blink (Pv , Gv ) distributions have some overlap, for example as shown in Fig. 6. Therefore, a threshold Tc is applied to the maximum posterior class probability pmax = max p(C1 |x), p(C2 |x) such that if pmax > Tc the salient event is instantly labelled accordingto the maximum a posteriori (MAP) decision rule as Ck∗ = arg max p(C1 |x), p(C2 |x) . Conversely, C1 ,C2
if pmax ≤ Tc , a reject option is applied and the salient event is not labelled using this classifier, but is labelled according to the morphology of the signal. Specifically, the presence or absence of a negative trough synonymous with the morphology of the blink in the pre-processed EOG signal, as shown in Fig. 2, is used to determine whether this event is a blink or not respectively. The presence of the trough is determined if a negative threshold B− in the vertical EOG component is exceeded within a time interval Td after the occurrence of the peak, i.e. if the condition EOGv (t) ≤ B− is satisfied within Td , the event is labelled as a blink, otherwise it is labelled as a saccade. Despite the choice of threshold Tc to ensure that a decision is taken only when there is sufficient posterior probability, there is still the possibility for a misclassification to occur at a posterior probability greater than Tc . To mitigate this, even events that are labelled by the MAP classifier, and which are therefore either passed to the regression model if labelled as saccades or suppressed if labelled as blinks, are tested with the trough detection method in the background. In the rare cases where the trough detector detects a labelling error, a correction is effected, specifically by reverting the regression or suppression action that was performed. The algorithmic cycle as shown in Fig. 8 is restarted when both EOG components return to the baseline level, specifically when the
164
N. Barbara et al. / Biomedical Signal Processing and Control 47 (2019) 159–167
formance if model orders L1D > 1 are used for either Mx and My ; this was found to be the case across all six subjects. This agrees with the linear model often assumed in the literature [8–10]. Therefore, two-fold cross-validated RMSE and mean absolute angular errors in both horizontal and vertical directions were computed using first order models, as tabulated in Table 3. For the proposed two-channel input linear regression model, ∗ the optimal order L2D was determined by comparing the twofold cross-validated RMSE for increasing model orders L2D ∈ 1, 2, 3, 4 and establishing at which order there is no statistically significant improvement in performance. In this case, the optimal model order was found to be different for different subjects, hov∗ = 1 and L∗ = 3. Using these subject-specific ering between L2D 2D ∗ , the two-fold cross-validated RMSE and mean absolute anguL2D lar errors in both horizontal and vertical directions are shown in Table 3. The superior performance obtained using the proposed twochannel input linear regression model was found to be statistically significantly better than that obtained using 1D input models which are commonly considered in the literature, and this was consistently the case for most subjects. These results demonstrate that the observations of the dependence test in Section 3.1 may indeed be exploited to design a two-channel regression model with improved estimation performance. Fig. 8. Algorithm flowchart for saccade and blink detection and labelling from EOG signals in real-time.
EOG component magnitudes fall below |˛SET|, where 0 ≤ ˛ ≤ 1. This is marked by ‘B’ in Fig. 7. 5. Results In this work, a total of 600 saccadic movements of random angular displacements and 300 blinks were recorded from six subjects, who were asked to follow onscreen instructions to perform the different eye movements. The angular gaze displacements varied up to ±40◦ horizontally and ±20◦ vertically, spanning the whole space of a 24 in. LCD monitor. The EOG data collected was then pre-processed as discussed in Section 2.2 and any subject-related mistakes, such as blinking during periods when instructed to perform a saccade or vice versa, were manually discarded from further analysis, resulting in 480 saccades and 200 blinks. 5.1. Gaze displacement estimation performance For this analysis, the peak amplitudes for different saccadic movements were divided into four folds, each with 120 trials. To compare the estimation models shown in Fig. 5, the root mean square error (RMSE) for each fold was computed, defined as:
N 2 2 1 ˆ h ˆ v E=
hj − + − v j j j N
(12)
j=1
where N denotes the total number of trials in the fold, hj and vj represent the target horizontal and vertical angular gaze disˆ h and ˆ v denote the estimated placements respectively, while j j horizontal and vertical angular gaze displacements respectively. Additionally, the mean absolute horizontal and vertical angular errors were also calculated. Considering the modelling technique of Fig. 5a, i.e. having a separate model for each EOG component, the optimal polynomial ∗ for models M and M were determined separately. It was order L1D x y established that there is no statistical difference in the model per-
5.2. Eye movement labelling performance The parameters Tc , Td , B− , SET and ˛ required for the proposed algorithm in Section 4 were determined as follows. The value for Tc was set to 0.75, which was determined empirically by noting that in the region where the (Pv , Gv ) feature distributions of saccades and blinks overlap, the typical pmax values are less than this value. While higher values of Tc lead to lower classification errors using the MAP classifier, this would also lead to more frequent rejected events, each of which would have to wait to be processed by the trough detector to be labelled, resulting in higher system latencies. The value for the time interval Td was based on the mean blink peak-to-trough interval in the pre-processed vertical EOG component. For a 256 Hz sampling frequency, this value was 60 samples, corresponding to 0.2344 s. Threshold B− was determined from a set of recorded blinks using: B− = med{Etrain }
(13)
where is a constant in the range [0, 1], Etrain represents a dataset containing the blink trough amplitude values that would be obtained during a subject-specific training session, while med Etrain represents the median value of Etrain . The choice of the median statistic was to ensure that the threshold calibration is robust against outlier values, normally arising from subject-related mistakes during training data collection. The optimal value for , cross-validated over the six subjects, was found to be 0.4. Choice of the SET value affects the false event detection rate. Therefore, plots for accuracy versus false detection rate for various SET values were generated and the optimal SET value was chosen at the knee of the plot to maximise the accuracy and minimise the false detection rate, which was found to be 60–70V for the given recording equipment. In the case of ˛, increasing its value will lead to higher false event detection rates and therefore lower values of ˛ are desirable; however, lower ˛ values will result in longer mean duration for cycle restart. According to the literature [16] the inter-saccade fixation duration is typically 250 ms, therefore it is desirable to ensure that the lowest ˛ value chosen does not result in a restart cycle delay exceeding this value. An ˛ value between 0.0 and 0.1 was found to meet these opposing requirements.
N. Barbara et al. / Biomedical Signal Processing and Control 47 (2019) 159–167
165
Table 3 Estimation performance obtained by 1D and 2D input models. Subject
1D input models
2D input models
RMSE/◦
Horizontal error/◦
Vertical error/◦
RMSE/◦
Horizontal error/◦
Vertical error/◦
S1 S2 S3 S4 S5 S6
3.13 2.65 3.22 2.63 3.10 3.05
1.46 1.21 1.26 1.29 1.72 1.47
1.92 1.73 2.23 1.51 1.65 1.86
2.47 2.67 3.09 2.55 2.93 3.02
0.96 1.32 1.25 1.25 1.75 1.41
1.57 1.66 2.05 1.43 1.39 1.89
Average
2.97 ± 0.26
1.40 ± 0.19
1.82 ± 0.25
2.79 ± 0.26
1.32 ± 0.26
1.67 ± 0.26
Table 4 10-fold cross-validated saccade and blink labelling accuracy. Subject
6.1. Interacting with the virtual keyboard
Accuracy (%) Saccades
Blinks
S1 S2 S3 S4 S5 S6
100.00 100.00 100.00 100.00 99.50 100.00
100.00 100.00 100.00 100.00 100.00 100.00
Average
99.92
100.00
Next, the real-time eye movement classification performance of the algorithm proposed in Section 4 was evaluated. Using the optimal parameter values, a 10-fold event labelling accuracy obtained using the proposed algorithm for each subject is presented in Table 4; note that downward-going saccadic movements, i.e. saccades with Pv < 0, were not included in this performance analysis in order not to bias the results, as these events are labelled with certainty as saccades, as discussed in Section 4.
6. EOG-based virtual keyboard The proposed two-channel input linear regression model as well as the real-time eye movement detection and labelling technique were then used to interface with a real-time EOG-based virtual keyboard, as shown in Fig. 9. Using a regression model to estimate the subject’s gaze displacements allows users to reach any icon from anywhere on the screen, thus moving away from the standard, unnatural, discrete-step virtual keyboards found in the literature [1,4–7]. Furthermore, the real-time eye movement detection technique allows users to interact with the application asynchronously, i.e. without being restricted to perform eye movements within specific cued intervals [6,7]. The keyboard interface as well as the performance obtained when this was tested by 10 subjects are presented in this section (Table 4).
Following standard QWERTY keyboard layouts, the proposed virtual keyboard organises letters, numbers, punctuation marks and other symbols on two 40-icon menus, as shown in Fig. 9, with the inter-icon horizontal and vertical angular separation being approximately 5◦ . The layout on both menus also includes action icons to toggle between both menus, delete last transcribed character or word, emulate the pressing of the ‘Enter’ key and also to exit the application. A writing bar showing the transcribed characters is also displayed at the top. In order to type using the virtual keyboard, the user performs saccadic movements to reach the desired icon and selects it using a dwell-based validation technique, i.e. by fixating for a pre-specified period of time, which was set to 2 s. Specifically, whenever the subject’s ith saccade is detected by the algorithm of Section 4, the corresponding horizontal and vertical angular gaze displacements, ˆ v respectively, are estimated using the proposed twoˆ h and i i channel input regression model. These estimates are then used to determine the subject’s new POG on the screen, which is mapped to the centroid of the nearest icon. Since a dwell-based validation technique was used, a greencoloured circular progress bar, centred over this icon is displayed to provide visual feedback regarding the POG, and the time left before the icon is selected and the corresponding character/action is typed/actuated. When the dwell-time elapses, the icon is highlighted in green to indicate to the user that it has been selected; this feedback is intended to reduce the subject’s instinct to look at the writing bar after each icon selection to ensure that the correct action was taken. If however, the POG is estimated to fall within the writing bar region, the system is temporarily paused, ignoring the subject’s ocular activity for a period of 3 s. This pause permits the user ample time to verify that the correct character(s) were successfully transcribed and refocus their POG on the indicated icon. During this interval an orange-coloured circular progress bar is displayed on the icon corresponding to the previous POG, indicating the time left before the system resumes its monitoring of the ocular activity.
Fig. 9. Icon layout of the proposed virtual keyboard.
166
N. Barbara et al. / Biomedical Signal Processing and Control 47 (2019) 159–167
This application was also designed to address the Midas touch problem [7,10] by allowing users to deliberately pause and resume the system, thereby allowing the subject to gaze around and rest while the ocular activity is ignored. This pause-resume feature was implemented through double blink detection where specifically, if two consecutive blink events are detected within a 1 s period, the system is toggled between its active and paused states. When the system is paused, the circular progress bar is changed to a solid red-coloured circular cursor to indicate the system’s state and to serve as the cue for the user to refocus his/her POG to resume the system by performing another double blink gesture. This feature is also particularly useful if the subject’s angular gaze displacements are not estimated correctly, resulting in the cursor falling on an undesired icon. The double blink gesture can also be used in such cases to pause the system such that, after focusing the POG on the solid red-coloured circular cursor, s/he performs a second double blink gesture to continue with the task from this location. 6.2. Results The proposed virtual keyboard was tested by a total of 10 subjects, who were asked to transcribe two strings, ‘HELLO’ and ‘GOOD DAY’, each followed by the selection of the exit icon in separate trials, for three times. Subjects were also instructed to transcribe the two strings correctly and completely and hence, to make use of the delete letter icon to correct any mistyped characters as required. Before using the proposed virtual keyboard application the user had to perform a 200 s data recording session where a total of 100 saccades and 50 blinks were collected. This data was used to estimate the system parameters, as detailed earlier in Section 5. Two different performance measures were considered, namely the writing speed (WS) and the icon selection accuracy. The WS in characters per minute (cpm) [4–7] is defined as: 1 WS = N N
60
|Sj | + 1
(14)
Dj
j=1
where N represents the total number of trials, |Sj | represents the length of the particular string which was transcribed in trial j, while Dj denotes the time in seconds taken to finish trial j. Since subjects were asked to write the phrases correctly and completely, |Sj | = 5 and |Sj | = 8 for the strings ‘HELLO’ and ‘GOOD DAY’ respectively. The +1 included in the numerator represents the selection of the exit icon after typing each string. This measure has a theoretical maximum of 30 cpm, due to the 2 s dwell-time for icon selection. However, since the WS results documented in this work account for delays to traverse to the desired icons and delays spent to correct any mistakes due to system or user errors during typing, the quoted results provide practical values of the typical typing rate. The icon selection accuracy [5,6] is defined as: 1 N N
Accuracy =
j=1
100
|Scorrect j | |Sselectedj |
(15)
where N represents the total number of trials, |Scorrect j | represents the total number of intended and successfully selected icons in trial j and thus, also accounts for any intentional selections of the delete letter icon, whereas |Sselectedj | denotes the total number of icons selected during trial j, which also accounts for all unintended icon selections as well as selections of the exit and delete letter icons. The WS and icon selection accuracy results obtained by each of the 10 subjects are tabulated in Table 5, where it is pertinent to point out that Subject S1 had more experience with the application; consequently, the average results are computed over Subjects
Table 5 Average WS and accuracy results obtained when the proposed virtual keyboard was tested by 10 subjects. Subject
WS/cpm
Accuracy/%
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
20.78 ± 2.37 12.70 ± 6.90 12.74 ± 4.62 11.31 ± 3.71 12.07 ± 4.95 11.00 ± 3.66 7.82 ± 1.99 12.65 ± 2.62 16.35 ± 2.51 10.36 ± 4.36
100.00 100.00 98.48 100.00 100.00 100.00 100.00 97.92 100.00 96.67
Average (S2–S10)
11.89 ± 4.42
99.23
S2-S10. The performance achieved compares well with the WS and accuracy performance obtained by state-of-the-art EOG-based virtual keyboards [1,4–7]. These results also demonstrate that the developed application could be effectively controlled via simple eye movements captured using EOG. Good WSs and high accuracies were achieved because (i) the angular error in both vertical and horizontal directions is less than the inter-icon angular separation and (ii) if the estimated POG falls on an unwanted icon the subject still has a 2 s dwell-time to validate the icon or otherwise. Furthermore, it is clear that the average WSs obtained are lower than the theoretical maximum WS of 30 cpm. However, this theoretical maximum is based on the assumption that the time taken, after successfully selecting a particular icon to reach the next icon is negligible, which is not the case in practice. Other issues attributed to the slower average WSs include subject-related mistakes while using the application. For example, there were instances where subjects failed to perform a double blink gesture immediately after a mis-estimation occurs so as to pause the system right away and allow the user’s POG to refocus on the icon indicated by the red-coloured cursor before resuming. Instead, subjects repeatedly performed saccadic movements to the icon where their POG would have been mis-estimated, which resulted in the estimated POG to move further away from the desired icon. On the other hand, in cases where the double blink gesture was successfully actuated right after a mis-estimation occurred, some subjects were generally spending a substantial amount of time before performing a second double blink gesture to resume the system and continue with the task. Additionally, even though subjects were given feedback of the system’s chosen icon by highlighting it in green, some of them were still sometimes confirming this by looking at the writing bar. These issues however, are expected to reduce drastically with user familiarity with the system, allowing for higher WSs to be achieved. This is demonstrated by the substantially higher WS of the experienced user, S1, which averages 20.78 ± 2.37 cpm. Moreover, as user familiarity with the system increases, the WS performance could be further increased by reducing the dwell-time period, which would effectively reduce the time needed to successfully select each icon. In fact, when the same exercises were performed by Subject S1 using a revised dwell period of 1.5 s, an average WS of 29.84 ± 3.10 cpm was achieved. These results also show that with user familiarity a superior WS performance than that reported in the literature [1,4–7] could be obtained. 7. Conclusion This work focused on the development of novel techniques to estimate the subject’s ocular pose and to detect and label different eye movements from EOG signals in real-time, which were used to interact with an EOG-based virtual keyboard allowing users to reach any icon from anywhere on the screen directly
N. Barbara et al. / Biomedical Signal Processing and Control 47 (2019) 159–167
and asynchronously. The angular gaze displacements were estimated using a two-channel input linear regression model, using features extracted from both the horizontal and vertical EOG components jointly. Using this technique a mean absolute angular error of 1.32 ± 0.26◦ and 1.67 ± 0.26◦ was obtained in the horizontal and vertical directions respectively, which was found to be generally statistically significantly better than that achieved using separate models for each EOG component, as is commonly considered in the literature. Additionally, this work proposed a novel algorithm to exploit the EOG peak and gradient features to distinguish between saccades and blinks from EOG signals in real-time. Using this technique, the saccade and blink labelling accuracy was 99.92% and 100.00% respectively. When these techniques were used to interface with the proposed virtual keyboard, an average writing speed across subjects of 11.89 ± 4.42 cpm was achieved; this performance has been shown to improve substantially with user’s familiarity with the system. Future work aims to develop techniques that adapt the dwell-time to user experience and incorporate a dictionary to help improve the writing speeds. Acknowledgements The research work disclosed in this publication is funded by the ENDEAVOUR Scholarship Scheme (Malta). The scholarship may be part-financed by the European Union – European Social Fund (ESF) under Operational Programme II – Cohesion Policy 2014–2020, “Investing in human capital to create more opportunities and promote the well-being of society”. The authors would also like to thank JINS Company Limited for their support. References [1] A.B. Usakli, S. Gurkan, Design of a novel efficient human–computer interface: an electrooculagram based virtual keyboard, IEEE Trans. Instrum. Meas. 59 (2010) 2099–2108.
167
[2] W. Heide, E. Koenig, P. Trillenberg, D. Kömpf, D.S. Zee, Electrooculography: technical standards and applications, in: G. Deuschl, A. Eisen (Eds.), Recommendations for the Practice of Clinical Neurophysiology: Guidelines of the International Federation of Clinical Physiology (EEG Suppl. 52), Elsevier Science, 1999, pp. 223–240. [3] A. Bulling, J.A. Ward, H. Gellersen, G. Troster, Eye movement analysis for activity recognition using electrooculography, IEEE Trans. Pattern Anal. Mach. Intell. 33 (2011) 741–753. [4] W. Tangsuksant, C. Aekmunkhongpaisal, P. Cambua, T. Charoenpong, T. Chanwimalueang, Directional eye movement detection system for virtual keyboard controller, 5th Int. Conf. Biomed. Eng. (2012) 1–5. [5] K. Yamagishi, J. Hori, M. Miyakawa, Development of EOG-based communication system controlled by eight-directional eye movements, Int. Conf. IEEE Eng. Med. Biol. Soc. (2006) 2574–2577. [6] D.S. Nathan, A.P. Vinod, K.P. Thomas, An electrooculogram based assistive communication system with improved speed and accuracy using multi-directional eye movements, 35th Int. Conf. Telecommun. Signal Process. (2012) 554–558. [7] N. Barbara, T.A. Camilleri, Interfacing with a speller using EOG glasses, IEEE Int. Conf. Syst. Man Cybern. (2016). ˜ P. Aqueveque, E.J. Pino, Eye-tracking capabilities of low-cost EOG [8] O.V. Acuna, system, 36th Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. (2014) 610–613. [9] H. Miyashita, M. Hayashi, K. Ichi Okada, Implementation of EOG-based gaze estimation in HMD with head-tracker, 18th Int. Conf. Artif. Real. Telexistence (2013) 20–27. [10] N. Itakura, K. Sakamoto, A new method for calculating eye movement displacement from AC coupled electro-oculographic signals in head mounted eye-gaze input interfaces, Biomed. Signal Process. Control 5 (2010) 142–146. [11] A. Bulling, D. Roggen, G. Tröster, It’s in your eyes – towards context-awareness and mobile HCI using wearable EOG goggles, Proc. 10th Int. Conf. Ubiquitous Comput. (2008) 84–93. [12] K.R. Lee, W.D. Chang, S. Kim, C.H. Im, Real-time “eye-writing” recognition using electrooculogram, IEEE Trans. Neural Syst. Rehabil. Eng. 25 (2017) 37–48. [13] C.M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), Springer, 2006. [14] M. Vidapanakanti, S. Kakarla, S. Katukojwala, M.U.R. Naidu, Analysis of saccadic eye movements of epileptic patients using indigenously designed and developed saccadic diagnostic system, 13th Int. Conf. Biomed. Eng. (2009) 431–434. [15] K. Fukunaga, Introduction to Statistical Pattern Recognition, Computer Science and Scientific Computing, Elsevier Science, 2013. [16] J.H. Goldberg, X.P. Kotval, Computer interface evaluation using eye movements: methods and constructs, Int. J. Ind. Ergon. 24 (1999) 631–645.