Accepted Manuscript Codebook-based electrooculography data analysis towards cognitive activity recognition P. Lagodzinski, K. Shirahama, M. Grzegorzek PII:
S0010-4825(17)30352-9
DOI:
10.1016/j.compbiomed.2017.10.026
Reference:
CBM 2819
To appear in:
Computers in Biology and Medicine
Received Date: 11 April 2017 Revised Date:
18 October 2017
Accepted Date: 23 October 2017
Please cite this article as: P. Lagodzinski, K. Shirahama, M. Grzegorzek, Codebook-based electrooculography data analysis towards cognitive activity recognition, Computers in Biology and Medicine (2017), doi: 10.1016/j.compbiomed.2017.10.026. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
RI PT
Codebook-based Electrooculography Data Analysis towards Cognitive Activity Recognition P. Lagodzinskia,∗, K. Shirahamab,∗∗, M. Grzegorzeka,∗∗
of Knowledge Engineering, University of Economics in Katowice, Bogucicka 3 Str., 40-226 Katowice, Poland b Pattern Recognition Group, University of Siegen, Hoelderlinstr. 3, 57076 Siegen, Germany
SC
a Department
Abstract
EP
TE
D
M AN U
With the advancement in mobile/wearable technology, people started to use increasingly a variety of sensing devices to track their daily activities as well as health and fitness conditions in order to improve the quality of life. This work addresses an idea of eye movement analysis, which due to the strong correlation with cognitive tasks can be successfully utilized in activity recognition. Eye movements are recorded using an electrooculographic (EOG) system built into the frames of glasses, which can be worn more unobtrusively and comfortably than other devices. Since the obtained information is low-level sensor data expressed as a sequence representing values in constant intervals (100Hz), the cognitive activity recognition problem is formulated as sequence classification. However, it is unknown what kind of features are useful for accurate cognitive activity recognition. To overcome this, a codebook approach is adopted where the sequences of recorded EOG data are described by the distribution of characteristic subsequences – codewords, obtained by clustering a large number of subsequences. Further, statistical analysis of the codeword distribution results in discovering features which are characteristic to a certain activity class. Experimental results demonstrate good accuracy of the codebook-based cognitive activity recognition reflecting the effective usage of the codewords.
AC C
Keywords: Ambient assisted living, Cognitive activity recognition, Electrooculography (EOG), Sequence classification, Codebook approach
∗ Principal
corresponding author author Email addresses:
[email protected] (P. Lagodzinski),
[email protected] (K. Shirahama),
[email protected] (M. Grzegorzek) ∗∗ Corresponding
Preprint submitted to Journal of Computers in Biology and Medicine
October 26, 2017
ACCEPTED MANUSCRIPT
1. Introduction
AC C
EP
TE
D
M AN U
SC
RI PT
Technological advances and the availability of various mobile/wearable devices allow us to continuously record sensor data related not only to our dailylife activities, but also to our health. In addition to the commonly available GPSs tracking locations and accelerometers/gyroscopes monitoring body movements, other state-of-the-art sensors can be utilized to capture physiological signals such as heart rate, blood pressure, respiration and/or electrooculogram (EOG) [1, 2, 3, 4]. Such low-level sensor data are a very rich source to deduce high-level information about physical behaviors of a person, his/her mental states (e.g. emotions), or cognitive awareness. Cognition is a term describing mental processes by which knowledge is acquired, including aspects such as reasoning, awareness, perception, knowledge, intuition and judgment [5]. Each person is characterized by his/her cognitive skills allowing to process the information they receive from their five senses. However, these cognitive skills can be trained by performing different mindchallenging activities like playing games, solving puzzles, engaging conversation, reading books or watching video. Recognition of such cognitive activities is in our interest as it can be used e.g., to help the elderly keep their mind sharp and to reduce the risk of age-related dementia. Sensor-based activity recognition, investigating such cognitive processes as attention, visual memory or learning, usually involves brain activity sensing or the analysis of eye movements, which are also strongly correlated with human cognitive tasks [6]. Since accurate monitoring of brain activities requires rather invasive methods like electroencephalograph (EEG) suffering from the noise generated by muscle movement or functional magnetic resonance imaging, tracking eye movements with a lightweight, unobtrusive electrooculographic (EOG) system [7] seems to be more suitable for day-long data collection. The EOG uses electrodes placed around the eyes to measure changes in potential of a dipole between the cornea and retina caused by an eye movement [3, 4]. Obtained EOG data form sequences representing values in constant time intervals, and indicate that cognitive activity recognition can be treated as sequence classification in the machine learning framework, where a classifier (recognition model) is trained that takes a data sequence as input and predicts its activity class. First, ‘training sequences’ that are already annotated with activity classes are analyzed. This enables the classifier to capture characteristics of sequences in each activity class. Then, it is used to categorize ‘test sequences’ for which activity classes are unknown. Since raw, low-level sensor data are usually carrying a large amount of irrelevant information, one of the most important issues is to build a classifier on a sequence representation (feature) expressing relevant characteristics. Thus, obtaining a good feature is crucial for accurate sequence classification. Literature presents a wide range of feature selection approaches developed for high-dimensional data and different application purposes [8, 9, 10, 11]. Many of the existing methods like mean and variance of values in a sequence or firstorder derivatives are based on the prior knowledge and manual investigation [2, 2
ACCEPTED MANUSCRIPT
TE
2. Related Work
D
M AN U
SC
RI PT
12, 13] causing difficulties with the representation of detailed information and problems with statistical validation. There is also a number of methods for EOG data based on a heuristic approach utilizing the information about basic eye movement types like saccades, fixations or blinks [3]. Nevertheless, the problem of relevant feature extraction and selection from EOG data in terms of activity recognition has not been accurately investigated, which uncovers the room for further research. The contributions of the presented paper are two-fold: First, instead of the above-mentioned hand-crafted features based on prior knowledge and manual investigation, we adopt a feature learning approach that extracts useful features by analyzing a large amount of data [14]. In particular, we apply a simple but effective feature learning approach, codebook approach [15], to the EOG data collected with the smart eyewear [7] in order to recognize several cognitive activities (e.g., reading, watching television or drinking). In the codebook approach, a sequence is represented by a feature describing the distribution of characteristic subsequences, called codewords, obtained by clustering a large number of subsequences. This feature not only captures the detailed information in the sequence but also consists of statistically validated codewords. Second, we further extend the codebook approach with statistical analysis of codewords to discover which of them are specific to a certain activity class. Through a series experiments we obtain and present relevant parts of EOG sequences that can be associated with a given class of cognitive activities, proving that the presented codebook approach method can be successful not only in activity recognition but also as an analytical tool.
AC C
EP
Eye movement analysis has been attracting researchers investigating visual behaviors for a long time. The early studies, focusing on recognizing objects perceived by human observers, use Markov processes in order to model the visual fixations [16]. The obtained fixations were sequenced into character strings, and then the edit distance was applied to quantify the similarity between eye movement sequences. In [17], the authors investigated the sequences of temporal fixations using discrete time Markov chains to discover fixation clusters that can point out features attracting observer’s attention. Such an information can be especially helpful during a training process. This was utilized by a method proposed in [18], where the information about dynamics of saccadic eye movements is used to evaluate the results of student’s training on assessing tomography images. The means for automated eye movement analysis proposed in [19], along with sequence matching and hidden Markov model based methods which allow to interpret eye movements with a high accuracy and in a significantly short computation time. Recently, the research community becomes interested in human activity recognition, utilizing a variety of sensors available in devices of every day use like mobile phones, smart watches or fitness wristbands. However, eye-trackers and the information about eye movements, that are strongly correlated with cognitive aspects, have rarely been used to 3
ACCEPTED MANUSCRIPT
AC C
EP
TE
D
M AN U
SC
RI PT
track our daily-life activities. In [4, 20], the authors proposed a method using the information on blinking frequency, eye movement and head motion as a combined feature for j48 decision tree to distinguish activities like reading, talking and walking. Eye movement data obtained with EOG were utilized by a method proposed in [21] to estimate the number of words a user reads, using a simple valley detection algorithm to detect line breaks on the horizontal signal component of EOG. A more advanced method for recognizing not only reading activity but also copying a text, taking hand-written notes, watching television or browsing the web, was proposed in [3]. The authors developed a large set of hand-crafted features describing eye movement data by capturing fundamental eye movement characteristics and dynamics. Those features are string representations of saccade, fixation and blink features, that are ranked and evaluated using minimum-Redundancy-Maximum-Relevance (mRMR) feature selection [22] and a Support Vector Machine (SVM) classifier. The method proposed in [23] for interfacing with a speller utilized a thresholding algorithm to detect different saccadic eye movements. The authors defined five different thresholds to distinguish near and far saccadic movements, fixations and near saccadic movements for horizontal (EOGH ) and vertical (EOGV ) components of EOG data and blinks. Then the classification is performed by comparing extracted peaks obtained from EOGH and EOGV with the thresholds. In [24], the authors proposed a simple approach to reading activity recognition based on the observation, that reading text creates characteristic patterns on EOGH depicting the smooth pursuit of text and rapid transition from right to left while switching the lines (repetitive large negative peaks). The peaks are found by applying minimum and maximum peak separation thresholds to the derivatives of a preprocessed EOGH signal. The authors stated that the number of peaks indicates the number of lines the user read, while the distance between them reflects the time the user needed to read one line of text. A method for recognizing reading tasks based on autoregressive features extracted from EOG data was proposed in [25]. The authors decided to utilize four different autoregressive models, widely used for extracting features from biomedical signals, based on the assumption that data at any point are closely related to few preceding data points. With this approach, they could extract several features from raw EOG signals, which were passed to a recurrent Elman neural network to automatically detect the reading activity. In [26], the authors presented a method to extract EOG features using Linear Predictive Coding (LPC) model applied to spectral entropies of EOG signals, where parameters were converted to LPC capstral coefficients to obtain a more reliable and robust feature. Compared to the above existing methods, our method does not focus on extraction of features like the mean and variance of a subsequence, and does not operate on features developed using string representations of basic eye movements like saccades, fixations or blinks. Instead, subsequences sampled from EOG data are grouped into clusters, where each of subsequences preserves the detailed information obtained with an EOG system, as it consists of raw signal values. Moreover, a large number of subsequences allows us to discover 4
Collecting EOG data
EOG data sequences
RI PT
Pre-processing
ACCEPTED MANUSCRIPT
Codewords
EOG data sequence
Histogram-type feature
EOG data sequences
(b) Codeword assignment
Clusters of subsequences
SC
(a) Codebook construction
Histogram-type features
Sequences for "reading" Sequences for the others
Histogram-type features
M AN U
(c) Classifier training/test
Figure 1: An overview of codebook-based cognitive activity recognition method.
D
statistically characteristic subsequences acting as codewords. This reduces the influence of noise contained within raw EOG data. This makes the codebookbased approach an easy-to-use analytical tool for investigating EOG data and seeking for subsequences that can be successfully associated with a given activity class. 3. Cognitive Activity Recognition Using Codebook Approach
AC C
EP
TE
Figure 1 depicts the codebook-based approach to cognitive activity recognition using EOG and accelerometer data. Our method consists of three major steps: The codebook construction is the first step (Fig. 1a) where subsequences are sampled from sequences and grouped into clusters based on their similarity. The codebook is then constructed as a set of codewords, which are obtained as cluster centers. The second step is the codeword assignment (Fig. 1b) where the feature is extracted from a sequence by assigning each subsequence to the most similar codeword. The feature obtained this way is a histogram reflecting the frequency of each codeword. The last step is classifier training and test (Fig. 1c) where obtained features are considered as points in the multidimensional space. Thus, a classifier can be trained to discriminate between training sequences annotated with a certain activity class and the others. The dashed line in Figure 1c represents the boundary between “reading” and other activities. Given this boundary, the classifier can predict the activity class of a test sequence. 3.1. Preprocessing Raw data obtained with an EOG system is known to contain a large amount of noise introduced by muscle movements associated with facial mimicry. For
5
ACCEPTED MANUSCRIPT
SC
RI PT
this reason, several different noise reduction and approximation algorithms were investigated, as shown in Table 1. The best results were obtained by applying the Hampel filter [27], which detects and removes the outliers from a given sequence. Good results were also obtained using Butterworth filter and the moving average. However, the accuracy improvement of 0.4% in favor of Butterworth filter does not justify its additional computational load required to obtain the results in comparison to the moving average. Other methods produced noticeably worst results 72.2% and 76.3% for Peak Envelope algorithm and Fast Fourier Transform (FFT) approximation, respectively. Based on the cognitive activity recognition accuracy and the computational time needed to obtain the results, a moving average is the method to be used during experiments as it is optimal for reducing random high frequency noise from lower frequency periodic waveforms and retaining a sharp step response.
M AN U
Table 1: Mean accuracies of cognitive activity recognition obtained for datasets preprocessed using different noise reduction and approximation methods.
Moving Average
Median
Hampel
Butterworth
Peak Envelope
FFT
83.6%
81.6%
85.6%
84.0%
72.3%
76.3%
AC C
EP
TE
D
3.2. Codebook Construction The codebook construction step assumes obtaining subsequences from each sequence by locating a sliding window of size w every s time points. Then, k-means clustering [28] is performed in order to group subsequences into N clusters, where subsequences are sharing similar characteristics. Because each of subsequences consists of values at w time points and can be considered as a vector in a w-dimensional space, the similarity between subsequences can be expressed simply by their Euclidean distance. In addition, a simple translation is applied to each subsequence to ensure that the value of its first time point is zero. This enables us to robustly analyze shapes of subsequences regardless of their starting values. The clustering result defines the codebook consisting of N codewords. Since EOG data tend to show periodically some characteristics, (see Fig. 2a, b), we utilize an alternative representation of subsequences in a frequency domain. Thus, as an alternative to a subsequence consisting of time points of the raw EOG sequence, its vector representation consisting of FFT coefficients is used to obtain a different type of information about the subsequence. A codebook for this FFT-based representation can be obtained using the above-mentioned k-means clustering with no modification. K-means clustering is the standard method for codebook extraction in the field of image/video classification, where the codebook approach bas been innovated, [29]. Although it may be affected by initial cluster centers that are randomly selected, there should be no significant difference because a large number of subsequences are collected and grouped to reduce this variation.
6
ACCEPTED MANUSCRIPT
TE
D
M AN U
SC
RI PT
3.3. Codeword Assignment For each sequence, subsequences are collected in the same way as when the codebook was constructed. Then, for each subsequence the most similar codeword is found based on the Euclidean distance between them, and the frequency of the codeword is incremented. The frequencies of particular codewords constitute a histogram-type feature representing their distribution in a sequence. The final, probabilistic form of the feature is obtained by normalizing the frequency of each codeword so that the sum of frequencies of all the N codewords is equal to one. Euclidean distance used to measure the similarity between subsequences and codewords is the most straightforward and intuitive similarity measure for the time series. It is easy to implement and indexable with any access method. Moreover, the Euclidean distance is surprisingly effective in comparison to other more complex approaches, especially in case of large datasets [30]. The computation time required to obtain the distance between subsequences of length w is much shorter in case of Euclidean distance O(w). Compared to this, Dynamic Time Warping (DTW) is a very popular similarity measure allowing for a nonlinear mapping between signals by minimizing the distance between them, but it takes O(w2 ). This difference become more visible as the number of subsequences to be clustered increases. If there are S (e.g. 1000000) subsequences, the computational cost of Euclidean distance to compute the distance between each pair of subsequences is O(S 2 w), in comparison to DTW taking O(S 2 w2 ). Thus, the computational time of DTW is increased by a factor of w (e.g. w = 64), compared to Euclidean distance. It is possible to find research where DTW can achieve a linear complexity with a lower bounding technique [31, 32]. However, this may introduce pathological matchings between two time series and distort the true similarity.
AC C
EP
3.4. Classifier Training and Test A binary classifier is trained to distinguish training sequences labeled with a certain activity from other training sequences, where the former and the latter will be referred as ‘positive’ and ‘negative’ sequences, respectively. Also, to gain the high discrimination power, a variety of characteristic subsequences need to be considered using a high volume of codewords – each sequence is represented by a high-dimensional feature vector. For this reason, the proposed method uses Supported Vector Machine (SVM) as it is known as effective for high-dimensional data, [33, 34]. The classification boundary is placed by the SVM in the middle between positive and negative sequences based on the margin maximization principle, so that the generalization error is theoretically not related to the number of dimensions. Actually, in the field of image/video classification, SVM is used as the standard classifier in the codebook approach [29]. A trained SVM produces a scoring value ranging from 0 to 1 for a test sequence, based on its distance to the classification boundary [35]. A larger value indicates that the test sequence is more likely to belong to the target activity class. Assuming that the performed experiment involves recognition of A activities,
7
ACCEPTED MANUSCRIPT
RI PT
A SVMs perform the task, each built as a binary classifier for one activity. The activity of a test sequence is finally determined as the one characterized by the highest SVM value. An important part of using an SVM is its parameter setting. Our method uses Radial Basis Function (RBF) kernel as it takes one parameter γ controlling the complexity of a classification. Another SVM parameter C controls the penalty of mis-classification. To obtain the best possible results, C and γ are set by performing grid search using cross validation [36]. This approach for setting SVM parameters is used throughout all the experiments.
TE
D
M AN U
SC
3.5. Multiple Feature Representation Since the device [7] used during experiments is capable not only to capture eye movements using an EOG sensor but also to deliver data collected with accelerometer and gyroscope, a proper method for fusing the extracted features is required. Assuming that each feature is a type of histogram based on an independently constructed codebook, there are two known approaches: early and late fusion [37]. The early fusion concatenates M features of N dimensions into a single feature of (M · N ) dimensions. The second approach uses the outputs of M classifiers built on each feature individually, to obtain the final scoring value. The simplicity and computational efficiency of the early fusion are suitable for the proposed method. In the early fusion, SVM training and test are performed once on concatenated features without any modification. Each of M extracted features is normalized to sum up to one. This ensures that each feature is treated equally in the newly created, high-dimensional feature. Moreover, the simple concatenation used by the early fusion approach still allows to consider the correlation among dimensions in different features.
AC C
EP
3.6. Investigating Codewords The steps described above focus on obtaining effective features in order to accurately assign a sequence to the appropriate activity class. Such histogramtype features represent the distribution of individual codewords within a sequence. However, deeper understanding of which codewords are characteristic for particular activity classes still requires investigation. For this reason, simple statistical analysis involving probability and entropy calculation is developed. By the definition, an entropy is an expected, average value of information contained in each event [38]. Also, the information is defined as the negative logarithm of the probability distribution of possible events. Assuming a codeword c, its entropy over A activity classes can be denoted with the following equation: A A X X H(c) = Pi (c) · Ii (c) = − Pi (c) · log2 Pi (c), (1) i=1
i=1
where Pi (c) is the probability that the codeword c is included in sequences for an activity i ∈ 1 . . . A. If the probability is distributed equally between A activities,
8
ACCEPTED MANUSCRIPT
SC
RI PT
the entropy is high indicating the uncertainty and so the chosen codeword is not so characteristic. In contrast, if the probability of a codeword is higher for one activity than the others, the value of entropy tends to zero and thus the codeword can be perceived as characteristic to the activity. Based on this, the desired codewords should be described with high probability and low entropy values. The dual description using probability and entropy is motivated by a simple fact that the probability itself cannot tell if a given codeword is characteristic of a particular activity class. Similarly, an entropy cannot answer the question which codeword has a higher probability of occurrence than the other ones. Only the combination of probability and entropy can indicate if and which given codeword is truly characteristic of a certain activity class.
M AN U
4. Experimental Results
This section presents the experimental results of our codebook-based method regarding the cognitive activity recognition task. At the beginning, a brief description of the dataset is provided with a short description of implementation. Then, different representations of subsequences are discussed along with various combinations of feature vectors and their influence on the recognition accuracy. Afterwards, an analysis of codewords is performed describing their connection to certain activity classes.
AC C
EP
TE
D
4.1. Collecting a dataset The dataset used throughout all the experiments is obtained with the smart eye-ware called JINS-MEME [7], which has EOG-based eye-tracker, accelerometer and gyroscope built into a spectacle frames. Figure 2 shows example data we collected using JINS-MEME glasses. EOG data consist of four data vectors (EOGL , EOGR , EOGH , EOGV ) and the additional data from three-axis accelerometer (ACCX , ACCY , ACCZ ) were collected with JINS-MEME glasses, since the preliminary experiments showed that the results obtained using only EOG signals are not good enough to effectively distinguish between several cognitive activities. The EOGL (or EOGR ) represent differences between the electric potential field of the pole at left (N L) (or right (N R)) nose pad and the reference one of the pole at the bridging part (BR) [39]. The EOGH is defined as a difference between the electric potential field of the left nose pad N L and the one of the pole at the right nose pad (NR), while EOGV is an average of EOGL and EOGR . Data collection was performed in a controlled environment at the laboratory of University of Siegen. One hundred adults, mostly university students, participated in experimental sessions, in which their eye and head movements were recorded while performing the following three daily activities: 1. reading a printed page of text in a participant’s native language, 2. drinking mineral water and 3. watching a video. We decided to investigate these activities based on the definition of cognitive activities in Section 1 and the assumption that
9
ACCEPTED MANUSCRIPT
M AN U
SC
RI PT
they are common in everyday life but they are not complex. Moreover, reading activity is an example of activity broadly discussed within many renowned studies on activity recognition, [21, 24, 25]. The drinking activity was chosen as a counterpoint in order to check if it can be distinguished from other cognitive activities. A broader set of cognitive activities such as reading, writing or tracking the moving object, as well as various levels of cognitive tasks will be investigated in our further research. After each activity recording, participants took a short break to calm their eyes. The data for each activity are EOG and accelerometer sequences with around 30 seconds long. Since these sequences are obtained with a high sampling rate of 100Hz, it is considered that they contain a sufficient information to build an effective classifier. The EOG and accelerometer data were collected by a Bluetooth-based data streaming application of the smart eye-ware [7]. To sum up, the dataset consists of 100 data samples for each of 3 activity classes, and each data sample consists of 7 sequences - 4 EOG and 3 accelerometer sequences.
Y
X
AC C
EP
TE
D
Z
Figure 2: Examples of accelerometer and EOG sequences obtained for reading activity.
10
ACCEPTED MANUSCRIPT
RI PT
Finally, the JINS-MEME glasses used during the experiments were equipped with plain, non-prescription glasses. However, since all the participants being a part of the experiments were well-sighted and none of them had to wear prescription glasses there was no influence on the results.
TE
D
M AN U
SC
4.2. Implementation details During the experiments a codebook is constructed using subsequences from all the 300 sequences from each type of sensor (i.e., 100 sequences × 3 activities), which are also preprocessed by a moving average filter with a sliding window wavg = 8. The performance is evaluated using two-fold cross validation, where a half of sequences are used for training and another half for test. Then, the second half is used for training and the first one for test. The cognitive activity recognition performance is measured as the accuracy, expressed as a percentage of how many sequences are correctly classified. The method uses three main parameters describing the size of sliding window w, the sliding size s, and the number of codewords N . The parameter s allows to control how densely the subsequences are sampled from a sequence. Assuming that a denser sampling usually leads to a better performance [40], s is set to 8 through all the experiments. The values of parameters w and N are investigated by taking combinations of values defined by w ∈ {8, 16, 32, 64, 128} and N ∈ {8, 16, 32, 64, 128, 256, 512}, to avoid under- or over-estimating recognition accuracies. Although we have investigated the optimal number of codewords by finding the elbow point of k-means clustering qualities [41], we were unable to find the optimal number of codewords that allows to obtain significantly better results than the others. Thus, the presented work reports results obtained by different numbers of codewords. Finally, using N , the obtained subsequences are clustered to construct a codebook for each of seven features: EOGL , EOGR , EOGH , EOGV , ACCX , ACCY and ACCZ . Early fusion is performed to combine these features into a single high-dimensional feature.
AC C
EP
4.3. Results of Cognitive Activity Recognition Using Codebook Approach Accuracies are calculated for 35 results, each of which is obtained by applying early fusion to individual features, extracted with one combination of sliding window size w and number of clusters N . All 35 results are calculated based on all available datasets using two-fold cross validation. Figure 3a depicts the accuracy distribution of these 35 results. The best accuracy of 86.6% was obtained for w = 128 and N = 64. During the experiments it was also observed that the values of w higher than 128 decrease the recognition accuracy. The reason for this can be considered as overfitting, where codewords that are very specific to training sequences are mistakenly regarded as useful for classifying test sequences. The accuracy distributions of activity recognition using the codebooks built with FFT coefficients are shown in Figure 3b, where the best result of 96.6% was obtained (w = 128, N = 256). It should be noted that a feature built as a combination of above-mentioned subsequence representations in both time and frequency domains was also investigated. However, there was no significant improvement in recognition accuracy. 11
ACCEPTED MANUSCRIPT
90
100
80
60
80
60
50
40
40
20
32
64
128
clusters
256
512
8
16
32
64
128
window size
50
40
40 30
0
20
16
60
60
8
10
16
32
64
128
clusters
0
256
512
8
16
32
64
128
window size
20 10 0
SC
8
70
20
30
0
80
RI PT
100
70
accuracy %
accuracy %
80
(a)
(b)
TE
D
M AN U
Figure 3: Accuracy distributions of cognitive activity recognition obtained with codebooks constructed on raw subsequences (a) and subsequences represented by FFT coefficients (b).
(a)
(b)
EP
Figure 4: The mean accuracy (a) and a standard deviation results (b) of cognitive activity recognition using four-fold cross validation based on datasets of random choice.
AC C
In addition to the two-fold cross validation used in experiments described above, a four-fold cross validation was investigated as an alternative partitioning scheme. In order to investigate the robustness of the proposed approach, the data were randomly assigned to the test and training datasets. Figure 4 shows the mean accuracies and standard deviation obtained from 10 iterations of randomly assigned data using four-fold cross validation. Results depicted in Figure 4a show that it is possible to obtain the accuracy of 100% with the codebook-based cognitive activity recognition approach, and the accompanying small standard deviation values shown in Figure 4b confirms the stability and consistency of our codebook-based recognition method.
12
ACCEPTED MANUSCRIPT
D
M AN U
SC
RI PT
4.4. Results of Codewords Investigation In this section, we investigate which codewords are characteristic to particular activity classes. Table 2 presents the exemplary results of the codeword investigation obtained for 9 of 64 codewords that were obtained with the proposed method using subsequences of preprocessed data (w = 64, N = 128). It can be observed that the data presented in Table 2 contain several features with high probabilities and low entropies in every activity class – desired situation as a low entropy indicates that the probability in one activity class is significantly higher than in the other classes. For example, the fourth codeword C4 for accelerometer sequences on z axis ACCZ has a very high probability for drinking activity. This coincides with the fact that one very characteristic action is taken during drinking, that is tilting the head back while taking the sip of water. Also, there is no eye movements that could be linked with this activity class, thus codewords for EOG sequences have relatively low probability values. Features corresponding to eye movements are more likely to occur while describing the reading activity, where we are following a line of text and switching between those lines using our eyesight. These actions are captured e.g., by codewords C10 , C31 and C62 for all four EOG sequences. The last activity class corresponding to watching television/video presents very little characteristics, only time to time showing some connection to eye movements – codeword C33 for EOGH sequence. Nevertheless, this is justified as follows: Usually we are watching a video sitting on a couch which is placed in some distance to the screen. Thus, in most situations the whole surface of the screen is within a field of view, so there is no reason to make large eye movements. These characteristic features are depicted in Figure 5.
AC C
EP
TE
4.5. Scalability to a Larger Number of Activity Classes Within the previous Sections 4.3 and 4.4 we validated the effectiveness of the codebook-based recognition method for three basic cognitive activities (reading, watching video and drinking), and demonstrated that the entropy-based investigation of codewords can provide us with knowledge about characteristics of these activities. This encourages us to perform experiments on a wider collection of activity classes. Specifically, new classes of cognitive activities were introduced to the dataset: engaging conversation, taking hand written notes and sorting numbers. The activity of engaging conversation involved two subjects talking to each other face to face in a sitting position. During another activity subjects were asked to take a handwritten notes based on the information that is provided on the whiteboard and presented orally by another subject. In case of the last activity subjects had to place 30 cards with random numbers in ascending order while all the cards were shuffled and presented to the subjects on a desk in such a way that they could see all the numbers. In all cases subjects were advised to behave naturally. The new dataset consists from 300 sequences from each type of sensor – 150 sequences for three basic activities and another 150 sequences collected for the new activity classes (50 sequences × 6 activities). With this
13
ACCEPTED MANUSCRIPT
Table 2: Exemplary results of codewords investigation using probabilities and entropies.
Probability C10
C19
C21
C22
C31
C33
C60
C62
Drinking
ACCX ACCY ACCZ EOGL EOGR EOGH EOGV
0.10 0.00 0.99 0.14 0.35 0.01 0.22
0.03 0.07 0.12 0.03 0.00 0.03 0.05
0.25 0.47 0.99 0.05 0.17 0.09 0.16
0.04 0.50 0.00 0.02 0.26 0.02 0.16
0.08 0.31 0.08 0.17 0.16 0.00 0.18
0.03 0.54 0.99 0.02 0.15 0.18 0.05
0.25 0.06 0.15 0.06 0.04 0.00 0.09
0.04 0.01 0.07 0.05 0.00 0.02 0.01
0.05 0.01 0.11 0.00 0.01 0.02 0.00
Reading
ACCX ACCY ACCZ EOGL EOGR EOGH EOGV
0.80 0.66 0.00 0.56 0.36 0.94 0.62
0.83 0.79 0.68 0.90 0.92 0.94 0.83
0.41 0.35 0.00 0.78 0.54 0.87 0.58
0.83 0.26 0.00 0.85 0.29 0.95 0.75
0.70 0.30 0.44 0.00 0.57 0.00 0.55
0.96 0.19 0.00 0.90 0.73 0.66 0.94
0.37 0.69 0.52 0.78 0.90 0.00 0.72
0.75 0.96 0.72 0.89 0.89 0.96 0.95
0.76 0.82 0.73 0.99 0.87 0.94 0.93
ACCX ACCY ACCZ EOGL EOGR EOGH EOGV
0.09 0.32 0.00 0.29 0.28 0.03 0.15
0.13 0.13 0.19 0.06 0.07 0.02 0.10
0.33 0.17 0.00 0.15 0.27 0.03 0.25
0.11 0.23 1.00 0.11 0.44 0.02 0.07
0.21 0.38 0.46 0.82 0.26 1.00 0.25
0.00 0.26 0.00 0.07 0.10 0.15 0.00
0.37 0.23 0.32 0.14 0.05 1.00 0.18
0.20 0.02 0.19 0.05 0.09 0.01 0.02
0.17 0.15 0.15 0.00 0.10 0.02 0.06
SC
M AN U Entropy
C10
C19
C21
C22
C31
C33
C60
C62
0.90 0.96 0.00 1.38 1.57 0.35 1.32
0.77 0.92 1.20 0.53 0.00 0.38 0.78
1.55 1.47 0.00 0.91 1.42 0.64 1.38
0.78 1.49 0.00 0.69 1.54 0.31 1.02
1.12 1.57 1.33 0.00 1.39 0.00 1.42
0.00 1.44 0.00 0.54 1.08 1.25 0.34
1.56 1.11 1.42 0.95 0.55 0.00 1.10
0.97 0.25 1.07 0.57 0.51 0.27 0.30
0.97 0.73 1.08 0.08 0.60 0.35 0.38
TE
C4
AC C
EP
ACCX ACCY ACCZ EOGL EOGR EOGH EOGV
D
Sequence
RI PT
C4
Watching video
Sequence
dataset, a series of experiments described in Section 4.3 were performed using a four-fold cross validation. Figure 6 depicts the mean accuracies and standard deviations obtained from 10 iterations, and shows that it is possible to obtain an accuracy of 97.4% even in case of six-class classification setting using the presented codebook approach (w = 128, N = 16). The low values of the accompanying standard deviations 0.6 for the best accuracy result confirm the stability and robustness of the proposed method. Further experiments, involving the entropy-based investigation of the codewords, allowed us to extract features characteristic to the newly introduced ac-
14
(a) C4 (ACCZ ) drinking
(b) C19 (ACCZ ) drinking
(d) C31 (ACCX ) reading
(e) C60 (ACCY ) reading
(f) C4 (EOGH ) reading
(g) C10 (EOGH ) reading
(h) C21 (EOGH ) reading
(i) C60 (EOGH ) reading
(j) C62 (EOGH ) reading
(k) C10 (EOGL ) reading
(l) C31 (EOGL ) reading
(n) C10 (EOGR ) reading
(o) C33 (EOGR ) reading
(p) C60 (EOGV ) reading
(r) C21 (ACCZ ) watching TV
(s) C22 (EOGH ) watching TV
(t) C33 (EOGH ) watching TV
EP
TE
D
M AN U
SC
(c) C31 (ACCZ ) drinking
RI PT
ACCEPTED MANUSCRIPT
AC C
(m) C62 (EOGL ) reading
(q) C62 (EOGV ) reading
Figure 5: Plots of characteristic codewords for particular activities in Table 2.
15
(b)
SC
(a)
RI PT
ACCEPTED MANUSCRIPT
M AN U
Figure 6: The mean accuracy (a) and a standard deviation results (b) of six class cognitive activity recognition using four-fold cross validation based on datasets of random choice.
EP
TE
D
tivity classes (Figure 7). In case of taking handwritten notes it is observed that features related to accelerometer sequences on axis x and y (ACCX , ACCY ) as well as to EOGH sequence have very high probabilities and low entropies. This coincides with the fact that during data collection a subject was asked to take the notes based on the information presented on the whiteboard. This triggered the head movement while the horizontal eyes movement results from tracking the consecutive lines of text. Investigation of the codewords characteristic to the activity of sorting numbers revealed that some features for accelerometer sequences on z axis ACCZ have very high probabilities with accompanying low entropies. In this case, instead of using their sight to search for cards with matching numbers lying on the desk, subjects were using head movements. Unfortunately, due to the lack of the eyes or head movement no characteristic codeword is extracted for the last activity class of engaging conversation. Nevertheless, the proposed codebook approach along with the entropy-based analysis can be useful for building a knowledge base of cognitive activities. 5. Conclusions
AC C
The experimental results presented in this paper show that the codebook approach can be successfully utilized for the cognitive activity recognition task. Applied to EOG and accelerometer data, the proposed method achieved high accuracy results, predicting proper activity classes in 99.3% of cases without using prior knowledge or heuristics. Moreover, the entropy-based investigation of codewords appeared to be an easy-to-use analytical tool to gain knowledge about subsequences characteristic to a particular class of activities. These can be used to build a knowledge base on cognitive activities as there is not much known about characteristics of particular cognitive activity classes and they are still under investigation. Regarding our future work, the current implementation uses hard assignment, deterministically assigning each subsequence to the nearest, single codeword. This lacks the flexibility while dealing with the uncertainty – when a 16
(b) C12 (ACCX ) taking notes
(e) C2 (ACCZ ) sorting numbers
(f) C7 (ACCZ ) sorting numbers
(c) C12 (ACCY ) taking notes
(d) C26 (EOGH ) taking notes
M AN U
SC
(a) C9 (ACCX ) taking notes
RI PT
ACCEPTED MANUSCRIPT
(g) C24 (ACCZ ) sorting numbers
(h) C30 (ACCZ ) sorting numbers
Figure 7: Plots of characteristic codewords extracted for activities of taking handwritten notes and sorting numbers.
EP
TE
D
subsequence shows similarity to more than one codeword. We plan to solve this issue with soft assignment, smoothly assigning subsequence to multiple codewords based on kernel density estimation [42] or with probabilistic assignment based on Gaussian Mixture Model [43, 34, 44]. Especially, probabilistic assignment can solve another problem that the current simple histogram based feature cannot precisely encode the distribution of codewords. To overcome this, probabilistic assignment represents a sequence with a very high dimensional vector, expressing the parameters of the probabilistic model (i.e., GMM) used to encode the distribution of codewords. This enables us to achieve accurate recognition using a simple and fast classifier like linear SVM. Acknowledgement
AC C
Research and development activities leading to this article have been supported by the German Federal Ministry of Education and Research within the project “Cognitive Village: Adaptively Learning Technical Support System for Elderly” (Grant Number: 16SV7223K). References
[1] M. Garbarino, M. Lai, D. Bender, R. W. Picard, S. Tognetti, Empatica E3 – a wearable wireless multi-sensor device for real-time computerized biofeedback and data acquisition, in: Proceedings of the Fourth International Conference on Wireless Mobile Communication and Healthcare - Transforming 17
ACCEPTED MANUSCRIPT
Healthcare Through Innovations in Mobile and Wireless Technologies (MOBIHEALTH), 2014, pp. 39–42. doi:10.1109/MOBIHEALTH.2014.7015904.
RI PT
[2] O. D. Lara, M. A. Labrador, A survey on human activity recognition using wearable sensors, IEEE Communications Surveys Tutorials 15 (3) (2013) 1192–1209. doi:10.1109/SURV.2012.110112.00192.
SC
[3] A. Bulling, J. A. Ward, H. Gellersen, G. Troster, Eye movement analysis for activity recognition using electrooculography, IEEE Transactions on Pattern Analysis and Machine Intelligence 33 (4) (2011) 741–753. doi: 10.1109/TPAMI.2010.86.
M AN U
[4] S. Ishimaru, K. Kunze, K. Tanaka, Y. Uema, K. Kise, M. Inami, Smart eyewear for interaction and activity recognition, in: Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems, CHI EA ’15, ACM, New York, NY, USA, 2015, pp. 307–310. doi:10.1145/2702613.2725449. URL http://doi.acm.org/10.1145/2702613.2725449 [5] A. Bulling, D. Roggen, G. Troester, What’s in the eyes for contextawareness?, IEEE Pervasive Computing 10 (2) (2011) 48–57. doi:10.1109/ MPRV.2010.49.
D
[6] K. Kunze, M. Iwamura, K. Kise, S. Uchida, S. Omachi, Activity recognition for the mind: Toward a cognitive ”quantified self”, Computer 46 (10) (2013) 105–108. doi:10.1109/MC.2013.339.
TE
[7] JINS MEME: The world’s first wearable eyewear that lets you see yourself, https://jins-meme.com/en/, accessed: 2017-03-08.
EP
[8] J. Garcia-Nieto, E. Alba, J. Apolloni, Hybrid DE-SVM approach for feature selection: Application to gene expression datasets, in: Proceeding of the Second International Symposium on Logistics and Industrial Informatics, 2009, pp. 1–6. doi:10.1109/LINDI.2009.5258761.
AC C
[9] B. C. Kuo, H. H. Ho, C. H. Li, C. C. Hung, J. S. Taur, A kernel-based feature selection method for SVM with RBF kernel for hyperspectral image classification, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 7 (1) (2014) 317–326. doi:10.1109/JSTARS. 2013.2262926.
[10] C. Persello, L. Bruzzone, Kernel-based domain-invariant feature selection in hyperspectral images for transfer learning, IEEE Transactions on Geoscience and Remote Sensing 54 (5) (2016) 2615–2626. doi:10.1109/TGRS. 2015.2503885. [11] J. W. Xu, K. Suzuki, Max-AUC feature selection in computer-aided detection of polyps in CT colonography, IEEE Journal of Biomedical and Health Informatics 18 (2) (2014) 585–593. doi:10.1109/JBHI.2013.2278023. 18
ACCEPTED MANUSCRIPT
RI PT
[12] T. Gu, L. Wang, Z. Wu, X. Tao, J. Lu, A pattern mining approach to sensor-based human activity recognition, IEEE Transactions on Knowledge and Data Engineering 23 (9) (2011) 1359–1372. doi:10.1109/TKDE.2010. 184.
[13] R. W. Picard, E. Vyzas, J. Healey, Toward machine emotional intelligence: analysis of affective physiological state, IEEE Transactions on Pattern Analysis and Machine Intelligence 23 (10) (2001) 1175–1191. doi: 10.1109/34.954607.
SC
[14] Y. Bengio, A. Courville, P. Vincent, Representation learning: A review and new perspectives, IEEE Trans. Pattern Anal. Mach. Intell. 35 (8) (2013) 1798–1828. doi:10.1109/TPAMI.2013.50. URL http://dx.doi.org/10.1109/TPAMI.2013.50
M AN U
[15] K. Shirahama, M. Grzegorzek, Emotion Recognition Based on Physiological Sensor Data Using Codebook Approach, Springer International Publishing, 2016, pp. 27–39. doi:10.1007/978-3-319-39904-1_3. URL http://dx.doi.org/10.1007/978-3-319-39904-1_3 [16] S. S. Hacisalihzade, L. W. Stark, J. S. Allen, Visual perception and sequences of eye movement fixations: a stochastic modeling approach, IEEE Transactions on Systems, Man, and Cybernetics 22 (3) (1992) 474–481. doi:10.1109/21.155948.
TE
D
[17] M. Elhelw, M. Nicolaou, A. Chung, G.-Z. Yang, M. S. Atkins, A gazebased study for investigating the perception of visual realism in simulated scenes, ACM Transactions on Applied Perception 5 (1) (2008) 3:1–3:20. doi:10.1145/1279640.1279643. URL http://doi.acm.org/10.1145/1279640.1279643
EP
[18] L. Dempere-Marco, X.-P. Hu, S. L. S. MacDonald, S. M. Ellis, D. M. Hansell, G.-Z. Yang, The use of visual search for knowledge gathering in image decision support, IEEE Transactions on Medical Imaging 21 (7) (2002) 741–754. doi:10.1109/TMI.2002.801153.
AC C
[19] D. D. Salvucci, J. R. Anderson, Automated eye-movement protocol analysis, Human-Compututer Interaction 16 (1) (2001) 39–86. doi:10.1207/ S15327051HCI1601_2. URL http://dx.doi.org/10.1207/S15327051HCI1601_2 [20] S. Ishimaru, K. Kunze, K. Kise, J. Weppner, A. Dengel, P. Lukowicz, A. Bulling, In the blink of an eye: Combining head motion and eye blink frequency for activity recognition with google glass, in: Proceedings of the 5th Augmented Human International Conference, AH ’14, ACM, New York, NY, USA, 2014, pp. 15:1–15:4. doi:10.1145/2582051.2582066. URL http://doi.acm.org/10.1145/2582051.2582066
19
ACCEPTED MANUSCRIPT
RI PT
[21] K. Kunze, M. Katsutoshi, Y. Uema, M. Inami, How much do you read?: Counting the number of words a user reads using electrooculography, in: Proceedings of the 6th Augmented Human International Conference, AH ’15, ACM, New York, NY, USA, 2015, pp. 125–128. doi:10.1145/ 2735711.2735832. URL http://doi.acm.org/10.1145/2735711.2735832
SC
[22] H. Peng, F. Long, C. Ding, Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy, IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (8) (2005) 1226–1238. doi:10.1109/TPAMI.2005.159.
M AN U
[23] N. Barbara, T. A. Camilleri, Interfacing with a speller using eog glasses, in: Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2016, pp. 1069–1074. doi:10.1109/SMC.2016. 7844384. [24] K. Huda, M. S. Hossain, M. Ahmad, Recognition of reading activity from the saccadic samples of electrooculography data, in: In Proceedings of the 2015 International Conference on Electrical Electronic Engineering (ICEEE), 2015, pp. 73–76. doi:10.1109/CEEE.2015.7428296. [25] S. D’Souza, S. Natarajan, Recognition of eog based reading task using ar features, in: International Conference on Circuits, Communication, Control and Computing, 2014, pp. 113–117. doi:10.1109/CIMCA.2014.7057770.
TE
D
[26] Z. Lv, X. Wu, M. Li, A research on eog feature parameters extraction based on linear predictive coding model, in: Proceedings of the Third International Conference on Bioinformatics and Biomedical Engineering, 2009, pp. 1–4. doi:10.1109/ICBBE.2009.5162234.
EP
[27] S. V. Vaseghi, Advanced Digital Signal Processing and Noise Reduction, 4th Edition, John Wiley and Sons, 2008. [28] J. Han, M. Kamber, J. Pei, Data Mining: Concepts and Techniques, 3rd Edition, Morgan Kaufmann, 2011.
AC C
[29] Y. G. Jiang, J. Yang, C. W. Ngo, A. G. Hauptmann, Representations of keypoint-based semantic concept detection: A comprehensive study, IEEE Transactions on Multimedia 12 (1) (2010) 42–53. doi:10.1109/TMM.2009. 2036235. [30] H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, E. Keogh, Querying and mining of time series data: Experimental comparison of representations and distance measures, Proc. VLDB Endow. 1 (2) (2008) 1542–1552. doi: 10.14778/1454159.1454226. URL http://dx.doi.org/10.14778/1454159.1454226
20
ACCEPTED MANUSCRIPT
RI PT
[31] C. A. Ratanamahatana, E. Keogh, Everything you know about dynamic time warping is wrong, in: Third Workshop on Mining Temporal and Sequential Data, Citeseer, 2004.
SC
[32] T. Rakthanmanon, B. Campana, A. Mueen, G. Batista, B. Westover, Q. Zhu, J. Zakaria, E. Keogh, Searching and mining trillions of time series subsequences under dynamic time warping, in: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12, ACM, New York, NY, USA, 2012, pp. 262–270. doi:10.1145/2339530.2339576. URL http://doi.acm.org/10.1145/2339530.2339576 [33] V. N. Vapnik, Statistical Learning Theory, Wiley-Interscience, 1998.
M AN U
[34] K. Shirahama, M. Grzegorzek, Towards large-scale multimedia retrival enriched by knowledge about human interpretation: Retrospective survey, in: Multimedia Tools and Applications, Vol. 75, 2016, pp. 297–331. [35] H.-T. Lin, C.-J. Lin, R. C. Weng, A note on platt’s probabilistic outputs for support vector machines, Machine Learning 68 (3) (2007) 267–276. [36] C.-C. Chang, C.-J. Lin, LIBSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology 2 (3) (2011) 27:1–27:27. doi:10.1145/1961189.1961199. URL http://doi.acm.org/10.1145/1961189.1961199
TE
D
[37] C. Snoek, M. Worring, A. Smaulders, Early versus late fusion in semantic video analysis, in: Proceedings of the 13th Annual ACM International Conference on Multimedia, MM’15, 2005, pp. 399–402.
EP
[38] T. Washio, E. Suzuki, K. M. Ting, A. Inokuchi (Eds.), Advances in Knowledge Discovery and Data Mining, 12th Pacific-Asia Conference, PAKDD 2008, Osaka, Japan, May 20-23, 2008 Proceedings, Vol. 5012 of Lecture Notes in Computer Science, Springer, 2008. doi:10.1007/ 978-3-540-68125-0. URL https://doi.org/10.1007/978-3-540-68125-0
AC C
[39] S. Kanoh, S. Ichi-nohe, S. Shioya, K. Inoue, R. Kawashima, Development of an eyewear to measure eye and body movements, in: 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2015, pp. 2267–2270. doi:10.1109/EMBC.2015.7318844. [40] E. Nowak, F. Jurie, B. Triggs, Sampling strategies for bag-of-features image classification, in: Proceedings of the 9th European Conference on Computer Vision, ECCV’06, 2006, pp. 490–503. [41] D. T. Pham, S. S. Dimov, C. D. Nguyen, Selection of k in k-means clustering, Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science 219 (1) (2005) 103–119.
21
ACCEPTED MANUSCRIPT
RI PT
[42] J. C. van Gemert, C. J. Veenman, A. W. M. Smeulders, J. M. Geusebroek, Visual word ambiguity, IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (7) (2010) 1271–1283. doi:10.1109/TPAMI.2009.132. [43] K. Shirahama, K. Uehara, Kobe University and Muroran Institute of Technology at TRECVID 2012 semantic indexing task, in: Proceedings of TREC Video Retrival Evaluation (TRECVID) 2012 Workshop, 2012, pp. 239–247.
AC C
EP
TE
D
M AN U
SC
[44] A. V. Ken Chatfield, Victor Lempitsky, A. Zisserman, The devil is in the details: an evaluation of recent feature encoding methods, in: Proceedings of the British Machine Vision Conference, BMVA Press, 2011, pp. 76.1– 76.12, http://dx.doi.org/10.5244/C.25.76.
22
ACCEPTED MANUSCRIPT
RI PT
Highlights
• Cognitive activity recognition based on EOG obtained from smart eyewear.
SC
• Codebook approach to extract useful features for accurate activity recognition. • Statistical analysis to discover subsequences characteristic to activities.
AC C
EP
TE D
M AN U
• Very accurate recognition on activities like reading, watching video and drinking.
1
ACCEPTED MANUSCRIPT
RI PT
Dr. Przemyslaw Lagodzinski is an Assistant Professor at the Department of Knowledge Engineering at the University of Economics in Katowice, Poland. He received the Diploma degree in Computer Science from the Silesian University of Technology, Gliwice, Poland in 2003 and the PhD degree in Computer Science from the Silesian University of Technology, Gliwice, Poland, in 2010. From 2002 to 2015, he was professionally related to the Silesian University of Technology, Gliwice, Poland, working as a Chief IT Specialist in the University Computer Center. He published several papers in image processing. His current research interests focus on pattern recognition, image processing, machine learning and sensor-based human activity recognition. He is a member of ACM SIGGRAPH.
M AN U
SC
Dr. Kimiaki Shirahama received his B.E., M.E. and D.E. degrees in Engineering from Kobe University, Japan in 2003, 2005 and 2011, respectively. After working as an assistant professor in Muroran Institute of Technology, Japan, since 2013, he is working as a postdoctoral researcher at Pattern Recognition Group in University of Siegen, Germany. From 2013 to 2015, his research activity was supported by the Postdoctral Fellowship of Japan Society for the Promotion of Science (JSPS), and is now supported within a project of German Federal Ministry of Education and Research (BMBF). His research interests include multimedia data processing, machine learning, data mining and sensorbased human activity recognition. He is a member of ACM SIGKDD, ACM SIGMM, Institute of Image Information and Television Engineers in Japan (ITE), Information Processing Society of Japan (IPSJ) and Institute of Electronics, Information and Communication Engineering in Japan (IEICE).
AC C
EP
TE D
Prof. Dr. Marcin Grzegorzek is Head of the Research Group for Pattern Recognition at the University of Siegen, Professor at the Department of Knowledge Engineering at the University of Economics in Katowice. He studied Computer Science at the Silesian University of Technology, did his PhD at the Pattern Recognition Lab at the University of Erlangen-Nuremberg, worked scientifically as Postdoc in the Multimedia and Vision Research Group at the Queen Mary University of London and at the Institute for Web Science and Technologies at the University of Koblenz-Landau, did his habilitation at the AGH University of Science and Technology in Kraków. He published more than 100 papers in pattern recognition, image processing, machine learning, and multimedia analysis. For the time being, he runs eight externally funded research projects. For instance, the project CogAge (www.cognitive-village.de) aiming at developing a user-friendly support system for elderly that applies machine learning algorithms for sensor-based health assessment.
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
None Declared