Pattern Recognition Letters 70 (2016) 52–58
Contents lists available at ScienceDirect
Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec
Nonparametric discovery of movement patterns from accelerometer signals✩ Thuong Nguyen b,∗, Sunil Gupta a, Svetha Venkatesh a, Dinh Phung a a b
Center for Pattern Recognition and Data Analytics, Deakin University, Waurn Ponds, Victoria 3216, Australia School of Computer Science and Information Technology, RMIT University, Melbourne 3000, Australia
a r t i c l e
i n f o
Article history: Received 14 November 2014 Available online 26 November 2015 Keywords: Movement intensity Activity recognition Accelerometer Bayesian nonparametric Dirichlet process
a b s t r a c t Monitoring daily physical activity plays an important role in disease prevention and intervention. This paper proposes an approach to monitor the body movement intensity levels from accelerometer data. We collect the data using the accelerometer in a realistic setting without any supervision. The ground-truth of activities is provided by the participants themselves using an experience sampling application running on their mobile phones. We compute a novel feature that has a strong correlation with the movement intensity. We use the hierarchical Dirichlet process (HDP) model to detect the activity levels from this feature. Consisting of Bayesian nonparametric priors over the parameters the model can infer the number of levels automatically. By demonstrating the approach on the publicly available USC-HAD dataset that includes ground-truth activity labels, we show a strong correlation between the discovered activity levels and the movement intensity of the activities. This correlation is further confirmed using our newly collected dataset. We further use the extracted patterns as features for clustering and classifying the activity sequences to improve performance. © 2015 Elsevier B.V. All rights reserved.
1. Introduction Physical activity directly affects human physical and mental health. The 2007–2009 Canadian Health Measures Survey [1] notes that only 15% of adults meet the recommended physical activity guidelines. The survey states this as one of the main reasons for the increasing trend of various diseases such as obesity, diabetes, high blood pressure and cardiovascular disease. Asztalos et al. [2] further show a positive relationship between physical activity and mental health. Long term activity monitoring can help improve the intervention of these diseases. Moreover, it can provide guidelines for changing one’s life style to reduce their risk. Recent advances in wearable sensor technology provide the opportunity to measure human physical activity or movement instead of inferring them from a survey or human observation. The accelerometer is the most popular sensor for this task due to its small size and low energy consumption [3]. It is also widely used in a recent arising trend of devices and applications to monitor physical activity for health monitoring and fitness assistance. Some examples are
✩
This paper has been recommended for acceptance by J. Laaksonen. Corresponding author. (Part of this work was done when Thuong was with Deakin University) Tel.: +61 3 9925 9678. E-mail addresses:
[email protected] (T. Nguyen),
[email protected] (S. Gupta),
[email protected] (S. Venkatesh),
[email protected] (D. Phung). ∗
http://dx.doi.org/10.1016/j.patrec.2015.11.003 0167-8655/© 2015 Elsevier B.V. All rights reserved.
Fitbit1 devices and various mobile phone applications on iOS and Android for activity tracking. Although these devices and applications help the user keep track of her daily activities, most of them are only focus on step counting. However, different intensity levels and their duration may have different effects on people’s health [4]. The step count might not be efficient to reflect all aspects of a user’s activities. Thus, there is a need to detect the intensity levels and duration of the activities. In this paper, we propose an approach to detect the body movement intensity levels from the accelerometer data. By investigating the public USC-HAD dataset [5] that includes the labeled sequences of various activities, we find a feature that reflects the intensity of body movement. As this feature is continuous, we need to discretize it to obtain the activity levels. The levels can be modeled as a mixture of normal distributions and inferred from the data by clustering approaches, e.g. Gaussian mixture model (GMM). These methods, however, require the number of levels to be specified in advance. This information is not always available and may vary among the users. They also require the data to be aggregated into a single flat structure. As the data points in an activity sequence might have some mutual correlation, this flat structure might not be adequate to model the hierarchical and grouping nature of the data. We address these problems by employing the hierarchical Dirichlet process (HDP) model [6].
1
http://www.fitbit.com .
T. Nguyen et al. / Pattern Recognition Letters 70 (2016) 52–58
The HDP model organizes the data into documents, which are usually bags of words. In our case, a document is a set of accelerometer signals obtained from an activity sequence. The HDP model can discover the clustering structures within the documents and also share the structures among them. Due to the use of Dirichlet process priors on the parameters, the model can automatically discover the number of activity levels from the data. To demonstrate the effectiveness of the discovered activity levels, we use them as the features for clustering activity sequences. The activity clusters are consistent with the labels provided in the dataset. Although providing a good benchmark dataset for analyzing human activity, the USC-HAD dataset was collected in a control setting under the observation of a researcher. We, therefore, further collect a new dataset in our lab using the Sociometric badge [7]. We provide a badge to each participant to wear during working hours for three weeks. Motivated by the experience sampling framework [8], we use the Magpi application2 running on mobile phones to collect the ground-truth in the most possible natural way. The participants answer the questionnaire about their latest activities through the application when they can spare some time. They can choose multiple labels for an activity sequence, which is a common scenario. We then obtain the activity sequence preceding to the answer time and assign the chosen labels to it. We repeat the experiments using the HDP model on this data to obtain the activity levels that are consistent with the labels provided by users. We then use the mixing proportions over the activity levels of the sequences as the features to perform a multi-label classification. The classification performance using these features outperforms that using the fast Fourier transform (FFT) features. The proposed HDP model can deal with not only univariate data but also multivariate data. We further use it to discover the activity patterns in a multivariate setting on two features—the mean and standard deviation of signal magnitude. Using this setting, we can discover an extra type of activity compared to the univariate setting. Our main contributions in this paper are: (1) A new collection method for human activity data in a natural setting. The groundtruth is collected using experience sampling on mobile phones. (2) An extraction of physical activity levels using the HDP model. The number of levels is inferred automatically. (3) A demonstration of the effectiveness of the extracted patterns for clustering and multilabel classification toward improved performance. (4) An extraction of the activity patterns using the HDP model in a multivariate setting on two features: the mean and standard deviation of the signal magnitude. The rest of this paper is organized as follows. Section 2 reviews the related work in activity data collection and recognition methods. Section 3 introduces the datasets used in this paper, including the public USC-HAD dataset and the Sociometric dataset that we collected in our research lab. Section 4 presents the HDP model for activity level detection. Section 5 reports our experimental results. And finally we conclude our paper in Section 6. 2. Related work This section reviews the related work on accelerometer-based activity recognition. We focus on the following two aspects: the data acquisition and the recognition methods. 2.1. Data collection The data collection setting affects the performance of activity recognition in many aspects. Two main factors are sensor setting including the number and placement of sensors, and the label collection setting. 2
https://www.magpi.com/ .
53
2.1.1. Multiple sensors vs single sensor Early work in activity recognition uses multiple sensors to enhance the recognition. Bao and Intille [9] use five bi-axial accelerometers placed at different parts of user’s body (thigh, ankle, arm, wrist and hip). Olguín and Pentland [10] use three accelerometers worn at right wrist, left hip, and chest. They examine the classification using four different combinations of these positions. Parkka et al. [11] use two accelerometers placed at chest and wrist. Huynh et al. [12] use two accelerometers put in the right hip pocket and on the right wrist. Atallah et al. [13] use six accelerometers worn at chest, upper arm, wrist, hip thigh, ankle and ear. In spite of the high classification performance, it is obtrusive for the users to perform real-life activities while they are wearing such complicated systems. Thus many recent studies have shifted to a single sensor setting. Ravi et al. [14] use an accelerometer worn near the pelvic region. Karantonis et al. [15] use an accelerometer worn at the waist to classify activities and detect fall. A recent work [16] systematically examines the accelerometer setting for activity recognition including the number of sensors and their placement. The work states that the recognition performance is not improved using more than two accelerometers and the hip and chest are the two best places for activity recognition using one single accelerometer. In this paper, we do not focus on fine-grain activity recognition, thus we use an accelerometer integrated in the Sociometric badge worn at one’s chest. Wearing the badge is similar to wearing a name tag, thus it is less obtrusive than other positions. Our early work along this line of research has been reported in [28].
2.1.2. Label collection The label collection method depends on the data collection setting. Under a laboratory setting, the subjects are required to perform a particular activity during a particular time interval. The activity label can be easily recorded under the observation of a researcher. However, it is difficult to obtain the labels when the data is collected outside the laboratory without the supervision of researchers. Under this setting, the labels are usually provided by the users themselves. For example, in [9], the users perform each of 20 different activities and annotate the activity along with start and end time stamp. Recent studies utilize smart phones to collect labels. Typical approach is using an application to allow users selecting the activity that they perform and click start and end button before and after the activity, respectively [17]. However, there is a bias in this approach as the users know the activity while they are performing it. In this paper, we collect the data in a totally natural way as described in Section 3.2.
2.2. Recognition methods A wide range of machine learning methods have been used for activity recognition. The most popular ones are the supervised learning algorithms. Some examples are decision tree [9,15], k-nearest neighbor (kNN) [9,14], naive Bayes [9,12,14], hidden Markov models (HMM) [10,12], support vector machine (SVM) [12,14]. Cleland et al. [16] recently compare these algorithms and conclude that SVM is the best algorithm for this task. However, these methods are suitable for single-label data only. As each sequence in our data may have multiple labels, we use a multi-label classification algorithm. The features used for activity recognition vary from time domain to frequency domain. Huynh et al. [12] use a topic model to extract the features for the task. Our approach can be seen as a nonparametric version of the topic model in [12]. A recent work [18] uses the HDP model for activity and routine recognition, but the features are extracted using the Dirichlet process mixture model (DPM).
54
T. Nguyen et al. / Pattern Recognition Letters 70 (2016) 52–58
Standard deviation
2
12 11
1.5 1 6
0.5 1
2
3
4
7
8 9
5
10
0 0
50
100
150 Time (seconds)
200
250
Fig. 1. An example sequence containing multiple activities extracted from the USCHAD dataset. The red line is the standard deviation of the signal magnitude obtained from the accelerometer readings. The activities are: 1. Sitting, 2. Standing, 3. Sleeping, 4. and 5. Elevator up and down, 6. Walking forward, 7. and 8. Walking left and right, 9. and 10. Upstairs and downstairs, 11. Running, 12. Jumping up. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
3. Data collection and feature extraction 3.1. USC-HAD dataset The USC human activity (USC-HAD) dataset [5] was collected using the MotionNode sensing platform. Each MotionNode includes a triaxial accelerometer, a triaxial gyroscope, and a triaxial magnetometer. The sampling rate fs is 100 Hz. There are 14 participants in the collection. They perform 12 basic activities, each is repeated 5 times. There are totally 840 recorded sequences. The activity labels are recorded by a near-by observer. In this paper, we only use the accelerometer signals for activity analysis. From the accelerometer readings, i.e. axi , ayi , azi , we compute the signal magnitude as:
Ei =
a2xi + a2yi + a2zi
(1)
For every 1 s interval, we compute the standard deviation of the signal magnitude. We visualize an example of a sequence including multiple activities in Fig. 1. In this example, it can be seen that the standard deviation is closely related to the activity types. For example, when the subject performs sedentary activities (1–5), it is close to 0; it is around 0.3–0.5 for the walking activities (6–10); while it is more than 1 for running and jumping activities. Therefore, we use this feature to detect the movement intensity levels of the activities. The method to learn the levels will be discussed in Section 4. 3.2. Sociometric dataset: a natural activity data collection Although providing a good benchmark dataset for activity analysis, the USC-HAD dataset was collected in an unnatural setting. To further confirm the results on a more natural setting, we collect a new dataset using the Sociometric badges with the ground-truth collected using the experience sampling method. 3.2.1. The Sociometric badge We collect a new dataset using the Sociometric badge [7] from 13 participants in our research lab during working hours for three weeks. The badge is integrated with an ADXL-330 triaxial accelerometer with the sampling frequency fs = 250 Hz. This sampling frequency is recommended by the badge manufacturer [19]. The signals are recorded in three axes and normalized using the absolute value of gravity |g| and the zero gravity point g0 [19]. The signals in the USCHAD dataset are not normalized, thus the value range might be different between the two datasets. The features are extracted from the signal magnitude computed using Eq. (1). For every 1 s interval, the badge generates a feature named consistency. The consistency values reflect the stability of the energy values. It ranges from 0 to 1, where 1 indicates no change in energy, and 0 indicates the maximum variation in energy. The examples of the consistency sequences are presented in Fig. 5.
We observe that the consistency feature has a similar correlation with the intensity of body movement as the standard deviation computed in the USC-HAD dataset. However, it is opposite to the standard deviation feature in the sense that its highest value reflects the most sedentary activities, while in USC-HAD dataset, the sedentary activities have the standard deviation values close to 0. Although we would like to compute the standard deviation feature for the Sociometric data, it is not possible as the Sociometric badge only provides its derived features without the original signal readings. 3.2.2. Experience sampling for ground-truth collection The collection of ground-truth conducted by an observer in the USC-HAD dataset has various drawbacks. First, the participant knows the activity to perform in advance, thus she might have some bias during such activity and the data collected might not be natural. In addition, it is difficult to have an exact segmentation of the activities in real situations. Therefore, we let the participants wear the badges during their daily activities. We record the data continuously for the whole working day and collect the ground-truth of activities at some certain points. Inspired by Larson and Csikszentmihalyi [8], we collect the ground-truth information using the experience sampling—a popular method in social studies. In experience sampling, the participants are required to stop at certain times and answer a questionnaire about their experience. The goal is to record some of the ongoing attributes related to the subjects. A key point of this method is that the subjects should not know the sampling time in advance so that they cannot anticipate the event and just act naturally. Instead of using papers to take note as in the original study, we use the Magpi application that runs on mobile phones. The subjects only need to open the application, choose the options in the questions and submit the answer. The activity data collected in this manner is more natural than the USCHAD dataset. The question to be surveyed is “Which activity did you perform most in the last 10 minutes?” There are 6 options: sitting, standing, walking, playing sports, running, and not sure. The participants decide to answer whenever they are willing to. As we fix the sampling duration to 10 min preceding to the answering time, we allow the users to choose more than one option. This solution is based on a general scenario where the user might perform multiple activities during a 10 min duration. For example, she might sit for 3 min, walk for 2 min and sit for the rest 5 min. This solution leads to a multi-label activity dataset, i.e. an activity sequence might have multiple labels. Thus the ground-truth is not as clean as that collected in a laboratory setting like the USC-HAD dataset. During 3 weeks, we receive 360 answers from all users. Out of those, only 201 answers are valid due to data missing or corruption during the collection. The statistics of these answers are presented in Table 1. The options running and not sure are not chosen, hence we eliminate them from the analysis. We also count the co-occurrence of the activities in the collected sequences. Of these 201 sequences, 163 sequences have single label and 38 sequences have two labels. There is no sequence having more than two labels. The statistics of the co-occurrence of the labels is Table 1 Statistics of answers for activity groundtruth collection. Activity
Count
Percentage
Sitting Standing Walking Playing sports Running Not sure
158 19 54 8 0 0
78.60 9.45 26.87 3.98 0.00 0.00
T. Nguyen et al. / Pattern Recognition Letters 70 (2016) 52–58 Table 2 Statistics on activity co-occurrence in the recorded sequences. The diagonal is the numbers of single-label sequences.
Sitting Standing Walking Sports
Sitting
Standing
Walking
Sports
125 6 26 1
6 8 5 0
26 5 23 0
1 0 0 7
4.2. Hierarchical Dirichlet process
presented in Table 2. It can be seen from this table that sitting is the dominant activity. This is reasonable as the participants tend to provide the answer when they are free. Among the mixed sequences, sitting and walking have the largest number of co-occurrence. Although these statistics present an overview of the collected dataset, there is an important characteristic that cannot be shown. Even though the ground-truth is multi-label, it might still be missing as we rely totally on the users and they might not select all the activities that they performed during the sampling interval. 4. Framework We provide a brief description of the Dirichlet process and the adaptation of the hierarchical Dirichlet process (HDP) model to infer the activity levels from accelerometer data.
The DPM model is typically used to model a single data set. However, sensor data is typically organized in multiple sets. In our activity data, data points from an activity sequence can be seen as a set and they have some mutual correlation inside a set. Data from multiple sets may also share some correlation. The hierarchical Dirichlet process (HDP) [6] is particularly suitable for this type of data. In HDP, each set of data points is modeled using a DPM and these models are assumed to share the hyperparameters through some links. The purpose is to exploit the shared statistics between the sets through a common set of mixture components. Each set has its own mixing proportions over the components. The dependency between the mixing proportions of the sets is specified by another DP. Let us assume that there are J sequences indexed by j = 1, . . . , J and the jth sequence has Nj data points. Let xji be the ith data point of the jth sequence. We model the distribution of activity data as a mixture of normal distributions. The mixture distribution of each sequence Gj is connected with other mixture distributions via a DP sharing the same base measure G j ∼ DP(α , G0 ) for j = 1, . . . , J, where G0 is the shared base measure drawn from another DP G0 ∼ DP(γ , H). It can be clearly seen that Gj s and G0 share the same support . Following the stick breaking process in Eq. (2), Gj s and G0 can be computed by:
Gj =
∞
π jk δθk , G0 =
k=1
4.1. Dirichlet process The Dirichlet process (DP) has been used as a prior distribution over the parameters of mixture models [20]. Its generative process can be described as a Chinese restaurant metaphor [21] where each data point is a customer and each mixture component is a table. A data point assigned to a component is equivalent to a customer choosing a table. This restaurant is assumed to have infinite number of tables and each table has infinite number of seats. The first customer chooses any arbitrary table. The ith customer chooses an existing table with the probability proportional to the number of customers sitting there, or a new table with the probability proportional to a parameter α . This parameter controls the growing number of tables. Given this metaphor, the number of components can grow to infinite if needed. Alternatively, a DP can be realized through a stickbreaking construction [22]:
G=
∞
βk δθk
(2)
k=1
iid
55
∞
βk δθk
respectively, where π j is the mixing proportion of the jth sequence. Using the stick-breaking process in [6], the generative process of HDP model becomes:
θk ∼ H, for k = 1, . . . , ∞ πj x ji | z ji ∼ f (x ji | θz ji ) z ji ∼
p( z i = k | · ) ∝
∞
n−i f −i (xi | · ), k k
existing k
α
new k
f −i k
(xi | · ),
(3)
where f −i (xi | · ) is the predictive likelihood of observing xi under k component k. Gibbs sampling is a typical approach for inference of DPM [23]. By dealing with each data point as the last point iteratively and sampling the posterior using Eq. (3), the model can infer the number of mixture components automatically as it can introduce new components if needed and remove the unused ones.
(5)
where the likelihood function f and the base measure H are chosen depending on the data format. For example, f follows a univariate normal distribution and H is a normal-gamma distribution for univariate data. For multivariate data, f follows a multivariate normal distribution and H is a normal-Wishart distribution. We demonstrate these two settings in Sections 5.1 and 5.2, respectively. The generative process of the HDP model is presented in Fig. 2. The inference of this model is usually performed using the MCMC methods. Specifically, this sampling scheme integrates out the parameters π1: J , θ 1: K and iteratively samples zji s and β k s using the following conditional distributions [6].
θk ∼ H, k = 1, . . . , ∞, and β = (βk )k=1 are the where weights constructed through the stick breaking process iid βk = vk s
(4)
k=1
Fig. 2. A graphical representation of the HDP mixture model .
56
T. Nguyen et al. / Pattern Recognition Letters 70 (2016) 52–58
Algorithm 1. Collapsed Gibbs sampling for HDP.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13:
Input: x = [xji ], initK. Initialize z to a random integer matrix with values between 1 to initK. while (not converged) do Sampling m1:K using Eq. (8). Sampling β1:K , βnew using Eq. (7). for j = 1 to J do for n = 1 to Nj do Sampling zji using Eq. (6). Remove inactive components. Set K = Number of active components. ˆ 1:J using Eq. (9). Estimate π Estimate θˆ1:K using Eq. (10). ˆ 1:J , θˆ1:K . Output: z, π
Fig. 3. Visualization of standard deviation and the discovered activity levels. Each activity level is presented as a normal distribution. .
5.1. Univariate setting for movement intensity detection
Sampling zji includes two parts:
p(z ji = k | · ) ∝
(αβk + n−k ji ) f −i (x ji ), k
old k
(αβnew ) f −i (x ji ), k
new k
(6)
− ji
where nk
is the number of data points belong to component k ex cluding xji , f −i (x ji | · ) = θ f (x ji | θk , x− ji , z− ji )h(θk )d (θk ) and h( · ) k k
is the density function of H. When h and f are conjugates, the integration term can be evaluated analytically, resulting in a ‘collapsed’ Gibbs sampling scheme. Sampling β 1: K and βnew is intractable. However, they can be sampled through auxiliary variables mjk s as: γ −1
q(β1:K , βnew | z, m ) ∝ βnew
K
βk
j
m jk −1
(7)
k=1
πˆ jk =
(z ji = k ) + α N j + αK
n=1
θˆk = MAP h(θk )
(9)
f x ji
| θk
τ 1/2 τ exp − (x − μ )2 2π 2
(11)
where τ = 1/σ 2 is the precision . This is the likelihood function f presented in Eq. (5). Under Bayesian settings, we further use a normalgamma distribution over the parameters μ and τ in the place of the base measure H in Eq. (5):
τ | α0 , β0 ∼ Gamma(τ | α0 , β0 )
(12)
μ | μ0 , s0 , τ ∼ N (μ | μ0 , (s0 τ )−1 )
(13)
(8)
After finishing the Gibbs sampling steps, the parameters π 1: J and θ 1: K can be estimated using the last Gibbs sample:
N j
p( x | μ , σ 2 ) =
where μ0 , α 0 , β 0 and s0 are the hyperparameters for the normalgamma prior distribution.
where mjk s are sampled as:
q(m jk = m | z, m− jk , β ) ∝ s(n jk , m )(αβk )m
In this setting, we infer the activity levels from a single feature (standard deviation for the USC-HAD dataset and consistency for the Sociometric dataset). The data includes multiple sequences. We treat each data point as a random number drawn from a univariate normal distribution with unknown mean (μ) and unknown variance (σ 2 ):
(10)
n, j|z ji =k
A step-by-step summary of the collapsed Gibbs sampling algorithm for the HDP model is provided in Algorithm 1. In this algorithm, new mixture components can be introduced at step 8, and an existing component can also be removed if it is unused any longer (step 9). Therefore, the algorithm can infer the number of mixture components automatically. The parameter π jk in Eq. (9) represents the proportion of the kth component contributes to the jth activity. Each π j is a K-dimensional vector, where K is automatically estimated from the data. To further demonstrate the effectiveness of our model, we use this vector as the features for classification or clustering the sequences.
5. Experiments We demonstrate the HDP model using two different settings: a univariate setting to detect the movement intensity from a single feature, and a multivariate setting on two features.
5.1.1. Univariate setting on USC-HAD data This dataset includes 840 sequences. We run Gibbs sampling for HDP using 110 iterations including a burn-in period of 10 iterations. We report the results using the last Gibbs sample that includes three mixture components. Each component is a normal distribution as visualized in Fig. 3. In this figure, the signals of sedentary activities (e.g. sitting, standing, lying) belong to the first component; the signals of walking belong to the second component and the signals of running and jumping belong to the third component. Therefore, these components can be seen as the activity intensity levels. Using the mixing proportion π j as the features, we can further cluster the activity sequences into groups. As π j s are multinomial distributions, we use the Jensen–Shannon (JS) divergence as the distance metric between the sequences where we recall that JS divergence is defined as:
JS( p, q ) =
1 [KL( p||m ) + KL(q||m )] 2
(14)
p where m = 12 ( p, q ) and KL( p||m ) = i pi log mi is the Kullback– i Leibler divergence. The similarity between two sequences i and j is defined as:
Similarity(i, j ) = e−JS(πi ,π j )
(15)
where π i and π j are the mixing proportions of the sequences i and j, respectively. We use the affinity propagation (AP) algorithm [24] to cluster the sequences from the similarity matrix computed by Eq. (15). AP is also a nonparametric algorithm as it can discover the number of clusters automatically. From our data, the AP algorithm clusters the sequences
T. Nguyen et al. / Pattern Recognition Letters 70 (2016) 52–58
57
Table 3 Comparison of FFT features vs proportion features for multi-label classification on Sociometric dataset. Bold metrics indicate better performance, which is smaller for Hamming loss and larger for the others.
Fig. 4. Confusion matrix between the ground-truth activities and the clusters discovered by AP algorithm. The number on x-axis indicates the ground-truth activity: 1–5 are the walking activities; 6 is running; 7 is jumping; 8–12 are the sedentary activities (e.g. sitting, standing, lying). The value of a cell is the number of sequences belong to the activity assigned to the corresponding cluster.
into 5 groups. We visualize the confusion matrix between the activities and the clusters in Fig. 4. In this matrix, there are three main clusters of sequences: walking sequences in the first group; sedentary activities in the second group; running and jumping sequences in the third group. There are only some sequences that are clustered into separate groups. This result further confirms the strong correlation between the movement intensity and the activity patterns discovered by HDP model. 5.1.2. Univariate setting on Sociometric data As discussed in Section 3, we observe that consistency feature of the Sociometric data is equivalent to the standard deviation of magnitude of the USC-HAD dataset. Thus, we use the same univariate setting to detect the activity levels from the consistency feature. There are 201 activity sequences in this dataset. We run Gibbs sampling for HDP using 250 iterations including a burn-in period of 50 iterations. The algorithm discovers 4 components, each is a normal distribution. An illustration of the normal distributions learned by HDP and some activity sequences is presented in Fig. 5. As the consistency values are opposite to the standard deviation values used in Section 5.1.1, the consistency values are closer to 1 if the activity is more sedentary. The normal distributions in the left side of Fig. 5 reflect the body movement intensity. This correlation is illustrated in the examples of three activity sequences in the same figure. The consistency values of the first sequences are mostly close to 1 and all of them belong to the first component. Given that its label is sitting, we can consider the first component as the most sedentary level. The consistency values of the second sequence belong to three components from 1 to 3. Its label is walking. However, as the label might be missing, the user might have sat and walked but she only marked walking as the label. In the third sequence, with the label of playing sports, the consistency values
Fig. 5. Physical activity levels extracted by HDP (left) and the consistency values of three sequences. The labels of these sequences are sitting, walking, and playing sports, respectively. The inferred component indicators (zji s) are presented in the bottom part.
Metrics
FFT features
Proportion features
Hamming loss Accuracy Precision Recall
0.1845 ± 0.0209 0.6503 ± 0.0373 0.7070 ± 0.0395 0.7066 ± 0.0407
0.1357 ± 0.0188 0.7464 ± 0.0328 0.8166 ± 0.0359 0.7755 ± 0.0360
mostly belong to the third and fourth components. Given the fact that the participants playing table tennis, which includes the movement at multiple intensity level, the mixture between two components in this sequence is reasonable. Thus these components represent high intensity activities. The mixing proportions (π j ’s) can be used as features for further analysis on data. As the ground-truth information obtained in our project is multiple labels, we employ the Mulan library [25] for multilabel classification. We obtain the baseline by running the Mulan classifier [25] using the FFT features derived from the consistency sequences. We evaluate the multi-label classification using the standard metrics (Hamming loss, accuracy, precision, recall) [26]. The comparison of our mixing proportion features with FFT features for the multi-label classification is presented in Table 3. This table shows that our features outperform FFT features in all metrics. 5.2. Multivariate setting for activity clustering We further use the HDP model to discover the patterns from activity data in a multivariate setting. In particular, we use two features computed from accelerometer signals: the mean and standard deviation of signal magnitude. This is a special case of multivariate setting. We model each data point as a random draw from a multivariate normal distribution with unknown mean (μ) and unknown covariance ():
p(x | μ, ) =
1 || exp − (x − μ )T (x − μ ) 2π 2
(16)
where = −1 is the precision matrix. This is the likelihood function f presented in Eq. (5). Using a Bayesian setting, we further impose a normal-Wishart distribution over (μ, ):
| W, ν ∼ W ( | W, ν )
(17)
μ | μ0 , λ, ∼ N (μ | μ0 , (λ)−1 )
(18)
where W, ν , μ0 , λ and are the hyperparameters of the normalWishart prior distribution, which is the base distribution H in Eq. (5). We run the HDP for each user separately as the data scales are different among the users. To avoid the effect of the feature scale to the covariance, we normalize both features by rescaling them to a unit range (subtract the value to min and divide it by max–min). For each user, we run the Gibbs sampling of HDP using 110 iterations including a burn-in period of 10 iterations. We report the results using the last Gibbs sample from each user. There are 7 users that have 4 activity atoms and the remaining 7 users have 5 activity atoms. We visualize the atoms of 2 users in Fig. 6. The activity atoms of 7 users that have 4 atoms follow a typical pattern (Fig. 6(a)): the sedentary data points form the smallest atom; the walking data points form the second atom; the running and jumping data points form two separate atoms. An example of the users that have 5 atoms is presented in Fig. 6(b). For these users, there are two atoms to present the walking data points. We further evaluate the activity atoms by using the mixing proportion π j (cf. Fig. 2) as the features for clustering the activity sequences. As in Section 5.1.1, we use the JS divergence as the distance
58
T. Nguyen et al. / Pattern Recognition Letters 70 (2016) 52–58
a
b
Activity atoms of user 1
1 Standard deviation
Standard deviation
1
mean and standard deviation of the signal magnitude. This multivariate setting helped distinguish more types of activities than the univariate setting. The proposed method offers a novel approach for monitoring the users’ daily activities and maintaining an efficient amount of activity in their daily behavior to improve their health. Although the data collected in this paper is natural, it has a limitation in the unclean ground-truth labels. However, this is a trade-off between the natural data and the ground-truth. In the future, it is interesting to find the relation between the activity levels and the energy expenditure.
Activity atoms of user 10
0.8 0.6 0.4 0.2 0
0.8 0.6 0.4 0.2 0
0
0.2 0.4 0.6 0.8 Signal magnitude
1
0
0.2 0.4 0.6 0.8 Signal magnitude
1
Fig. 6. Bivariate activity atoms of 2 users. Each atom is a 2D normal distribution plotted as an ellipse using its mean and covariance. In (a), user 1 has 4 atoms; in (b), user 10 has 5 atoms.. Table 4 Clustering performance of USC-HAD data using the mixing proportions obtained from the bivariate setting HDP. User
F-measure
NMI
RI
Purity
1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mean
1.000 0.973 1.000 0.945 1.000 0.968 0.962 0.857 0.927 0.893 0.993 1.000 0.912 0.821 0.946
1.000 0.933 1.000 0.881 1.000 0.911 0.893 0.841 0.926 0.864 0.962 1.000 0.912 0.809 0.924
1.000 0.981 1.000 0.962 1.000 0.978 0.973 0.911 0.953 0.929 0.995 1.000 0.944 0.891 0.966
1.000 0.983 1.000 0.950 1.000 0.967 0.933 0.967 1.000 0.917 0.983 1.000 1.000 0.967 0.976
metric and the similarity of 2 sequences i and j is e−JS(πi ,π j ) . We employ the AP algorithm for clustering the sequences. To evaluate the clustering performance, we divide the ground-truth labels into 4 groups: sedentary (including sitting, standing, lying, using elevator), walking, running and jumping. We then use these groups as the ground-truth to compute the standard clustering metrics: F-measure, normalized mutual information (NMI), Rand-index (RI) and purity [27, pp. 356–360]. The performance metrics of 14 users are reported in Table 4. In this table, 4 out of 14 subjects have perfect clustering results (all metrics are equal to 1). Most of the remaining subjects have high performance and only 3 out of 14 subjects have F-measure less than 0.9. Overall, the mean values of the metrics show a good clustering with all metrics are greater than 0.9. This result shows that using a bivariate setting over the mean and standard deviation of signal magnitude, the HDP model can distinguish between the running and jumping sequences, while they are in the same group using univariate setting as in Section 5.1.1. 6. Conclusion We have presented a method to infer the physical activity levels from accelerometer data. We learned the activity levels using the hierarchical Dirichlet process (HDP) model that can infer the number of levels automatically. Running the model on the USC-HAD dataset that includes the labels activity sequences, we demonstrated the strong correlation between the extracted activity levels and the movement intensity of the activities. This correlation was further confirmed on a natural dataset collected in our lab. We then demonstrate the effectiveness of the extracted patterns for clustering and classifying the activity sequences toward improved performance. The HDP model was also illustrated in a multivariate setting using two features—the
References [1] R.C. Colley, D. Garriguet, I. Janssen, C.L. Craig, J. Clarke, M.S. Tremblay, Physical activity of Canadian adults: accelerometer results from the 2007 to 2009 Canadian Health Measures Survey, Statistics Canada, 2011. [2] M. Asztalos, I. De Bourdeaudhuij, G. Cardon, The relationship between physical activity and mental health varies across activity intensity levels and dimensions of mental health among women and men, Public Health Nutr. 13 (08) (2010) 1207– 1214. [3] C.V. Bouten, K. Koekkoek, M. Verduin, R. Kodde, J.D. Janssen, A triaxial accelerometer and portable data processing unit for the assessment of daily physical activity, IEEE Trans. Biomed. Eng. 44 (3) (1997) 136–147. [4] B.M. Duvivier, N.C. Schaper, M.A. Bremers, G. van Crombrugge, P.P. Menheere, M. Kars, H.H. Savelberg, Minimal intensity physical activity (standing and walking) of longer duration improves insulin action and plasma lipids more than shorter periods of moderate to vigorous exercise (cycling) in sedentary subjects when energy expenditure is comparable, PloS One 8 (2) (2013) e55542. [5] M. Zhang, A.A. Sawchuk, USC-HAD: a daily activity dataset for ubiquitous activity recognition using wearable sensors, in: Ubicomp, ACM, 2012, pp. 1036–1043. [6] Y. Teh, M. Jordan, M. Beal, D. Blei, Hierarchical Dirichlet processes, J. Amer. Stat. Assoc. 101 (476) (2006) 1566–1581. [7] D. Olguín, A. Pentland, Social sensors for automatic data collection, in: Americas Conference on Information Systems, 2008. [8] R. Larson, M. Csikszentmihalyi, The experience sampling method, in: New Directions for Methodology of Social & Behavioral Science, 1983, pp. 41–56. [9] L. Bao, S.S. Intille, Activity Recognition from User-Annotated Acceleration Data, Lecture Notes in Computer Science, 2004, pp. 1–17. [10] D. Olguín, A. Pentland, Human Activity Recognition: Accuracy Across common Locations for Wearable Sensors, Citeseer, 2006. [11] J. Parkka, M. Ermes, P. Korpipaa, J. Mantyjarvi, J. Peltola, I. Korhonen, Activity classification using realistic data from wearable sensors, IEEE Trans. Inf. Technol. Biomed. 10 (1) (2006) 119–128. [12] T. Huynh, M. Fritz, B. Schiele, Discovery of activity patterns using topic models, in: Ubicomp, ACM, 2008, pp. 10–19. [13] L. Atallah, B. Lo, R. King, G.-Z. Yang, Sensor positioning for activity recognition using wearable accelerometers, IEEE Trans. Biomed. Circuits Syst. 5 (4) (2011) 320– 329. [14] N. Ravi, N. Dandekar, P. Mysore, M. Littman, Activity recognition from accelerometer data, in: Proceedings of AAAI, vol. 5, 2005, pp. 1541–1546. [15] D.M. Karantonis, M.R. Narayanan, M. Mathie, N.H. Lovell, B.G. Celler, Implementation of a real-time human movement classifier using a triaxial accelerometer for ambulatory monitoring, IEEE Trans. Inf. Technol. Biomed. 10 (1) (2006) 156–167. [16] I. Cleland, B. Kikhia, C. Nugent, A. Boytsov, J. Hallberg, K. Synnes, S. McClean, D. Finlay, Optimal placement of accelerometers for the detection of everyday activities, Sensors 13 (7) (2013) 9183–9200. [17] Y. Hattori, S. Inoue, G. Hirakawa, A large scale gathering system for activity data with mobile sensors, in: ISWC, 2011, pp. 97–100. [18] F.-T. Sun, Y.-T. Yeh, H.-T. Cheng, C. Kuo, M. Griss, Nonparametric discovery of human routines from sensor data, in: PerCom, 2014, pp. 11–19. [19] D.O. Olguín, B.N. Waber, T. Kim, A. Mohan, K. Ara, A. Pentland, Sensible organizations: technology and methodology for automatically measuring organizational behavior, IEEE Trans. Syst. Man Cybern. Part B: Cybern. 39 (1) (2009) 43–55. [20] T. Ferguson, A Bayesian analysis of some nonparametric problems, Ann. Stat. 1 (2) (1973) 209–230. [21] J. Pitman, Combinatorial Stochastic Processes, Lecture Notes in Mathematics, vol. 1875, Springer-Verlag, Berlin, 2006. [22] J. Sethuraman, A constructive definition of Dirichlet priors, Stat. Sin. 4 (2) (1994) 639–650. [23] R. Neal, Markov chain sampling methods for Dirichlet process mixture models, J. Comput. Graph. Stat. (2000) 249–265. [24] B. Frey, D. Dueck, Clustering by passing messages between data points, Science 315 (2007) 972–976. [25] G. Tsoumakas, I. Katakis, I. Vlahavas, Mining multi-label data, in: Data Mining and Knowledge Discovery Handbook, Springer, 2010, pp. 667–685. [26] G. Tsoumakas, I. Katakis, Multi-label classification: an overview, Int. J. Data Warehousing Mining 3 (3) (2007) 1–13. [27] C. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval, vol. 1, Cambridge University Press, Cambridge, 2008. [28] T. Nguyen, S. Gupta, S. Venkatesh, D. Phung, A Bayesian Nonparametric Framework for Activity Recognition Using Accelerometer Data, in: ICPR, 2014, pp. 2017– 2022.