Pervasive and Mobile Computing (
)
–
Contents lists available at ScienceDirect
Pervasive and Mobile Computing journal homepage: www.elsevier.com/locate/pmc
User-optimized activity recognition for exergaming Bobak J. Mortazavi ∗ , Mohammad Pourhomayoun, Sunghoon Ivan Lee, Suneil Nyamathi, Brandon Wu, Majid Sarrafzadeh Computer Science Department, UCLA, Los Angeles, CA 90095, United States
article
info
Article history: Available online xxxx Keywords: Activity recognition Exergames Wireless health Machine learning User-optimization
abstract This paper presents SoccAR, a wearable exergame with fine-grain activity recognition; the exergame involves high-intensity movements as the basis for control. A multiple model approach was developed for a generalized, large, multiclass recognition algorithm, with an F Score of a leave-one-subject-out cross-validation greater than 0.9 using various features, models, and kernels to the underlying support vector machine (SVM). The exergaming environment provided an opportunity for user-specific optimization, where the expected movement can assist in better identifying a particular user’s movements when incorrectly predicted; a single model SVM with a radial basis function kernel improved 12.5% with this user optimization. © 2015 Elsevier B.V. All rights reserved.
1. Introduction Exergaming, the use of exercise to control video games, or the incorporation of health information into video games, has been a growing field in combating a growing obesity epidemic. These games may address personal health [1], targeted therapies for rehabilitation [2], or exercise-levels of sports activities [3] captured by camera-based systems, such as the Microsoft Kinect [4]. One particular realm of exergames involves using body-worn sensors (e.g., accelerometers, gyroscopes, etc.) to allow users to control video games with body movements [5]. Such movements have been shown to generate moderate levels of physical activity [6,7]. While some systems use the mobile phone as the controller [8,9], more enhanced mobile games need to be developed to allow for a wider range of movements. Systems need to develop for enhanced recognition algorithms to identify those movements [3], monitor various levels of intensity need to be monitored [10], and adapt to multiple users while preventing cheating the exercise [11]. Indeed, exergaming can be a tool to combat sedentary behavior. Sedentary behavior, affects both adults [12] and children [13], leading to chronic conditions [14], that cause a financial burden on both individuals [15] and health care systems [16–18]. This work will investigate developing such an exergame’s recognition algorithm. Many activity recognition problems concern themselves with setting up a model and then showing this model is robust and general, either in a testing environment or in cross-validation [19]. However, it is possible to leverage specific information and context to improve classification results [20]. Indeed, often multiple models or hierarchical classifiers improve classification results, using structural information [19], conditional information [20], or expert knowledge to improve classification [21]. Once such a system is proven to be generally accurate, however, user-specific optimization can then become appropriate. Work in [22] introduces an incremental support vector machine, capable of retraining and optimizing as new data becomes available. [23] uses this method to update human recognition in video applications to
∗
Corresponding author. E-mail addresses:
[email protected] (B.J. Mortazavi),
[email protected] (M. Pourhomayoun),
[email protected] (S.I. Lee),
[email protected] (S. Nyamathi),
[email protected] (B. Wu),
[email protected] (M. Sarrafzadeh). http://dx.doi.org/10.1016/j.pmcj.2015.11.001 1574-1192/© 2015 Elsevier B.V. All rights reserved.
2
B.J. Mortazavi et al. / Pervasive and Mobile Computing (
)
–
improve accuracy. Further, [19,24] use contextual information and a method called active learning to update the models based upon contextual information. When to retrain, however, is an application-dependent problem. This paper presents SoccAR, an Augmented Reality Soccer obstacle course exergame based upon the Temple Run platform to incorporate a dynamic mobility game with detailed sports-type, fine-grain actions, first presented in [25]. That work focused on the design of an exergame, a multiple model approach to classifying fine-grain activity in a large, multiclass recognition problem, and introduced a new mechanism for evaluating model effectiveness. This work will extend [25] by considering the multiple model approach designed and evaluating methods for optimizing the model performance, and extending them for user-optimization. By using the game engine as ground-truth generation for each movement, the system can retrain models for a specific user, helping improve the accuracy through increased usage. This work presents the retraining mechanism in both an online and offline setting and uses the prior work to introduce user-specific fine-grain activity recognition methods. 2. Related works 2.1. Sports exergames Work in [2] presents a platform for developing an exergame to assist in stroke rehabilitation using accelerometers, representing a class of exergames with a specific goal and set of actions in mind. Work by [1] develops an entertainmentbased exergame with multiplayer aspects to encourage competition. Similarly, work by [4] demonstrates a motion-based game for exercise and entertainment of adults, but uses a Microsoft Kinect that, while allowing freedom in a particular space, does not allow for a game that can be played anywhere and across any distance. Finally, a set of mobile games, such as those in [8,9], allows for gaming in any environment by using a mobile computing device (such as a smartphone) as the controller. So while these games allow for freedom in environment, they do not address a wide range of possible motions and exercises. 2.2. Fine-grain activity exergaming Work in [3] describes the steps necessary to building an exergame that results in healthy and enjoyable gameplay with accurate motion classification. A tablet-based game is built from a list of soccer movements, which the paper defines as fine-grain, collected to develop an energetic and fast-paced game. A game-specific movement classification algorithm is created based upon a principal component analysis, hereafter known as Fine-Grain PCA. This algorithm classifies movements shown in the [7] to produce exercise-levels of intensity by guaranteeing a certain level of metabolic output. The classification algorithm presented in this work will result in a more accurate gameplay experience than that of Fine-Grain PCA, as will be shown in Section 6. 2.3. Multiple models works Work in [26] builds hierarchical hidden Markov models in order to better identify activities of daily living by using multiple models. While improving accuracy of activities of daily living, particularly with similar movements, this structure is not set up for real-time processing. Work in [27] takes a multiple model approach to classifying patients with heart failure. The technique presented allows for a clustering of patients based upon contextual information (e.g. medical conditions and socioeconomic status). This work will similarly adapt this approach. 2.4. Incremental learning Work in [28] presents a feature-reduction approach to finding an optimal point between algorithmic accuracy and game latency. While this heuristic approach can, potentially, find a user-optimal point, the base algorithmic accuracy cannot improve greatly as the model is never changed. Work in [29] uses contextual information for co-training and retraining. All of these methods use context to create a subset of labeled data in order to reduce training time. This work builds upon these ideas, not necessarily concerned with reducing the training time, but using active selections of training data to improve the accuracy of a given system by analyzing the data in an on-line fashion and retraining any model when determined necessary. 3. Soccer exergame This section describes the development of the SoccAR platform, initially presented in [25], which was a Temple Runbased obstacle course with soccer movements. A sensor platform inputted actions from the user in any environment and a computing device and display drove the game engine for the user to interact with, as shown in Fig. 1. 3.1. Head-worn display Ultimately, what could make a game successful as a mobile game would be the ability to display to the user the gameplay environment while leaving the hands and legs free. To enable this, head-worn displays are needed. However,
B.J. Mortazavi et al. / Pervasive and Mobile Computing (
(a) Shimmer3 sensors.
)
–
3
(b) Head-mounted display.
Fig. 1. Shimmer sensor (left) and Espon Moverio glasses (right). Table 1 Fine-grain movements for SoccAR. Soccer
Running
Misc.
Shoot Square pass Through pass Chip Flick pass Placed shot Cut left Cut right Spin
Shoot (while running) Run Jump (in place) Jump (while running) Jab (while running) Sword Slash (while running) Cut left (while running) Cut right (while running) Spin (while running)
Open Door Unsheathe Sword Draw Bow Shoot Bow Jab Sword Slash Slide Walk
these displays must also let the user see the actual environment, so that they do not run into actual physical objects. SoccAR was implemented on a laptop and displayed on the Epson Moverio BT-100. The user was presented with the game in the small displays in the middle of the lenses. 3.2. Data collection Data was collected from 16 individual volunteers. They were asked to wear four Shimmer [30] wireless inertial measurement units (IMU) platforms, two attached to each wrist, two on top of each foot. Each sensor streamed data at deg 50 Hz and presented three axis accelerometer (±6g) and gyroscope (±250 s ) data as well as estimated of the orientation of the sensor, presented as four quaternions, derived from the accelerometer and gyroscope readings. The data was streamed, via bluetooth, to a computing device. With this sensor placement a truly mobile range of motion data was collected for 26 different fine-grain movements as well as some various other random motion to help build a ‘‘no movement class’’. For each move, ten repetitions were recorded. Since a Temple Run game was being adapted for the game in this paper, the motions included soccer motions as well as newer activities to make an enjoyable mobile exergame. Table 1 shows the list of moves both sports related and general running motions. Fig. 2 shows a user playing the SoccAR game next to a screenshot of this game, whose game design was further described in [25]. 4. Multiple model approach This section covers the multiple model approach originally introduced in [25], as the basis for improvements introduced in this work in Section 5. 4.1. Training Data were collected with an annotation process that marked the beginning and end of each movement. As the data was recorded, the individual in charge of running the data collection marked the beginning and end of each movement. The midpoint of each movement was determined from this start and end information, and each movement was annotated with its midpoint. If the data was represented as a time series D(t ), where time t starts at 0 and goes until the length of w , windows were built around these as such:
move(midpoint , w) = D t −
w
w ,D t +
(1) 2 2 where w is the desired window size and move(midpoint , w) is the windowed move around the desired midpoint. Before extracting these movement windows from the training set, the data was filtered with a low-pass filter via a moving average.
4
B.J. Mortazavi et al. / Pervasive and Mobile Computing (
(a) User playing SoccAR.
)
–
(b) Game screenshot.
Fig. 2. User playing SoccAR (left), kicking a ball. Note the shimmer sensors on each wrist and each foot. Screenshot of the game (right). Table 2 Features extracted for fine-grain activity classification and their descriptions. Feature
Description
Minimum Maximum Sum Mean Std. Dev Skewness Kurtosis Energy
Minimum value attained over window Maximum value attained over window Sum of all values attained over window Average of all values attained over window Standard deviation of all values attained over window Measure of asymmetry in signal attained over window Measure of peakedness in a signal attained over window Measure of frequency domain in a signal attained over window
This moving average was found, heuristically, to be best at one-fourth the average window size. The average window size was two seconds, or 100 points. Finally, four sensors were used at the same time and each move window was the following information from each sensor: SensorDataloc (t ) = ⟨Gx (t ), Gy (t ), Gz (t ), Ax (t ), Ay (t ), Az (t )⟩
(2)
where loc is the sensor identification on the body. Four Quat signals were also added to the SensorDataloc , where the four quaternions were derived from the accelerometer and gyroscope readings, as designed by [30] using a standard attitude and heading reference system (AHRS). The magnitude of the acceleration was also calculated. If time length t data is collected, each component of the vector above is a length t time-series signal. Thus, the full data set at any point was indicated by: MoveData(t ) = ⟨SensorDataLA (t ), SensorDataRA (t ), SensorDataLL (t ), SensorDataRL (t ),
∥LAa (t )∥, ∥RAa (t )∥, ∥LLa (t )∥, ∥RLa (t )∥⟩
(3)
where LA is the left arm, RA the right arm, LL the left leg, and RL the right leg respectively, and the final four channels are the magnitudes of acceleration of each sensor. Once the data was windowed and filtered, the movements were supplied to a support vector machine (SVM) classifier for feature extraction and training. The SVM used in this work was LibSVM [31], used in Matlab for offline validation with RBF kernel with default values for the parameters and Weka’s [32] SMO implementation for the PUK Kernel to be discussed, also interfaced with Matlab. 4.1.1. Feature extraction For each of the 44 channels above (e.g., x-acceleration, y-acceleration, etc.) a set of features was extracted. The list of features that represent these is shown in Table 2. In particular, the energy of a signal was calculated as in [33,34]: fft (move) = {xi |i = 1, . . . , k}
(4)
k
Energy(move) =
|xi |2
i=1
k
(5)
B.J. Mortazavi et al. / Pervasive and Mobile Computing (
)
–
5
Fig. 3. Forward selection wrapper algorithm flowchart for feature selection, where computing Θ + ω with the highest F Score is done in a cross-validation scheme on the training set.
where the fft of a move window results in k fft components, and the energy then follows from the components. Calculating and using all of these features, however, would likely result in over-fitting the data sets in training. In particular, [35] suggested that, for a linear SVM, the number of data to feature ratio should be 10 : 1. In this SoccAR work, the data set consists of 16 individuals and 26 movements, resulting in 416 samples (in this case the 10 copies of each move per user were not considered). As a result, if following the suggestion, no more than 42 features should be considered (and 50 were considered in this work for a multiclass classifier with a non-linear kernel for an upper bound that accounts for the different kernel types). Once extracted, the features selected were then normalized for use in an SVM. This normalization occurred in the training set, and the parameters of the normalization were applied to the testing set, so as to avoid potential pollution of the test set. The training and testing sets were derived in a leave-one-subject-out cross-validation (LOSOCV), detailed further below. 4.1.2. Feature selection In order to select features, the validation framework must first be considered. Selected features must be taken from training sets only, and then applied to testing sets. A leave-one-subject-out cross-validation (LOSOCV) was used to evaluate the model performance. When all of the features from Table 2 were calculated over each of the channels of the data set, 352 features resulted for each sample window, as elements of a feature set denoted Ω . For each fold of the cross-validation, this training set ran another LOSOCV on itself to determine the effectiveness of each feature ω to be added to the set Θ (in other words nesting LOSOCV valuations, feature ranking each of the subsets), once again to avoid pollution in developing a ranking. The flowchart for this wrapper method is shown in Fig. 3. Those features most often selected and the position in which they were selected are averaged to come up with the defined feature set for the entire system. At each point where a training set was applied to an SVM, a normalization of features occurs. While a backward elimination approach would have been the best to find the optimal subset of features, the duration and complexity of such an algorithm with 352 features precluded this from being run in this instance. Instead, a forward selection approach was employed as an appropriate approximation. Table 3 shows the top 50 features selected. Note that many of these features are structurally important for different movements, showing different limbs moving or not (for example if the right leg is stationary, such as in drawing and
6
B.J. Mortazavi et al. / Pervasive and Mobile Computing (
)
–
Table 3 Top 50 selected features for fine-grain classifier by forward selection and wrapper method. Features 1–10
11–20
21–30
31–40
41–50
Avg(RA, ∥A∥) Std(LL, Gy ) Max(RA, Ay ) Max(RA, ∥A∥) Energy(LA, Ax ) Energy(RA, Ax ) Energy(LL, Gy ) Energy(RA, ∥A∥) Max(LA, Ay ) Kurt(RA, ∥A∥)
Energy(RL, Ay ) Avg(LL, Ay ) Energy(RL, Gy ) Min(LA, Ax ) Std(RA, Gz ) Std(LA, Gy ) Energy(LA, Gz ) Std(LA, ∥A∥) Std(RA, Ay ) Min(RA, ∥A∥)
Energy(LA, Az ) Energy(RA, Gz ) Min(LL, ∥A∥) Energy(LL, Gx ) Skew(RL, ∥A∥) Std(RA, ∥A∥) Std(RA, Ax ) Energy(RL, ∥A∥) Std(RL, ∥A∥) Energy(LL, Gz )
Std(LA, Ay ) Std(LA, Ax ) Energy(LL, Ax ) Energy(LL, Ay ) Skew(RA, ∥A∥) Std(LL, Ax ) Sum(RA, ∥A∥) Std(LL, Az ) Sum(LL, ∥A∥) Max(RA, Gz )
Sum(LL, Ay ) Std(LA, Gz ) Min(LL, Ay ) Std(LL, Gz ) Std(LL, Ay ) Std(RL, Ay ) Std(RL, Gy ) Min(RA, Gy ) Std(RA, Gy ) Std(LL, Gx )
Table 4 Fine-grain movements for SoccAR. Cluster 1—Soccer
Cluster 2—Misc.
Cluster 3—Running
Shoot Square pass Through pass Chip Flick pass Placed shot Cut left Cut right Jump Spin Slide
Open Door Draw arrow Shoot arrow Jab Unsheathe Sword Sword Slash Walk
Run Cut left (while running) Cut right (while running) Jab (while running) Jump (while running) Shoot (while running) Spin (while running) Sword Slash (while running)
shooting an arrow, versus a pass where the right leg is moving). With the spread of features across all limbs, the necessity for wearing all the sensors available and combining the general windowed results (e.g., mean) with structure specific results (e.g., Kurtosis/Skewness) became apparent.
4.2. Classification Once the appropriate features were selected and a model created, live testing of data was run for gameplay mechanisms. Each incoming point from the four sensors was combined together to one vector. This point was filtered with a moving average filter of 25 points, then put into a sliding window of size 100 points as determined in [25]. Once the features were extracted from this movement buffer, it was supplied to the classifier in order to determine the movement. In the multiple model approach this could be done as one set of feature extraction for each model used, in series, or all possible features at once in parallel, depending upon the computational power of the gaming device and delay allowed.
4.3. Model generation Generally, by increasing the number of classes, algorithm performances tend to degrade [3,36]. The method presented here used information on similar movements to create clusters of smaller classification problems. In the case of this work, expert knowledge regarding the similarity of movements was used to manually create the clusters. However, clustering algorithms such as K-Means or Expectation Maximization run on the features extracted could have been used to generate similar movements. Creating an appropriate distance metric for similarity of movements, however, is left for future work and discussion in Section 7. Fig. 4 shows the general flow chart of applying expert knowledge to the clustering algorithm as desired, where each level L as desired can have as many N clusters as needed to improve the accuracy. For this work L = 2 and N = 3. Table 4 shows the list of moves in their appropriate clusters. As SoccAR contains some soccer moves and some other moves adapted for obstacles in its Temple Run-style game, contextual knowledge of the movements yielded these fairly distinct clusters of movements. Cluster 1 consisted of most moves for soccer type actions. Cluster 3 consisted of movements that occur while running. Cluster 2 contained the remainder of movements in a miscellaneous grouping. These also happened to correspond to different expected readings from sensors. For example, most of the soccer movements were primarily foot movements, most of the miscellaneous mostly arm movements, while the other running cluster was a cluster of motions that consisted of movements with a fairly consistent linear velocity. Thus, four models were generated. One model differentiated between the clusters, and the remaining three models classified movements in each cluster.
B.J. Mortazavi et al. / Pervasive and Mobile Computing (
)
–
7
Fig. 4. Training with expert knowledge and clustering.
4.4. Model importance An important consideration when creating these multiple models, and analyzing the accuracy of a classification in a gaming environment is the ability to handle a certain level of error as long as the response is in a real-time fashion (e.g. as long as it passes one might not care that the exact correct pass was used). [25] defined a new metric for measuring the accuracy of such classifiers as well. For any multiclass classification, the F Score will be used: F =2×
P ×R P +R
(6)
where P is the precision and R the recall. Most multiple model systems [24,26,37] still present classification results as an overall metric. For this work, the classification accuracy of each model was analyzed and a metric created to calculate the weighted accuracy of multiple models, since the performance and speed of the recognition algorithm are sometimes more important than the accuracy [28]. Take, for example, a situation in which a particular cluster contained two different passes to the left. Perhaps, in the context of a game, it was not as important to differentiate between the two passes as it was to identify that either pass had occurred. In that setting, the accuracy of the system to identify that cluster (the higher level model) was more important than the accuracy of the lower model. As such, different accuracy metrics were calculated. For a given level and model, the accuracy, in terms of F Score, for the level was calculated as: n
Fl =
αi ∗ ∥Ci ∥ ∗ Fli
i =1 n
(7)
αi ∗ ∥Ci ∥
i=1
where Fl is the F Score for a certain level of multiple models (1 being the top, 2 the next and so forth), Fli is the F Score for the particular cluster/model in this level (e.g., F Score for the soccer cluster in Table 4), αi is the weighted importance of this cluster in range [0, 1], ∥Ci ∥ is the size of the cluster. If, for example, all the αi were 1 then the F Score for the level would be the weighted average of the F Score for each model in this level of the classifier. If the Fl values for each level of a
8
B.J. Mortazavi et al. / Pervasive and Mobile Computing (
)
–
multiple-model scenario were considered, then the overall F Score would be calculated as: L
Fmm =
βi
i =1 L i =1
(8) βi Fi
where the Fmm represents the F Score for the multiple model scheme, βi is the importance of each model level in range [0, 1], and Fi is the F Score for the given level of the model. This work considered this top level F Score more representative of the multiple model scheme. However, where all weights are 1 this scheme represents the same accuracy as an overall system, of accurately determining the cluster, then accurately classifying in that cluster. For the purposes of comparison to other methods that weighting is presented in Section 6. Similarly, feature ranking can be modified to show important features per cluster, and with this important factor in mind, but such an overall ranking scheme is left for future work and discussion in Section 7. 4.5. PUK kernel and weka SMO For modeling this problem, an SVM with an RBF Kernel was implemented using LibSVM [31]. However, a less commonly used kernel was compared. The Weka SMO implementation of SVM was used [32], called through Matlab, using a Pearson Universal Kernel (PUK) with complexity parameter set to 100. The PUK kernel is a universal kernel based on the Pearson VII function. This kernel was applied here for activity monitoring, as it has been shown to be a robust, universal kernel for support vector regression [38], discriminating protein types [39] and on gesture recognition [40]. The kernel is based on the Pearson VII function for curve fitting: H
f (x) =
1+
1
2(x−x0 ) 2 ω −1
2 ω
(9)
σ
where H is the peak high at the center x0 , x is the independent variable being fit, σ is the half-width of the curve, and ω is the tailing factor for the peak [38]. This curve fitting is particularly powerful because, by varying ω it can fit different distributions, such as Gaussian and Lorentzian [38]. From this, the kernel function for an SVM can be defined as a kernel of two vectors by: K ( x, y ) =
1
1+
√
1
2∗ ∥x−y∥ 2 ω −1
2 ω
(10)
σ
where a variable and its midpoint of the curve are replaced by the Euclidean distance of two vectors, and the height is simply set to 1. Work in [38] guarantees that this function matches the requirements for a kernel (real valued symmetrical matrix and satisfies Mercer’s conditions). 5. User-centered activity recognition Once the multiple model algorithm was designed, and validated for general accuracy, this work addressed optimizing the algorithm to perform more accurately for each individual user of the system. The first approach, as previously discussed, was to dynamically adjust the number of features used while maintaining a certain expected level of accuracy, as in [28]. However, the exergaming environment provided a unique structure in which false positives and false negatives were appropriately identified in an online fashion, since the game engine itself provided a ground truth label based upon knowledge of the expected movement. As a result, the system was calibrated for each particular user in order to enhance the predictive capabilities of the underlying recognition algorithm. 5.1. Retraining in gaming Since each obstacle in the exergame prompted for a specific movement, the system knew what the expected movement was from the user. As a result, the recognition algorithm could differentiate the missed movements between mistakes the user makes and movements the algorithm mispredicts. The algorithm designed assumed the user was not trying to cheat the exercise movements, an assumption that will be further discussed in Section 7. The retraining algorithm for SoccAR is presented in Fig. 5. As each movement was generated by the user, the activity recognition algorithm detected and predicted the movement performed. Now, in general settings, the system would have no way of knowing if it generated the correct prediction.
B.J. Mortazavi et al. / Pervasive and Mobile Computing (
)
–
9
Fig. 5. Classification and retrain flow chart for user-specific gaming.
Fig. 6. Classification and retrain flow chart for user-specific cross-validation.
However, in the exergaming environment, the game engine provides an expected movement parameter. This was considered a ground truth label for further supervised training. These misses were either caused by the user performing the wrong action or the algorithm mispredicting a correct movement. Each missed move was stored, with the label that was generated by the algorithm as well as the expected label of the movement prompted by the game. Before any adjustment was to be made to the algorithm, however, the cause for the miss needed to be better differentiated through intelligent storage of mispredictions. This storage could be as simple as marking the movement with the ground truth, and retraining the algorithm when a certain threshold of misses was hit (a heuristic dependent upon the game designer, the game length, computational and memory capacity, etc.). Such a method, however, might inadvertently retrain the algorithm with several mistakes the user made, none of which are related to each other. In this work, several steps were taken to mitigate these factors. First, and foremost, the intensity of the movement was required to be above a certain level for moderate levels of exercise. Each misprediction was thrown out if the intensity of the movement was not great enough. For movements that were not filtered by this approach, a nearest neighbor approach was considered, wherein, once the threshold of quantities of movements is generated, the similarity of these misses was considered, through euclidean distance. More complex similarity measures are left for further discussion in Section 7. When these movements were recorded and found to be mispredictions, they were fed back into the training algorithm along with the original training set to provide a user-specific training set. In this case, the accuracy was potentially polluted toward the user and that was precisely the aim of the system, to increase the responsiveness to the individual as she/he played the game. As a result, the next time a given movement was prompted, it would be more likely that the algorithm would properly classify the user movement. 5.2. Retraining offline Validating the effectiveness of such a re-training strategy should also happen in an offline setting, to report accuracy. In particular, modifying the leave-one-subject-out cross-validation approach demonstrated the improvements possible through user-specific re-training. Consider a standard leave-one-subject-out cross-validation, in which each loop of the validation splits the data set between a test person and training people. The data of the training set, θ , along with the ground truth labels, were provided to a supervised machine learning algorithm. The test set, γ , was then validated with this model. Consider Fig. 6, which outlines the offline re-training cross-validation scenario. For each iteration of the loop the initial training set θ was created as normal. Since 10 repetitions of each movement were collected for each person, the test person was split into two sets, each with half the data. The first set, what was considered the standard test set γ was created and the
10
B.J. Mortazavi et al. / Pervasive and Mobile Computing (
)
–
Table 5 Top 50 selected features for each model type. Model type
Top 50 features
Single model
f5 , f6 , f14 , f15 , f21 , f22 , f23 , f24 , f25 , f26 , f35 , f44 , f53 , f55 , f56 , f57 , f61 , f62 , f64 , f65 , f72 , f74 , f75 , f76 , f78 , f79 , f81 , f84 , f85 , f86 , f93 , f95 , f96 , f101 , f102 , f103 , f104 , f105 , f106 , f107 , f112 , f115 , f124 , f125 , f126 , f130 , f131 , f133 , f135 , f136 f12 , f14 , f45 , f51 , f63 , f74 , f78 , f93 , f95 , f122 , f124 , f125 , f138 , f146 , f157 , f185 , f188 , f195 , f202 , f211 , f212 , f232 , f235 , f241 , f243 , f244 , f246 , f251 , f254 , f256 , f258 , f282 , f292 , f295 , f313 , f315 , f329 , f330 , f331 , f332 , f336 , f340 , f348 , f350 , f335 , f351 , f312 , f205 , f236 , f206 f15 , f16 , f21 , f22 , f31 , f44 , f46 , f49 , f53 , f55 , f56 , f64 , f65 , f72 , f76 , f79 , f85 , f86 , f87 , f89 , f95 , f101 , f102 , f104 , f105 , f112 , f114 , f115 , f124 , f126 , f130 , f131 , f132 , f133 , f136 , f142 , f154 , f155 , f156 , f180 , f181 , f182 , f203 , f204 , f223 , f231 , f232 , f233 , f256 , f261 f2 , f4 , f5 , f6 , f11 , f12 , f15 , f21 , f22 , f23 , f24 , f25 , f29 , f35 , f42 , f43 , f45 , f48 , f57 , f60 , f61 , f62 , f65 , f66 , f71 , f75 , f81 , f85 , f86 , f93 , f95 , f101 , f102 , f103 , f104 , f105 , f106 , f107 , f124 , f126 , f127 , f130 , f133 , f134 , f139 , f141 , f142 , f143 , f145 , f146 f44 , f49 , f63 , f91 , f96 , f106 , f131 , f134 , f135 , f143 , f145 , f151 , f152 , f181 , f183 , f202 , f212 , f232 , f301 , f306 , f318 , f324 , f342 , f338 , f326 , f85 , f22 , f32 , f193 , f191 , f112 , f102 , f125 , f350 , f263 , f273 , f156 , f126 , f55 , f124 , f325 , f346 , f65 , f105 , f115 , f123 , f94 , f5 , f345 , f341
Top-level model Soccer model Misc. model Run model
Table 6 F Scores for the single multiclass classifier, each model of the multiple model method, and the overall multiple model method for RBF Kernel in SVM. Model
Features 1
5
10
15
20
25
30
40
50
Single model Top model Soccer model Misc. model Run model Multiple model
0.129 0.877 0.239 0.604 0.305 0.490
0.557 0.980 0.561 0.976 0.587 0.790
0.671 0.983 0.639 0.989 0.648 0.831
0.697 0.984 0.681 0.993 0.672 0.850
0.714 0.987 0.690 0.994 0.686 0.857
0.720 0.987 0.686 0.994 0.698 0.858
0.722 0.987 0.682 0.994 0.7 0.857
0.719 0.986 0.688 0.99 0.685 0.855
0.707 0.979 0.673 0.986 0.675 0.845
second set ρ helped build the incremental set. Two training sets were created, the initial training set θ and the incremental training set Θ where ρ ∪ θ = Θ . The testing set γ was then evaluated on each training set, and any difference in predictions showed the increase or decrease in accuracy by training toward a specific user. 6. Results For proper evaluation of the algorithm’s results, the Fine-Grain PCA [3] related work was compared. Further, both single and multiple model solutions were presented for both SVM with RBF Kernel and PUK Kernel in order to show the development of robust, fine-grain classifiers for large, multiclass problems, but also to compare the two kernel types against each other. 6.1. Features extracted The top 50 features for each cluster and model type are listed in Table 5. Notice while many features are repeated each model has its own set of selected features when compared to the single model feature ranking presented earlier in Table 3. Feature f1 is the minimum value of the X axis of the Gyro of the Left Arm. f2 is the minimum value of the Y axis of the Gyro. The minimum features for the left arm are the first 10 features (3-axis gyro, 3 axis accelerometer, 4 channel quaternions). Then the next 10 features are the maximum, and so on. After all the feature types are extracted on the left arm, the right arm begins. Features 1–80 are the left arm, 81–160 the right arm, 161–240 the left leg, 240–320 the right leg, 320–328 the magnitude of the left arm, 329–336 the magnitude of the right arm, 335–344 the magnitude of the left leg, and 344–352 the magnitude of the right leg. 6.2. Cross validation A leave-one-subject-out cross-validation was run in order to test the robustness of the algorithms. The results of the system in comparison between RBF (LibSVM in Weka [31]) and PUK kernels were also compared. Table 6 shows the results of running the multiple model scheme in the RBF Kernel. For this application, the importance of each model was considered paramount. As a result, all of the α and β parameters were set to 1. The RBF model differentiated between the three types of movements with high accuracy. While certain motion classification in each of the clusters might be lower, this method allowed for high accuracy of many of the miscellaneous movements, as well as an overall performance gain over the singlemodel classifier. Table 7 shows the results of the same experiment with the PUK kernel. The results showed several important factors. The first was that the single model classification scheme was significantly improved. Further, the soccer model and the run model both see significant improvements in classification. The overall multiple model classification scheme not only reaches a high level of accuracy, with an F Score reaching as high as 0.97, but outperformed the best versions of previous models with only 10 features. Fig. 7 shows the mean F Scores across all users in the cross-validation for each examination type in response to the number of features used. The RBF kernel was more accurate than the PUK kernel for a small number
B.J. Mortazavi et al. / Pervasive and Mobile Computing (
)
–
11
Fig. 7. Mean F Scores in cross-validation per feature of the single and multiple model SVM RBF and PUK kernels. The x-axis indicates the number of features used in the model while the y-axis indicates the mean F Score achieved. Table 7 F Scores for the single multiclass classifier, each model of the multiple model method, and the overall multiple model method for PUK Kernel in SVM. Model
Features 1
5
10
15
20
25
30
40
50
Single model Top model Soccer model Misc. model Run model Multiple model
0.144 0.595 0.186 0.447 0.176 0.343
0.485 0.843 0.441 0.858 0.506 0.668
0.694 0.963 0.694 0.941 0.618 0.828
0.748 0.981 0.749 0.963 0.685 0.868
0.755 0.994 0.902 0.972 0.796 0.937
0.782 0.996 0.93 0.972 0.829 0.951
0.847 0.995 0.945 0.985 0.872 0.963
0.902 0.996 0.956 0.996 0.874 0.968
0.913 0.995 0.962 0.993 0.878 0.969
Table 8 Highest achieved F Scores for the algorithm from [3] and the SVM algorithms. Algorithm
F Score
Fine-grain PCA SVM (RBF)—single SVM (RBF)—multiple SVM (PUK)—single SVM (PUK)—multiple
0.700 0.723 0.858 0.913 0.970
of features, but did not improve greatly with more features. Finally, these methods, at their best accuracy were compared with the Fine-Grain PCA algorithm applied to this data set. As expected, the results of the Fine-Grain PCA degrade from the almost 0.80 F Score reported in [3] to only 0.70 as shown in Table 8, since the number of similar movements increased. 6.3. Retraining The offline re-training algorithm, described in Section 5 was also applied to the fine-grain PCA method, the multiple model approach with the RBF and PUK kernels, and also demonstrated with a general activity of daily living system, shown in [7] to be insufficient for exergaming, to see if user-specific re-training would improve the predictive power. That scheme will be referred to as RDML [33]. Fig. 8 plots the results of the validation when tested on the test subset, and shows the results of the re-training of the model and re-testing, using the LibSVM RBF kernel. This kernel was chosen to show the effectiveness of the re-training with a widely used and generally strong kernel for activity recognition. Notice the significant and expected boost in results as a result of re-training the model with the user-specific data. The expected biasing/polluting of the training set performed as desired increasing the reliability of the algorithm. Table 9 shows the results of the F Score and its incremental improvement using an optimal selection of 25 features, which were at or near the peak performance for each SVM model and computationally limited latency of the system in the realtime setting. The RDML method was applied along with the fine-grain PCA method. Each method designed for exergaming saw an improvement by user re-training, while RDML did not, further validating the need for exergaming-specific activityrecognition systems. Moreover, for each cluster in the multiple model approach, the SVM with PUK kernel was applied in the re-training scenario. In particular, the percentage improvement varied by cluster model, due primarily to the initially high F Score of each of those models in cross-validation. The percentage improvements would imply that there is, perhaps, an upper bound for which the models cannot improve beyond, but that those with room to improve do improve. In particular, the run model shows that while improvement is possible, the smaller percentage implies the need for more user-specific
12
B.J. Mortazavi et al. / Pervasive and Mobile Computing (
)
–
Fig. 8. Results of cross-validation on the test set and the retrained cross-validation using SVM with RBF Kernel. The x-axis indicates the number of features used in the model while the y-axis indicates the mean F Score achieved.
Fig. 9. Results of the LOSOCV for the entire set on an SVM with RBF Kernel, and individual responses to the models by feature set size for users 1, 10, and 4. The x-axis indicates the number of features used in the model while the y-axis indicates the mean F Score achieved. Table 9 Results of cross validation, re-trained cross validation, and percentage improvement. Method
F Score
Re-trained F Score
% Improvement
Single model SVM RBF Single model SVM PUK Top cluster SVM PUK Soccer model SVM PUK Misc. model SVM PUK Run model SVM PUK Fine-grain PCA RDML
0.72 0.76 0.99 0.90 0.97 0.78 0.70 <0.05
0.81 0.89 0.99 0.94 0.97 0.84 0.78 <0.05
12.5% 17.1% 0% 4.4% 0% 7.7% 11.4% 0%
training data. Finally, when considering the worst user in response to the general model, person 4, shown in Fig. 9 using the RBF kernel, the user saw a percentage improvement of 7.7% at the number of features that results in minimum improvement and 38.3% at the largest improvement, with a maximum achieved F Score of 0.82 in the re-train scenario. 7. Future work The multiple model approach developed shows a strong classification for fine-grain movements and the ability to improve the speed with which they are classified by reducing the number of features necessary for classification. However, in a multiple model scheme, what exactly is the overall number of features used? Is it the average number of features used per model? The maximum? The intersection? The union of all features used? What if features are extracted in parallel while the first level model is being run? Depending upon the implementation of the real-time aspects of the work, this metric will require different representation and, as such, is left for future investigation. Further, different models can be developed for different clusters (e.g. SVMs might be stronger for cluster a whereas Hidden Markov Models might be stronger for cluster b). How the clusters are selected, and models trained (perhaps per user) are all important questions. The optimization of such
B.J. Mortazavi et al. / Pervasive and Mobile Computing (
)
–
13
algorithms, particularly their hyperparameters, should be better studied, either through cross-validation or a grid search method. The user experience in such a game play should be tied with its long term ability to combat obesity, act accurately and with minimal perceived latency during continuous activity, and enjoyment has been left as future work for a prolonged clinical study that can validate effectiveness over a significant period of use. User’s might enjoy something the first time they play it but find it cumbersome after months of usage. For example, a larger number of features might take up to a second to extract, but the game design might be able to hide this latency by prompting for movements earlier and animating the beginning of the expected movement before it is actually recognized. Finally, for the user retraining, many considerations must be taken, in terms of storage on the system for multiple models, saving models over multiple usages for a specific user, and cheating prevention techniques, so that the user is unable to eliminate the need for exercise-based movements by re-training the algorithm to non-standard moves. In particular, the algorithm presented in this paper assumes the user is not trying to cheat the system by repeating movements of minimal intensity for the algorithm to retrain on. Methods for identifying cheating, beyond simply thresholding on the intensity have been investigated [11] and might be applied to work here. This can be extended beyond the gaming environment, as well, by introducing a form of ‘‘warm-up’’ for any sports environment, where the user is taken through a dedicated set of tasks before then generally monitoring his/her activity thereafter. Further, the similarity of each movement can be a result of advanced nearest neighbor techniques based, for example, on locality sensitive hashing or further unsupervised techniques where movements are clustered together to truly identify similar and distinct movements that need to be added to the algorithm. The possibilities listed all provide opportunities for unique research to extend beyond this work. 8. Conclusion This paper presents a multiple model approach to classifying movements in a large, multiclass exergame environment with fine-grain motions. By using expert knowledge to identify similar movements, clusters and submodels per cluster result in an algorithm that more accurately classifies movements with fewer features. First, the creation of features that maintain the structure of movements, in combination with features that generalize activity over an entire window in a time series, results in better classification of these non-repetitive movements. Using only 25 features, the multiple model approach with a PUK kernel in an SVM achieves an F Score of 0.95 versus a single model, traditional approach, using an RBF kernel, which achieves an F Score of 0.72. Multiple models are able to cluster together similar movements for better classification. By removing the movements that would have caused false positives or false negatives in the single model approach, the classifier is able to separate these movements and classify them against each other, finding different features that help results in a more accurate total model. Finally, once the models are selected, a re-training scenario is presented in order to improve the accuracy of the system, tailored to the specific user of that system. The recognition algorithm presented, as well as the important factors, allows for adapting systems and results based upon the accuracy, importance, and real-time responsiveness. The re-training mechanism, developed in both an online and offline setting, improves single model classification by up to 17% and also improves the submodels per cluster of movements, where improvement was still possible. Tracking this improvement, as well as addressing user experience over a long-term trial will address any limitations to the system design and algorithm impediments. This re-training, in combination with the dynamic optimization approach, results in a large improvement in prediction per user. References [1] T. Park, I. Hwang, U. Lee, S.I. Lee, C. Yoo, Y. Lee, H. Jang, S.P. Choe, S. Park, J. Song, Exerlink: enabling pervasive social exergames with heterogeneous exercise devices, in: Proceedings of the 10th International Conference on Mobile Systems, Applications, and Services, ACM, 2012, pp. 15–28. [2] G. Alankus, A. Lazar, M. May, C. Kelleher, Towards customizable games for stroke rehabilitation, in: Proceedings of the 28th International Conference on Human Factors in Computing Systems, ACM, 2010, pp. 2113–2122. [3] B. Mortazavi, S. Nyamathi, S. Lee, T. Wilkerson, H. Ghasemzadeh, M. Sarrafzadeh, Near-realistic mobile exergames with wireless wearable sensors, IEEE J. Biomed. Health Inform. 18 (2) (2014) 449–456. http://dx.doi.org/10.1109/JBHI.2013.2293674. [4] K. Gerling, I. Livingston, L. Nacke, R. Mandryk, Full-body motion-based game interaction for older adults, in: Proceedings of the 2012 ACM Annual Conference on Human Factors in Computing Systems, ACM, 2012, pp. 1873–1882. [5] A. Whitehead, H. Johnston, N. Nixon, J. Welch, Exergame effectiveness: what the numbers can tell us, in: Proceedings of the 5th ACM SIGGRAPH Symposium on Video Games, ACM, 2010, pp. 55–62. [6] O.D. Lara, M.A. Labrador, A survey on human activity recognition using wearable sensors, IEEE Commun. Surv. Tutor. 15 (3) (2013) 1192–1209. [7] B. Mortazavi, N. Alsharufa, S.I. Lee, M. Lan, M. Sarrafzadeh, M. Chronley, C.K. Roberts, Met calculations from on-body accelerometers for exergaming movements, in: 2013 IEEE International Conference on Body Sensor Networks (BSN), IEEE, 2013, pp. 1–6. [8] A.P. Macvean, Developing adaptive exergames for adolescent children, in: Proceedings of the 11th International Conference on Interaction Design and Children, ACM, 2012, pp. 339–342. [9] A. Koivisto, S. Merilampi, K. Kiili, Mobile exergames for preventing diseases related to childhood obesity, in: Proceedings of the 4th International Symposium on Applied Sciences in Biomedical and Communication Technologies, ACM, 2011, p. 29. [10] N. Alshurafa, W. Xu, J. Liu, M.C. Huang, B. Mortazavi, C. Roberts, M. Sarrafzadeh, Designing a robust activity recognition framework for health and exergaming using wearable sensors, IEEE J. Biomed. Health Inform. (2014). [11] N. Alshurafa, J.-A. Eastwood, M. Pourhomayoun, S. Nyamathi, L. Bao, B. Mortazavi, M. Sarrafzadeh, Anti-cheating: Detecting self-inflicted and impersonator cheaters for remote health monitoring systems with wearable sensors, in: BSN, 2014, pp. 92–97. [12] S. Wu, A. Green, Projection of chronic illness prevalence and cost inflation, 2000. [13] F.W. Booth, C.K. Roberts, M.J. Laye, Lack of exercise is a major cause of chronic diseases, Compr. Physiol. (2012). [14] R. Guthold, T. Ono, K.L. Strong, S. Chatterji, A. Morabia, Worldwide variability in physical inactivity: a 51-country survey, Am. J. Prev. Med. 34 (6) (2008) 486–494.
14
B.J. Mortazavi et al. / Pervasive and Mobile Computing (
)
–
[15] N.A. Garrett, M. Brasure, K.H. Schmitz, M.M. Schultz, M.R. Huber, Physical inactivity: direct cost to a health plan, Am. J. Prev. Med. 27 (4) (2004) 304–309. [16] E.A. Finkelstein, O.A. Khavjou, H. Thompson, J.G. Trogdon, L. Pan, B. Sherry, W. Dietz, Obesity and severe obesity forecasts through 2030, Am. J. Prev. Med. 42 (6) (2012) 563–570. [17] P.A. Heidenreich, J.G. Trogdon, O.A. Khavjou, J. Butler, K. Dracup, M.D. Ezekowitz, E.A. Finkelstein, Y. Hong, S.C. Johnston, A. Khera, et al., Forecasting the future of cardiovascular disease in the United States a policy statement from the American heart association, Circulation 123 (8) (2011) 933–944. [18] A.D. Association, et al., Economic costs of diabetes in the US in 2012, Diabetes Care 36 (2013) 1033–1046; Diabetes Care 36 (6) (2013) 1797. [19] J. Hu, N.V. Boulgouris, Fast human activity recognition based on structure and motion, Pattern Recognit. Lett. 32 (14) (2011) 1814–1821. [20] J. Xu, Y. Sun, Z. Wang, W. Kaiser, G. Pottie, Context guided and personalized activity classification system, in: Proceedings of the 2nd Conference on Wireless Health, ACM, 2011, p. 12. [21] L. Chen, C.D. Nugent, H. Wang, A knowledge-driven approach to activity recognition in smart homes, IEEE Trans. Knowl. Data Eng. 24 (6) (2012) 961–974. [22] G. Cauwenberghs, T. Poggio, Incremental and decremental support vector machine learning, Adv. Neural Inf. Process. Syst. (2001) 409–415. [23] A. Lu, D. Thompson, J. Baranowski, R. Buday, T. Baranowski, Story immersion in a health video game for childhood obesity prevention, Games Health: Res. Dev. Clin. Appl. (2012). [24] C. Sminchisescu, A. Kanaujia, D. Metaxas, Conditional models for contextual human motion recognition, Comput. Vis. Image Underst. 104 (2) (2006) 210–220. [25] B. Mortazavi, M. Pourhomayoun, S. Nyamathi, B. Wu, S. Lee, M. Sarrafzadeh, Multiple model recognition for Near-Realistic exergaming, in: 2015 IEEE International Conference on Pervasive Computing and Communications, PerCom, PerCom 2015, St. Louis, USA, 2015. [26] T. Duong, D. Phung, H. Bui, S. Venkatesh, Efficient duration and hierarchical modeling for human activity recognition, Artificial Intelligence 173 (7) (2009) 830–856. [27] M. Pourhomayoun, N. Alshurafa, B. Mortazavi, H. Ghasemzadeh, K. Sideris, B. Sadeghi, M. Ong, L. Evangelista, P. Romano, A. Auerbach, A. Kimchi, M. Sarrafzadeh, Multiple model analytics for adverse event prediction in remote health monitoring systems, in: 2014 IEEE Conference on Healthcare Innovation Point-of-Care Technologie (EMBS HIPT), IEEE, 2014. [28] B. Mortazavi, S.I. Lee, M. Sarrafzadeh, User-centric exergaming with fine-grain activity recognition: A dynamic optimization approach, in: 2014 ACM Joint International Conference on Pervasive and Ubiquitous Computing (UbiComp 2014) Ubicomp Workshop on Smart Health Systems and Applications, ACM, 2014. [29] M. Stikic, K. Van Laerhoven, B. Schiele, Exploring semi-supervised and active learning for activity recognition, in: 12th IEEE International Symposium on Wearable Computers, 2008. ISWC 2008, IEEE, 2008, pp. 81–88. [30] Shimmer3 wireless sensor platform. http://www.shimmersensing.com/images/uploads/docs/Shimmer3_Spec_Sheet_-_Jan_2014.pdf [Online; lastaccessed 31.03.14]. [31] C.-C. Chang, C.-J. Lin, LIBSVM: A library for support vector machines, ACM Trans. Intell. Syst. Technol. 2 (2011) 27:1–27:27. Software available at: http://www.csie.ntu.edu.tw/~cjlin/libsvm. [32] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I.H. Witten, The weka data mining software: an update, ACM SIGKDD Explor. Newsl. 11 (1) (2009) 10–18. [33] N. Ravi, N. Dandekar, P. Mysore, M. Littman, Activity recognition from accelerometer data, in: Proceedings of the National Conference on Artificial Intelligence, Vol. 20, AAAI Press, MIT Press, Menlo Park, CA, Cambridge, MA, London, 1999, 2005, p. 1541. [34] L. Bao, S. Intille, Activity recognition from user-annotated acceleration data, Pervasive Comput. (2004) 1–17. [35] L.S. Prichep, A. Jacquin, J. Filipenko, S.G. Dastidar, S. Zabele, A. Vodencarevic, N.S. Rothman, Classification of traumatic brain injury severity using informed data reduction in a series of binary classifier algorithms, IEEE Trans. Neural Syst. Rehabil. Eng. 20 (6) (2012) 806–822. [36] T. Hastie, R. Tibshirani, Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2001. [37] J. Sun, X. Wu, S. Yan, L.-F. Cheong, T.-S. Chua, J. Li, Hierarchical spatio-temporal context modeling for action recognition, in: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, IEEE, 2009, pp. 2004–2011. [38] B. Üstün, W.J. Melssen, L.M. Buydens, Facilitating the application of support vector regression by using a universal pearson vii function based kernel, Chemometr. Intell. Lab. Syst. 81 (1) (2006) 29–40. [39] G. Zhang, H. Ge, Support vector machine with a pearson VII function kernel for discriminating halophilic and non-halophilic proteins, Comput. Biol. Chem. 46 (2013) 16–22. [40] A. Doswald, F. Carrino, F. Ringeval, Advanced processing of semg signals for user independent gesture recognition, in: XIII Mediterranean Conference on Medical and Biological Engineering and Computing 2013, Springer, 2014, pp. 758–761.