Field evaluation of a random forest activity classifier for wrist-worn accelerometer data

Journal of Science and Medicine in Sport 20 (2017) 75–80 Contents lists available at ScienceDirect Journal of Science and Medicine in Sport journal ...

Download PDF

514KB Sizes 1 Downloads 25 Views

Report

PDF Reader
Full Text

Journal of Science and Medicine in Sport 20 (2017) 75–80

Contents lists available at ScienceDirect

Journal of Science and Medicine in Sport journal homepage: www.elsevier.com/locate/jsams

Original research

Field evaluation of a random forest activity classiﬁer for wrist-worn accelerometer data Toby G. Pavey a,b,∗ , Nicholas D. Gilson b , Sjaan R. Gomersall b , Bronwyn Clark c , Stewart G. Trost a a

School of Exercise and Nutrition Sciences, Queensland University of Technology, Australia School of Human Movement and Nutrition Sciences, The University of Queensland, Australia c School of Public Health, The University of Queensland, Australia b

a r t i c l e

i n f o

Article history: Received 5 November 2015 Received in revised form 16 May 2016 Accepted 16 June 2016 Available online 23 June 2016 Keywords: Accelerometer Random forest classiﬁer Physical activity Wrist

a b s t r a c t Objectives: Wrist-worn accelerometers are convenient to wear and associated with greater wear-time compliance. Previous work has generally relied on choreographed activity trials to train and test classiﬁcation models. However, validity in free-living contexts is starting to emerge. Study aims were: (1) train and test a random forest activity classiﬁer for wrist accelerometer data; and (2) determine if models trained on laboratory data perform well under free-living conditions. Design: Twenty-one participants (mean age = 27.6 ± 6.2) completed seven lab-based activity trials and a 24 h free-living trial (N = 16). Methods: Participants wore a GENEActiv monitor on the non-dominant wrist. Classiﬁcation models recognising four activity classes (sedentary, stationary+, walking, and running) were trained using time and frequency domain features extracted from 10-s non-overlapping windows. Model performance was evaluated using leave-one-out-cross-validation. Models were implemented using the randomForest package within R. Classiﬁer accuracy during the 24 h free living trial was evaluated by calculating agreement with concurrently worn activPAL monitors. Results: Overall classiﬁcation accuracy for the random forest algorithm was 92.7%. Recognition accuracy for sedentary, stationary+, walking, and running was 80.1%, 95.7%, 91.7%, and 93.7%, respectively for the laboratory protocol. Agreement with the activPAL data (stepping vs. non-stepping) during the 24 h free-living trial was excellent and, on average, exceeded 90%. The ICC for stepping time was 0.92 (95% CI = 0.75–0.97). However, sensitivity and positive predictive values were modest. Mean bias was 10.3 min/d (95% LOA = −46.0 to 25.4 min/d). Conclusions: The random forest classiﬁer for wrist accelerometer data yielded accurate group-level predictions under controlled conditions, but was less accurate at identifying stepping verse non-stepping behaviour in free living conditions Future studies should conduct more rigorous ﬁeld-based evaluations using observation as a criterion measure. © 2016 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.

1. Introduction Physical activity (PA) has an inverse relationship with many health outcomes, including coronary heart disease, diabetes, cancers and depression, and with all-cause mortality.1 Further, there is increasing evidence that prolonged sedentary behaviour (SB) is associated with increased risk of chronic illnesses, including, cardiovascular disease, cancer, diabetes, obesity and mortality in mid-age and older adults.2,3 As measurement of these behaviours

∗ Corresponding author. E-mail address: [email protected] (T.G. Pavey).

becomes more reﬁned, their relationship with health outcomes can be speciﬁed more clearly,4 with the accurate measurement of PA and SB providing improved evaluation of future health promotion intervention strategies. Several methods are currently available to assess PA and SB in free living conditions. These include subjective measures, such as self-reported PA and sitting-time, and objective measures, including heart rate monitoring and pedometers. In particular, the use of accelerometer-based motions sensors for the measurement of PA and SB has rapidly increased.5 Accelerometers can be used to derive a broad range of outcomes related to PA and SB, including physical activity type, energy expenditure, posture, and time spent in sedentary, light, moderate and vigorous intensity activities.

http://dx.doi.org/10.1016/j.jsams.2016.06.003 1440-2440/© 2016 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.

76

T.G. Pavey et al. / Journal of Science and Medicine in Sport 20 (2017) 75–80

Traditional approaches to classiﬁcation of PA and SB using accelerometers involve regression based ‘cut-points’, where the relationship between measured energy expenditure and accelerometer counts are modelled using linear regression techniques.6 Another commonly utilised technique is the receiver operator characteristic (ROC) curve, which can determine cut-point thresholds by evaluating levels of sensitivity (true positives) and speciﬁcity (true negatives) for intensity categories.7 These allow for the classiﬁcation of time spent at different PA intensities (e.g. sedentary, light, moderate and vigorous). However, inaccuracies with cutpoint methods are well-documented, in particular, the accurate prediction of PA intensity across differing activities.8–10 For example, activities where upper body movements are performed in the absence of ambulation are likely to be misclassiﬁed and underestimated, where cut-points are developed from locomotion activities.6,9 This results in the cut point method misclassifying PA across all intensities by approximately 30%, with vigorous activities misclassiﬁed approximately 50% of the time.11 Further, this method has resulted in an abundant and often conﬂicting number of published cut-points, making comparisons across studies difﬁcult.9 A developing alternative approach to estimating PA and SB is the use of machine learning or pattern recognition, which include approaches such as decision trees, artiﬁcial neural networks and hidden Markov models.12–14 Developed algorithms learn to recognize complex patterns or features in the acceleration signal and make predictions about the type and/or the intensity of PA.15 The feasibility of using a pattern recognition approach has been established for accelerometers worn at the waist.16,17 However, algorithms for estimating PA from wrist data, using pattern recognition techniques, are starting to emerge.13,18 Wrist-worn accelerometers are convenient to wear and offer the likelihood of better compliance with wear time requirements, as they can be worn continuously, without the need to remove them when changing clothes, showering or sleeping.19 Data from NHANES shows compliance with waist worn protocols ranging from 40 to 70% (2003–2006), with 70 to 80% compliance for wrist worn protocols (2011–2012).20 The GENEActiv is a widely used tri-axial accelerometer for measurement of PA and SB. Data reduction in the form of intensity threshold cut-points have previously been established in adults.21,22 However, the above limitations of cut-point thresholds apply, with cross validation of the adult cut-points showing classiﬁcation accuracy of approximately 50%.23 Pattern recognition techniques have previously been applied to the GENEActiv monitor, providing high accuracy for the recognition of lab-based activities in four activity classes (sedentary, household, walking and running).24 Further, previous work assessing pattern recognition techniques at both waist and hip locations have relied on choreographed activity trials to train and test classiﬁcation models. Thus, validity in free living contexts remains an open question. The need for moving beyond regression based cut-points is clear, with a requirement of establishing and reﬁning pattern recognition techniques, which are then assessed in free-living conditions, for the wrist worn location where greater wear time compliance is likely. Consequently, the aims of this study were to: (1) train and test a random forest activity classiﬁer for wrist accelerometer data; and (2) determine if models trained on laboratory data perform well under free-living conditions.

2. Methods Participants for calibration and free-living validation were part of an ongoing study, assessing the validity of the GENEActiv for the measurement of SB.19 Fifty-seven participants were recruited from the University of Queensland, Australia by convenience sampling,

including word of mouth and an online university newsletter. Those who showed interest received an information sheet explaining the study and the eligibility criteria; and an invitation to join the study via to be eligible for the study participants had to be over the age of 18 years, and ambulatory. Eligible participants provided written informed consent prior to enrolling in the study. Ethical clearance was obtained from the Medical Research Ethics Committee of the University of Queensland (#2013000870). For this calibration and validation study, data from 21 participants were randomly selected. The protocol involved a single testing session of approximately 45 min for each participant. Self-reported demographics were collected and the GENEActiv monitor was placed on the non-dominant wrist, and the activPAL attached to the thigh. Participants then performed various lying, sitting, standing and moving activities for approximately 30 min. The ﬁrst activity for each participant was (1) lying still followed by a random allocation of (2) sitting still (3) standing still (4) sitting active (either working on a laptop or sorting papers into storage) (5) standing active (either washing dishes or cleaning windows) (6) walking at own pace (7) running at own pace. Each activity was undertaken for three minutes, one after the other. Before the start of the session, a research assistant explained and gave a demonstration of each activity to be completed. During the session a research assistant timed the activities, with another instructing the participant of the next activity. For the purpose of developing a simple classiﬁer for ﬁeld based research, the activity trials were categorised into 4 distinct activity classes: sedentary (lying or sitting still), stationary+ (sitting active, standing still, standing active), and overground walking (self-paced walking), and running (self-paced running). Sixteen participants then agreed to complete a 24-h free living evaluation trial, by continuing to wear the GENEActiv and activPAL after the single testing session (day 1) and returning both monitors two days later (day three of the protocol). Day two provided a full 24-h monitoring period. The GENEActiv (Activinsights Ltd., Cambridgeshire, UK) is a tri-axial, ± 6 g seismic acceleration sensor, which is small (36 cm × 30 cm × 12 cm), lightweight (16 g), water resistant, and offers a near body temperature sensor to help improve the conﬁrmation of wear and non-wear time. GENEActiv validity studies have demonstrated strong correlations for criterion validity (r = 0.79–0.98) against indirect calorimetry for physical activity and sedentary behaviour.21,25 The GENEActiv was conﬁgured with a sampling frequency of 30 Hz. The selected sampling frequency should fulﬁl the Nyquist criterion, which speciﬁes that the sampling frequency must be at least twice the frequency of the highest frequency of the movement under investigation. The frequency of the majority of daily PA movements ranges between 0.3 and 3.5 Hz, hence the selected sampling frequency of 30 Hz was more than adequate. Moreover, previous research has shown that sampling frequencies >10 Hz are not associated with greater classiﬁcation accuracy.24 The activPAL device (Version 3, Pal Technologies Ltd., Glasgow, UK) is a thigh-worn inclinometer accelerometer, which continuously records posture and movement (time spent sitting/lying, standing or stepping). The device was sealed with a nitrile ﬁnger cot and a layer of Opsite ﬁlm (Smith & Nephew) and attached to the skin with an additional transparent ﬁlm (TegadermTM Roll, 3MTM ) in order to provide a waterproof barrier. The attachment was made to the right thigh (midline on the anterior aspect). The activPALs were initialised using the default settings. The activPAL has been shown to have high accuracy as a measure of posture (sit/lie as opposed to upright) and motion.26 Participants’ activPAL and GENEActiv data were considered valid if they reported wearing the device for all waking hours, with less than 30 minutes removal over the 24 h monitored period.

T.G. Pavey et al. / Journal of Science and Medicine in Sport 20 (2017) 75–80

Raw tri-axial acceleration signal from the middle two minutes of each activity trial (where the ﬁrst and last 30 s of each activity trial was omitted), were parsed and segmented into sequences of 10 s, non-overlapping time windows for feature extraction. Features were selected from the extensive list of time and frequency domain features described by Liu et al.12 and included the following in each axis of movement: mean, SD, 10th, 25th, 50th,75th, 90th percentile, MAD, lag one autocorrelation, signal power, dominant frequency 0.25–3.0 Hz, dominant frequency magnitude 0.25–3.0 Hz, entropy 0.25–3.0 Hz. A random forest is an ensemble or collection of multiple decision tree models.27 Each tree is grown from a bootstrap sample of the training dataset and each node is split using the best among a randomly selected subset of explanatory variables or features. The class predictions generated by each tree in the forest are aggregated and the ﬁnal model prediction (i.e., activity type) is based on the majority vote. A random forest model was selected because: (1) very little pre-processing of the data is required, as the features do not need to be normalized; (2) feature selection procedures are not required because the algorithm effectively does this on its own; and (3) each tree within the forest is independently grown to maximum depth using a randomly selected subset of features, thus making them resistant to over ﬁtting the training data.28 Models were trained, tuned, and cross-validated using the “randomForest” and “caret” packages within R.29 The train function within the “caret” package was used to tune candidate values for the number of trees (ntree) and the number of features to consider for splitting at every node (mtry). In the current study, accuracy was optimized with 100 trees and six features randomly sampled as candidates at each split. Model cross-validation was completed using leave-one-out cross-validation (LOOCV). In LOOCV, the prediction model is trained on data from all of the participants except one, which is “held out” and used as the test dataset. The process is repeated until all participants have served as the test data set and the accuracy results are averaged. Accuracy was evaluated by calculating sensitivity, speciﬁcity, positive predictive value, negative predictive value, and balanced accuracy for each activity class (sedentary, stationary+, walking and running). Classiﬁer performance during the 24 h free living trial was evaluated by examining agreement between the random forest classiﬁer and the activPAL monitor. For this comparison, the time stamped event data from the activPAL output was converted to 10-s epochs and aligned with the 10 s time windows scored by the random forest classiﬁer. For each 10 s epoch of activPAL data, activity type was calculated by multiplying the numeric value assigned to each activity type (1 = sit/lie, 2 = stand, 3 = step) by the proportion of the 10 s interval in that activity and summing the products. Predictions were rounded to the nearest integer. Thus, for 10 s epochs with multiple activPAL codes (i.e. a combination of standing and stepping), the assigned activity type represented the code recorded for the majority of the 10 s interval. This approach provided a realistic assessment of performance under free-living conditions and contrasted with the approach of other ﬁeld-evaluation studies which excluded segments with mixed activity types from the analysis.13,18 Because the stationary+ activity class comprised dynamic sitting and standing without ambulation, it was not possible to directly align this activity class with the ActivPAL’s sitting or standing categories. Thus, activity class predictions from the random forest classiﬁer and activPAL were dichotomized as stepping (activPAL stepping; random forest walking and running) or non-stepping (activPAL sit/lie or stand; random forest sedentary or stationary+) activity behaviour. Agreement between the random forest classiﬁer and activPAL was evaluated by calculating sensitivity, speciﬁcity, positive predictive value, negative predictive value, and balanced accuracy. In addition, agreement between random forest and

77

Table 1 Confusion matrix for classiﬁcation of activities in lab-based protocol (including sensitivity, speciﬁcity, PPV, NPV, and balanced accuracy). Activity class

Sedentary Stationary+ Walk Run Sensitivity Speciﬁcity PPV NPV Balanced accuracy

Sedentary

Stationary+

Walk

Run

0.80 0.02 0 0 0.80 0.98 0.88 0.97 0.89

0.19 0.95 0.07 0.05 0.96 0.89 0.92 0.94 0.93

0 0.01 0.91 0.01 0.92 0.99 0.94 0.99 0.95

0 0 0 0.93 0.94 0.99 0.98 0.99 0.97

PPV, positive predicted value; NPV, negative predicted value.

activPAL predicted time spent (min) in stepping was evaluated using the Intraclass correlation coefﬁcient (ICC) and Bland–Altman plots. 3. Results All 21 participants (mean age = 27.6 ± 6.2, 62% male) completed the seven laboratory-based activity trials wearing a GENEActiv accelerometer, while 16 participants completed the free-living assessment. These 16 participants were not different from the larger sample, with respect to physical activity levels and demographics. Averaged over all activity classes, classiﬁcation accuracy for the random forest classiﬁer during LOOCV was 92.7 ± 0.07%, with a weighted Kappa = 0.88 ± 0.12, for the laboratory results. Table 1 presents the confusion matrix for the activity type predictions. Recognition accuracy for sedentary, stationary+, walking, and running was 80.1%, 95.7%, 91.7%, and 93.7%, respectively for the laboratory results. The majority of the 20% of misclassiﬁed sedentary trials were from a classiﬁcation of stationary+. Table 1 shows that sensitivity, speciﬁcity and positive predictive value PPV (Precision) was high. When predictions were dichomotised as “stepping vs non-stepping”, recognition accuracy for stepping was 93.5%. Sensitivity, speciﬁcity, PPV, NPV, and balanced accuracy were 93.5%, 98.9%, 97.2% and 96.2%, respectively for the laboratory results. Table 2 presents agreement with the activPAL data (stepping vs. non-stepping) during the 24 h free living evaluation, which exceeded 90% on average. The random forest classiﬁer exhibited high speciﬁcity for activPAL measured non-stepping, but sensitivity and PPV were modest, and the Kappa statistic was moderate.30 Fig. 1 presents a scatterplot of the random forest classiﬁer and activPAL daily stepping time. Points on the line of identity indicate that the random forest estimate of stepping time was identical to that of the activPAL. Stepping time estimates from the random forest classiﬁer were strongly correlated to the activPAL, with an ICC of 0.92 (95% CI = 0.75–0.97). When the participant with approximately 250 min per day of stepping was excluded from the dataset, Table 2 Accuracy, sensitivity, speciﬁcity, PPV, NPV, balanced accuracy, and kappa for random forest classiﬁer measured stepping and non-stepping in free-living contexts. Mean (SD) Accuracy Sensitivity Speciﬁcity PPV NPV Balanced accuracy Kappa

93.7 (2.4%) 53.8 (11.5%) 96.3 (1.4%) 47.7 (11.7%) 96.9 (1.7%) 75.1 (5.7%) 0.47 (0.10)

PPV, positive predicted value; NPV, negative predicted value.

78

T.G. Pavey et al. / Journal of Science and Medicine in Sport 20 (2017) 75–80

Fig. 1. Random forest predicted vs activPAL predicted stepping time (AP, activPAL; RF, random forest classiﬁer).

the ICC for stepping time was 0.72 (95% CI = 0.31–0.91). Supplementary Fig. 1 displays a Bland–Altman plot depicting the mean bias and limits of agreement (LOA) for activPAL and random forest classiﬁer measured stepping time. Mean bias (activPAL minus random forest) was −10.3 min/d (95% LOA = −46.0 to +25.4 min/d), indicating that, at the group level, the random forest model tended to overestimate stepping time relative to the activPAL. Supplementary Fig. 2 displays a Bland-Altman plot where the participant with approximately 250 min per day of stepping was excluded, and shows any systematic bias was an artifact related to this outlier. 4. Discussion The aims of this study were to train and test a random forest activity classiﬁer for wrist accelerometer data, and determine if a model trained on laboratory data performed well under freeliving conditions. To our knowledge, this is one of only a few studies to assess the performance of a PA classiﬁer for wrist-worn accelerometer data in free-living.13,18 The random forest classiﬁer exhibited high overall accuracy, sensitivity, and speciﬁcity in the laboratory-based cross-validation, but considerably lower sensitivity and positive predictive value for stepping versus non-stepping activity behaviours under free living conditions. The classiﬁcation accuracy of 93% during LOOCV is similar to previous wrist based pattern recognition studies conducted in laboratory-based settings. Zhang and colleagues have shown accuracy of 96% with four activity classes (sedentary, household, walking and running), using a Decision Tree method and LOOVC.24 Ellis and colleagues have shown accuracy of 88% across eight PA activities (including household activities and stairs, walking and running), using a random forest classiﬁer. The slightly lower accuracy of 88% can be attributed to the inclusion of a stair activity, which we did not include, and provided misclassiﬁcation with walking.31 Our own accuracy classiﬁcation and those previously reported for the wrist are comparable to those reported for accelerometers worn at the waist,24,31,32 suggesting in a lab environment, positioning of the monitor at either the waist or wrist will elicit similar results. The lower classiﬁcation accuracy for sedentary activities was notable in the present study (80%). Misclassiﬁcation occurred through stationary+, suggesting some common patterning features for wrist worn data from non-ambulant movements. Staudenmayer et al. assessed a range of data reductions models (including linear regression, decision trees, and a random forest classiﬁer) for the assessment of PA and SB using a wrist worn Actigraph.8 For the outcomes ‘sedentary’ or ‘not sedentary’ and

‘locomotion’ or ‘not locomotion’ the random forest classiﬁer’s performance was 96% and 99%, respectively. However, with the sedentary class, ‘standing’ was included. Performance estimation in free living was investigated using direct observation, but only in two people for 2 h, compared to our sample of 16 for 24 h. It was suggested that the results were promising, but only for one of the two participants. Our free-living results for stepping and non-stepping versus the activPAL provided high overall accuracy and speciﬁcity. However, the propensity for the RF classiﬁer to misclassify activPAL stepping epochs as non-stepping resulted in substantially lower sensitivity for the combined walking and running class, a and moderate agreement, based on Kappa. Because our classiﬁer was trained on continuous self-paced walking and running trials performed in the laboratory, it is likely that slow-paced walking or lifestyle activities with brief intermittent periods of ambulatory movement were misclassiﬁed as non-stepping. This highlights the challenges of using lab-based activity recognition algorithms to assess free living PA behavior. Gyllensten and Bonomi have shown reduced accuracy for waist-worn accelerometer data when assessed in daily living, suggesting that acceleration features obtained in the lab, differ to those in free-living.32 Measurement at the wrist may be particularly challenging given the number of arm movements throughout the day, providing greater inter-person variability, as highlighted by our Bland–Altman limits of agreement. Our ﬁndings are consistent with the results two recently published studies evaluating the performance of machine learning activity recognition algorithms trained on free-living accelerometer data. Sasaki et al.13 compared activity recognition rates of classiﬁers training on laboratory and free-living tri-axial accelerometer data in older adults. In a the laboratory setting, the classiﬁer trained on a structured activity trials exhibited high recognition accuracy (87–96%) for locomotion, sedentary activities, household activities, recreational activities, and standing. However, when applied to data collected under free-living conditions, recognition accuracy decreased substantially (49–55%). When the models were trained on free-living accelerometer data, recognition accuracy improved to 58–69%; however, none of the algorithms were deemed sufﬁciently accurate for assessment of free-living activity behavior. Most recently, Ellis and colleagues,18 established the feasibility of training a machine learning activity recognition algorithm on free-living accelerometer data using wearable camera images collected over a 7-d monitoring period as ground truth labels. Balanced accuracy (the average of sensitivity and speciﬁcity) for recognition of sitting, sitting in a motor vehicle, walking/running, and standing was 88% and 83% for the hip and wrist algorithms, respectively. However, sensitivity (0.57), speciﬁcity (0.98), and balanced accuracy (0.78) for recognition of walking/running at the wrist, was comparable to the levels achieved by our lab-based RF classiﬁer for recognition of stepping activity. Moreover, 32% of the minutes coded as standing were misclassiﬁed as walking/running. Therefore, although there is consistent evidence that classiﬁers trained on free-living data outperform algorithms trained on laboratory-based data, the relatively small improvements in accuracy achieved by free-living classiﬁers and the modest recognition rates observed for key activities such as standing and walking, suggests that more research is needed to develop more accurate activity recognition algorithms for free-living accelerometer data. Although use of the activPAL for free-living comparison limited assessment to stepping and non-stepping only, it did provide valuable insights on agreement, with this being the ﬁrst study to conduct a free living evaluation of a machine learning algorithm for the GENEActiv worn on the wrist, with the choice of the activPAL providing a means of evaluating performance over a

T.G. Pavey et al. / Journal of Science and Medicine in Sport 20 (2017) 75–80

full 24 h period. Subsequent studies in this area need to build on these data using more comprehensive assessments. Future studies should develop and test classiﬁers with more rigorous ﬁeld based evaluations based on direct observation or video assessments. However, it has been suggested that there is no ‘gold standard’ for activity type classiﬁcation in real life.33 There may be difﬁculty in attaining ground truth in the free living context where movements can be transient, providing labeling difﬁculties for observers. However, Lyden and colleagues have successfully used direct observation over three days in 13 participants as a criterion measure when assessing machine learning methods for a hip placed monitor,34 although, this did require extensive researcher training. Other options include PA diaries, use of time tools, or wearable cameras, e.g. the sensecam for capturing ground truth.18 But these may be limited to less frequent – longer duration activities.34 The strengths of the study were the use of a random forest classiﬁer, which was trained on a variety of lab-based movements, and then assessed on free-living data. Further, we introduced the stationary+ activity class to assess movement without ambulation. This activity class is particularly applicable to speciﬁc populations that may sit or stand a lot (e.g. ofﬁce, transport and factory workers). A further justiﬁcation for a stationary+ category is the consistent ﬁnding that a single monitor placed on the wrist or hip locations cannot reliably differentiate sitting and standing in free living or simulated free living contexts.8,24 As previously discussed, the use of the activPAL limited assessment to stepping and non-stepping only, but the use of a binary outcome has the advantage of providing easier to interpret performance indicators.33 However, it should be acknowledged that merging into two classes (stepping and nonstepping) simpliﬁes the problem and that accuracy would likely be lower if the four classes were evaluated separately. However, physical activity interventions typically promote ambulatory activities such as walking, so evaluating performance of a classiﬁer that differentiates stepping from non-stepping has strong relevance to the ﬁeld, especially as cut-point methods applied to the wrist cannot differentiate output (G units) recorded during sitting and standing with arm movements and walking/running. A dichotomous approach to classiﬁer evaluation has also been utilised by others.8,33 We were also limited to a small number of participants with a narrow demographic for training and free living assessment, affecting the generalizability to speciﬁc populations (e.g. older adults). 5. Conclusions A random forest classiﬁer trained and cross-validated on lab-based activity data provided high overall accuracy for wristworn data, which was similar to accuracy estimates previously obtained at the waist. However, this method is less accurate at identifying stepping verse non-stepping behaviour in free living conditions. These ﬁndings highlight the challenges of applying machine learning to activity monitor assessment in free-living conditions, and the need to train prediction models using free-living data. 6. Practical implications • Activity recognition algorithms trained on lab-based data are less accurate at identifying stepping versus non-stepping behaviour in free living conditions. • Accuracy of the random forest classiﬁer worn at the wrist was similar to waist worn estimates. • Further examination of random forest classiﬁers using more rigorous ﬁeld-based assessments is required.

79

Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.jsams.2016.06.003.

References 1. Haskell WL, Blair SN, Hill JO. Physical activity: health outcomes and importance for public health policy. Prev Med 2009; 49(4):280–282. 2. Pavey TG, Peeters G, Brown WJ. Sitting-time and 9-year all-cause mortality in older women. Br J Sport Med 2012. http://dx.doi.org/10.1136/bjsports2012-091676. 3. Thorp AA, Owen N, Neuhaus M et al. Sedentary behaviors and subsequent health outcomes in adults: a systematic review of longitudinal studies, 1996–2011. Am J Prev Med 2011; 41(2):207–215. 4. Sallis JF, Owen N, Fotheringham MJ. Behavioral epidemiology: a systematic framework to classify phases of research on health promotion and disease prevention. Ann Behav Med 2000; 22(4):294–298. 5. Wijndaele K, Westgate K, Stephens SK et al. Utilization and harmonization of adult accelerometry data: review and expert consensus. Med Sci Sport Exerc 2015; 47(10):2129–2139. 6. Staudenmayer J, Pober D, Crouter S et al. An artiﬁcial neural network to estimate physical activity energy expenditure and identify physical activity type from an accelerometer. J Appl Physiol 2009; 107(4):1300–1307. 7. Jago R, Zakeri I, Baranowski T et al. Decision boundaries and receiver operating characteristic curves: new methods for determining accelerometer cutpoints. J Sport Sci 2007; 25(8):937–944. 8. Staudenmayer J, He S, Hickey A et al. Methods to estimate aspects of physical activity and sedentary behavior from high-frequency wrist accelerometer measurements. J Appl Physiol 2015; 119(4):396–403. 9. Trost SG, Wong W-K, Pfeiffer KA et al. Artiﬁcial neural networks to predict activity type and energy expenditure in youth. Med Sci Sport Exerc 2012; 44(9):1801–1809. 10. Crouter SE, Clowers KG, Bassett DR. A novel method for using accelerometer data to predict energy expenditure. J Appl Physiol 2006; 100(4):1324–1331. 11. Lyden K, Kozey SL, Staudenmeyer JW et al. A comprehensive evaluation of commonly used accelerometer energy expenditure and MET prediction equations. Eur J Appl Physiol 2011; 111(2):187–201. 12. Liu S, Gao R, Freedson P. Computational methods for estimating energy expenditure in human physical activities. Med Sci Sport Exerc 2012; 44(11): 2138. 13. Sasaki JE, Hickey A, Staudenmayer J et al. Performance of activity classiﬁcation algorithms in free-living older adults. Med Sci Sport Exerc 2015. http://dx.doi.org/10.1249/MSS.0000000000000844. 14. Montoye A, Mudd LM, Biswas S et al. Energy expenditure prediction using raw accelerometer data in simulated free-living. Med Sci Sport Exerc 2015; 47(8):1735–1746. 15. Hagenbuchner M, Cliff DP, Trost SG et al. Prediction of activity type in preschool children using machine learning techniques. J Sci Med Sport 2014; 18(4):426–431. 16. De Vries SI, Engels M, Garre FG. Identiﬁcation of children’s activity type with accelerometer-based neural networks. Med Sci Sport Exerc 2011; 43(10):1994–1999. 17. De Vries SI, Garre FG, Engbers LH et al. Evaluation of neural networks to identify types of activity using accelerometers. Med Sci Sports Exerc 2011; 43(1):101–107. 18. Ellis K, Kerr J, Godbole S et al. Hip and wrist accelerometer algorithms for free-living behavior classiﬁcation. Med Sci Sport Exerc 2015. http://dx.doi.org/10.1249/MSS.0000000000000840. 19. Pavey TG, Gomersall SR, Clark BK et al. The validity of the GENEActiv wrist-worn accelerometer for measuring adult sedentary time in free living. J Sci Med Sport 2015. http://dx.doi.org/10.1016/j.jsams.2015.04.007. 20. Freedson PS, John D. Comment on estimating activity and sedentary behavior from an accelerometer on the hip and wrist. Med Sci Sport Exerc 2013; 45(5):962–963. 21. Esliger DW, Rowlands AV, Hurst TL et al. Validation of the GENEA accelerometer. Med Sci Sports Exerc 2011; 43(6):1085–1093. 22. Hildebrand M, Van Hees VT, Hansen BH et al. Age-group comparability of raw accelerometer output from wrist-and hip-worn monitors. Med Sci Sports Exerc 2014; 46(9):1816–1824. 23. Welch WA, Bassett DR, Thompson DL et al. Classiﬁcation accuracy of the wristworn gravity estimator of normal everyday activity accelerometer. Med Sci Sport Exerc 2013; 45(10):2012–2019. 24. Zhang S, Rowlands AV, Murray P et al. Physical activity classiﬁcation using the GENEA wrist-worn accelerometer. Med Sci Sports Exerc 2012; 44(4): 742–748. 25. Rowlands AV, Olds TS, Hillsdon M et al. Assessing sedentary behavior with the GENEActiv: introducing the sedentary sphere. Med Sci Sport Exerc 2014; 46(6):1235–1247. 26. Grant PM, Ryan CG, Tigbe WW et al. The validation of a novel activity monitor in the measurement of posture and motion during everyday activities. Br J Sport Med 2006; 40(12):992–997. 27. Breiman L. Random forests. Mach Learn 2001; 45(1):5–32.

80

T.G. Pavey et al. / Journal of Science and Medicine in Sport 20 (2017) 75–80

28. Williams G. Data mining with Rattle and R: the art of excavating data for knowledge discovery, Springer Science & Business Media, 2011. 29. Kuhn M. Building predictive models in R using the caret package. J Stat Softw 2008; 28(5):1–26. 30. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977:; 33(1):159–174. 31. Ellis K, Kerr J, Godbole S et al. A random forest classiﬁer for the prediction of energy expenditure and type of physical activity from wrist and hip accelerometers. Physiol Meas 2014; 35(11):2191–2203.

32. Gyllensten IC, Bonomi AG. Identifying types of physical activity with a single accelerometer: evaluating laboratory-trained algorithms in daily life. IEEE Trans Biomed Eng 2011; 58(9):2656–2663. 33. van Hees VT, Golubic R, Ekelund U et al. Impact of study design on development and evaluation of an activity type classiﬁer. J Appl Physiol 2013. http://dx.doi.org/10.1152/japplphysiol00984.2012. 34. Lyden K, Keadle SK, Staudenmayer J et al. A method to estimate free-living active and sedentary behavior from an accelerometer. Med Sci Sport Exerc 2014; 46(2):386–397.

Field evaluation of a random forest activity classifier for wrist-worn accelerometer data

Field evaluation of a random forest activity classifier for wrist-worn accelerometer data

Recommend Documents