Performance comparison between wrist and chest actigraphy in combination with heart rate variability for sleep classification

Performance comparison between wrist and chest actigraphy in combination with heart rate variability for sleep classification

Accepted Manuscript Performance comparison between wrist and chest actigraphy in combination with heart rate variability for sleep classification Md A...

1004KB Sizes 0 Downloads 38 Views

Accepted Manuscript Performance comparison between wrist and chest actigraphy in combination with heart rate variability for sleep classification Md Aktaruzzaman, Massimo Walter Rivolta, Ruby Karmacharya, Nello Scarabottolo, Luigi Pugnetti, Massimo Garegnani, Gabriele Bovi, Maurizio Ferrarin, Roberto Sassi PII:

S0010-4825(17)30259-7

DOI:

10.1016/j.compbiomed.2017.08.006

Reference:

CBM 2746

To appear in:

Computers in Biology and Medicine

Received Date: 23 February 2017 Revised Date:

3 August 2017

Accepted Date: 3 August 2017

Please cite this article as: M. Aktaruzzaman, M.W. Rivolta, R. Karmacharya, N. Scarabottolo, L. Pugnetti, M. Garegnani, G. Bovi, M. Ferrarin, R. Sassi, Performance comparison between wrist and chest actigraphy in combination with heart rate variability for sleep classification, Computers in Biology and Medicine (2017), doi: 10.1016/j.compbiomed.2017.08.006. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

RI PT

Performance comparison between wrist and chest actigraphy in combination with heart rate variability for sleep classification

Md Aktaruzzamana,b,∗, Massimo Walter Rivoltaa , Ruby Karmacharyaa , Nello Scarabottoloa , Luigi Pugnettic , Massimo Garegnanic , Gabriele Bovic , Maurizio Ferrarinc , Roberto Sassia a Dipartimento

di Informatica, Universit` a degli Studi di Milano, Milan, Italy of Computer Science and Engineering, Islamic University, Kushtia, Bangladesh c IRCCS S. Maria Nascente, Fond. Don Carlo Gnocchi Onlus, Milan, Italy

M AN U

SC

b Department

Abstract

The concurrent usage of actigraphy and heart rate variability (HRV) for sleep efficiency quantification is still matter of investigation. This study compared chest (CACT) and wrist (WACT) actigraphy (actigraphs positioned on chest and wrist, respectively) in combination with HRV for automatic sleep

TE D

vs wake classification. Accelerometer and ECG signals were collected during polysomnographic studies (PSGs) including 18 individuals (25 to 53 years old) with no previous history of sleep disorders. Then, an experienced neurologist performed sleep staging on PSG data. Eleven features from HRV and accelerom-

EP

etry were extracted from series of different lengths. A support vector machine (SVM) was used to automatically distinguish sleep and wake. We found 7 minutes as the optimal signal length for classification, while maximizing specificity

AC C

(wake detection). CACT and WACT provided similar accuracies (78% chest vs 77% wrist), larger than what yielded by HRV alone (66%). The addition of HRV to CACT reduced slightly the accuracy, while improving specificity (from 33% ∗ Corresponding author: Md Aktaruzzaman while performing this study was with the Dipartimento di Informatica, Universit` a degli Studi di Milano, Italy. He is now with the Department of Computer Science and Engineering, Islamic University, Kushtia 7003, Bangladesh. Phone: +880 71 62201 (ext 2313). Email address: [email protected] (Md Aktaruzzaman)

Preprint submitted to Computers in Biology and Medicine

August 3, 2017

ACCEPTED MANUSCRIPT

to 51%, p < 0.05). On the contrary, the concurrent usage of HRV and WACT

RI PT

did not provide statistically significant improvements over WACT. Then, a subset of features (3 from HRV + 1 from actigraphy) was selected by reducing redundancy using a strategy based on Spearman’s correlation and area under the ROC curve. The usage of the reduced set of features and SVM classifier gave only slightly reduced classification performances, which did not differ from

SC

the full sets of features. The study opens interesting possibilities in the design of wearable devices for long-term monitoring of sleep at home.

Keywords: Sleep scoring; Heart rate variability; Actigraphy; SVM classifier;

1. Introduction

M AN U

Wearable sensors

Sleep is a dynamic process that varies from day to day. It plays a significant role in the genesis and insurgence of several pathologies such as cardiac, neurological and metabolic disorders [1]. The quality of sleep mostly influences nat-

TE D

ural processes like learning, memorization and concentration [2]. Today, mobile health applications represent the new paradigm of tele-home care monitoring that combines the standard telemedicine approach with the latest Internet of Things (IoT) concept [3]. In this context, the development of a new generation of sensors has led to aggregate different types of sensing units in a single device.

EP

Typical examples are wearable T-shirts with actigraphy (ACT) and electrocardiographic (ECG) sensors or smartwatches with skin conductance and ACT. In this work, we explored the possibility of assessing the quality of sleep of healthy

AC C

subjects in their day-to-day lives by analysing signals which could be easily integrated in a single wearable device. In particular, we considered a combination of ECG and ACT collected at chest. The standard assessment of sleep is done professionally in a sleep laboratory

using whole night polysomnography (PSG), which requires signals from multiple sensors like electroencephalogram (EEG), electrooculogram (EOG), and electromyogram (EMG). Unfortunately, the use of PSG for sleep staging faces

2

ACCEPTED MANUSCRIPT

some serious limitations such as the need of expensive equipments, a special

RI PT

sleep laboratory, and trained personnel, which make PSG unsuitable for monitoring large populations. Besides these, the normal behavior of sleep of a person can even be affected by the new environment (typically a dedicated room inside

the hospital). These limitations of PSG have motivated the pursuit of some alternative methods for long-term evaluation of sleep at home. Many researchers

SC

[4, 5, 6, 7] have proposed sleep staging from heart rate variability (HRV, i.e., the

variation in duration of time intervals between successive heart beats), which has the advantages of being low cost and noninvasive. However, sleep staging

lower accuracy [7, 8, 9].

M AN U

from only HRV is still far from widespread practical implementation due to its

When interested only on sleep quality (rather than estimating each sleep phase), another alternative method is the use of ACT, typically recorded at wrist, using a watch-like device. The main advantages of activity monitors based on ACT are their small size, very low price and also the fact that they are comfortable, noninvasive devices. The use of ACT for distinguishing sleep

TE D

and wakefulness was first validated by Kripke et al. [10] in 1978. While already actively used in clinical practice, it is still an active field of research [11, 12, 13, 14]. However, its accuracy reduces as sleep becomes more fragmented [15]. Being just based on acceleration data, it might underestimate wake (i.e., wake

EP

is classified as sleep), particularly for low movement activities such as sitting quietly or lying on the bed and reading, watching television, etc. In this proof-of-concept study, we were interested in verifying if the combi-

AC C

nation of chest ACT (CACT; actigraph placed on the chest) with HRV performs better or at least similarly to wrist ACT (WACT; actigraph placed on the wrist) with or without HRV. The rationale underlying this question is that if they provide similar performances, then all sensors can be integrated on a single device, capable of collecting both ECG and ACT (a chest-belt or a sensorized T-shirt). This might facilitate the use of the systems by inexperienced users, as well as simplifying long-term sleep monitoring at home. While integrated devices, with somehow similar characteristics, do exist for fitness or well-being purposes, to 3

ACCEPTED MANUSCRIPT

the best of our knowledge, no relevant data have been published with respect

RI PT

to sleep-staging. Of note, in infants, several studies have already investigated the use of ACT positioned at different locations, rather than wrist, for assessing sleep quality [16, 6, 17, 12, 18]. Finally, the computational burden on these

integrated sensors must be kept as small as possible, to limit the drain of the batteries and extend their operational time. With this in mind, we also inves-

SC

tigate the possibility to reduce the number of operations required (i.e., number of features which are estimated) while obtaining acceptable accuracy.

M AN U

2. Materials and Methods

A brief overview of the analyses performed in this study is represented by the block diagram in figure 1.

Wrist Accelerometry

TE D

ECG

EP

Preprocessing

AC C

Best Combination of Epochs

Chest Accelerometry

Preprocessing

Feature Extraction & Selection

Classification

Figure 1: Diagram of the analysis performed.

2.1. Data collection A cohort of 18 volunteers (age between 25 and 53 years, females: 8), with no previous history of sleep disorders, was specifically enrolled for this study. 4

ACCEPTED MANUSCRIPT

Initially, potential participants were identified through a call for volunteers in the

RI PT

research centers involved in the study and afterwards the recruitment continued by “word of mouth”. All participants signed an informed consent, adhering to the principles outlined in the Helsinki Declaration. The presence of sleep

disorders was first tested simply by interviewing the volunteers at enrollment (any subject referring to known or suspected sleep disorders or conditions which

SC

might affect sleep, e.g., anxiety or stress, was not further considered for the study). Then, after PSG, data were analyzed by an expert neurologist and, in case sleep alterations were detected, volunteers and their data were excluded

M AN U

from the rest of the study (no subject needed to be excluded for this reason, while 3 recordings were excluded for low signal quality, e.g., electrodes detachments). PSG data were collected using a simplified setting composed of 1 bipolar EOG, 2 unipolar ECG and 2 unipolar EEG, recorded using the Xltek Trex HD ambulatory EEG/PSG (Natus Medical Incorporated, USA), at home for one night. All the signals were acquired at a sampling rate of 200 Hz. The bipolar EOG was recorded by placing the two electrodes on the lower-outer edge of the

TE D

right eye and on the upper-outer edge of the left eye, respectively. The two unipolar ECG signals were obtained placing two electrodes on the left upper chest. Then, a single bipolar signal was derived by computing their difference. The bipolar ECG lead was approximately aligned as the standard lead III.

EP

Finally, EEG leads were located on the right frontopolar position (Fp2) and on the right mastoid (M2) to obtain the bipolar derivation Fp2-M2. Acceleration data were collected simultaneously with two triaxial accelerom-

AC C

eters (GENEActiv, Activinsights Limited, UK) worn at the chest and the wrist. Each accelerometer contained a 3D MEMS sensor with 12 bits resolution, an output range of ±8g, and a sampling rate of 50 Hz (using the standard hardware

filtering settings of the device). The device has recently been validated for sleep vs wake classification [19], while worn at the wrist. The two accelerometers and the Xltek Trex HD ambulatory EEG/PSG were manually synchronized using the internet clock time before any PSG study. Proper instructions about the usage of the devices were given to the subjects 5

ACCEPTED MANUSCRIPT

by two expert neurologists at the Don Gnocchi Hospital, Milan, Italy, so that

RI PT

they could perform the recordings themselves at home. Each volunteer collected the devices from the hospital the same day of the study, and then went home

after the electrodes were placed. The device was then brought back to the hospital by the volunteer in the next morning. Data were downloaded and the

sleep scoring was manually assessed by the same neurologists on each 30-s epoch,

SC

according to the guidelines of AASM [20]. The scoring was carried out using Natus SleepWorks ver. 6.3 (Natus Medical Incorporated, USA). The description

of each individual and her/his sleep characteristics (wake and sleep duration),

M AN U

during the night of recording, are shown in Table 1. 2.2. Preprocessing of ECG and Accelerometry

Heart beat locations and labels were determined on the bipolar ECG lead using the gqrs algorithm, freely available from Physionet [21]. The algorithm, based on a strategy similar to the classical one by Pan & Tompkins [22], located the QRS complexes for each ECG signal and then, the inter-beat time interval

TE D

series (RR series) was computed. RR intervals corresponding to instantaneous heart rate below 30 bpm or above 180 bpm were likely due to low signal quality or beat misdetection and therefore removed from the series. RR intervals were further discarded at time of feature extraction when falling outside the range from Q1 − 3 × (Q3 − Q1 ) to Q3 + 3 × (Q3 − Q1 ), where Qi was the ith quartile

EP

of the RR series under analysis (i.e., a RR segment). Segments with more than 50% of RR intervals detected as artifacts were no further considered in this

AC C

study. Consequently, when a segment was discarded, the corresponding series of the acceleration signal was discarded as well. An example of RR series during a single WAKE and SLEEP epoch is shown in figure 2a. We computed the vector magnitude (VM) of the triaxial acceleration signals

as follows:

VMj =

q

Xj2 + Yj2 + Zj2 ,

(1)

where j was the time index and Xj , Yj , Zj the three components of the acceleration vector. An early study [23] reported that human activity frequency 6

ACCEPTED MANUSCRIPT

Age [y]

Gender

TIB [h]

TST

TWT

SE

1

51

M

5.75

573

117

0.83

2

46

F

6.63

661

134

0.83

3

28

F

8.72

870

177

0.83

4

25

F

7.04

732

113

0.87

5

32

M

5.90

600

108

0.85

6

29

M

6.92

435

396

0.52

7

30

M

8.43

840

171

0.83

8

25

F

8.28

683

311

0.69

9

26

F

10

29

M

11

52

M

12

53

F

13

26

M

14

45

M

15

27

F

16

25

17

25

18

26

Median:

29

(26-45)

SC

M AN U 742

112

0.87

8.08

774

196

0.80

6.82

666

152

0.81

7.73

708

220

0.76

8.53

714

309

0.70

7.46

603

292

0.67

8.81

611

446

0.58

TE D

7.12

M

7.08

699

151

0.82

M

7.19

705

158

0.82

F

7.83

717

223

0.76

7.33

702

174

0.82

(6.93-8.28)

(611-732)

(134-292)

(0.70-0.83)

EP

IQR:

RI PT

No.

AC C

Table 1: Age, gender and sleep statistics for each individual included in the study population. Here, TIB: time in bed; TST: total sleep time, as number of 30-s epochs of SLEEP; TWT: total wake time, as number of 30-s epochs of WAKE; SE: sleep efficiency, computed as the ratio between sleep time and total bed time, i.e., TST/(TST + TWT).

in adults lies below 20 Hz, and almost all of the signal energy is limited to 5 Hz. Moreover, most of the commercially available actigraphs employed in sleep studies uses frequency content between 0.25 and 3 Hz [24]. Therefore, the VM

7

ACCEPTED MANUSCRIPT

0.3

1.6 SLEEP

WAKE

Wrist VM [g]

RR [s]

1.4 1.2 1 0.8

SLEEP

0.2

0.1

0 0

20

40

60

0

20

Time [s]

40

60

Time [s]

SC

0.08 WAKE

SLEEP

0.06 0.04 0.02 0 0

M AN U

Chest VM [g]

RI PT

WAKE

20

40

60

Time [s]

Figure 2: Examples of collected signals, during a single epoch of WAKE, followed by a single

TE D

epoch of SLEEP: (a) RR series; (b) VM of wrist actigraphy; (c) VM of chest actigraphy.

signal was bandpass filtered in the range 0.25 to 3 Hz (3rd order Butterworth; zero-phase forward and reverse filtering) to remove slow gravitational drifts and vibrations not related to subject movements. Filtered vector magnitudes of

EP

wrist and chest triaxial accelerations during wake and sleep (one epoch) for an individual are shown in figure 2b and 2c, respectively.

AC C

2.3. Features extraction

The performance of a classifier depends on how uniquely the objects of a

class can be represented by the selected features. Thus, features extraction is a critical step of any classification problem. Many features have been reported [7, 9, 13] for the classification of SLEEP vs WAKE based on either HRV or actigraphy. Among them, we selected those which proved effective in previous studies [7], while avoiding the evident redundancy and keeping the feature set dimension small. So, we extracted a set of 7 features from RR series and a set of 8

ACCEPTED MANUSCRIPT

4 features from each filtered VM signal. The features were extracted from the

RI PT

beginning to the end of each recording, considering bed time only, on segments of optimal length (as determined in section 2.5). 2.3.1. HRV features

From RR series, we computed three traditional time domain parameters and

one frequency domain index [25], one regularity measure, one scaling feature

SC

and a newly introduced feature, which is sensitive to the effects of non-linearity, non-stationarity, and non-gaussianity in a series [26].

M AN U

The time domain features included average RR (MeanRR ), standard deviation (SDNN) and root mean square of successive differences (RMSSD). In order to consider the inter-subject variability of the heart rate, MeanRR was computed on a transformed RR series. The original RR series was normalized by subtracting its average and dividing by its standard deviation. Since most of the samples belonged to SLEEP (∼ 70 − 80%), the average and standard deviation of RR values were highly weighted by the SLEEP data. In this way, the

TE D

normalized RR values of all subjects were approximately 0 during SLEEP and differed from 0 during WAKE.

In the frequency domain, we computed the power in the low frequency (LF: 0.04 − 0.15 Hz) and high frequency (HF: 0.15 − 0.4 Hz) bands using parametric spectral analysis based on autoregressive models (AR). The ratio LF/HF was

EP

then considered as representative of the sympatho-vagal balance of the autonomic nervous system regulation: a quantity known to be different between

AC C

sleep and wake [5]. We did not directly included the normalized values of LF and HF in the feature set, for their algebraic redundancy (or large correlation) with respect to the LF/HF ratio in sleep studies [27]. The regularity measure used in this study was Sample Entropy (SampEn).

Typically applied to physiological time series, SampEn is assessed by comparing small patterns of length m, constructed from the given series, within a tolerance of mismatch r (i.e., the maximum absolute difference between the corresponding elements of any two patterns is r), and then repeating these comparisons for

9

ACCEPTED MANUSCRIPT

extended patterns of length m + 1. To the best of our knowledge, there is

RI PT

no universal indication about the values of m and r for estimating SampEn on RR series. Here, we have considered m = 1 and r = 0.2 × STD. This

selection of parameters was inspired by our previous studies [26, 7], where this combination was identified as the best one for short series (number of samples ≤ 300, equivalent to about 5 to 6 minutes of HRV series).

SC

The scaling feature was estimated using detrended fluctuation analysis (DFA). DFA provides a scaling exponent which describes the correlation properties of a non-necessarily stationary signal. In this study, we considered only the short

M AN U

term DFA scaling exponent (DFAα1 ), with number of scales n varying from 4 to 11.

The last parameter extracted from RR series was probability of agreement (ProbAgree ) [26]. ProbAgree provides important information about the presence of non-gaussianity, non-stationarity or non-linearity of the statistical process (of which the RR series is a realization). It is quantified by assessing how much the numerical estimation of SampEn (i.e., SampEn of the original series) agrees

TE D

with the SampEn obtained from synthetic series generated by an AR model (previously determined on the original RR series). Here, it was computed by generating a set of 200 synthetic series. The value of ProbAgree ranges from 0.0 to 0.5.

EP

2.3.2. Acceleration features: The most extensively used feature for discriminating WAKE and SLEEP

AC C

from acceleration data is the activity count (AC), which measures the amount of movements (or activity) found in the series [28]. Some researchers directly used this metric with a threshold [29, 28], while others included it in the computation of their parameters [13, 30]. There are three common ways used for computing AC [24]: i) time above threshold; ii) zero-crossing; and iii) digital integration. In here, we considered the zero-crossing technique (using a threshold of 5.2 milli-g applied to the absolute value of the filtered VM), which counts how many times the signal crossed a predefined threshold [30].

10

ACCEPTED MANUSCRIPT

Inspired by the works of [12, 13, 30], here, in addition to AC, we selected three

RI PT

other features: mean (MeanVM ), standard deviation (STDVM ) and maximum value (MaxVM ) of the absolute value of the filtered VM. These four features were extracted from both chest and wrist accelerometry. 2.4. Classification

SC

The manual staging of sleep (performed on the PSG data at the Don Gnocchi Hospital) was considered as gold standard in this study. According to the

American Academy of Sleep Medicine (AASM) [20], an overnight sleep can be

M AN U

divided into epochs (1 epoch = 30 s), and each epoch can be categorized as WAKE, non-rapid eye movement sleep (NREM1, NREM2 and NREM3), and rapid eye movement sleep (REM). It is worth noting that NREM1 is a transition sleep stage, and it is typically either discarded or merged into wake. Here, we followed the second strategy inspired by [31]. Thus, we grouped REM, NREM2 and NREM3 into the SLEEP class, whereas wake and NREM1 into the WAKE class, leading to a binary classification problem.

TE D

We used a support vector machine (SVM) with linear kernel to classify SLEEP vs WAKE using the features described in section 2.3.1 and 2.3.2. Being the SVM’s learning algorithm affected by class numerosity (biasing the classifier towards SLEEP which is the most common state in our dataset), the training was performed using equal proportions of SLEEP and WAKE samples (a “bal-

EP

anced” dataset) by random sampling of segments of optimal length, while testing was assessed using the original proportions.

AC C

In this study, we employed two techniques to assess the performance of the classifier. First, in the preliminary phase, when using only features related to HRV, we used a k-fold cross validation with k = 10. Briefly, SLEEP and WAKE samples from all individuals were divided into 10 equal subsets. One out of ten subsets was kept out to test the classifier, while the other 9 were used to randomly sample a training set with a balanced proportion of SLEEP and WAKE epochs. The classifier was trained on such balanced training set and tested on the fold kept out. This procedure was repeated for each fold, 11

ACCEPTED MANUSCRIPT

iteratively. Second, for the main analyses of the paper, we used a Leave-One

RI PT

subject-Out approach (LOO), in which the test set was composed by features of a single subject, and the training set built using equal proportions of SLEEP and WAKE from the remaining subjects. The procedure was then repeated, excluding iteratively each of the subjects in the dataset.

Although both techniques are valid methodologies for assessing the perfor-

SC

mance of a classifier, in the context of sleep studies the second is preferred as it leads to a proper understanding of how the method would generalize on a new subject. For this reason we employed LOO to assess the performances of

M AN U

the methods we considered in this paper. However, we preliminary used k-fold validation to select the best combination of epochs to use in our classification scheme (see section 2.5). In fact, k-fold validation typically displays a lower variability of the estimates and provided a smoother variation of the performance metrics (which in turn facilitated the selection of the number of epochs). The performance of the classifier was quantified with four parameters: accuracy (Acc), sensitivity (Se), specificity (Sp), and the reliability parameter

TE D

Cohen’s Kappa (K) [32]. Their average and the standard deviations were computed across folds or subjects. Here, Se and Sp refer to the true recognition of SLEEP and WAKE, respectively. 2.5. Segment’s length selection

EP

HRV metrics typically require longer period of time to provide reliable estimate (due to the relatively small number of beats in any 30 s, when compared

AC C

to ACT samples). Therefore, we determined the best combination of nearby epochs that provided the highest specificity using only HRV data. Segments of RR series of different lengths (3 to 15 epochs) were constructed by starting at 1 epoch before, 1 after the current epoch, and increasing the number of surrounding epochs up to 14 (7 epochs before, 7 epochs after). For each RR segment, HRV parameters were computed and used to train the linear SVM. The classification performances were assessed using a k-fold cross validation scheme (see section 2.4), to have a lower variability of the estimates. The label for the 12

ACCEPTED MANUSCRIPT

entire segment (i.e., WAKE or SLEEP) was the one of the current epoch (we

RI PT

did not exclude segments composed of both WAKE and SLEEP epochs, as the one considered here is actually the situation encountered in real applications).

We preferred to maximize specificity given the fact the HRV in this study was

capability of detecting WAKE [16]). 2.6. Correlation analysis and features selection

SC

meant mainly as an addition to actigraphy (the latter having typically a lower

As part of the main goal of this study, we wanted to reduce the number

M AN U

of features involved in the classification problem (to reduce the computational cost, and hence the energy consumption of the device). We performed a twostep procedure. First, we determined which features were correlated between each other. In order to take into account the different proportion of SLEEP and WAKE samples, random sampling was used to determine the Spearman’s correlation coefficient between the features. Briefly, 30 subsets of segments, with equal proportion of SLEEP and WAKE (approximatively 3000 epochs each),

TE D

were extracted randomly from the entire set. Then, the correlation matrix was computed on these 30 subsets and the median correlation was determined for each combination of features. Features were considered correlated for values beyond 0.7 (features moderately correlated did not contemporarily entered the reduced set). Second, we selected the features displaying the larger area un-

EP

der the ROC curve (AUC), while not being correlated between each other. We repeated the same procedure for HRV and ACT features. The AUC was com-

AC C

puted for each feature separately, using the same random sampling algorithm described previously, to avoid bias imposed by the unbalanced proportion of SLEEP and WAKE samples. We tested the SVM classifier in five different situations: i) only chest fea-

tures; ii) only wrist features; iii) only HRV features; iv) chest features + HRV; and v) wrist features + HRV. The first two cases were meant to set initial reference values of performance for SLEEP vs WAKE classification. The same five classification tasks were performed again after reducing the dimension of the 13

ACCEPTED MANUSCRIPT

AUC

Correlation (ρ)

RI PT

HRV Feature 0.63

→SDNN [s]

0.59

RMSSD: 0.85

RMSSD [s]

0.54

SDNN: 0.85

LF/HF

0.51

DFAα1 : 0.93

→SampEn

0.62

ProbAgree

0.51

DFAα1

0.52

SC

→Mean RR

M AN U

LF/HF: 0.93

Accelerometer Feature Chest AC Chest MeanVM [g] Chest STDVM [g]

0.64

STD 0.83, AC 0.92

0.71

Mean 0.83, Max 0.93

0.72

STD 0.93

0.69

Mean 0.83, STD 0.73

Wrist MeanVM [g]

0.73

≥ 0.83 with all wrist features

→ Wrist STDVM [g]

0.74

≥ 0.73 with all wrist features

Wrist MaxVM [g]

0.73

Mean 0.89, STD 0.96

EP

Wrist AC

Mean 0.92

TE D

→ Chest MaxVM [g]

0.61

Table 2: AUC and Spearman’s correlation. The symbol → indicates the features contained

AC C

in the reduced set.

feature set, as described previously in section 2.6. The methods discussed in this study were compared with two published

algorithms. For ACT we considered the algorithm proposed by Sadeh [30], which identifies SLEEP and WAKE based on a linear combination of features derived from the activity counts computed for 11 distinct and successive minutes (of which the current one is in the middle). With respect to HRV, we considered

14

ACCEPTED MANUSCRIPT

the set of features proposed in [16, 6] and used the linear SVM as classifier. It

RI PT

is composed of the average RR values for nine 30-s epochs (8 preceding plus the current one, for a total of 9 features). 2.7. Statistical Analysis

The accuracy of the classifiers were compared using two standard statistical

SC

tests. In the LOO validation scheme, the two classifiers were both trained on the

same 17 subjects and then tested using the data from the subject left out. As a consequence, 18 values of accuracy (and Se, Sp and Cohen’s K) were available for

M AN U

each classifier. First, we employed the McNemar test [33, 34], which compares the actual outcome of the classifiers (and not just the accuracies). The McNemar test was performed after excluding each of the subjects. The final p value was obtained as the median of the 18 p values, as suggested by [35]. Second, we compared the accuracies of the two classifiers using a paired t-test, which has low type II error. Given the fact that several comparisons between classifiers were performed, a Bonferroni correction for repeated comparisons was applied.

TE D

The difference between accuracies was considered statistically significant when p < 0.05 (after Bonferroni correction) in both statistical tests. Specificities were similarly compared using a paired t-test.

EP

3. Results

The proportions of WAKE and SLEEP are naturally unbalanced during a night of sleep of an healthy subject. This is confirmed in our dataset, where

AC C

76.9% of the total epochs belong to SLEEP and only 23.1% to WAKE, as Table 1 shows.

With respect to the analysis described in section 2.5, where all the HRV

features were used to classify SLEEP vs WAKE, the mean accuracies, sensitivity and specificity, for different combinations of the number of epochs considered, are depicted in figure 3. The combination of 7 preceding and 6 following epochs was found as one of the most suitable, with a specificity of 0.63 and an accuracy

15

ACCEPTED MANUSCRIPT

of 0.66 (with K: 0.23). So, the remaining of the study was performed on segments

Accuracy

Se

0.78

0.7 0.69

4 0.68 0.67

2

6

0.76 0.74

4

0.72

2

0.66

4

6

2

4

M AN U

2

SC

Epochs Before

Epochs Before

0.71

6

RI PT

of data encompassing 7 epochs preceding the current one and 6 following it.

Epochs After

0.7 0.68

6

Epochs After

(a)

(b)

0.6

6

0.55

4

TE D

Epochs Before

Sp

2

2

4

0.5

6

Epochs After

EP

(c)

Figure 3: Test accuracy, sensitivity and specificity for each combination of epochs (before and after the current one) as assessed using a 10-fold cross validation (differently than the

AC C

rest of the analysis performed in the paper, where leave-one-subject-out was employed). Only HRV-related features were considered.

3.1. Classification and Statistical Analysis The average performances obtained in classifying SLEEP vs WAKE using

the linear SVM for various sets of features are shown in figure 4 and listed in Table 3. Statistical comparisons, along with the corresponding p-values,

16

ACCEPTED MANUSCRIPT

are reported in Table 4. CACT and WACT provided an equivalent accuracy

RI PT

(0.78 vs 0.77, p > 0.05) but WACT had higher specificity (i.e., correct WAKE detection; 0.50 vs 0.33; p < 0.05). Both had higher accuracy than HRV features alone (p  0.05). 1

SC

0.6

0.2

0 Chest

Wrist

HRV

M AN U

0.4

Chest+HRV

Wrist+HRV

Sadeh94

Sadeh94+Wrist

Lewicke04

Figure 4: Average and standard deviation (across subjects) of accuracy, sensitivity, specificity

TE D

and Cohen’s K obtained using the full set of features.

When comparing CACT and WACT to Sadeh94 (obtained from ACT at wrist), both displayed higher accuracy than the latter in this dataset (p < 0.05). However, CACT had lower capability of detecting WAKE (lower specificity, p < 0.05). The full set of HRV features had the same performance of the 9 lagged

EP

values of mean heart rate, suggested by Lewicke [6], when considering a linear SVM for classification (the McNemar and paired t-test disagree in Table 4 due to the very large inter-subject variability of Lewicke04). These results suggest that the HRV features we considered are comparable in discriminating powers

AC C

Performance

0.8

with other sets proposed in the literature. Also, CACT underestimated WAKE, likely due to the smaller number of movements of the torso compared to the wrist.

Adding HRV to CACT slightly reduced the accuracy (from 0.78 to 0.70,

p < 0.05) but increased the correct detection of WAKE (specificities differ in a statistically significant manner, p < 0.05). However, adding HRV to WACT had

17

Acc Se Sp K

ACCEPTED MANUSCRIPT

Acc

Se

Sp

K

Chest

0.78 ± 0.08

0.89 ± 0.05

0.33 ± 0.16

0.23 ± 0.14

Wrist

0.77 ± 0.08

0.82 ± 0.08

0.50 ± 0.19

0.30 ± 0.18

HRV

0.66 ± 0.07

0.69 ± 0.12

0.53 ± 0.22

0.17 ± 0.15

Chest+HRV

0.70 ± 0.07

0.75 ± 0.12

0.51 ± 0.20

0.21 ± 0.13

Wrist+HRV

0.76 ± 0.08

0.81 ± 0.09

0.54 ± 0.19

0.31 ± 0.17

Sadeh1994

0.70 ± 0.07

0.76 ± 0.13

0.52 ± 0.18

0.27 ± 0.14

Sadeh1994+HRV

0.65 ± 0.09

0.65 ± 0.13

0.58 ± 0.22

0.17 ± 0.16

Lewicke2008

0.55 ± 0.24

0.52 ± 0.38

0.54 ± 0.34

0.08 ± 0.13

Sadeh1994+Lewicke2008

0.63 ± 0.17

0.64 ± 0.24

0.53 ± 0.24

0.15 ± 0.16

Chest+HRV-subset

0.73 ± 0.08

0.81 ± 0.10

0.43 ± 0.18

0.22 ± 0.15

Wrist+HRV-subset

0.78 ± 0.08

0.85 ± 0.07

0.48 ± 0.18

0.30 ± 0.18

M AN U

SC

RI PT

Method

Table 3: Performance metrics (average ± standard deviation across subjects) for the various feature sets and methods considered in the classification of SLEEP vs WAKE.

TE D

no clear effect. This was further confirmed by adding either our HRV parameters (the full set), or those suggested by Lewicke04, to Sadeh94 (based on ACT collected at wrist). The combined methods seemed to suggest a slightly reduced accuracy and an increased specificity, but not in a statistically significant manner (p > 0.05). The addition of HRV to CACT led to performances identical to

EP

those of Sadeh94 (Acc: 0.70, Se: 0.75, Sp: 0:51 and K: 0:21 vs Acc: 0.70, Se: 0.76, Sp: 0.52, K: 0.27), with an analogous capability of detecting WAKE

AC C

(p > 0.05). Interestingly, in this latter case, the addition of HRV compensated the low specificity of ACT at chest with respect to wrist. Regarding the feature selection strategy described in section 2.6, the correla-

tion analysis showed that SDNN was highly correlated with RMSSD (ρ = 0.85),

and LF/HF was also highly correlated with DFAα1 (ρ = 0.93). As reported in Table 2, the three HRV features with the largest AUC were:

MeanRR (AUC: 0.63), SampEn (AUC: 0.62) and SDNN (AUC: 0.59), and they were not correlated between each other. On the other hand, ACT features 18

ACCEPTED MANUSCRIPT

displayed different discriminating power accordingly to the site of recording.

RI PT

For Chest, MaxVM had the largest AUC (0.72) while, for Wrist, STDVM had the largest discriminative power (AUC: 0.74; this was the largest value obtained

among the features considered). Most ACT features were highly correlated (ρ ≥ 0.73) between them. We then built two different reduced sets of features: one

containing HRV plus CACT and the other HRV plus WACT. In particular, they

(MeanRR , SDNN, SampEn and STDVM ). Acc. (McNemar)

Acc. (t-test)

M AN U

Methods compared

SC

were: Chest+HRV (MeanRR , SDNN, SampEn and MaxVM ) and Wrist+HRV

Sp. (t-test)

p = 0.507

p≈1

p = 0.00∗

p < 0.001∗

p = 0.006∗

p = 0.029∗

p < 0.001∗

p = 0.014∗

p≈1

p < 0.001∗

p = 0.019∗

p = 0.005∗

p = 0.002∗

p = 0.026∗

p≈1

p < 0.001

p≈1

p≈1

Chest vs Chest+HRV

p < 0.001∗

p = 0.014∗

p = 0.002∗

Wrist vs Wrist+HRV

p = 0.963

p≈1

p≈1

Chest+HRV vs Sadeh94

p = 0.250

Chest vs Wrist Chest vs HRV Wrist vs HRV Chest vs Sadeh94 Wrist vs Sadeh94

TE D

HRV vs Lewicke04



p≈1

p≈1



p = 0.131

p = 0.961



p < 0.001

p≈1

p≈1

Chest+HRV vs Chest+HRV-subset

p < 0.001∗

p≈1

p = 0.841

Wrist+HRV vs Wrist+HRV-subset

p = 0.297

p = 0.860

p≈1

Sadeh94 vs Sadeh94+HRV

AC C

EP

Sadeh94 vs Lewicke04+Sadeh94

p = 0.011

Table 4: Pairwise statistical comparisons of the performance of the methods considered for the classification of SLEEP vs WAKE. A star (∗) marks statistically different performance (p < 0.05 after Bonferroni correction for repeated comparison).

The use of the reduced sets of features, with the SVM classifier, gave only

slightly reduced performances, which did not differ in statistically significant manner (p > 0.05) with respect to the corresponding full sets of features (for Chest+HRV, Acc: 0.73, Se: 0.81, Sp: 0.43, K: 0.22; for Wrist+HRV, Acc: 0.78, 19

ACCEPTED MANUSCRIPT

Se: 0.85, Sp: 0.48, K: 0.30). Also in this case, the performance achieved for

RI PT

both Chest+HRV and Wrist+HRV were nearly identical as shown in figure 5. 1

SC

0.6

0.4

M AN U

0.2

0 Chest+HRV

Chest+HRV-subset

Wrist+HRV

Wrist+HRV-subset

Figure 5: Average and standard deviation (across subjects) of accuracy, sensitivity, specificity and Cohen’s K obtained using both full set vs subset of features for the combination of heart

4. Discussion

TE D

rate variability and actigraphy features.

While ACT features from both wrist and chest, injected into a linear SVM, provided equivalent accuracy, their capability of detecting SLEEP and WAKE was different. Indeed, CACT showed a higher sensitivity and lower specificity

EP

than those of WACT. Due to the different position of the sensor, mild movements of the wrist are undetected at chest, leading to two opposite situations. During SLEEP, CACT classification is more resistant to spurious movement of

AC C

Performance

0.8

the arms, increasing sensitivity. On the contrary, during WAKE, the lower sensitivity at chest to peripheral movements does not allow a proper recognition and limits specificity. Given these results, in principle, CACT could be suitable for those applications where WAKE is erroneously detected due to pathological conditions that might fool WACT (e.g., nocturnal tremors or sleep myoclonus in Parkinson’s or Alzheimer’s disease, but this hypothesis should be further investigated).

20

Acc Se Sp K

ACCEPTED MANUSCRIPT

The result seems to be in line with what obtained by Lamprecht et al. on

RI PT

infants [18], in which it was found that the addition of CACT to WACT did not produce any relevant classification improvement. Another study in which

a similar position for the actigraph was investigated is the one by Sazonov et al. [12]. They performed sleep classification on infants by placing an actigraph

on the upper part of the diaper and obtained a higher recognition of wake with

SC

respect to our findings (0.41 vs 0.33). However, the accuracy reported therein

was comparable with what we achieved in our adult population with CACT (0.75 vs 0.78). These contrasting results might be due to the nature of the population

M AN U

(infants vs adult), different proportion of sleep samples (0.65 vs 0.77) or to the position of the actigraph that was closer to the center of mass. Once compared to the results obtained by the Sadeh’s algorithm on our dataset, CACT and WACT showed better accuracy. While both also displayed higher sensitivity, CACT proved less capable in classifying correctly wake (lower specificity). The finding seems to suggest that other features based on the acceleration signal, like the ones we selected in this work rather than only the

TE D

activity count, might be suitable for SLEEP vs WAKE classification. A similar conclusion can be derived from [12, 19]. Furthermore, they might allow the use of shorter segments of data (7 minutes in this work against the 11 minutes of the Sadeh’s method).

EP

The performances obtained in this study using ACT were comparable (or only slightly inferior) to those reported by other researchers using linear classifiers [13]. For example, Sadeh’s [30] and Sazanov’s [12] achieved very similar

AC C

results on infants, but using more complex actigraphy-based features and larger amount of data for training the classifier (76.2% and 75.2%, respectively [13]). On the other hand, non-linear classifiers might lead to higher performance for the same classification task [13, 36]. A linear SVM is preferable as it can provide results which generalize better (is less prone to overfitting than non-linear SVM). Also, it is less computationally intensive than SVM with a non-linear kernel. For these reasons, and given the fact that the focus was on the comparison between chest and wrist actigraphy (with and without the addition of 21

ACCEPTED MANUSCRIPT

HRV), in this work, we employed a linear SVM.

RI PT

HRV provided inferior performance with respect to both wrist and chest ACT, but its specificity was comparable to WACT. While this could be accounted, in first instance, to the features we selected, when we tested the HRV

features which proved effective in a similar study [6] (9 consecutive RR mean values, computed on the current and previous 8 epochs) with the linear SVM

SC

classifier on our population, results were not significantly different from ours. This seems to confirm the idea that, at least in the setting we considered, HRVbased features proved less effective for SLEEP/WAKE classification than ACT.

M AN U

Possibly, non-linear classifiers might be necessary to improve the performance of HRV-based sleep assessment, as other previous studies [8, 7] hint. However, when HRV was combined with WACT or CACT, two different situations arose. First, the addition of HRV to CACT led to an increase in specificity, which was now comparable to what obtained with the Sadeh’a algorithm. In practice, the added information brought by HRV, related to the activity of the autonomous nervous system, seems to have compensated for the

TE D

movements undetected at chest during WAKE. Second, the combination of HRV and WACT did not provide statistically significant improvements to the classification performance. This result is not completely in line with the findings of other researchers [37, 38] who showed that the addition of HRV-related features

EP

to wrist actigraphy led to better specificity. Possible reasons for the disagreement might be the size of the populations (very small in [37]), the use of a larger set of HRV features (in [38]), or the different classifiers used. In order

AC C

to excluded the possibility that this result was due to the features employed, we used the linear SVM to classify SLEEP/WAKE based on a set composed of the Sadeh’s score, for actigraphy, and of the 9 averaged values of RR over the current and previous 8 epochs, for HRV, as suggested by [6]. Also this feature set did not show better performance than what obtained by the Sadeh’s algorithm alone. Overall, it seems that the activity at wrist does not benefit on this population of the addition of HRV. As discussed before, the effect of a non-linear classifier in separating the HRV features and improving the overall 22

ACCEPTED MANUSCRIPT

accuracy, needs to be verified further.

RI PT

The feature selection strategy discarded, by construction, 7 out of 11 features. The classification performances obtained using the reduced set of features were very similar to those obtained with the full sets suggesting that the extra features were not adding much to the overall classification problem. Clearly,

it cannot be excluded that the small population (18 subjects) might have lim-

SC

ited the addition of meaningful features whose discriminative power would be detectable only on larger population.

The features which emerged, among the ones computed on ACT, were MaxVM

M AN U

at chest and STDVM at wrist. A previous study [12] also employed the maximum value of acceleration (even if a value for each of the current and 8 previous epochs was computed, instead of a single one as in here) when performing activitybased SLEEP/WAKE identification in infants, and placing the actigraph over the sacral region on the diaper. The result suggests that large movements of the torso region might well characterize WAKE, while the same is not necessary true when the actigraph is at the wrist. With respect to the HRV-related

TE D

features, the ones which were selected by the feature selection procedure were MeanRR , SDNN and SampEn. Among these three, the one most used in similar works [7, 8, 16, 6] is MeanRR . This might be associated with the fact that an augmented vagal control occurring from WAKE to SLEEP produces an incre-

EP

ment in MeanRR (or reduction of HR) [5]. Such vagal increase is detected by SDNN as well, which diminuished during SLEEP (except REM sleep). Here, as in [6], SDNN was found to be more relevant than RMSSD, probably because of

AC C

its sensitivity to variation of total power of the RR series between SLEEP and WAKE. Finally, also SampEn, which already proved effective in discriminating SLEEP and WAKE in previous studies [9, 7], also increased during SLEEP. This study supports the effectiveness of using surrounding epochs to improve

the classification accuracy of SLEEP vs WAKE, independently of the features set used. It reported 7 preceding epochs and 6 following of the current epoch as the best combination (which provided maximum Sp). This finding was in line with previous studies where more than one epoch was found to be necessary for 23

ACCEPTED MANUSCRIPT

a proper identification of SLEEP and WAKE [7, 30]. The number of preceding

RI PT

epochs dictated from HRV nearly matched what reported by Lewicke et al. [6] and Sazonov et al. (7 preceding epochs here and 8 in there). In here, as in Sadeh et al. [30], better performances were obtained when also considering epochs after the current one (6 following epochs here and 11 in there).

The proportion of SLEEP and WAKE epochs in the dataset we considered

SC

was about 8:2. It is well known that, with unbalanced proportion of samples, a classifier could be biased to the class more represented [7, 13]. Training a classifier with the original proportion of samples gives higher average accuracy

M AN U

by increasing the true recognition of SLEEP and reducing the true recognition of WAKE (an example is the different specificity obtained in [13] when training the classifier with the original proportion of data vs a non-biased loss function). Different solutions can be adopted to minimize such problem. For example, one could change the loss function of the classifier [13] or plan a balanced training. Here, similarly to [6], we performed the training of the classifier using a subset of the total dataset, by randomly selecting a balanced proportion of SLEEP

TE D

and WAKE samples for the subjects included in the test set. Then, we used the “natural” proportion of samples for testing. This solution has the drawback of using only part of the data in the training set. On the other hand, redefining the loss function would require the implementation of complex optimization

EP

algorithms.

4.1. Limitations of the study

AC C

There are three main limitations in this study. First, the complex procedure connected to data collection resulted in a small population of subjects enrolled. This forced us to keep the number of features small and to limit the complexity of the classifier. Second, the population was composed only of healthy adults and, therefore, the results might not be equivalent for infants or elderly people as well as for subjects affected by sleep disorders. Third, while the classification results were estimated using a leave-one-subject-out procedure, for the selection of the optimal segment’s length we employed k-fold validation. Given the fact that the 24

ACCEPTED MANUSCRIPT

segment’s length was selected in the same subjects in which it was ultimately

RI PT

applied might somehow have biased the results. However, the possible bias is limited by the fact that only HRV features were used in the segment’s length selection procedure.

5. Conclusion

SC

In this study, we compared the performance of chest and wrist actigraphy for automatic SLEEP/WAKE identification. We further studied if the combination of HRV and activity detection increased the performance obtained from ACT

M AN U

alone. In particular, we were interested in verifying whether the addition of HRV to chest actigraphy was capable of compensating for the possible lower performances of ACT at chest. In fact, at chest, both signals can be easily collected.

Our results showed that wrist and chest ACT performed similarly in terms of accuracy, but CACT had a reduced capability in recognizing WAKE. However,

TE D

When we combined HRV and CACT, the specificity increased and the performances were statistically indistinguishable, on our population, from standard algorithms, like the one proposed by Sadeh [30]. The finding was confirmed also when using a reduced set of features, with reduced computational power. Therefore, our work supports the idea that a single wearable device, includ-

EP

ing HRV and CACT sensors, could be used effectively to monitor sleep patterns in healthy individuals. The result could open interesting possibilities in wearable devices’ design, facilitating long term monitoring of sleep at home.

AC C

In future studies, it would be interesting to test more complex classifiers,

such as artificial neural networks or SVM with different kernels, and to extend the analysis on other populations (for example, on cohorts of subjects suffering from sleep disorders). Furthermore, increasing the population size would justify the investigation of using a larger number of features (in particular for HRV).

25

ACCEPTED MANUSCRIPT

None Declared

Authors contribution

RI PT

Conflicts of interest

All authors made substantial contributions to the submitted manuscript. In

SC

particular, each contributed as follows:

• Study conception and design: Aktaruzzaman, Rivolta, Scarabottolo,

M AN U

Bovi, Ferrarin and Sassi;

• Data acquisition: Rivolta, Karmacharya, Bovi, Ferrarin and Sassi; • Data analysis and interpretation: Aktaruzzaman, Rivolta, Karmacharya, Pugnetti, Garegnani and Sassi;

• Drafting of the manuscript: Aktaruzzaman, Rivolta, Karmacharya,

TE D

Scarabottolo, Pugnetti, Garegnani, Bovi, Ferrarin and Sassi.

Funding source

This study was partially supported by the project: “SMARTA - Sistema di Monitoraggio Ambientale con Rete di sensori e Telemonitoraggio indossabile a

EP

supporto di servizi di salute, prevenzione e sicurezza per l’Active Aging” funded by Regione Lombardia, Milan, Italy.

AC C

Acknowledgements

The authors thank Mr Giovanni Scalera from Fond. Don Carlo Gnocchi

Onlus, Milan, Italy, for his contribution to data collection and preparation of the revised version of the manuscript.

26

ACCEPTED MANUSCRIPT

• AC - activity count • Acc - accuracy • ACT - actigraphy • AR - autoregressive • AUC - area under the curve • CACT - chest actigraphy • Chest AC - chest activity count

M AN U

• Chest MaxVM - maximum value of chest VM • Chest MeanVM - average of chest VM

• Chest STDVM - standard deviation of chest VM • DFA - detrended fluctuation analysis

• DFAα1 - short term DFA scaling exponent • ECG - electrocardiogram

TE D

• EEG - electroencephalogram

SC

• AASM - American Academy of Sleep Medicine

RI PT

Abbreviations

• EMG - electromyogram

• EOG - electrooculogram • Fp2 - right frontopolar • HF - high frequency

EP

• HRV - heart rate variability • K - Cohen’s Kappa • LF - low frequency

AC C

• LF/HF - sympatho-vagal balance • LOO - leave-one subject-out • M2 - right mastoid • Mean RR - average RR • NREM - non-rapid eye movement • ProbAgree - probability of agreement • PSG - polysomnographic study

27

ACCEPTED MANUSCRIPT

• REM - rapid eye movement

RI PT

• RMSSD - root mean square of successive differences • ROC - receiver operating characteristic • RR - inter-beat time interval • SampEn - sample entropy • SE - sleep efficiency

SC

• Se - sensitivity • SDNN - standard deviation of normal-to-normal intervals • Sp - specificity

• TIB - time in bed • TST - total sleep time • TWT - total wake time • VM - vector magnitude • WACT - wrist actigraphy

M AN U

• SVM - support vector machine

• Wrist AC - wrist activity count

TE D

• Wrist MaxVM - maximum value of wrist VM • Wrist MeanVM - average of wrist VM • Wrist STDVM - standard deviation of wrist VM

EP

References

[1] J. M. Mullington, M. Haack, M. Toth, J. M. Serrador, H. K. Meier-Ewert, Cardiovascular, inflammatory, and metabolic consequences of sleep depri-

AC C

vation, Prog Cardiovasc Dis 51 (4) (2009) 294–302. doi:http://dx.doi. org/10.1016/j.pcad.2008.10.003.

[2] W. D. Killgore, Effects of sleep deprivation on cognition, Progr Brain Res 185 (2010) 105–29. doi:10.1016/B978-0-444-53702-7.00007-5.

[3] B. M. Silva, J. J. Rodrigues, I. de la Torre D´ıez, M. L´opez-Coronado, K. Saleem, Mobile-health: A review of current state in 2015, J Biomed Inform 56 (2015) 265–272. doi:10.1016/j.jbi.2015.06.003. 28

ACCEPTED MANUSCRIPT

[4] A. J. Welch, P. C. Richardson, Computer sleep stage classification using

152. doi:10.1016/0013-4694(73)90041-2.

RI PT

heart rate data, Electroencephalogr Clin Neurophysiol 34 (2) (1973) 145–

[5] E. Vanoli, P. B. Adamson, Ba-Lin, G. D. Pinna, R. Lazzara, W. C. Orr,

Heart rate variability during specific sleep stages: A comparison of healthy

1918–1922. doi:10.1161/01.CIR.91.7.1918.

SC

subjects with patients after myocardial infarction, Circulation 91 (7) (1995)

[6] A. Lewicke, E. Sazonov, M. J. Corwin, M. R. Neuman, S. A. C. Schuck-

M AN U

ers, Sleep versus wake classification from heart rate variability using computational intelligence: Consideration of rejection in classification models, IEEE Trans Biomed Eng 55 (1) (2008) 108–118. doi:10.1109/TBME.2007. 900558.

[7] M. Aktaruzzaman, M. Migliorini, M. Tenhunen, S. L. Himanen, A. M. Bianchi, R. Sassi, The addition of entropy-based regularity parameters improves sleep stage classification based on heart rate variability, Med Biol

TE D

Eng Comput 53 (5) (2015) 415–425. doi:10.1007/s11517-015-1249-z. [8] M. Xiao, H. Yan, J. Song, Y. Yang, X. Yang, Sleep stages classification based on heart rate variability and random forest, Biomed Signal Process

EP

Control 8 (6) (2013) 624–633. doi:10.1016/j.bspc.2013.06.001. [9] M. Oswaldo Mendez, M. Matteucci, V. Castronovo, L. Ferini-Strambi, S. Cerutti, A. M. Bianchi, Sleep staging from heart rate variability: time-

AC C

varying spectral features and hidden Markov models, Int J Biomed Eng Technol 3 (2010) 246–263. doi:10.1504/IJBET.2010.032695.

[10] D. F. Kripke, D. J. Mullaney, S. Messin, V. G. Wyborney, Wrist actigraphic measures of sleep and rhythms, Electroencephalogr Clin Neurophysiol 44 (5) (1978) 674–676. doi:10.1016/0013-4694(78)90133-5.

[11] C. P. Pollak, W. W. Tryon, H. Nagaraja, R. Dzwonczyk, How accurately

29

ACCEPTED MANUSCRIPT

does wrist actigraphy identify the states of sleep and wakefulness?, Sleep

RI PT

24 (8) (2001) 957–65. doi:10.1093/sleep/24.8.957. [12] E. Sazonov, N. Sazonova, , S. Schuckers, M. Neuman, CHIME Study Group, Activity-based sleepwake identification in infants, Physiol Meas 25 (5) (2004) 1291–1304. doi:10.1088/0967-3334/25/5/018.

SC

[13] J. Tilmanne, J. Urbain, M. V. Kothare, A. V. Wouwer, S. V. Kothare, Algorithms for sleep-wake identification using actigraphy: a comparative

study and new results, J Sleep Res 18 (1) (2009) 85–98. doi:10.1111/j.

M AN U

1365-2869.2008.00706.x.

[14] A. Kosmadopoulos, C. Sargent, D. Darwent, X. Zhou, G. D. Roach, Alternatives to polysomnography PSG: A validation of wrist actigraphy and a partial-PSG system, Behav Res 46 (4) (2014) 1032–1041. doi: 10.3758/s13428-013-0438-7.

[15] B. Sivertsen, S. Omvik, O. E. Havik, S. Pallesen, B. Bjorvatn, G. H. Nielsen,

TE D

S. Straume, I. H. Nordhus, A comparison of actigraphy and polysomnography in older adults treated for chronic primary insomnia, Sleep 29 (10) (2006) 1353–1358. doi:10.1093/sleep/29.10.1353. [16] A. T. Lewicke, E. S. Sazonov, S. A. C. Schuckers, Sleep-wake identification

EP

in infants: Heart rate variability compared to actigraphy, in: Conf Proc IEEE Eng Med Biol Soc, Vol. 1, IEEE, 2004, pp. 442–445. doi:10.1109/

AC C

IEMBS.2004.1403189. [17] K. So, T. M. Adamson, R. S. C. Horne, The use of actigraphy for assessment of the development of sleep/wake patterns in infants during the first 12 months of life, J Sleep Res 16 (2) (2007) 181–187.

doi:

10.1111/j.1365-2869.2007.00582.x.

[18] M. L. Lamprecht, A. P. Bradley, T. Tran, A. Boynton, P. I. Terrill, Multisite accelerometry for sleep and wake classification in children, Physiol Meas 36 (1) (2015) 133–147. doi:10.1088/0967-3334/36/1/133. 30

ACCEPTED MANUSCRIPT

[19] V. T. van Hees, S. Sabia, K. N. Anderson, S. J. Denton, J. Oliver, M. Catt,

RI PT

J. G. Abell, M. Kivimaki, M. I. Trenell, A. Singh-Manoux, A novel, open access method to assess sleep duration using a wrist-worn accelerometer,

PLoS One 10 (11) (2015) e0142533. doi:10.1371/journal.pone.0142533. [20] C. Iber, S. Ancoli-Israel, A. L. Chesson, S. F. Quan, et al., The AASM

Manual for the Scoring of Sleep and Associated Events: Rules, Terminol-

Medicine, Westcherster, IL, U.S.A., 2007.

SC

ogy, and Technical Specifications, 1st Edition, American Academy of Sleep

M AN U

[21] A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C. Peng, H. E. Stanley, PhysioBank, PhysioToolkit, and PhysioNet components of a new research resource for complex physiologic signals, Circulation 101 (23) (2000) e215– e220. doi:10.1161/01.CIR.101.23.e215.

[22] J. Pan, W. J. Tompkins, A real-time QRS detection algorithm, IEEE Trans

TE D

Biomed Eng 32 (3) (1985) 230–236. doi:10.1109/TBME.1985.325532. [23] D. P. Redmond, F. W. Hegge, Observations on the design and specification of a wrist-worn human activity monitoring system, Behav Res Methods 17 (6) (1985) 659–669. doi:10.3758/BF03200979.

EP

[24] S. Ancoli-Israel, R. Cole, C. Alessi, M. Chambers, W. Moorcroft, C. P. Pollak, The role of actigraphy in the study of sleep and circadian rhythms,

AC C

Sleep 26 (3) (2003) 342–392. doi:10.1093/sleep/26.3.342. [25] Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology, Heart rate variability: standards of measurement, physiological interpretation and clinical use, Circulation 93 (5) (1996) 1043–1065. doi:10.1161/01.CIR.93.5.1043.

[26] M. Aktaruzzaman, R. Sassi, Parametric estimation of sample entropy in heart rate variability analysis, Biomed Signal Process Control 14 (2014) 141–147. doi:10.1016/j.bspc.2014.07.011. 31

ACCEPTED MANUSCRIPT

[27] R. L. Burr, Interpretation of normalized spectral heart rate variability in-

RI PT

dices in sleep research: A critical review, Sleep 30 (7) (2007) 913–919. doi:10.1093/sleep/30.7.913.

[28] J. Paquet, A. Kawinska, J. Carrier, Wake detection capacity of actigraphy

during sleep, Sleep 30 (10) (2007) 1362–1369. doi:10.1093/sleep/30.10.

SC

1362.

[29] R. J. Cole, D. F. Kripke, W. Gruen, D. J. Mullaney, J. C. Gillin, Automatic sleep/wake identification from wrist activity, Sleep 15 (5) (1992) 461–469.

M AN U

doi:10.1093/sleep/15.5.461.

[30] A. Sadeh, K. M. Sharkey, M. A. Carskadon, Activity-based sleep-wake identification: an empirical test of methodological issues, Sleep 17 (3) (1994) 201–207. doi:10.1093/sleep/17.3.201.

[31] J. M. Kortelainen, M. O. Mendez, A. M. Bianchi, M. Matteucci, S. Cerutti, Sleep staging based on signals acquired through bed sensor, IEEE Trans Inf

TE D

Technol Biomed 14 (3) (2010) 776–785. doi:10.1109/TITB.2010.2044797. [32] J. Cohen, A Coefficient of Agreement for Nominal Scales, Educ Psychol Meas 20 (1) (1960) 37–46. doi:10.1177/001316446002000104. [33] S. L. Salzberg, On comparing classifiers: Pitfalls to avoid and a rec-

EP

ommended approach, Data Min Knowl Discov 1 (1997) 317–327. doi: 10.1023/A:1009752403260.

AC C

[34] T. G. Dietterich, Approximate statistical tests for comparing supervised classification learning algorithms, Neural Comput 10 (7) (1998) 1895–1923. doi:10.1162/089976698300017197.

[35] K. Buza, Classification of gene expression data: A hubness-aware semisupervised approach, Comput Meth Programs Biomed 127 (2016) 105–113. doi:{10.1016/j.cmpb.2016.01.016}.

32

ACCEPTED MANUSCRIPT

[36] G. Orellana, C. M. Held, P. A. Estevez, C. A. Perez, S. Reyes, C. Algarin,

RI PT

P. Peirano, A balanced sleep/wakefulness classification method based on actigraphic data in adolescents, in: Conf Proc IEEE Eng Med Biol Soc 2014, IEEE, 2014, pp. 4188–4191. doi:10.1109/EMBC.2014.6944547.

[37] W. Karlen, C. Mattiussi, D. Floreano, Improving actigraph sleep/wake classification with cardio-respiratory signals, in: Conf Proc IEEE Eng Med

SC

Biol Soc 2008, 2008, pp. 5262–5265. doi:10.1109/IEMBS.2008.4650401.

[38] S. Devot, R. Dratwa, E. Naujokat, Sleep/wake detection based on car-

M AN U

diorespiratory signals and actigraphy, in: Conf Proc IEEE Eng Med Biol

AC C

EP

TE D

Soc 2010, 2010, pp. 5089–5092. doi:10.1109/IEMBS.2010.5626208.

33

ACCEPTED MANUSCRIPT

Highlights: Context of the study is automatic sleep vs wake classification with wearable sensors. We tested chest and wrist actigraphy in combination with heart rate variability (HRV).

RI PT

A support vector machine (SVM) was used to automatically distinguish sleep and wake. The addition of HRV to chest actigraphy led to better detection of wake epochs.

AC C

EP

TE D

M AN U

SC

HRV and chest actigraphy can be easily embedded in a single compact wearable sensor.