Design and validation of a computer-based sleep-scoring algorithm

Design and validation of a computer-based sleep-scoring algorithm

Journal of Neuroscience Methods 133 (2004) 71–80 Design and validation of a computer-based sleep-scoring algorithm Rhain P. Louis a , James Lee b , R...

176KB Sizes 3 Downloads 78 Views

Journal of Neuroscience Methods 133 (2004) 71–80

Design and validation of a computer-based sleep-scoring algorithm Rhain P. Louis a , James Lee b , Richard Stephenson a,b,∗ a

Department of Physiology, University of Toronto, 1 Kings College Circle, Toronto, Ont., Canada M5S 1A1 b Department of Zoology, University of Toronto, 25 Harbord Street, Toronto, Ont., Canada M5S 3G5 Received 14 May 2003; received in revised form 18 September 2003; accepted 19 September 2003

Abstract A computer-based sleep scoring algorithm was devised for the real time scoring of sleep–wake state in Wistar rats. Electroencephalogram (EEG) amplitude (␮Vrms ) was measured in the following frequency bands: delta (δ; 1.5–6 Hz), theta (Θ; 6–10 Hz), alpha (α; 10.5–15 Hz), beta (β; 22–30 Hz), and gamma (γ; 35–45 Hz). Electromyographic (EMG) signals (␮Vrms ) were recorded from the levator auris longus (neck) muscle, as this yielded a significantly higher algorithm accuracy than the spinodeltoid (shoulder) or temporalis (head) muscle EMGs (ANOVA; P = 0.009). Data were obtained using either tethers (n = 10) or telemetry (n = 4). We developed a simple three-step algorithm that categorizes behavioural state as wake, non-rapid eye movement (NREM) sleep, rapid eye movement (REM) sleep, based on thresholds set during a manually-scored 90-min preliminary recording. Behavioural state was assigned in 5-s epochs. EMG amplitude and ratios of EEG frequency band amplitudes were measured, and compared with empirical thresholds in each animal. STEP 1: EMG amplitude greater than threshold? Yes: “active” wake, no: sleep or “quiet” wake. STEP 2: EEG amplitude ratio (δ × α)/(β × γ) greater than threshold? Yes: NREM, no: REM or “quiet” wake. STEP 3: EEG amplitude ratio Θ2 /(δ × α) greater than threshold? Yes: REM, no: “quiet” wake. The algorithm was validated with one, two and three steps. The overall accuracy in discriminating wake and sleep (NREM and REM combined) using step one alone was found to be 90.1%. Overall accuracy using the first two steps was found to be 87.5% in scoring wake, NREM and REM sleep. When all three steps were used, overall accuracy in scoring wake, NREM and REM sleep was determined to be 87.9%. All accuracies were derived from comparisons with unequivocally-scored epochs from four 90-min recordings as defined by an experienced human rater. The algorithms were as reliable as the agreement between three human scorers (88%). © 2003 Elsevier B.V. All rights reserved. Keywords: Algorithm; Wistar rats; Sleep; Staging; Electroencephalogram; Electromyogram

1. Introduction Computer-based systems have been employed by many investigators for automating the laborious task of scoring sleep–wake state over an extended period of time. Such systems have been used in a variety of species, including humans (Penzel and Conradt, 2000), mice (Van Gelder et al., 1991), dogs (Horner et al., 1995) and rats (Clark and Radulovacki, 1988; Gottesmann et al., 1977; Hamrahi et al., 2001; Itowi et al., 1990; Neckelmann et al., 1994; Robert et al., 1996; Ruigt et al., 1989). The algorithms all exploit well-known characteristics of the electroencephalo∗ Corresponding author. Tel.: +1-416-978-3491; fax: +1-416-978-8532. E-mail address: [email protected] (R. Stephenson).

0165-0270/$ – see front matter © 2003 Elsevier B.V. All rights reserved. doi:10.1016/j.jneumeth.2003.09.025

gram (EEG) and electromyogram (EMG) during different vigilance states (see Robert et al., 1996). During wakefulness, low voltage, high frequency EEG activity predominates, accompanied by a high, yet variable, level of EMG activity. During non-rapid eye movement (NREM) sleep, the EEG is predominantly comprised of high amplitude, low frequency “slow waves” in the delta band (1.5–6 Hz in rats) and sleep spindles in the alpha band (10.5–15 Hz in rats), along with a low to moderate level of EMG activity. During rapid eye movement (REM) sleep, the EEG is comprised of very regular waves with a dominant frequency of 6–10 Hz (theta band in rats), with very low, intermittent, or absent EMG activity. Placement of the EEG electrodes over the frontal cortex facilitates recording of alpha and delta activities and a frontal derivation therefore facilitates the identification

72

R.P. Louis et al. / Journal of Neuroscience Methods 133 (2004) 71–80

of NREM sleep (Schwierin et al., 1999). Theta activity is robust when electrodes are placed over the medial aspect of the parietal cortex (Robert et al., 1996) near the hippocampus (Huber et al., 2000), and many sleep scoring algorithms depend heavily on a strong theta signal in conjunction with EMG, electrooculogram (EOG) and ponto-geniculo-occipital (PGO) wave data for clear identification of REM sleep. However, in many experimental situations the use of instrumentation positioned over the skull may preclude the use of medial electrodes and thereby impair the recording of theta activity, possibly jeopardizing the overall accuracy of a theta-dependent algorithm. Furthermore, it may sometimes be desirable to minimize the overall level of instrumentation, and an algorithm that can be used without relying on theta, EOG and PGO wave activity may be desirable in this situation. We have developed and validated an algorithm for on-line identification of wakefulness, NREM and REM states in rats using a single bipolar EMG electrode and a single bipolar EEG electrode. To maximize the utility and flexibility of the system, the algorithm was designed with three steps. It was validated as a one-step algorithm, which relies solely on EMG; a two-step algorithm, which uses EEG and EMG but does not require theta-band activity; a two-step algorithm, which uses EEG only; and a three-step algorithm, which uses EMG and EEG activity. The one-step algorithm was designed to discriminate wakefulness and sleep (NREM sleep and REM sleep combined), the two-step algorithm discriminates wakefulness, NREM sleep and REM sleep, and the three-step algorithm distinguishes “active” and “quiet” wakefulness, NREM sleep and REM sleep. The algorithms were validated and the reliability and reproducibility of visual scoring and computer scoring were compared. In addition, comparisons were made between recordings obtained from tethered animals and telemetered animals.

2. Methods 2.1. Subjects Fourteen male Wistar rats (Charles River Labs, SaintConstant, PQ, Canada) with a mean weight (±S.E.) of 420.1 ± 32.5 g at the time of surgery were used. Animals were singly housed and maintained on a 12:12 light-dark cycle. Water and standard laboratory rat chow (Lab Diet, PMI Nutrition International, Brentwood, MO) were available ad libitum. 2.2. Surgery All procedures were performed in accordance with the guidelines established by the Canadian Council on Animal Care and were approved by the animal care committee at the University of Toronto. Rats were instrumented with either a tethered recording apparatus (n = 10) or a radio transmitter

(n = 4). The radio transmitter (model TL10M3-F50-EET, Data Sciences International, St. Paul, MN) was implanted into the peritoneal cavity and EEG and EMG electrodes were tunnelled under the skin to the head and neck, respectively. The tethered rats were instrumented with a 9-channel headpiece socket (Ginder Scientific, Ottawa, ON), which was attached via gold-plated pins to multistranded stainless-steel electrodes and secured to the skull with cranioplastic cement (Plastics One, Roanoke, VA). The rats were instrumented under ketamine (80 mg/kg) and xylazine (10 mg/kg) anaesthesia. A single bipolar cortical EEG electrode was implanted in the skull using stainless steel wires connected to miniature stainless steel screws (size 080). A left frontal electrode was implanted 2 mm anterior to bregma and 2 mm lateral to the midline. A right parietal electrode was implanted 4 mm posterior to bregma and 4 mm lateral to the midline. All signals were referenced to a ground electrode implanted over the left parietal cortex 4 mm anterior and 4 mm lateral to bregma. A right frontal anchoring screw was implanted 2 mm anterior to bregma and 2 mm lateral to the midline. The EMG was recorded using bipolar multistranded stainless steel electrode wires embedded into the levator auris longus muscle in the neck. Three rats from the tethered group each had additional bipolar EMG electrodes implanted into the temporalis (head) and spinodeltoid (shoulder) muscles. At the completion of the surgery, all rats were given analgesic (buprenorphine, 0.015 mg/kg) and antibiotic (Penlong; 0.1 ml). Rats were allowed to recover in their home cages for at least 7 days before any recordings were made. 2.3. Tethered recordings Rats were recorded in their home cage (dimensions 44.5 cm × 24.1 cm × 19.0 cm), which was enclosed within a grounded Faraday cage (dimensions 67.3 cm × 63.5 cm × 94 cm). The headpiece socket was secured to a counter-balanced tethered cable with swivelling commutator (Plastics One, Inc., Roanoke, VA), which permitted free movement of the animal. EEG and EMG signals were amplified (2000×; CWE, Inc., Ardmore, PA) and filtered using a high-pass corner frequency of 1 Hz and a low-pass corner frequency of 1000 Hz. 2.4. Telemetered recordings Radio transmitters were calibrated prior to implantation using a 30 Hz, 1 mV sine wave signal generator (CAL-830 Microvolt Calibrator, CWE, Inc.). During recordings, rats were placed inside their home cages, which were situated directly over a radio receiver (PhysioTel RPC-1, Data Sciences International). The signal from the receiver was passed to an adapter (PhysioTel Multiplus Analog Adaptor Model DL-10, Data Sciences International), which demultiplexed and amplified the radio signal.

R.P. Louis et al. / Journal of Neuroscience Methods 133 (2004) 71–80

73

Table 1 Codified sleep scoring rules State

Unequivocal

Equivocal

Wake

Low-amplitude, mixed-frequency EEG activity Sustained EMG activity

Low-amplitude, mixed-frequency EEG activity Transient or absent EMG activity

NREM

High-amplitude, low-frequency EEG activity Low-level EMG activity

High-amplitude, low-frequency EEG activity Transient EMG activity Mixed-amplitude EEG activity Low/moderate-level EMG activity

REM

Sawtooth-pattern EEG activity Flat EMG activity

Sawtooth-pattern EEG activity Transient EMG activity

Transitional Unclassified

Any epochs which include rapid changes of state that occur entirely within the course of a single epoch Epochs that do not correspond to one of the described states, such as movement artifacts or off-scale data

Rules were developed to ensure that epochs were consistently scored by human raters. Unclassified and transitional epochs were excluded from analyses. EMG legend: low-level: a sustained level of baseline activity without spikes. Flat: a flat EMG and/or dominant ECG (when present). Sustained: variable amplitude spikes seen. Transient: one spike in an epoch. EEG legend: high frequency: a dense and “thick” EEG signal. Sawtooth: a prominent low amplitude jagged zig–zag pattern. Low frequency: individual spikes can be discriminated in the trace, long wavelength baseline oscillations greater than or equal to 1 s−1 . Spindles: waxing and waning of the amplitude in mid frequency spikes, time course of 1 or a few seconds.

For both tethered and telemetered rats, the EEG and EMG analog signals were then passed to a Pentium 4 PC running LabVIEW software (National Instruments Corp., Austin, TX) for on-line analysis, Maclab/8 hardware (ADInstruments, Inc., Grand Junction, CO) for visual display and a Vetter model 820 videocassette recorder (A.R. Vetter Co., Rebersburg, PA) for backup recording. The Vetter also recorded a video image of the rat to facilitate a direct correlation between behaviour and electrophysiology. All tethered and telemetered rats were undisturbed for at least 1 h prior to recording, and were not handled in the 12 h prior to the recording. 2.5. Visual scoring The EEG and EMG signals were recorded for visual scoring at 100 samples s−1 . A standardized set of scoring rules (Table 1 and Fig. 1) was used to ensure consistency in scoring across three raters. Data were classified as unequivocal (definite), equivocal (probable), transitional and unclassified epochs, based on the rater’s confidence in the concordance of the data with the aforementioned standardized visual scoring rules. Unequivocal and equivocal epochs were further subdivided into wake, NREM sleep and REM sleep scores. Thus, there were eight possible scores for each epoch.

In order to analyze the accuracy of the algorithm, matrices of concordance were constructed showing the agreement between raters and between the computer and the raters. The analysis used epochs that were unequivocally scored visually as wake, NREM sleep or REM sleep. State-specific accuracy was defined as the number of epochs scored as a given state by both the computer and the human rater, divided by the number of epochs scored as that state by the human rater (×100%). Overall accuracy was determined by taking the mean of the state-specific accuracies. A measure of rater confidence in the scores assigned (“confidence index”) was determined by taking the number of epochs that each rater scored unequivocally and dividing that by the total number of epochs scored (×100%). 2.6. On-line computer analysis On-line data analysis and computer scoring was performed by a purpose-built program written in LabVIEW format. Data were analyzed in 5-s epochs at a sampling speed of 400 samples s−1 . The EEG and EMG signals were subjected to simple amplitude analysis in the time domain. Raw signals were first bandpass filtered using a Chebyshev filter. The EMG was bandpass filtered from 10–100 Hz. The high cut-off was dictated by the transmitters but was

Fig. 1. Examples of epochs that correspond to the states described in the codified sleep scoring rules (see Table 1). Data were obtained from a single rat during the time period ZT2.5–4, but the traces are representative of traces obtained at all times of the day.

74

R.P. Louis et al. / Journal of Neuroscience Methods 133 (2004) 71–80

applied to the tethered animals for consistency. The EEG signal was filtered into the following five frequency bands of interest; delta (δ; 1.5–6 Hz), theta (Θ; 6–10 Hz), alpha (α; 10.5–15 Hz), beta (β; 22–30 Hz), and gamma (γ; 35–45 Hz). The choice of boundaries of the EEG frequency bands was based on preliminary power spectral analyses of EEG activity during wake, NREM sleep and REM sleep and concurs with previous studies in rats (Corsi-Cabrera et al., 2001). Amplitudes (␮Vrms ) of the filtered data (EMG and each EEG band) were then recorded. The EMGrms amplitude was used as the input for step one of the algorithm. The EEG amplitudes (␮Vrms ) for each frequency band were combined into the ratios (δ × α)/(β × γ) and Θ2 /(δ × α), representing the input variables for steps two and three (respectively) of the algorithm (see below). A TTL pulse generated by LabVIEW at the start of each epoch was recorded on a Maclab channel to aid in the temporal synchronization of visual and computer epochs.

2.9. Basis for design of the sleep scoring algorithm The algorithm functions by comparing incoming data with threshold values. The thresholds are derived from preliminary recordings and are specific to each rat. The algorithm was designed as a three-step process (Fig. 3). EMGrms amplitude provided a good discrimination between wakefulness and sleep and, except where noted, was set as the first step. Ratios of EEG frequency band amplitudes were chosen which maximized discrimination between the different vigilance states. Ratios were used

2.7. Preliminary recordings The design of the algorithm was based on preliminary recordings made in six rats. In each rat, over 1000 epochs were recorded continuously during a 90 min period from zeitgeber time (ZT) 2.5 to ZT4 (ZT0 is defined as the time of “lights on”). An additional recording at the same ZT was made in two of the six rats 1 month later. Rats are relatively inactive during this time of day but usually spend some time in each of the three vigilance states (Borbély, 1978). Eleven recordings spanning the light-to-dark (ZT10.5–13.5) and dark-to-light (ZT22.5–1.5) transition periods were made in three rats to test for an effect of circadian time on the EEG and EMG amplitudes and accuracy of the algorithm. The eleven recordings were all scored using thresholds obtained from recordings made from ZT2.5–4. An analysis of different muscle locations was also undertaken to determine the optimal site for EMG implantation. A comparison of state-specific EMG amplitudes in three muscles (in the neck, head and shoulder) was made in fourteen recordings from three rats. 2.8. Validation recordings Eight additional rats were used to validate the algorithm. Four rats were instrumented with transmitters, four with tethers. Two 90-min recordings were made from each rat (>1000 epochs each); the second recording (Day 2) was made at least 2 days after the first one (Day 1). Each tethered Day 1 recording was visually scored by raters 1 (RL), 2 (RS) and 3 (JL). Raters 1 and 2 were both experienced, while rater 3 was a novice. Each rater was blind to the scores of the other raters and the computer. Unmarked Day 1 recordings were rescored by rater 1 two months after the initial scorings to determine the intra-rater variability over time. All tethered Day 2 recordings and all telemetered recordings were scored by rater 1 alone.

Fig. 2. For each step of the algorithm, the data were separately plotted as wakefulness (open circles), NREM sleep (filled triangles) and REM sleep (crosses). The threshold was determined to be the point where the separation between states was maximal, as determined by visual examination. Thresholds are indicated by the vertical lines. Plot A (EMGrms amplitude) illustrates all epochs. Plot B (NREM ratio amplitude) only shows those epochs which were below threshold in Plot A. Plot C only shows those epochs that were below threshold in both Plots A and B. Only the epochs that were scored unequivocally by Rater 1 from a 90-min recording are plotted.

R.P. Louis et al. / Journal of Neuroscience Methods 133 (2004) 71–80

instead of raw amplitudes as this maintained internal consistency between rats. The ratio (δ × α)/(β × γ) provided good discrimination between NREM sleep and the other states, and, except where noted, was set as the second step (NREM-ratio). The REM-ratio Θ2 /(δ × α) was effective in identifying REM sleep and was set as the third step. The choice of these criteria was supported by statistical analyses (see Section 3). 2.10. Threshold determination Raw EEG and EMG amplitude data for each unequivocally-scored epoch from the Day 1 recordings were entered into an Excel spreadsheet (Microsoft Corp., Redmond, WA) along with the corresponding visual scores (wake, NREM sleep and REM sleep). The epochs were sorted by visually scored state and an iterative procedure was used to find the optimal threshold (i.e. maximal discrimination of the target state with minimal erroneous scoring of the other states). This was repeated for each of the three steps in turn (EMGrms , (δ×α)/(β ×γ) and Θ2 /(δ×α)). Sample plots are illustrated in Fig. 2. Values above the threshold in each step were scored as “active” wake (step 1), NREM sleep (step 2) or REM sleep (step 3). Values below threshold were passed on to the subsequent step for further analysis (see Fig. 3). The thresholds determined from the Day 1 recording in each rat were applied to the Day 2 recordings to examine the effectiveness of using the thresholds from one recording on subsequent data and to validate the algorithm. 2.11. Validation protocols Identical data were used in each of the following analyses.

75

2.11.1. One-step algorithm (EMGrms only) EMG amplitude tended to be higher during wakefulness than during sleep (Figs. 1 and 4). In the one-step algorithm (representing the first step in multi-step algorithms), if EMGrms amplitude was greater than threshold, the state was defined as wake, and sub-threshold amplitude was defined as sleep. 2.11.2. Two-step algorithm (EMGrms + NREM-ratio) In step 1, wakefulness was determined as described above (supra-threshold EMGrms amplitude). For epochs with sub-threshold EMGrms amplitude, the “NREM-ratio” (δ × α)/(β × γ) was computed and compared with its corresponding threshold (step 2). Epochs in which the NREM-ratio was above the threshold in step 2 were classified as NREM sleep and the remainder were classified as REM sleep. 2.11.3. Two-step algorithm (EEG only: NREM-ratio + REM-ratio) The first step of the algorithm (EMGrms ) was omitted, and the overall accuracy was measured using only the NREM-ratio (step 1) and the REM-ratio (step 2). This algorithm would be useful in cases where the EMG signal is poor or unavailable. 2.11.4. Three-step algorithm (EMGrms + NREM-ratio + REM-ratio) As before, the first step involved an analysis of EMGrms amplitude. However, in this algorithm the EMGrms amplitude threshold was raised to a higher value so that epochs with a relatively large EMGrms amplitude were defined as “active” wake. Sub-threshold EMGrms amplitudes were

Fig. 3. Flow chart of the decision-making process involved in our algorithm. At each step, a value above threshold leads to a definitive state assignment, while values below threshold are passed on to a successive step for further analysis. A value below threshold in the third step is classified as “wake”, which allows the recovery of waking epochs with relatively low EMGrms amplitudes.

76

R.P. Louis et al. / Journal of Neuroscience Methods 133 (2004) 71–80

Fig. 4. Variables used in the algorithm, EMGrms , NREM-ratio ((δ∗α)/(β∗γ)), REM ratio ((Θ2 )/(δ∗α)), and their values in wake, NREM sleep and REM sleep. Asterisks (∗) denote significant differences from respective values in the other two states (ANOVA; P < 0.05). Values are grand medians (±95th and 5th percentiles) of the median values from seven recordings from seven rats.

passed on to step two. Step two was as described for the two-step algorithm where epochs with NREM-ratio values greater than threshold were classified as NREM and epochs with sub-threshold NREM-ratio values were passed on to the third step. The third step in the algorithm utilized the “REM-ratio” Θ2 /(δ × α). Epochs with a suprathreshold REM-ratio were scored as REM sleep and the remainder were classified as “quiet” wake. The subdivision of wake into “active” and “quiet” categories was subjective so the two were pooled and a quantitative analysis of the sub-states was not attempted. 2.11.5. Three-step algorithm (EMG-first versus NREM-first) As above, except that the order of the first two steps of the three-step algorithm was reversed, setting the NREM-ratio as the first step and EMGrms as the second step (NREM-first algorithm). 2.12. Statistical analyses Parametric data were subjected to one-way ANOVA, and when appropriate, data were further analyzed for multiple comparisons using Tukey’s post hoc test. Non-parametric data were analyzed using Kruskal–Wallis one-way ANOVA on ranks followed by Student–Newman–Keuls post hoc tests to identify pairwise differences. The analysis of the effect of time of day on the scoring accuracy was performed using a nonparametric Mann–Whitney Rank Sum test. Comparisons of the accuracy of the algorithm between data from tethered and telemetered rats were analyzed using two-tailed unpaired Student’s t-tests. A significance level of P < 0.05 was used in all comparisons.

3. Results 3.1. Preliminary recordings 3.1.1. Effect of time of day Recordings made across the light–dark and dark–light transition periods did not differ significantly (P = 0.20) with respect to overall accuracy using the three-step algorithm. 3.1.2. Comparison of muscles for EMG recordings The levator auris longus muscle in the neck yielded a significantly greater overall accuracy than either the spinodeltoid (shoulder) or temporalis muscle (P = 0.009). These data confirmed that the neck muscle provides the best EMG signal for discriminating wake and sleep, and neck EMG electrodes were used in all subsequent studies. 3.1.3. Tethered versus telemetered groups There was no significant difference (P = 0.92) in the overall accuracy using tethers (n = 4 rats) and telemetry (n = 4 rats), so data from the two groups were pooled for all further analyses. 3.2. Reliability of visual scoring 3.2.1. Rater confidence The degree of confidence that each rater had in their assigned scores was estimated using a “confidence index” (percentage of epochs scored unequivocally). Rater 1’s confidence index was 68.7%, while rater 2’s confidence index was 40.7% and rater 3’s confidence index 72.6%. There was no apparent relationship between confidence index and sleep scoring experience, as rater 1 was the most experienced, and

R.P. Louis et al. / Journal of Neuroscience Methods 133 (2004) 71–80

3.2.2. Inter-rater agreement Inter-rater agreement between raters 1 and 2 (both experienced raters) was found to be 94.1% over all epochs scored unequivocally (n = 1412) in one recording from each of four tethered rats. Agreement on visual scoring of all epochs (equivocal and unequivocal) between raters 1 and 2 on the same recordings was 83.4% overall (n = 3499 epochs). When data from rater 3 (novice rater) was considered, agreement with rater 1 was 84.0% over all unequivocal epochs from the same four recordings (n = 2260 epochs), and agreement with rater 2 was 82.8% (n = 1379 epochs). A total of 1141 epochs were scored unanimously and unequivocally by all three raters. Of those epochs, 1004 were scored as the same state by all three raters, leading to an agreement of 88.0% (1004 of 1141 epochs). Comparisons between all three raters are shown in Table 2. 3.2.3. Intra-rater agreement To permit an examination of intra-rater variability, a visual rescoring of four unmarked recordings was performed 2 months after the initial scorings by rater 1. 99.5% (2404 of 2415) of unequivocally scored epochs were classified consistently in both instances. When equivocal data were included, 89.0% (3295 of 3663) of all epochs were scored as the same state in both instances.

The waking EMGrms amplitude was significantly greater than the EMGrms amplitudes in NREM sleep (2.1 times larger) and REM sleep (2.6 times larger). For each EEG frequency band, amplitude in one of the states (wake, NREM sleep, REM sleep) was found to significantly differ from the corresponding value in the other two states (Fig. 5; P < 0.05). However, the magnitude and reliability of the state-specific differences was considerably enhanced by combination of the EEG frequency band amplitudes into ratios (compare Figs. 4 and 5). Table 2 Matrix illustrating concordance between scores defined unequivocally by three raters Rater 3

Wake

NREM

REM

Wake

NREM

REM

Rater 1 Wake NREM REM

433 70 19

0 632 0

3 0 255

433 41 5

27 1513 81

3 16 141

Rater 3 Wake NREM REM

530 57 18

12 562 1

6 66 127

Data from four tethered rats.

*

45 40

WAKE

35

NREM

30

REM

*

25

*

20 15 10

*

*

5 0

Delta

Theta

Alpha

Beta

Gamma

Frequency Band Fig. 5. Plot of the mean ± S.D. of the median values of the relative EEG amplitudes (%) of frequency bands from seven rats. For each epoch, the amplitude of each frequency band was expressed as a percentage of the total (all frequency bands summed). Asterisks (∗) denote states that are significantly different from the other two states (ANOVA; P < 0.05).

The NREM-ratio was found to be statistically greater in NREM sleep than in wake (3.2 times larger) and REM sleep (4.4 times larger), and the REM-ratio was statistically greater in REM sleep than in wake (3.5 times larger) or NREM sleep (4.3 times larger) (all P < 0.05; Fig. 4). 3.4. Validations For each epoch, the sleep–wake state assigned by computer was compared with that of rater 1.

3.3. Basis for design of the sleep scoring algorithm

Rater 2

50

Mean Amplitude (% of Total)

rater 3 the least experienced (scoring experience: rater 1 > rater 2 > rater 3; confidence index: rater 3 > rater 1 > rater 2).

77

3.4.1. One-step algorithm (EMGrms only) Concordance between computer and rater 1 over 6138 unequivocally-scored epochs from the eight rats was 90.1% (Table 3 and Fig. 6). When all epochs were analyzed (i.e. including those assigned an equivocal visual score), overall concordance was reduced to 77.4% (n = 7920). 3.4.2. Two-step algorithm (EMGrms + NREM-ratio) Overall concordance between the computer and rater 1 was 87.5% (n = 6138 unequivocal epochs). The algorithm Table 3 Matrix of concordance between the computer and rater 1 when the one-stage algorithm was used to score sleep–wake state in pooled data from eight rats Computer: one-step algorithm

Rater 1 Wake Sleep Total

Total

Wake

Sleep

975 175

190 4798

1165 4973

1150

4988

6138

Only epochs that were scored unequivocally by the rater were used in the analysis. Thresholds obtained from Day 1 recordings were used to score these Day 2 recordings.

78

R.P. Louis et al. / Journal of Neuroscience Methods 133 (2004) 71–80

Fig. 6. Overall accuracies of the one, two and three step algorithms (EMG-first) in scoring only unequivocally-defined epochs. Data from eight rats.

scored 83.7% of wake epochs, 93.9% of NREM sleep epochs, and 84.8% of REM sleep epochs correctly (Table 4 and Fig. 6). Using all epochs concordance fell to 75.7% (n = 7920). 3.4.3. Two-step algorithm (EEG only: NREM-ratio + REM-ratio) Overall concordance between computer and rater 1 in scoring unequivocally-defined epochs using only the EEG ratios was low, at 74.0% (n = 6138). When all epochs (equivocal + unequivocal) were examined, concordance fell further to 69.4% (n = 7920).

The computer scored 91.8% of wake epochs, 92.5% of NREM sleep epochs, and 79.4% of REM sleep epochs correctly (Table 5 and Fig. 6). When all epochs (equivocal + unequivocal) were analyzed, overall concordance fell to 78.4% (n = 7920).

3.4.4. Three-step algorithm (EMGrms + NREM-ratio + REM-ratio) Overall concordance between the computer and rater 1 for unequivocally scored epochs (n = 6138) was 87.9%.

3.4.5. Three-step algorithm (EMG-first versus NREM-first) There was no significant difference between tethers and telemetry in terms of overall accuracy when the NREM-first version of the algorithm was used (P = 0.08) so the data were pooled. Overall concordance between computer and rater 1 was 87.5% (n = 6138 unequivocal epochs) using the NREM-first algorithm. Concordance fell to 79.3% when all epochs (equivocal + unequivocal) were examined (n = 7920). There was no significant difference between overall accuracies of the NREM-first and the EMG-first versions of the three-step algorithm (P = 0.82).

Table 4 Matrix of concordance between the computer and rater 1 when the two-stage algorithm was used to score sleep–wake state in pooled data from eight rats

Table 5 Matrix of concordance between rater 1 and the computer when the three-stage algorithm was used to score sleep–wake state in pooled data from eight rats

Computer: two-step algorithm Wake Rater 1 Wake NREM REM Total

Total

NREM

REM

975 174 1

39 3942 117

151 82 657

1165 4198 775

1150

4098

890

6138

Only epochs that were scored unequivocally by the rater were used in the analysis. Thresholds obtained from Day 1 recordings were used to score these Day 2 recordings.

Computer: three-step algorithm

Total

Wake

NREM

REM

Rater 1 Wake NREM REM

1070 267 69

35 3882 91

60 49 615

1165 4198 775

Total

1406

4008

724

6138

Only epochs that were scored unequivocally by the rater were used in the analysis. Thresholds obtained from Day 1 recordings were used to score these Day 2 recordings.

R.P. Louis et al. / Journal of Neuroscience Methods 133 (2004) 71–80

4. Discussion We have developed and validated a simple and versatile three-step algorithm that can be used to score sleep–wake state in the Wistar rat. The algorithm can be used with EMG signals alone, or with EMG and EEG together. EMG amplitude provides good discrimination between sleep and wakefulness, whereas an EEG-only two-step algorithm was found to be unsatisfactory. Combined use of EMG and EEG allowed good discrimination between wakefulness, NREM sleep and REM sleep. A computer-based system can be considered to be a good substitute for human scoring when its accuracy approaches the agreement between human raters (Li et al., 2003). In the present study, concordance between the three-step computer algorithm and rater 1 (87.9%) was similar to the concordances among three human raters (88%) when comparisons were made using only unequivocally-defined epochs. It was noted that the concordance between computer and rater 1 always decreased when all data (equivocally and unequivocally scored epochs) were included. However, this cannot be taken to infer a lower accuracy on the part of the computer since the origin of the error (computer or human rater) is unknown. Thus, validation of the computer can only be accomplished when the comparator (visual scoring) has a reasonably high level of reliability (i.e. using only those epochs that were scored unequivocally). Since the three human raters disagreed 12% of the time when scoring unequivocal epochs, the inter-rater agreement of 88% can be considered to be the maximal scoring accuracy that the algorithm can be expected to attain (Li et al., 2003). We therefore conclude that the reliability of the three-step algorithm is similar to that of human scorers, and as such is a viable substitute for real-time human scoring of sleep–wake state in Wistar rats. Our system is simple, and the high degree of accuracy indicates that additional measurements, such as EOG (Gottesmann et al., 1977), hippocampal EEG (Gandolfo et al., 1988; van Luijtelaar and Coenen, 1984; Winson, 1976), or PGO waves (Benington et al., 1994) are not needed for accurate online scoring of the three basic sleep–wake states. The concordance between the visual scoring and computer scoring in this study is similar to that in prior studies, which range from 71% (Neckelmann et al., 1994) to 95% (Karasinski et al., 1994) in studies that used at least two human raters. Of 12 previous studies that utilized more than one rater in the validation analysis, nine showed accuracies of approximately 90%. Some of those accuracies are cited with respect to only the epochs where the visual scores were agreed upon by more than one rater (consensus epochs) (Bergmann et al., 1987; Goeller and Sinton, 1989; Karasinski et al., 1994; Robert et al., 1996; Ruigt et al., 1989; Van Gelder et al., 1991) whereas in other studies a single rater was used (Benington et al., 1994;

79

Gandolfo et al., 1988; Hamrahi et al., 2001; Neckelmann et al., 1994; van Luijtelaar and Coenen, 1984). Differences in the way that scoring accuracy was reported make it difficult to compare the accuracy of the present algorithm with previously published ones. The present algorithm compares well with the stated overall accuracies of five studies that did not use consensus epochs; 71% (Neckelmann et al., 1994), 91% (Benington et al., 1994), 93% (van Luijtelaar and Coenen, 1984) and 94% (Hamrahi et al., 2001). We confirmed that EMGrms amplitude alone would suffice for scoring sleep–wake state. This is dependent on there being a consistent difference in muscle tone between sleeping and waking states. Since EMG amplitudes did not differ consistently between NREM sleep and REM sleep (see Fig. 4), the EMG-only algorithm is unable to distinguish NREM sleep and REM sleep. REM sleep is marked by a lower level of baseline EMG activity, but it is accompanied by phasic muscle twitches. NREM sleep, while a relatively quiescent state, has a higher level of basal muscle tone than that occurring during REM sleep. Hence, the amplitudes of EMG activities, averaged over 5 s epochs, tended to be similar during NREM and REM sleep. Once the initial wake-sleep separation was made, the NREM-ratio ((δ × α)/(β × γ)) further categorized the sleep epochs as NREM sleep or REM sleep. The relatively high level of overall accuracy (87.5%) confirms the utility of the two-step algorithm in situations where theta activity may be compromised, such as experiments leading to lateral hypothalamic damage (Jurkowlaniec et al., 1989). However, it was found that the two-step algorithm incorrectly scored 13.0% (151 of 1165) of wake epochs as REM sleep (Table 4). The three-step algorithm was designed to recover these incorrectly scored wake epochs by exploiting the differences in the EEG frequency spectra between REM sleep and wake. Epochs that were not scored as “active” wake (step 1) or NREM sleep (step 2) were further analyzed, and classified as “quiet” wake or REM sleep. This approach led to a better discrimination of sleep and wake. Only 5.2% (60 of 1165) of wake epochs were incorrectly scored as REM sleep, down from 13% in the two-step algorithm (Table 5). The detection of wake was therefore more accurate using the three-step algorithm (91.8% accuracy) than the two-step algorithm (83.7% accuracy). However, this advantage was offset slightly by a small decrease in the accuracy of REM detection (reduced from 84.8 to 79.4%). Overall accuracy using the three-step approach was 87.9%. In situations where the detection of REM is of the utmost importance, the two-step algorithm allows the scoring of REM without measuring theta-band activity, at the cost of a decreased accuracy in scoring wakefulness. We confirmed the utility of setting the NREM-ratio as the first step of the three-stage algorithm, and relegating

80

R.P. Louis et al. / Journal of Neuroscience Methods 133 (2004) 71–80

EMGrms to the second step. This approach was prompted by our observations during separate sleep-deprivation experiments that the rats would sometimes adapt to the unrelenting deprivation by augmenting the EMG signal to “fool” the computer into scoring sleep as wakefulness. The rats did this by sleeping with pieces of rat chow in their jaws, or adopting a posture that maintained a high level of neck muscle tone. Changing the first step of the algorithm to the NREM ratio solves this problem, as the EEG characteristics of NREM sleep are not subject to manipulation by the animals to the same extent as the EMG. The EMG signal can sometimes be of poor quality, so we examined the efficacy of using only the EEG (NREMand REM-ratio) to score sleep–wake state. We found that this led to a low overall accuracy of 74%, including very poor identification of wake (47%). The incorrectly scored wake epochs were mostly scored as REM sleep (34%), although some were incorrectly scored as NREM sleep (19%). A possible reason for this low accuracy in scoring wakefulness can be attributed to a positive correlation that was observed between theta amplitude and EMG amplitude in unequivocally-defined wake epochs (Pearson product moment correlation; r = 0.215, P < 0.00001, n = 1165). Hence, “active” wake epochs tended to be associated with a supra-threshold REM ratio, and therefore scored as REM sleep. This confirms the need to include an EMG amplitude criterion in the algorithm. The use of transmitters for recording electrophysiological data has its benefits and its drawbacks. Advantages include reduced movement-related artefacts, increased mobility for the animal, and a possible decrease in stress when using telemetry. Drawbacks include limitations in the total amount of data that can be collected due to battery depletion, and the high cost of the equipment. However, overall accuracies using tethers and telemetry were found not to be significantly different. In summary, we have presented a sleep-scoring algorithm which allows the real-time scoring of sleep–wake state using a computer-based analysis of EEG and EMG data in 5-s epochs. The algorithm can be implemented with one, two or three steps, depending on the needs of the study. The accuracy and reliability of the algorithm approaches that of human scorers.

Acknowledgements This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC). The authors thank Dr. Richard Horner for his helpful discussions regarding sleep-scoring algorithms.

References Benington JH, Kodali SK, Heller HC. Scoring transitions to REM sleep in rats based on the EEG phenomena of pre-REM sleep: an improved analysis of sleep structure. Sleep 1994;17:28–36. Bergmann BM, Winter JB, Rosenberg RS, Rechtschaffen A. NREM sleep with low-voltage EEG in the rat. Sleep 1987;10:1–11. Borbély AA. Effects of light on sleep and activity rhythms. Prog Neurobiol 1978;10:1–31. Clark FM, Radulovacki M. An inexpensive sleep-wake state analyzer for the rat. Physiol Behav 1988;43:681–3. Corsi-Cabrera M, Perez-Garci E, Del Rio-Portilla Y, Ugalde E, Guevara MA. EEG bands during wakefulness, slow-wave, and paradoxical sleep as a result of principal component analysis in the rat. Sleep 2001;24:374–80. Gandolfo G, Glin L, Lacoste G, Rodi M, Gottesmann G. Automatic sleep–wake scoring in the rat on microcomputer APPLE II. Int J Biomed Comput 1988;23:83–95. Goeller CJ, Sinton CM. A microcomputer-based sleep stage analyzer. Comput Methods Programs Biomed 1989;29:31–6. Gottesmann C, Kirkham PA, LaCoste G, Rodrigues L, Arnaud C. Automatic analysis of the sleep-waking cycle in the rat recorded by miniature telemetry. Brain Res 1977;132:562–8. Hamrahi H, Chan B, Horner RL. On-line detection of sleep–wake states and application to produce intermittent hypoxia only in sleep in rats. J Appl Physiol 2001;90:2130–40. Horner RL, Brooks D, Kozar LF, Tse S, Phillipson EA. Immediate effects of arousal from sleep on cardiac autonomic outflow in the absence of breathing in dogs. J Appl Physiol 1995;79:151–62. Huber R, Deboer T, Tobler I. Topography of EEG dynamics after sleep deprivation in mice. J Neurophysiol 2000;84:1888–93. Jurkowlaniec E, Trojniar W, Ozorowska T, Tokarski J. Differential effect of the damage to the lateral hypothalamic area on hippocampal theta rhythm during waking and paradoxical sleep. Acta Neurobiol Exp (Warsz) 1989;49:153–69. Karasinski P, Stinus L, Robert C, Limoge A. Real-time sleep–wake scoring in the rat using a single EEG channel. Sleep 1994;17:113–9. Li C, Radulovacki M, Carley DW. An automated pontine-wave detection system. Sleep 2003;26:613–8. Neckelmann D, Olsen OE, Fagerland S, Ursin R. The reliability and functional validity of visual and semiautomatic sleep/wake scoring in the Moll-Wistar rat. Sleep 1994;17:120–31. Penzel T, Conradt R. Computer based sleep recording and analysis. Sleep Med Rev 2000;4:131–48. Robert C, Karasinski P, Natowicz R, Limoge A. Adult rat vigilance states discrimination by artificial neural networks using a single EEG channel. Physiol Behav 1996;59:1051–60. Ruigt GS, Van Proosdij JN, Van Wezenbeek LA. A large scale, high resolution, automated system for rat sleep staging. II. Validation and application. Electroencephalogr Clin Neurophysiol 1989;73:64–71. Schwierin B, Achermann P, Deboer T, Oleksenko A, Borbely AA, Tobler I. Regional differences in the dynamics of the cortical EEG in the rat after sleep deprivation. Clin Neurophysiol 1999;10:869–75. Van Gelder RN, Edgar DM, Dement WC. Real-time automated sleep scoring: validation of a microcomputer-based system for mice. Sleep 1991;14:48–55. van Luijtelaar EL, Coenen AM. An EEG averaging technique for automated sleep–wake stage identification in the rat. Physiol Behav 1984;33:837–41. Winson J. A simple sleep stage detector for the rat. Electroencephalogr Clin Neurophysiol 1976;41:179–82.