Multi-DoF continuous estimation for wrist torques using stacked autoencoder

Multi-DoF continuous estimation for wrist torques using stacked autoencoder

Biomedical Signal Processing and Control 57 (2020) 101733 Contents lists available at ScienceDirect Biomedical Signal Processing and Control journal...

2MB Sizes 0 Downloads 67 Views

Biomedical Signal Processing and Control 57 (2020) 101733

Contents lists available at ScienceDirect

Biomedical Signal Processing and Control journal homepage: www.elsevier.com/locate/bspc

Multi-DoF continuous estimation for wrist torques using stacked autoencoder Yang Yu, Chen Chen, Xinjun Sheng ∗ , Xiangyang Zhu State Key Laboratory of Mechanical System and Vibration, School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 200240, PR China

a r t i c l e

i n f o

Article history: Received 7 March 2019 Received in revised form 7 October 2019 Accepted 13 October 2019 Keywords: Human machine interface Simultaneous and proportional control Stacked autoencoder

a b s t r a c t Human machine interface (HMI) based on surface electromyography (sEMG) promises to provide an intuitive and noninvasive way to interact with peripheral equipments, such as prostheses, exoskeletons, and robots. Most recently, advances in machine learning, especially in deep learning algorithms, present the capabilities in constructing complicated mapping functions. In this study, we construct a stacked autoencoder-based deep neural network (SAE-DNN) to continuously estimate multiple degrees-offreedom (DoFs) kinetics of wrist from sEMG signals. During the experiments, high-density sEMG signals and multi-DoF wrist torques were simultaneously acquired under the guidance of a visual feedback system, with eight healthy subjects and an amputee recruited. Moreover, the estimation performance of SAE-DNN was compared with two of commonly used conventional regressors, linear regression (LR) and support vector regression (SVR). As a consequence, the results demonstrate the feasibility of this scheme and significant superiority of SAE-DNN over LR and SVR with higher R2 values across all DoFs (SAE-DNN: 0.829 ± 0.050, LR: 0.757 ± 0.075, SVR: 0.751 ± 0.079). The outcomes of this study provide us with a perspective and a feasible scheme for simultaneous and proportional control. © 2019 Elsevier Ltd. All rights reserved.

1. Introduction The interactions between human and machines can be realized through human machine interface (HMI), in which the motorrelated information extracted from neuromuscular system are utilized to control peripheral equipments. The interfacing can be achieved at the level of brain [1,2], peripheral nerves [3,4], and muscles [5,6]. However, for clinical and commercial applications, myoelectric control based on surface electromyography (sEMG) has received a large amount of attentions for its noninvasiveness, higher accuracy and relative robustness. In the past decades, myoelectric control has been extensively applied in prostheses [7], exoskeleton [8], stroke rehabilitation [9], and robot control [10]. The conventional myoelectric control approaches can be roughly divided into two categories, including direct control and pattern recognition (PR). Direct control [11] involves a pair of antagonistic muscles in responsibility for the movement separately in opposite directions of a single degree of freedom (DoF). A co-contraction strategy is employed to switch various functions between different DoFs, thus resulting in multi-DoF functionality. Despite reliable control is achieved, this kind of control paradigm

∗ Corresponding author. E-mail address: [email protected] (X. Sheng). https://doi.org/10.1016/j.bspc.2019.101733 1746-8094/© 2019 Elsevier Ltd. All rights reserved.

still remains to be cumbersome and nonintuitive on account of sequential switching mode and discrete motions [12]. Hence, there is a giant gap for HMI between reality and ideal conditions in functionality and control smoothness. To solve this problem, PR-based control methods have been proposed and extensively investigated in academic community [13]. Nevertheless, the robustness of PRbased strategies is an intractable issue to be addressed, likely to be influenced by electrode shift, impendence variation, and long-term effectiveness. Furthermore, the control output of this scheme is predefined sequential motion classes, contradictory to the fact that the human’s intuitive control is the coordination of multiple DoFs and proportional. Consequently, to overcome these limitations, research efforts have been devoted to investigating continuous decoding for motor-related multi-DoF information simultaneously from electromyography (EMG) signals, also known as simultaneous and proportional control (SPC). Ortiz-Catalan et al. [14] proposed PR-based simultaneous prosthetic control strategy by constructing particular classifier topologies, while the simultaneous motions are still predefined. As an alternative, regression techniques, such as linear regression (LR) [15], multilayer perceptron (MLP) [16], support vector regression (SVR) [17], and kernel ridge regression (KRR) [18], are employed to construct the complicated mappings between EMG signals and limb kinematics or kinetics. Generally, the kinematics or kinetics are recorded as the target labels to train the regressor combining with EMG signals. An alternative approach to

2

Y. Yu, C. Chen, X. Sheng et al. / Biomedical Signal Processing and Control 57 (2020) 101733

obtain target data is to instruct subjects to accomplish specific tasks with a cue, then making use of cue as labels [19]. Additionally, Jiang et al. [20,21] proposed a DoF-wise nonnegative matrix factorization (NMF) algorithm, in which only DoF-wise activations are needed rather than specific kinematic data. On the basis of an EMG generative model, DoF-wise NMF algorithm factorizes recorded EMG signals into a synergy matrix and a force function matrix. According to the synergy theory, the weights of synergy matrix are determined by spinal cord circuitries whereas force function matrix is related to supraspinal motor commands. Moreover, Lin et al. [22] introduced sparseness constraints to make it possible to extract the basis information from arbitrary contractions for multiple DoFs concurrently. NMF-based approach benefits from being available for low computational quantities and semi-supervision, while it is easy to be affected by the nonlinear factors for its inherent linearities [20], even if transformations in feature space could reduce its influence to some extent [18]. In most recent years, the booming development of artificial intelligence (AI), particularly the advent of deep learning methods, has received remarkable achievements in robotics and image processing [23]. In bioelectrical signal processing field, increasing amounts of studies based on deep learning algorithms have been proposed and validated for EMG data analysis, mainly focusing on PR of EMG signals. Among these investigations, deep learning is regarded as a powerful tool to conduct feature selection and feature learning [24]. Geng et al. [25] recognized gestures by instantaneous sEMG images with convolutional neural network (CNN) and tested its performance in several open-access datasets. Additionally, Muhammad et al. [26] reported that stacked sparse autoencoders outperform linear discriminant analysis (LDA) in multi-day gesture recognition. It is noteworthy that investigations regarding multi-DoF estimation with deep learning algorithms are quite few. Furthermore, a strong correlation between wrist kinematics and encoded features by autoencoder has been demonstrated in [27], which reveals the potential of stacked autoencoders for estimating multi-DoF wrist torques. And more importantly, these methods have an aptitude for the analysis of nonstationary signals and constructing complicated mappings, which is crucial and beneficial for multi-DoF continuous estimation based on sEMG. Consequently, in this paper, we proposed a stacked autoencoderbased deep neural network (SAE-DNN) to continuously estimate the multi-DoF wrist torques and compared the performance with LR and SVR. To the best of our knowledge, it is the first study that validates the feasibility of SAE-DNN to predict multi-DoF wrist torques continuously and simultaneously. The remaining parts of this paper are organized as follows: Section 2 introduces the experimental protocol, data processing and the specific implementation of SAE-DNN; Sequentially, the estimation performance of SAE-DNN, LR and SVR are presented in Section 3; Next, Section 4 discusses the feasibilities, advantages and limitations of SAE-DNN in estimating multi-DoF wrist torques, as well as the future work of this study; Finally, conclusions of this paper are drawn in Section 5.

2. Methods 2.1. Experimental studies 2.1.1. Subjects In the experiment, eight able-bodied subjects (1 female, 7 males, aged from 22 to 26, all right handed) were recruited without reporting any neural muscular disorders, performing a series of specific wrist motions under isometric contraction. Additionally, we also recruited a left-hand transradial amputee (male, 65-year-old, righthand dominance, cosmetic prosthetic user), with amputation for 10

years. Before conducting motion tasks, the subjects were informed with the detailed information of the experiment and the whole content and procedures are conformed with the Declaration of Helsinki. 2.1.2. Experimental protocol We focused on wrist joint as the research object for simultaneous and continuous estimation since it plays an important role in manipulation of human hand in real life and is physiologically simpler with three DoFs (i.e., wrist flexion/extension, radial/ulnar deviation, and pronation/supination), compared to joint with complex structures, such as shoulder. Fig. 1 shows the data acquisition platform in experiment in which sEMG signals and multi-DoF wrist torques were concurrently recorded. During an experiment session, a customized multi-DoF torque measurement device [28] was adopted to record wrist torques in different DoFs with a torque transducer mounted in each DoF (Fig. 1 (B)). The analog torque signals were amplified and sampled by a data acquisition board (PXIe-6363, NI, US) at sampling frequency of 1000 Hz, and then transmitted into a software to control a target in a guided interface (refer to Fig. 1 (A) and (C)). In Fig. 1 (C), the rotation of the target and its shift along horizontal and vertical line correspond to the activation of wrist in pronation/supination (DoF1), flexion/extension (DoF2) and radial/ulnar deviation (DoF3), respectively. Thus, subject can achieve various combination of wrist DoFs through identifying the movement of the target. For sEMG signals, a total number of 192-channel monopolar signals were recorded with an 8*24 high density electrode array (ELSCH064NM3, OT Bioelettronica, Italy, 10 mm inter-electrode distance), and then amplified and sampled by an amplifier (EMGUSB2+, OT Bioelettronica, Italy) at 2048 Hz. A ground grip and a single disposable channel were placed on the wrist and the end of the stump for healthy and amputee subject as reference, respectively. The synchronization of sEMG signals and torque signals was conducted by an Arduino development board through sending a trigger signal when starting to record torques. Specifically, we only utilized 16 channels out of total electrodes for reducing computation complexity even if 192-channel signals were measured. As illustrated in Fig. 2 (A), the red dots represented the channels selected for processing while the white dots were that unused. The relative location of these 16 channels on the forearm in axial and radial direction were depicted in Fig. 2 (B) and (D), respectively. These channels were placed around forearm approximate to uniform distribution and one third of forearm length proximal to elbow for healthy subjects. To receive high quality signals, every channel was filled with conductive paste and the subject’s skin was cleaned with alcohol wipes to remove dead skin. During an experiment session, the subject sat in a chair with his/her forearm placing into the torque measurement device and supported by an armrest at the meanwhile. To imitate various circumstances in realistic motions as much as possible, subjects were instructed to accomplish a series of wrist movements, involving single-DoF activation, and activations combining two or three DoFs. The detailed information of wrist activations are listed in Table 1. The contraction tasks include seven sessions with respect to the activated wrist DoFs. In each session, subjects were instructed to accomplish different tasks for 2 or 5 trials. The duration of each trial was 30 seconds, during which subjects performed corresponded contraction tasks repeatedly, under frequency no more than 2 Hz. At the beginning of the experiment, the maximum torques exerted by the subject in each direction of three DoFs were recorded. Then, the full movement range of the virtual target was set to 80% of the maximum torques, by which the torque signals of a certain subject were normalized. The actual activations of wrist in different DoFs were represented through the movement of target in the guided

Y. Yu, C. Chen, X. Sheng et al. / Biomedical Signal Processing and Control 57 (2020) 101733

3

Fig. 1. Data acquisition platform. (A) The schematic of EMG and torque signals recording system. (B) The customized multi-DoF wrist torque measurement device. (C) A visual guided interface for subjects to indicate torques in three DoFs of wrist. DoF1, DoF2 and DoF3 correspond to pronation/supination, flexion/extension, and radial/ulnar deviation, respectively.

Fig. 2. The electrodes setup of experiment. (A) The selected channels for processing among total electrodes. (B) The location of targeted channels along axial direction. (C) The experimental setup of amputee subject with a mirrored bilateral training strategy. (D) The relative location of selected channels on forearm of healthy participants.

interface, including single-DoF activations as well as combination of multiple DoFs. For the amputee subject, we implemented the experiment in a mirrored bilateral training strategy [29] (refer to Fig. 2 (C)) and only two DoFs (pro/supination and flex/extension) were considered. We recorded the sEMG signals from his amputation side, while wrist torques recorded from contralateral side were considered as targets when implementing symmetric movement.

2.2. Data processing The workflow for signal processing is illustrated in Fig. 3, describing main procedures for training regression model and estimating multi-DoF torques with sEMG signals. Inevitably, interferences were involved into raw signals during acquisition process, e.g., power line interferences, movement artifacts and high frequency noises. Moreover, it is very hard to directly map EMG

4

Y. Yu, C. Chen, X. Sheng et al. / Biomedical Signal Processing and Control 57 (2020) 101733

Table 1 Detailed information of wrist contraction tasks in experiment. Session DoFs 1 2 3 4

5

6

7

DoF1

Task

Description

1

Make the target rotate around the center of the circle, alternating between pronation and supination DoF2 1 Move the target left and right, alternating between flexion and extension DoF3 1 Move the target up and down, alternating between radial and ulnar deviation DoF1 &DoF2 1 Move the target left and right with counter clockwise rotation 2 Move the target left and right with clockwise rotation 2 3 Rotate the target in the left part of horizontal line 2 4 Rotate the target in the right part of horizontal line 2 DoF1 &DoF3 1 Move the target up and down with counter clockwise rotation Move the target up and down with clockwise rotation 2 2 Rotate the target in the upper part of vertical line 2 3 Rotate the target in the lower part of vertical line 2 4 DoF2 &DoF3 1 Move the target on principal diagonal line Move the target on deputy diagonal line 2 2 Move the target in clockwise circle 2 3 Move the target in counter clockwise circle 2 4 DoF1 &DoF2 &DoF3 1 Move the target in counter clockwise circle with counter clockwise rotation 2 Move the target in clockwise circle with clockwise rotation 2 Rotate the target clockwise and counter clockwise in the second quadrant 2 3 Rotate the target clockwise and counter clockwise in the forth quadrant 2 4 5 Rotate the target clockwise and counter clockwise in the first quadrant 2 Rotate the target clockwise and counter clockwise in the third quadrant 2 6

Trials 5 5 5 2

2

2

2

Fig. 3. Schematic of processing for sEMG and torque signals. LR: linear regression, SVR: support vector regression, SAE-DNN: stacked autoencoders based deep neural network.

signals into torques for its non-stationarity, thus we extracted muscle activations in every channel to display information associated with signal intensity. For sEMG signals, firstly, we fed them into a 4th-order Butterworth bandpass filter (20–500 Hz), removing lowfrequency movement artifacts and high frequency noise. Then, a notching comb filter was utilized to eliminate the frequency component (50 Hz and its multiplications) with interferences caused by power frequency. Next, full-wave rectification was conducted and then rectified signals were fed into a lowpass filter with cutoff frequency of 4 Hz to receive muscle activations. For the preprocessing of torque signals, a 4th-order Butterworth lowpass filter (4 Hz) was adopted to reduce high-frequency noise and then interpolation was implemented to match the frequency of sEMG signals. In this study,

three models were utilized to predict multi-DoF torques, involving LR, SVR and SAE-DNN. 2.3. Regression models The objective of regression techniques is to investigate the relationship between input and output, respectively referring to multi-channel sEMG signals and multi-DoF wrist torques. Given that the input of the regression models is X ∈ RM×N and the output is Y ∈ RD×N , where M represents the dimension of sEMG features across all channels, N indicates the number of samples, and D stands for the number of DoFs in wrist kinematics. The estimation of ˆ and the mapping function is f (•). regression model is presented by Y

Y. Yu, C. Chen, X. Sheng et al. / Biomedical Signal Processing and Control 57 (2020) 101733

5

Fig. 4. The structure of an autoencoder and stacked autoencoders based deep neural network. (A) Autoencoder. (B) Stacked autoencoders based deep neural network (SAEDNN). SAE-DNN in this study is composed of five layers, including input layer, feature I layer, feature II layer, full-connected layer, and output layer. The input of neural network is depicted as ‘Envelops’, i.e. the preprocessed sEMG signals representing sEMG intensities.

2.3.1. Linear regression model In linear regression approach, the mapping function f (•) is supposed to be linear:

(KKT) conditions [32]. The detailed information of SVR algorithm are described in [33].

ˆ = WT X + w0 Y

2.3.3. Stacked autoencoder-based DNN The concept of autoencoder was firstly proposed by Rumelhart et al. [34] in 1986, and in the past decades, widely applied in image processing, face recognition, and natural language processing. The typical structure of an autoencoder is shown in Fig. 4 (A), where Q is the input vector, P is the encoded features, and Q is the reconstructed data. Similar to MLP, an autoencoder is a feedforward neural network with a single hidden layer, except for input and output layers. Generally, the high-dimensional input data are encoded by encoder into low-dimensional representations (refer to ‘Code’ in Fig. 4 (A)), while codes are decoded by decoder with the optimization goal that output data are iterated as approximate as possible with respect to input data. In this study, we constructed a SAE-DNN to map EMG intensities into multi-DoF torques. Hinton et al. [35] proposed an effective layer-by-layer training strategy, enabling the training of deep autoencoders to be faster and avoid poor local minima. The training process contained two stages, pretraining and fine-turning. During pretraining, the input data firstly were fed into an autoendcoder, and then the encoded feature in the hidden layer were provided as the input of the next autoencoder. For all autoencoders, the transfer functions of encoder and decoder are logistic sigmoid function and linear transfer function, respectively. After weights being initialized through pretraining, fine-turning was then implemented with back-propagation algorithm, which aims to reduce the discrepancy between output data and target. The structure of the proposed DNN is illustrated in Fig. 4 (B), composed of five layers including an input layer, two hidden layers of autoencoders, an full-connected layer, and an output layer. The input of the network is the processed sEMG intensities. Moreover, the specific number of neurons of these five layers are 16 (or 8 for less channel estimation), 10, 6, 10 and 3 (or 2 for two-DoF estimation), respectively. Additionally, the full-connected layer is responsible for mapping the encoded features into wrist torques.

(1)

where each column of W ∈ RM×D contains weights corresponding to elements in feature vector, w0 is the bias that can compensate ˆ is the estimation of wrist torques. To make for possible offset, and Y it convenient to calculate, we merge W and w0 such that the first column of merged W is w0 . Correspondingly, X becomes a matrix that combines original X with a M by 1 vector whose elements are ones. The least mean square error is employed to evaluate the performance of estimation accuracy of regression model. Therefore, the optimization is conducted with the aim of minimizing the loss function l(w).

2 1  yt − wT xt 2 N

l(w) =

(2)

t=1

The optimized W could be calculated by designating the partial derivative of l(w) with respect to w to be zero, and the result is shown in Eq. (3). For each DoF, we constructed a regression model separately. W = (XXT )

−1

XYT

(3)

2.3.2. Support vector regression Support vector machine (SVM) is a very popular supervised learning algorithm both for classification and regression, introduced firstly by Vapnik and colleagues [30,31]. The fundamental idea of SVR is to map the input data X into a high-dimensional space with a nonlinear transformation for linear estimation purpose. By constructing a loss function and minimizing it through Lagrange multiplier method, the mapping function between input and output can be written as Eq. (4). f (x) =

N  



(˛i − ˛ ˆ i )K(xi , x) + b

(4)

i=1

where K(xi , x) is a kernel function which enables to map lowdimensional data into high-dimensional space without knowing specific transformation. Besides, ˛i and ˛ ˆ i are Lagrange multipliers, and b is the bias constant determined by Karush-Kuhn-Tucker

2.4. Performance evaluation To evaluate the estimation performance of different regression techniques, we leveraged a metric to indicate the estimation errors between predicted and actual torques, referring to the coefficient of determination (R2 ). R2 is widely applied in comparing the similari-

6

Y. Yu, C. Chen, X. Sheng et al. / Biomedical Signal Processing and Control 57 (2020) 101733

ties of two time series, here referring to the estimated torques and real torques. The larger the R2 , the more approximate estimated torques are to real torques. For multi-DoF circumstance, the R2 is the mean value over multiple DoFs. The definition of R2 is depicted in Eq. (5).

R2 = 1 −

n 2 (ˆyi − yi ) i=1 n 2 i=1

¯ (yi − y)

(5)

where yˆ i and yi are ith samples of estimated torques and real torques in specific DoF of wrist, respectively. Moreover, n represents the total number of samples of recorded torques. To test the generalization performance of regression models, we implemented a five-fold cross-validation during the analysis, with 80% of data in each session training the regressor and the rest testing its performance.

2.5. Statistical analysis In the experiment, we considered three factors which might affect the estimation results of multi-DoF wrist torques. The three factors (independent variables) were regression methods (LR, SVR and SAE-DNN), the number of DoFs estimated simultaneously (two DoFs and three DoFs), and the number of channels involved during processing (8 channels and 16 channels). The dependent variable was estimation accuracy, i.e., R2 values of each DoF or the averaged R2 values of estimated DoFs. Prior to significance analysis, all the data were demonstrated to be normal distribution through Kolmogorov-Smirnov test with significance value p > 0.05. A three-way ANOVA was adopted to statistically analyze the effect of the three factors on the estimation accuracy. The significant level was configured to be 0.05, meaning that a p-value less than that value was considered significantly different. According to three-way ANOVA, there was no interactions among the three factors, nor interactions between any two of these factors, while there were respective differences in dependent variable. For regression methods, the groups were pair-wise compared with Bonferroni adjustment. The p-value was adjusted to 0.017 (0.05/3) since there were three methods in the group. For the number of DoFs estimated simultaneously and the number of channels involved during processing, we implemented significant analysis with a two-tailed, paired-samples t-test with the aim of demonstrating the impact of the two factors on estimation accuracy. Additionally, it was noteworthy that the averaged R2 values across multi-DoFs for each subject were utilized when analyzing significant differences (refer to Fig. 5, 7) with regard to overall estimation accuracy, while R2 values of single-DoF were used when comparing significant differences of regression methods in respective DoF (shown in Fig. 5, 7 except the ‘AVE’ column).

Fig. 5. R2 values of three regression techniques of each DoF and the averaged values across all DoFs with three DoFs simultaneously estimated. The blue, red and gray bars represent the estimation results of LR, SVR and SAE-DNN, respectively. Significant difference (p < 0.05) is illustrated above bars of two approaches with asterisk if existing. The left part describes the estimation results of 8-channel circumstance while the right part is that with 16 channels involved.

3.1.1. Performance of 3-DoF simultaneous estimation Fig. 5 illustrates the R2 values of each DoF and the average values over all DoFs where 3 DoFs are simultaneously estimated. Higher R2 values reveal better estimation performance for a specific regression technique. Considering the number of channels involved in processing, the results are divided into two circumstances, i.e., 8 channels in left part and 16 channels in right part of Fig. 5. For 8channel case, the R2 values on average across all DoFs for LR, SVR and SAE-DNN are 0.547 ± 0.047, 0.539 ± 0.046, and 0.664 ± 0.051, respectively. For 16-channel case, the average R2 values for LR, SVR and SAE-DNN are 0.631 ± 0.045, 0.621 ± 0.045, and 0.740 ± 0.065, respectively. With respect to specific DoF, SAE-DNN outperforms LR and SVR in all DoFs regardless of the number of channels (8 channels or 16 channels). According to statistical analysis, there is significant difference (p < 0.05) between SAE-DNN and LR or SVR in all cases, depicted as asterisk in Fig. 5. Among these three DoFs, the estimation performance of flexion/extension is superior compared to other two DoFs, with R2 values 0.756 ± 0.055 for 8-channel case and 0.813 ± 0.059 for 16-channel case. Consequently, the predicted wrist torques of each DoF of Sub 1, under 16-channel, 3-DoF simultaneous estimation, are described in Fig. 6. The R2 values of DoF1, DoF2 and DoF3 are respectively 0.783, 0.868, 0.856 in this case. As shown in Fig. 6, the blue solid and dashed line represent predicted torques of regressors and recorded torques, respectively. Covering all sessions and activation tasks listed in Table 1, time periods along horizontal axis correspond to testing data of different activation tasks. In time period 0–90 s, only one DoF is activated at a time while there are combinations of two-DoF activation during time period 90–234 s. Eventually, the remaining section depicts the estimation results of scenarios in which three DoFs are activated at the same time.

3. Results 3.1. Comparison of estimation performance of different regression models To validate the feasibility and effectiveness of SAE-DNN on estimating multi-DoF wrist torques, we compared the estimation performance of SAE-DNN with a linear approach LR, and a nonlinear method SVR. Additionally, pair-wise comparison was performed considering regression techniques used in the study in order to analyze the significant difference statistically by fixing two other factors (the number of channels and the number of DoFs simultaneously estimated).

3.1.2. Performance of 2-DoF simultaneous estimation With regard to two DoFs (pronation/supination and flexion/extention) simultaneously estimated, we trained and tested the regression models using data only related to these DoFs. Consequently, the R2 values of each DoF and average values across multiple DoFs of able-bodied subjects are shown in Fig. 7. According to the results, it is superior for SAE-DNN to predict wrist torques in comparison with LR and SVR (8 channels: LR: 0.688 ± 0.067, SVR: 0.677 ± 0.074, SAE-DNN: 0.786 ± 0.038, 16 channels: LR: 0.757 ± 0.075, SVR: 0.751 ± 0.079, SAE-DNN: 0.829 ± 0.050). Besides, SAE-DNN has much smaller variance across subjects and DoFs, suggesting that it has much better stability and overall perfor-

Y. Yu, C. Chen, X. Sheng et al. / Biomedical Signal Processing and Control 57 (2020) 101733

7

Fig. 6. The estimation results of SAE-DNN from one subject (Sub 1) under the circumstance where 3 DoFs are simultaneously estimated. The blue solid and red dashed line refer to predicted and recorded torques, respectively. All combinations with respect to different DoFs are illustrated in this figure. For example, during 90 to 135s, DoF1 and DoF2 are activated whereas DoF3 isn’t. The R2 values for data shown of DoF1, DoF2, DoF3 are 0.783, 0.868, 0.856, respectively.

Fig. 7. R2 values of three regression techniques of each DoF and the averaged values across all DoFs with two DoFs (flexion/extension, and pronation/supination) simultaneously estimated. The blue, red and gray bars represent the estimation results of LR, SVR and SAE-DNN, respectively. Significant difference (p < 0.05) is illustrated above bars of two approaches with asterisk if existing. The left part describes the estimation results of 8-channel circumstance while the right part is that with 16 channels.

Fig. 8. Pearson correlation coefficient (R) of three regression techniques of each DoF and the averaged values across all DoFs with DoF1 & DoF2 simultaneously estimated. The blue, red and gray bars represent the estimation results of LR, SVR and SAE-DNN, respectively. The left part describes the estimation results of 8-channel circumstance while the right part is that with 16 channels.

mance for SAE-DNN. What’s more, the statistical analysis indicates that there is significant difference between SAE-DNN and LR or SVR (p < 0.05). For the amputee subject, we calculated the Pearson correlation coefficient (R) of recorded and predicted torques in every single DoF and overall estimation performance across DoF1 and DoF2, as shown in Fig. 8. Similarly, the estimation performance of SAE-DNN outperforms LR and SVR, with R equal to 0.58 and 0.68 for 8-channel and 16-channel settings, respectively. The estimation results of SAE-DNN of Sub 1 are shown in Fig. 9, in which 16 channels are used for processing and DoF1 & DoF2 are simultaneously estimated. The training and testing dataset contains single-DoF activation of DoF1 & DoF2 as well as their combinations. A remarkable estimation result is obtained with R2 values 0.926 and 0.926 for DoF1 and DoF2, respectively.

(flexion/extension) play a much more important role [36]. Thus, it is reasonable and necessary to investigate the estimation performance of SAE-DNN when DoF3 is excluded, for reducing the structural complexity of prosthesis. The comparisons of 2-DoF and 3-DoF simultaneous estimation under 8/16-channel setups with different regression techniques are presented in Fig. 10. Similar to our assumption that it might be easier to predict only two DoFs, the results indicate that among all the regression techniques, 2-DoF estimation shows a better performance than that of 3-DoF estimation. Further, according to statistical analysis, there are significant differences (p < 0.05) between 2-DoF and 3-DoF estimation over different combinations of channels and regression models. Among these situations, SAEDNN based estimation approach using 16 channels performs best with R2 values up to 0.829 ± 0.050 when predicting DoF1 & DoF2 simultaneously.

3.2. Comparisons between 2-DoF and 3-DoF simultaneous estimation

3.3. Effects of the number of involved channels

While DoF3 (radial/ulnar derivation) contributes to the manipulation of prosthesis, DoF1 (pronation/supination) and DoF2

The channels used for predicting wrist torques are distributed uniformly in two circles parallel to each other spatially. The sta-

8

Y. Yu, C. Chen, X. Sheng et al. / Biomedical Signal Processing and Control 57 (2020) 101733

Fig. 9. The estimation results of SAE-DNN from one subject (Sub 1) under the circumstance where only 2 DoFs are simultaneously estimated. The solid and dashed line refer to predicted and recorded torques, respectively. The predicted torques of all combinations of different DoFs are illustrated in this figure. For example, during 60 to 110s, DoF1 and DoF2 are activated. The R2 values for data shown of DoF1, DoF2 and averaged values are 0.926, 0.926, respectively.

Fig. 10. The R2 values of three regression techniques under 8 channels and 16 channels involvement. The blue and red bars represent 2-DoF and 3-DoF simultaneous estimation, and asterisk means significant difference (p < 0.05) between two situations.

Fig. 11. The R2 values of three regression techniques under 2 DoFs and 3 DoFs simultaneously estimated. The blue and red bars represent 8 channels and 16 channels involved in processing and asterisk means significant difference (p < 0.05) between two situations.

bility of data acquisition system and processing time are two factors which would have a great impact on the design of sensors system, especially the number and distribution of sensors. Less sensors could contribute to higher stability and less processing time, which is significant for practical application. Thus, to investigate the effects of the number of involved channels in regression becomes necessary and important. The results of two circumstances, i.e., using all 16 channels and 8 odd channels (refer to serial numbers in Fig. 2 (A)), are illustrated in Fig. 11. Obviously, R2 values of 16-channel setup are much higher than that with 8 channels involved. In addition, there are significant differences of all circumstances, indicating the necessities for involving enough channels of SAE-DNN.

are activated simultaneously to accomplish complex tasks, due to the mechanical connection between different DoFs in wrist torque recording equipment (refer to Fig. 1 (B)). Thereby, forces in a specific DoF can transfer to another DoF inevitably. Since the recorded wrist torques are considered as targets for SAE-DNN, estimation performance is dramatically decreased when DoF3 is involved. This issue can be partially solved by optimizing wrist torques measurement device in order to reduce the coupling of different DoFs in mechanical structure. In clinical application, surface crosstalk is an intractable and inevitable issue. More selective sensor placement on target muscles can be a possible solution to reduce the adverse impact of crosstalk. Nevertheless, more selective sensor placement means higher requirement for data acquisition, which is very difficult in practical usage. The proposed method SAE-DNN is less sensitive to surface crosstalk compared with traditional site-function-wise method since quite impressive estimation results are obtained even though multi-DoF (2 & 3 DoFs) wrist torques are simultaneously estimated.

4. Discussion 4.1. Error analysis The experiment results of proposed approach for estimating multi-DoF wrist torques are encouraging since the averaged R2 value exceeds 0.82 when radial/ulnar deviation (DoF3) is excluded, and is up to 0.74 when all three DoFs are included. However, many factors eliminate further enhancement of estimation performance of proposed method. Among these factors, force translations and surface crosstalk might be two main factors explaining the error sources. Force translation is generated when multiple DoFs

4.2. Comparisons of SAE-DNN based estimator with other regression techniques To validate the feasibility and effectiveness of proposed SAEDNN method, we compared our method with LR and SVR, which have been investigated in many works [15,17,18] regarding to myo-

Y. Yu, C. Chen, X. Sheng et al. / Biomedical Signal Processing and Control 57 (2020) 101733

electric continuous estimation. Most of current available control strategies of commercial prostheses are on the basis of finite state machine, in which EMG signals from a pair of antagonistic muscles are recorded and each muscle provides movement of a DoF in one direction. Strong robustness is one of most distinct characteristics of this control strategy. However, this scheme also shows a giant gap towards intuitive and natural control of human being. On the contrary, continuous estimation at multiple DoFs provides a feasible path towards intuitive and natural control owing to its predicting multi-DoF kinetics meanwhile. Moreover, SAE-DNN shows a superior performance significantly than LR and SVR regardless of involved channels and the number of DoFs estimated simultaneously. Perhaps, the special structure of SAE enables the network to learn more representative features and to reduce nonlinearity, similar to applications in image feature extraction [37]. However, there still remain some limitations for proposed SAE-DNN estimator. Firstly, the structure of SAE-DNN is quite complicated. Since, several parameters, such as the number of neurons in each layer, the type of transfer function, should be selected and optimized, which increases the burden on constructing an excellent network. Secondly, it seems to be prominent that the performance of deep learning algorithm declines with decreasing amount of data (refer to Fig. 11). What’s more, a large training dataset would consume more processing time in comparison with traditional regression techniques, such as LR and SVR. Finally, the regression techniques (LR, SVR and SAE-DNN) used in this study are data-driven black-box models, failing to build relationship with physiological information and to explain the models in an intuitive manner. 4.3. Applications and future work 4.3.1. Applications The applications of this study could be extensive and multifaceted, including clinical practice and potentials in intelligent HMI. For clinical applications, EMG-driven prostheses and exoskeletons might be two of the most widely used scenarios. The motor intentions of amputees could be recognized by mapping sEMG signals into multi-DoF continuous movements, thus reconstructing amputees’ motor functions. Different from PR-based prosthetic control scheme, more intuitive, natural and higher accuracy control for prostheses can be realized by SAE-DNN. In rehabilitation field, there is a great demand for motor function recovery, particularly for patients with neurological or musculoskeletal diseases. Through this method, multi-DoF motor information are extracted which could be provided to control exoskeleton. As for intelligent HMI, this technology might be potentially applied in robot remote control and robot demonstration. Previous studies [10,38,39] reported similar applications and validated its feasibility in robot control. 4.3.2. Future work Currently, the SAE-DNN based estimator has realized an encouraging estimation performance and outperforms the LR and SVR method with significant difference. Even so, many aspects are still remained to be improved. Above all, more experiments on amputees should be conducted in order to further test the performance of SAE-DNN in clinical application, though the estimation performance of the able-bodied and amputee subjects in this study shows the potential. Secondly, while it is reasonable to implement an offline analysis as the first step for developing a new control approach [40], several studies [41,19] show the importance and necessities of conducting an online experiment for evaluating the real performance of a SPC estimator. Therefore, the performance of SAE-DNN in online manner need to be verified in the future work. Thirdly, the movement of each subject in this study is approximate to rhythmic contraction, thus the performance of SAE-DNN

9

in randomized movement should be further investigated. Last but not least, it is essential to get insights into the black-box model behind high estimation performance. One possible solution is to investigate the relation between encoded features by SAE and motor unit (MU) characteristics. The feasibility and possibility of this assumption attribute to high-density EMG acquisition which enables decomposing sEMG signals into MU action potential trains. 5. Conclusion In this study, we proposed a SAE-DNN based scheme for multiDoF continuous estimation for wrist torques with multi-channel sEMG signals, in which SAE and feedforward neural network are responsible for unsupervised feature extraction and mapping features into wrist torques, respectively. Further, the estimation performance of SAE-DNN was compared with LR and SVR, and the results indicate that this scheme has obtained superior estimation performance over these traditional regression techniques. In conclusion, the feasibility and effectiveness of SAE-DNN are demonstrated in estimating multi-DoF wrist torques, with superior performance over two previously adopted regressors. It may pave the way to myoelectric continuous decoding for multi-DoF motor intentions in multifarious applications, including robotics, entertainment, not limited to prosthetics. Acknowledgments This work is supported in part by the China National Key R&D Program (Grant No.2018YFB1307200), the National Natural Science Foundation of China (Grant No.51620105002), and the Science and Technology Comission of Shanghai Municipality (Grant No.18JC1410400). References [1] J.R. Wolpaw, N. Birbaumer, D.J. McFarland, G. Pfurtscheller, T.M. Vaughan, Brain–computer interfaces for communication and control, Clin. Neurophysiol. 113 (6) (2002) 767–791. [2] M.J. Khan, K.-S. Hong, Hybrid EEG-fNIRS-based eight-command decoding for BCI: application to quadcopter control, Front. Neurorobot. 11 (2017) 6. [3] K.-S. Hong, N. Aziz, U. Ghafoor, Motor-commands decoding using peripheral nerve signals: a review, J. Neural Eng. 15 (3) (2018) 031004. [4] J. del Valle, X. Navarro, Interfaces with the peripheral nerve for the control of neuroprostheses, in: Int. Rev. Neurobiol., Vol. 109, Elsevier, 2013, pp. 63–83. [5] K. Englehart, B. Hudgins, A robust, real-time control scheme for multifunction myoelectric control, IEEE Trans, Biomed. Eng. 50 (7) (2003) 848–854. [6] M.A. Oskoei, H. Hu, et al., Support vector machine-based classification scheme for myoelectric control applied to upper limb, IEEE Trans. Biomed. Eng. 55 (8) (2008) 1956–1965. [7] A. Fougner, Ø. Stavdahl, P.J. Kyberd, Y.G. Losier, P.A. Parker, Control of upper limb prostheses: terminology and proportional myoelectric control-a review, IEEE Trans. Neural Syst. Rehabil. Eng. 20 (5) (2012) 663–677. [8] J. Chen, X. Zhang, Y. Cheng, N. Xi, Surface EMG based continuous estimation of human lower limb joint angles by using deep belief networks, Biomed. Signal Proces. Control 40 (2018) 335–342. [9] S. Balasubramanian, E. Garcia-Cossio, N. Birbaumer, E. Burdet, A. Ramos-Murguialday, Is EMG a viable alternative to BCI for detecting movement intention in severe stroke? IEEE Trans. Biomed. Eng. 65 (12) (2018) 2790–2797. [10] M. Ison, C.W. Antuvan, P. Artemiadis, Learning efficient control of robots using myoelectric interfaces, in: Robotics and Automation (ICRA), 2014 IEEE Int. Conf. on, IEEE, 2014, pp. 2880–2885. [11] A.D. Roche, H. Rehbaum, D. Farina, O.C. Aszmann, Prosthetic myoelectric control strategies: a clinical perspective, Curr. Surg. Rep. 2 (3) (2014) 44. [12] A.J. Young, L.H. Smith, E.J. Rouse, L.J. Hargrove, A comparison of the real-time controllability of pattern recognition to conventional myoelectric control for discrete and simultaneous movements, J. Neuroeng. Rehabil. 11 (1) (2014) 5. [13] E. Scheme, K. Englehart, Electromyogram pattern recognition for control of powered upper-limb prostheses: state of the art and challenges for clinical use., J. Rehabil. Res. Dev. 48 (6). [14] M. Ortiz-Catalan, H. Bo, R. Branemark, Real-time and simultaneous control of artificial limbs based on pattern recognition algorithms, IEEE Trans. Neural Syst. Rehabil. Eng. 22 (4) (2014) 756–764. [15] J.M. Hahne, M.A. Schweisfurth, M. Koppe, D. Farina, Simultaneous control of multiple functions of bionic hand prostheses: Performance and robustness in end users, Sci. Robot. 3 (19) (2018), eaat3630.

10

Y. Yu, C. Chen, X. Sheng et al. / Biomedical Signal Processing and Control 57 (2020) 101733

[16] S. Muceli, D. Farina, Simultaneous and proportional estimation of hand kinematics from EMG during mirrored movements at multiple degrees-of-freedom, IEEE Trans. Neural Syst. Rehabil. Eng. 20 (3) (2012) 371–378. [17] A. Ameri, E.N. Kamavuako, E.J. Scheme, K.B. Englehart, P.A. Parker, Support vector regression for improved real-time, simultaneous myoelectric control, IEEE Trans. Neural Syst. Rehabil. Eng. 22 (6) (2014) 1198–1209. [18] J.M. Hahne, F. Biessmann, N. Jiang, H. Rehbaum, D. Farina, F. Meinecke, K.-R. M“uller, L. Parra, Linear and nonlinear regression techniques for simultaneous and proportional myoelectric control, IEEE Trans. Neural Syst. Rehabil. Eng. 22 (2) (2014) 269–279. [19] N. Jiang, I. Vujaklija, H. Rehbaum, B. Graimann, D. Farina, Is accurate mapping of EMG signals on kinematics needed for precise online myoelectric control? IEEE Trans. Neural Syst. Rehabil. Eng. 22 (3) (2014) 549–558. [20] N. Jiang, K.B. Englehart, P.A. Parker, Extracting simultaneous and proportional neural control information for multiple-DoF prostheses from the surface electromyographic signal, IEEE Trans. Biomed. Eng. 56 (4) (2009) 1070–1080. [21] N. Jiang, H. Rehbaum, I. Vujaklija, B. Graimann, D. Farina, Intuitive, online, simultaneous, and proportional myoelectric control over two degrees-of-freedom in upper limb amputees, IEEE Trans. Neural Syst. Rehabil. Eng. 22 (3) (2014) 501–510. [22] C. Lin, B. Wang, N. Jiang, D. Farina, Robust extraction of basis functions for simultaneous and proportional myoelectric control via sparse non-negative matrix factorization, J. Neural Eng. 15 (2) (2018) 026017. [23] G.-Z. Yang, J. Bellingham, P.E. Dupont, P. Fischer, L. Floridi, R. Full, N. Jacobstein, V. Kumar, M. McNutt, R. Merrifield, et al., The grand challenges of Science Robotics, Sci. Robot. 3 (14) (2018), eaar7650. [24] A. Phinyomark, E. Scheme, EMG pattern recognition in the era of big data and deep learning, Big Data Cognit. Comput. 2 (3) (2018) 21. [25] W. Geng, Y. Du, W. Jin, W. Wei, Y. Hu, J. Li, Gesture recognition by instantaneous surface EMG images, Sci. Rep. 6 (2016) 36571. [26] M. Zia ur Rehman, S. Gilani, A. Waris, I. Niazi, G. Slabaugh, D. Farina, E. Kamavuako, Stacked sparse autoencoders for EMG-based classification of hand motions: A comparative multi day analyses between surface and intramuscular EMG, Appl. Sci. 8 (7) (2018) 1126. [27] I. Vujaklija, V. Shalchyan, E.N. Kamavuako, N. Jiang, H.R. Marateb, D. Farina, Online mapping of EMG signals into kinematics by autoencoding, J. Neuroeng. Rehabil. 15 (1) (2018) 21.

[28] L. Pan, Z. Yang, D. Zhang, A structurally decoupled mechanism for measuring wrist torque in three degrees of freedom, Rev. Sci. Instrum. 86 (10) (2015) 104301. [29] J.L. Nielsen, S. Holmgaard, N. Jiang, K.B. Englehart, D. Farina, P.A. Parker, Simultaneous and proportional force estimation for multifunction myoelectric prostheses using mirrored bilateral training, IEEE Trans. Biomed. Eng. 58 (3) (2010) 681–688. [30] C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn. 20 (3) (1995) 273–297. [31] B.E. Boser, I.M. Guyon, V.N. Vapnik, A training algorithm for optimal margin classifiers, in: Proceedings of the fifth annual workshop on Computational learning theory, ACM, 1992, pp. 144–152. [32] H.W. Kuhn, A.W. Tucker, Nonlinear programming, in: Traces and emergence of nonlinear programming, Springer, 2014, pp. 247–258. [33] P.-H. Chen, C.-J. Lin, B. Sch”olkopf, A tutorial on nu-support vector machines, Appl. Stoch. Model. Bus. 21 (2) (2005) 111–136. [34] D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning representations by back-propagating errors, Nature 323 (6088) (1986) 533. [35] G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science 313 (5786) (2006) 504–507. [36] D.J. Atkins, D.C. Heard, W.H. Donovan, Epidemiologic overview of individuals with upper-limb loss and their reported research priorities, JPO: J. Prosthet. Orthot. 8 (1) (1996) 2–11. [37] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P.-A. Manzagol, Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion, J. Mach. Learn. Res. 11 (Dec) (2010) 3371–3408. [38] P.K. Artemiadis, K.J. Kyriakopoulos, EMG-based control of a robot arm using low-dimensional embeddings, IEEE Trans. Robot. 26 (2) (2010) 393–398. [39] P.K. Artemiadis, K.J. Kyriakopoulos, An EMG-based robot control scheme robust to time-varying EMG signal features, IEEE Trans. Inf. Technol. Biomed. 14 (3) (2010) 582–588. [40] T. Kapelner, I. Vujaklija, N. Jiang, F. Negro, O.C. Aszmann, J. Principe, D. Farina, Predicting wrist kinematics from motor unit discharge timings for the control of active prostheses, J. Neuroeng. Rehabil. 16 (1) (2019) 47. [41] I. Vujaklija, A.D. Roche, T. Hasenoehrl, A. Sturma, S. Amsuess, D. Farina, O.C. Aszmann, Translating research on myoelectric control into clinics-are the performance assessment methods adequate? Front. Neurorobot. 11 (2017) 7.