Multi-sensor fusion for body sensor network in medical human–robot interaction scenario

Multi-sensor fusion for body sensor network in medical human–robot interaction scenario

Information Fusion 57 (2020) 15–26 Contents lists available at ScienceDirect Information Fusion journal homepage: www.elsevier.com/locate/inffus Mu...

3MB Sizes 1 Downloads 37 Views

Information Fusion 57 (2020) 15–26

Contents lists available at ScienceDirect

Information Fusion journal homepage: www.elsevier.com/locate/inffus

Multi-sensor fusion for body sensor network in medical human–robot interaction scenario☆ Kai Lin a,∗, Yihui Li a, Jinchuan Sun b, Dongsheng Zhou c, Qiang Zhang a a

School of Computer Science and Technology, Dalian University of Technology, Dalian, China Lanzhou Institute of Physics, China Academy of Space Technology, Lanzhou, China c Key Laboratory of Advanced Design and Intelligent Computing(Dalian University), Ministry of Education, Dalian, China b

a r t i c l e

i n f o

Keywords: Body sensor network Multi-sensor fusion Medical human–robot interaction Neural network Fusion decision

a b s t r a c t With the development of sensor and communication technologies, body sensor networks(BSNs) have become an indispensable part of smart medical services by monitoring the real-time state of users. Due to introducing of smart medical robots, BSNs are not related to users, but also responsible for data acquisition and multi-sensor fusion in medical human–robot interaction scenarios. In this paper, a hybrid body sensor network architecture based on multi-sensor fusion(HBMF) is designed to support the most advanced smart medical services, which combines various sensor, communication, robot, and data processing technologies. The infrastructure and system functions are described in detail and compared with other architectures. Especially, A multi-sensor fusion method based on interpretable neural network(MFIN) for BSNs in medical human–robot interaction scenario is designed and analyzed to improve the performance of fusion decision-making. Compared with the current multi-sensor fusion methods, our design guarantees both the flexibility and reliability of the service in the medical human–robot interaction scenario.

1. Introduction In recent years, body sensor networks(BSNs) has been gradually promoted in the field of intelligent healthcare and medical assistance. Depending on sensoring and communication technologies, BSNs can reliably collect a wide variety of physiological, psychological and activity information from users to make offline diagnosis or provide medical advice [1,2]. Although it can alleviate the shortage of medical resources, BSNs cannot provide real-time feedback online services for users. Fortunately, with the rapid development of intelligent robots, the mode of smart medical services has changed from “user-doctor” to “user-robotdoctor”. The introducing intelligent robots perform simple diagnosis and treatment, which handles emergencies more quickly and reduces the workload of medical staff. This requires BSNs to collect the real-time state of robots in addition to the state of users. In medical human–robot interaction scenarios, it needs to comprehensively consider the interaction between human, robot and the environment. Before performing a task, the robot must accurately identify surrounding environment and the state of the served user. Therefore, multi-sensor fusion is used to obtain the real-time situation of humanrobot-environment to ensure the efficiency and safety of medical ser-

vices. It is widely used in BSNs and needs to be redesigned for medical human–robot interaction scenarios. Due to the cost and technical constraints, sensors tend to have low acquisition accuracy and are susceptible to interference. Sensors data also has the characteristics of multi-source heterogeneity, which increases the difficulty of understanding the perceptual data. With the increasing complexity of medical applications, multi-sensor fusion becomes a nontrivial task that directly impact performance of the activity monitoring application[3]. Adam et al. [4] argued how the various processes of molecular interaction and fusion in the biological immune system can produce analogous behavior for the using of data in an artificial context, which provides a biologically inspired approach for multi-sensor fusion. Wen et al. [5] found that sensors exist latent structure influence mode in multi-sensor fusion. Carol et al. [6] proposed a biosensor data management framework including data collection and decision, which optimizes data transmission effectively and reduces the amount of collected data without destroying data integrity. Liu et al [7] designed a multi-data sensor fusion system combined with storm architecture to solve data loss problem in large-scale data processing. These studies have promoted the development of multi-sensor fusion in BSNs, but not focus on people and the environment while ignoring the factors of the robot.



Fully documented templates are available in the elsarticle package on CTAN. Corresponding author. E-mail addresses: [email protected] (K. Lin), [email protected] (Y. Li), [email protected] (J. Sun), [email protected] (D. Zhou), [email protected] (Q. Zhang). ∗

https://doi.org/10.1016/j.inffus.2019.11.001 Received 14 March 2019; Received in revised form 12 October 2019; Accepted 9 November 2019 Available online 11 November 2019 1566-2535/© 2019 Elsevier B.V. All rights reserved.

K. Lin, Y. Li and J. Sun et al.

Information Fusion 57 (2020) 15–26

In this paper, we focus on multi-sensor fusion for BSNs in medical human–robot interaction scenario. We first propose a hybrid medical human–robot interaction architecture for supporting stable, safe and efficient medical services. Then multi-sensor fusion for medical human– robot interaction is designed to meet the requirements of real-time healthcare based on three aspects: data correction, scene reconstruction, and fusion decision. The main contributions of this paper are summarized as follows. 1)We propose a hybrid body sensor network architecture based on multi-sensor fusion(HBMF), that combines current advanced human– robot interaction technologies and supports the most advanced intelligent medical services. 2)We use data correction, scene reconstruction, and fusion decision to design a unique multi-sensor fusion method for meeting the requirements of healthcare applications with human–robot interaction. 3)We compare our design with related works. The results reflect the advantages of our design on efficiency and safety in healthcare application with human–robot interaction. The rest of paper is organized as follows: Section 2 presents some related works. Section 3 describes the infrastructure of HBMF architecture. A multi-sensor fusion framework for medical human–robot interaction is proposed in Section 4. Section 5 proposes a multi-sensor fusion method based on interpretable neural network. The functional comparison between several systems has been given in 6. Section 7 concludes the paper.

neural network to realize the classification and recognition of human motion. Francesco et al. [19] developed a computational model of action intention understanding, which uses motor prediction to transform the action intention understanding into the process of active inferential and hypothesis testing. Chen et al. [20] proposed a label-less learning for emotion cognition (LLEC) to achieve the utilization of a large amount of unlabeled data. Harish et al. [21] presented an adaptiveneural-intention estimator to reason the motion intention of human upper limb movement. Elisabeta et al. [22] designed a motion and emotion recognition system for robotic assisted treatment of autistic children. Ma et al. [23] proposed a deep weighted fusion method for audio-visual emotion recognition, which improved the accuracy of natural language understanding. Chen et al. [24] designed a wearable affective robot from perspective of hardware and algorithms. These studies enable robots to provide services to users in medical environment. 2.2. Body sensor networks BSNs are responsible for collecting the real-time state of users. Ding et al. [25] proposed a body sensor network which based on tonoarteriography(TAG) for unobtrusive blood pressure measurement. Carlo et al. [26] developed a wireless unobtrusive monitoring system for continuous measurement of user’s temperature. Hou et al. [27] presented a system for human gait analysis based on BSNs to reflect the body’s physiological functions, mental state, and physical state. Yeh et al. [28] introduced an IoT-based healthcare system which uses BSNs to simultaneously achieve system efficiency and robustness. Wang et al. [29] presented a quantized compressed sensing(QCS) architecture to reduce the energy consumption of communication in BSNs, and proposed a rapid QCS(rapQCS) algorithm to combat the computational complexity of the configuration procedure in quantized compressed sensing architecture. Zhou et al. [30] proposed a mathematical optimization problem which commonly considers network topology design and cross-layer optimization in BSNs. Sasikala et al. [31] designed a routing protocol based on security aware trusted cluster to reduce the information misfortune in BSNs systems. Zhang et el. [32] proposed a random numbers generation method based on electromyogram to secure the data acquired from BSNs for rehabilitation. Shoaib et al. [33] designed a method of representing IMU data with deep neural networks to remove motion artifacts in free-mode BSNs. Chen et al. [34] proposed a medical AI framework based on data width evolution and self-learning for skin disease recognition. Although these studies have made progress of the development of BSNs, their objects are limited to humans.

2. Related work Intelligent robots are introduced to implement smart medical services with BSNs. The related researches can be classified into three categories: medical human–robot interaction, body sensor networks, and multi-sensor fusion. 2.1. Medical human–robot interaction Human–robot interaction is a developed field based on artificial intelligence, robotics, natural language understanding, and social sciences. In resent years, the perception of human–robot interaction in medical scenarios can be roughly divided into two aspects: human–robot collaborative environment perception and human intention perception. For human–robot collaborative environment perception, Kuehn et al.[8] developed the concept of artificial Robot Nervous System(aRNS), which aims to unify different perception modes for allowing robots to react to perceived stimulus like humans. Zaraki et al. [9] presented a robotic social perception system that enables robots to achieve true perception of different surrounding environments. Lin et al. [10] proposed a humanrobot-environment interactive reasoning mechanism and designed an object sorting robot system. Barbagallo et al. [11] developed a human– robot collaboration solution based on Kinect that combines body detection and voice commands to establish a safe moving space for the robotic arm. Truong et al. [12] presented a proactive social motion model that enables mobile service robots to perceive complex dynamic environments. Lin et al. [15] proposed a localization method (LNM) based on neighbor relative RSS (NR-RSS) and Markov-chain prediction algorithm for precise positioning in smart buildings. Rezaee et al. [13] designed a robot modeled by electric charge and proposed an obstacle avoidance technology based on behavioral structure. In terms of human intention perception, Ji et al. [14] designed a method of recognizing human motion based on three-dimensional convolutional neural networks, which realizes the recognition of human motion in surveillance video. Chen et al. [16] developed a cognitive information measurement theory for measuring dynamic information based on the mailbox principle. Liang et al. [17] proposed a locality-constrained affine subspace coding method to feature code on depth map, and realized the human motion recognition based on single depth feature. Du et al. [18] presented a human skeleton data recognition method based on hierarchical recurrent

2.3. Multi-sensor fusion Multi-sensor fusion aims to fuse sensory data to achieve more accurate and comprehensive perception. Attiq et al. [35] proposed a multisensor image fusion strategy based on genetic algorithm. Fang et al. [36] presented a decision-making algorithm for uncertain fusion based on grey relation and DS evidence theory, which can solve the uncertainty problems caused by the inconsistency of sensors and complex monitoring environment. Ammar et al. [37] proposed a track-to-track fusion(T2TF) algorithm based on information filter framework to solve issues such as the correlation of the estimates, the transmission shortcomings, and the high complexity cost. Lin et al. [38] proposed an AIdriven data-analytics-based spectrum allocation(ADASA) algorithm to analyze high-dimensional data and improve the spectrum utilization in heterogeneous wireless networks. Shi et al. [39] designed an electronic health system based on multi-sensor fusion algorithms, which has good classification accuracy. Zeng et al. [40] developed a multi-sensor fusion algorithm based on factor graph to deal with complex data provided by different sensors asynchronously or non-linear output signals. Alberto et al. [41] proposed multi-sensor fusion algorithm based on adaptive fingerprint for accurate indoor tracking. Wei et al. [42] presented an algorithm based on the weighted sum of sensor outputs, which improves 16

K. Lin, Y. Li and J. Sun et al.

Information Fusion 57 (2020) 15–26

Fig. 1. Architecture of HBMF system.

the quality of fusion with high-conflict sensors. Wang et al. [43] proposed a multi-sensor image fusion algorithm based on multi-resolution analysis, which uses curve-let transforms to integrate the information about visible and infrared images. Chen et al. [44] proposed a Edge Cognitive Computing(ECC) architecture that fuse cognitive computing and edge computing for providing dynamic and elastic storage and computing services. Lin et al. [45] designed an overlapping and hierarchical social clustering model(OHSC) for big data analysis based on the social relationship. Gallego et al. [46] introduced two fusion models based on scalability discussion and MapReduce experiments. In this paper, we analyze several classic fusion algorithms and introduce deep learning for feature extraction to improve the performance of multi-sensor fusion.

work to meet the data transmission requirements of large-scale systems, such as satellites, 4G/5G base stations, gateways, etc. It is responsible for transmitting original sensory data from front-end devices in the sensor-integrated terminal layer to the devices in the upper information cognitive layer. The information cognitive layer is made up of highperformance computers, clusters, private or public clouds. It achieves accurate understanding of the medical human–robot interaction environment by performing online multi-sensor fusion. For large-scale medical applications, the interactive application layer is usually operated by a data center that supports network virtualization. Fig. 1 shows the architecture of the HBMF system. It can be observed that the data sources of multi-sensor fusion are generated from human, robots, and environment. Among them, the data from human and robots are obtained by BSNs while the data from human–robot interactive environment is generated by deployed sensors. There data sources produce multi-source multi-modal data for multisensor fusion, which enables the HBMF to realize a feedback mechanism, which ensures the safety of human and robot during the process of medical human–robot interaction. In the rest of this section, the specific functions of the HBMF architecture are given in detail.

3. The infrastructure of HBMF architecture 3.1. HBMF architecture In this work, we design the HBMF architecture to support the most advanced smart medical services, which consists of BSNs, sensors, robots, medium and long distance communication devices, computing and storage devices, etc. The HBMF architecture is divided into four layers: the sensor-integrated terminal layer, the network transport layer, the information cognitive layer, and the interactive application layer. The sensor-integrated terminal layer includes a variety of front-end devices, such as BSNs, sensors, robots, etc. These devices are deployed directly in the human–robot interaction environment to provide smart medical services. In particular, BSNs are equipped on users and robots to monitor the state of users and robots in real time, and different types of sensors are deployed to obtain the necessary environmental information. This layer generates original sensory data for multi-sensor fusion. The network transport layer includes various medium and long distance communication devices involved in the space-terrestrial integrated net-

3.2. Specific functions of HBMF 3.2.1. Sensor-integrated terminal layer The sensor-integrated terminal layer is the bottom of HBMF architecture, which provides necessary hardware to support the whole system and aims at data collection, real-time monitoring, environment awareness, and behavioral awareness. The most advanced front-end devices are used to perform user-oriented tasks directly. BSNs on human(H-BSNs) consists of a variety of biomedical devices that distribute on inside, surface or close to the human body, such as pacemakers, implantable defibrillators, bionic ears and eyes, swallow17

K. Lin, Y. Li and J. Sun et al.

Information Fusion 57 (2020) 15–26

Table 1 Common sensors for H-BSNs and R-BSNs. Items

Sensors

H-BSNs R-BSNs H-BSNs & R-BSNs

EEG, ECG, heart rate, body temperature, blood pressure, oxygen saturation air pressure, gravity, light, magnetic, electricity acceleration sensor, gyroscope, position

Table 2 Common sensors for E-sensors. Function

Sensors

Environmental monitoring Security & protection

temperature, humidity, gas, illuminance, voice, water detect smoke, human mobile, gravity, human body presence

able devices, wearable sensors and so on. H-BSNs are used to monitor the state of human and generally considered to be wireless and aim to accomplish controlling and monitoring without influencing users’ comfortability. Similar to H-BSNs, BSNs on robots(R-BSNs) are used to monitor the state of robots. Robots have similar motions to humans, such as walking, grabbing, etc., but have no biological information. In addition, robots and human have many unshared features, such as electric power of robots and body temperature of human. Therefore, the sensors that make up the H-BSNs and R-BSNs are not exactly the same. Table 1 lists some of the commonly used sensors for H-BSNs and R-BSNs. For sensors deployed in the medical human–robot interaction environment(E-sensors), they accurately measure environmental information and support smart medical services by testing, recording and storing sensory data as much as possible. In the HBMF architecture, Esensors mainly monitor the environment surrounding human and robots to support better decisions for medical human–robot interaction. Table 2 list some E-sensors according to the functions.

needed by robots. To realize effective multi-sensor fusion, high performance cloud computing and storage are adopted to collect and process streams from H-BSNs and R-BSNs, which enables large-scale data sharing and collaborations between devices. In addition, the cloud infrastructure is attached to a back-end decision support system to make decisions for robots based on the multi-sensor fusion. Because the computing capacities and mobility of the cloud are mutually constrained, inappropriate task allocation schemes can lead to low quality of experience(QoE) and high cost, Youn et al. [47] proposed the task allocation schemes over co-located Cloud in mobile environment for keeping balance between QoE and cost. Lin et al. [48] designed a green video transmission(GVT) algorithm for improving video transmission performance in mobile cloud networks. Chen et al. [49] proposed a mobile cloudletassisted service mode for achieving flexible cost-delay tradeoffs between remote cloud service mode and mobile cloudlets service mode in terms of task scheduling. Functions of the information cognitive layer include multi-sensor fusion, data storage, data classification, equipment management, QoS service, and spatial information management. Specifically, technologies related to multi-sensor fusion are pattern recognition, neural networks, feature extraction, artificial intelligence and so on.

3.2.2. Network transportation layer The main function of the network transportation layer is transmitting the data obtained by the sensor-integrated terminal layer to the information cognitive layer securely and reliably. In addition, the network transportation layer has the following functions: security monitoring, traffic monitoring, heterogeneous network convergence, resource management, storage management, remote control and so on. Network transportation layer provides communication service between heterogeneous devices. Both traditional and emerging network technologies are applied to this layer. For different devices, the communication modes they use are usually different. For example, considering the wearability of the body sensors, H-BSNs and R-BSNs must use the least number of devices and ensure high accuracy and robustness with physical dimensions, weight, bio-compatibility and ergonomics [3]. Only low rate data transmission is required for both H-BSNs and R-BSNs. Introduced traditional network technologies include 3G/4G, IPv6, Wi-Fi/W- iMAX, bluetooth/Zigbee, etc. Although 4G technology has completely surpassed 3G technology in terms of communication quality, but its capacity is still limited which results in the inability to achieve the expected communication performance in large-scale medical services. In order to make the system run more smoothly, 5G technology is to provide high-speed wide-area communication. IPv6 technology provides a unique identity for each device. WiMAX technology has higher transmission rate and lager communication range than Wi-Fi, which enables high-speed communication between front-end devices and the data center. Bluetooth and Zigbee enable short-distance data exchange for BSNs, E-sensors and robots.

3.2.4. Interactive application layer The main function of the interactive application layer is to establish a direct interaction between the user and the robot, and intuitively display the feedback mechanism of the HBMF architecture. Considering both human and robots are usually not stationary, in order to ensure the simplicity and integrity of the application, the task switching function is necessary besides the task decomposition function, the decision informing function, and the security authentication function. The task decomposition and switching functions control task execution and collaboration for BSNs, robots, and E-sensors. The decision informing function mainly sends user’s instructions or system feedback decisions to front-end devices by using text, voice, and so on. In addition, it guides users by remote assistance if robots cannot handle the event. In order to ensure safe medical services, the interactive application layer also provides security authentication function supported by facial and voice recognition. 4. Multi-sensor fusion framework In this section, we first categorized the services of the medical human–robot interaction scenario and by analyzing their corresponding data sources in detail. Then, a multi-sensor fusion framework for medical human–robot interaction is designed to provide safe and effective robotic decision-making in medical environment. 4.1. Data source analysis of medical human–robot interaction scenario

3.2.3. Information cognitive layer The information cognitive layer is responsible for executing multisensor fusion, which transforms the complex data into information

Intelligent medical robots can replace medical staff to complete basic healthcare, including nursing patients, assisted diagnosis, rehabil18

K. Lin, Y. Li and J. Sun et al.

Information Fusion 57 (2020) 15–26

Fig. 2. Services in medical human–robot interaction scenario.

itation status tracking, and even surgical assistance. Considering that robots serve the users directly and make decisions without intervention, they need to perform tasks correctly and effectively and secure them and users. Thanks to the development of sensor technology, sensors deployed in medical human–robot interaction environments provide rich sensory data for making correct human–robot decisions. Sensory data source analysis is given from the perspective of system services. Fig. 2 shows the system services provided in medical human–robot interaction scenario.

For all the users, robots should have the ability to serve them and respond to any need according to users’ physical and psychological state, such as suddenness, falling, lack of drug storage, etc. 4.1.2. Sensory data for healthcare services Healthcare is the most basic service provided by medical robots based on the human–robot interaction. For example, if a robot detects that the user it serves has a high temperture, it can provide water to the user and ask whether to contact a doctor or take medicine. After getting instructions, the robot responds the user’s requests as much as possible. If the user’s health continues to deteriorate so that the robot cannot handle it, the robot will contact the hospital to request manual assistance. Human body state is one of the most important data source for healthcare services with robot decision-making. The characteristic of sensory data from users can be divided into dynamic features and static features. The dynamic features include electrical parameters(EMG, skin electricity, brain electricity, ECG) and physical parameters(muscle strength, body temperature, breathing, eye movement), the static features include biochemical parameters(ultrasound, urine, sweat, blood) and omics parameters(metabolism, protein, transcription, gene). More accurate medical human–robot interaction decision can be obtained with more comprehensive state of users. In addition, the user’s instructions and environmental conditions are also needed for providing safe and efficient healthcare services.

4.1.1. Sensory data for user classification First, Users are first divided into three categories: healthy users who can live normally, sub-healthy users with limited activity, and unhealthy users who lose their self-care ability. The same user may be converted in these three categories due to factors such as injury and rehabilitation, which requires H-BSNs to monitor and send the user’s real-time state to the medical robot. Typical state information includes body temperature, heart rate, sound, blood pressure, breathing, etc. Robots provide different medical services according to the classification of users. For healthy users, robots only need to provide basic living assistance and remind users to eat healthy food, do moderate exercise, sleep in time, etc., which requires the sensory data generated from users’ body and their living environment. For sub-healthy and unhealthy users, robots perform more medical-related assistive functions. In particular, in order to complete complex human–computer interactions, the collection of sensory data is more comprehensive and requires higher precision and reliability.

4.1.3. Sensory data for human–robot dialogue Human–robot dialogue is necessary to improve the quality of robots’ service to users. The modes of human–robot dialogue include direct 19

K. Lin, Y. Li and J. Sun et al.

Information Fusion 57 (2020) 15–26

control interface, speech recognition, gesture recognition, limb action recognition, etc. The direct control interface allows the robot to accept limited commands from the software interface, which guarantees accuracy but lacks flexibility. The speech recognition depends on natural language and enables the robot to understand the user’s intention by interpreting the user data. The motion recognition requires real-time perception and further data processing to obtain accurate information of the human motion trajectory and surrounding environment. By using somatosensory devices, human–robot dialogue can be realized without touching. Considering both flexibility and accuracy, multi-mode compatible human–robot dialogue can realize better medical services. Robots collect environment and user information (face, gestures, movements, sounds and rhythms) to form a comprehensive understanding from flexible medical human–robot interaction to enhance the user experience.

with user data to determine the behavior of robots. The behavioral restriction data includes electricity, motion range and so on, which independently affects the behavior of robots according to different data indicators. For example, the remaining power affects the running time of robots, the range of motion affects robots’ swing, etc. In addition, the impact of behavioral restriction data is independent from the behavioral decision. In summary, the multi-sensor fusion framework consists of three stages: preprocessing stage, feature learning stage, and fusion decision stage. Data preprocessing is an indispensable stage of multi-sensor fusion. Appropriate preprocessing not only makes the fusion result more accurate, but also improves the fusion efficiency. Feature learning is the necessary step after data preprocessing, which is divided into four parts according to the types of services: multi-model user data fusion, human–robot dialogue and intention understanding, user classification, path and action planning. Technologies required for each service are given as follows: 1) Multi-model user data fusion: interpretable neural network, crossmodal association learning, and dual-driven model by knowledge and data. 2) Human–robot dialogue and intention understanding: human action recognition, speech recognition. 3) User classification: interactive knowledge atlas modeling. 4) Path and action planning: path planning based on multi-modal sensing, task planning of humanoid operation. The fusion decision stage is divided into two steps: the first fuses the results from the four parts of feature learning stage, and the second step makes decision to obtain the final action plan. Common fusion and decision techniques include expert system, D-S evidence theory, fuzzy set theory, Pignistic probability distance, etc. Fig. 3 shows the overview of multi-sensor fusion framework.

4.1.4. Sensory data for security guarantee For protecting the safety of both users and robots, robots must avoid dangerous actions during medical human–robot interaction. Accurately capturing the surrounding environment information(obstacle location, space size, ground conditions, and so on) and the state of the robot(electricity, motion range, hardware conditions, and so on) can limit the robot’s motion to a safe range. Secondly, robots are able to judge emergency and help users get out of trouble, which requires them to discover changes in the surrounding environment, such as temperature, humidity, light intensity, air composition, air pressure, etc. For example, robots can detect the occurrence and intensity of fires based on perceived temperature and carbon dioxide content, then perform fire suppression or help users escape from the fire scene. In addition, robots can appropriately adjust the electrical equipment to increase the comfort and safety according to users’ surrounding environment, such as providing lighting to avoid collision.

4.2.2. Preprocessing stage Preprocessing stage transforms raw data into a more suitable state for multi-sensor fusion, which directly affects the performance of fusion. Due to characteristics of multiple sources and heterogeneity of data, collecting data from different sources raises different levels of problems, such as data integrity, authoritativeness, dimensionality inconsistency, noise, field redundancy or multi-index values, etc. Prepossessing mainly involves compensating for data integrity through different filling methods, removing data irrelevant to final decision and smoothing noise, identifying and correcting outlier data points, clustering, gaussian mixture model and so on. Data preprocessing executes data cleaning, data integration, data conversion and data reduction according to the task requirements. Data cleaning techniques includes filling missing data, smoothing noise, identifying outliers, and correcting inconsistencies. Filling missing data has two ways: the first to manually supplement the missing data by experts, and the other is to infer from the existing data. The first way is unsuitable for unsupervised learning data preprocessing of large-scale dataset. Regression and decision tree is the most common and reliable method to compensate for the missing data. Smoothing noise technologies includes binning(smoothing data by examining neighborhood values), regression(using fitting data function), and clustering(detecting outliers by clustering). Clustering-based denoising technology simultaneously performs clustering and outlier detection, which has good operability on dataset with linear time complexity. The most important issue of data integration is to detect data redundancy, which is caused by the correlation between different data. Commonly used redundancy analysis methods include Pearson productmoment correlation coefficient, chi-square test, and covariance of numerical attributes, etc. In addition, entity identification and data value conflict handling are also involved in the data integration process. Data conversion includes standardization processing, discretization processing and sparse processing of data to achieve a suitable form for learning. The dimensions of the data with different characteristics

4.2. Multi-sensor fusion framework in medical human–robot interaction scenario 4.2.1. The overview of multi-sensor fusion framework Based on the above data source analysis, a comprehensive multisensor fusion framework is proposed to meet the requirement of medical human–robot interact. Considering the multimodality and complexity of the sensory data, this framework adopts a distributed fusion mode. According to data sources in medical human–robot interaction scenario, sensory data is divided into two categories: user data (including user’s physical and psychological data) and non-user data (including robot data and environmental data). Sensory data needs to be pre-processed before being fused, such as data cleaning, correction, dimensionality reduction, feature extraction, etc. User physical data is further divided into dynamic data and static data, which is mainly fused based on the results of feature extraction. On the contrary, user psychological data mainly comes from human–robot dialogue, which requires more complex data processing, such as natural language processing to extract and fuse multiple semantic features. The results obtained from a single dialogue method can be directly involved in behavioral decision-making, for example, the user can express his intention only with speech commands, and the robot executes according to the command. For most cases, any single dialogue method cannot accurately express user’s intention, and necessary multi-mode compatible human–machine dialogue requires decision-level fusion contained in the framework. On the other hand, the non-user data guarantees the safety of medical services, which is divided into behavioral environment data and behavioral restriction data. Behavioral environment data includes environmental information surrounding users and robots, such as temperature, obstacle, humidity and so on. Similar to user physical data, it performs feature-level fusion and further executes decision-level fusion 20

K. Lin, Y. Li and J. Sun et al.

Information Fusion 57 (2020) 15–26

Fig. 3. The overview of multi-sensor fusion framework.

may be different, which produces significant numerical differences. Data standardization processing can solve this problem, which extends the data to a certain extent to make it fall into a specific area for comprehensive analysis. The normative methods include maximum-minimum normalization, Z-Score standardization, and log-transformation. Effective discretization reduces the time and space overhead and improves the performance of clustering and noise immunity. Sparse processing is mainly for discrete data, which facilitates the rapid convergence of the model and improves the anti-noise ability. Data reduction greatly reduces the size of the dataset while maintaining the integrity of the original data. It will be more efficient by analyzing the formed dataset after data reduction. Data reduction methods include: data cube aggregation, attribute subset selection, dimension reduction, numerical reduction, discretization, and concept layering.

Robot motion planning is derived from behavioral data, such as current position, time, battery capacity, target position, etc. It constructs behavior mapping models based on deep convolution neural networks from input data to behavioral sequences. Combined with mapping model and learning methods, the intelligent behavior planning method is developed to meet the requirements of the medical scenario. On the other hand, the premise of robot autonomous planning task is to acquire experience from human manipulation data. First, vision-based task analysis and decomposition are used to analyze human manipulation data to obtain geometric and kinematic constraints on human manipulation postures. Then, a task-oriented manipulation planning method is used to build a trajectory for the robot based on a specific typical task. Finally, through deep learning, transfer learning and reinforcement learning, and human experience are applied to robots.

4.2.3. Feature learning Data feature learing mechanisms include anomaly data processing, normalization processing, principal component analysis, and deep neural network. It is critical to the performance of subsequent multi-sensor fusion. For example, in the process of human–robot dialogue, robots can capture human emotions through action and speech analysis to achieve an accurate understanding of the user’s intentions. Fig. 4 shows the methods of data feature learning for understanding user’s intentions. Since the captured data of human motion is a high-dimensional and unstructured data sequence, it is difficult to directly perform motion recognition based on raw data, and the accuracy of recognition cannot be guaranteed. To solve this problem, segmentation and coding methods are used to encode sensory data, the mapping between human action sequences and motion patterns is achieved through dynamic time warping algorithms, and key frame extraction and support vector machines are used to understand user’s motion. For speech recognition, considering that valuable speech information may be interfered in the human– robot interaction scenario, speech enhancement techniques based on deep generation network model and supervised learning model are used to optimize speech information and obtain basic spectral pattern reconstruction. Multi-dimensional speech information extraction techniques based on signal processing and pattern recognition are adopted to identify user’s intention. In addition, since the pattern recognition method based on statistical learning requires a large number of labeled samples, this limits the application of the recognition method in speech understanding. Speech understanding technology based on transfer learning mechanism is introduced to accommodate these cases with limited samples or non-labeled samples.

4.2.4. Fusion decision In order to improve the execution efficiency of multi-sensor fusion in medical human–robot interaction scenarios, it is necessary to find the most suitable fusion strategy based on the characteristics of sensory data, and adjusts fusion strategies in time to obtain accurate fusion decisions. Based on the previous data source analysis, the following three fusion mechanisms are selected: • Cross-domain fusion: Cross-domain fusion mainly focuses on crossdomain knowledge migration and the fusion of different feature spaces, which solves the problem of the decline of fusion capability caused by multi-modal data from the source and target domains in different feature representation spaces. It is able to support fusion decision based on multi-source data generated from medical human–robot interaction scenarios. • Incremental classifier fusion: Since medical robots introduce additional information, the large amount of data or the dynamic growth of data leads to a significant increase of the convergence overhead, which cannot meet the requirements of real-time fusion decision-making. Incremental classifier fusion can optimize complementary modal data by co-clustering of multi-modal data and get decision results more quickly. • Multi-sensor fusion with incomplete data: It mainly deals with partially lost original perceptual data. For example, the traditional fusion mechanism cannot process the incomplete data contained in the dataset. Although it can directly delete incomplete data and make fusion decision only based on the remaining complete data, the loss of valuable information contained in the incomplete data will affect the accuracy and comprehensiveness of the fusion decision. Therefore, the fusion mechanism 21

K. Lin, Y. Li and J. Sun et al.

Information Fusion 57 (2020) 15–26

Fig. 4. Data feature learning for understanding user’s intentions.

based on incomplete data is very necessary for medical human–robot interaction scenarios.

and dynamic feature data, define them as Us and Ud respectively. The traditional method of dynamic feature data processing is two-stream convolutional network(TSCN), which adopts two convolutional neural networks to learn temporal and spatial properties of successive image frames respectively. Simonyan et al. [50] constructed a TSCN for video modal learning and made a breakthrough. However, since convolutional neural networks are not good at learning temporal properties, TSCN usually requires a large amount of time attribute information. Compared with convolution neural network, recurrent neural network has better time series fitting performance, so it is more effective to build a hybrid neural network by combining the convolutional neural network and the recurrent neural network to learn the temporal and spatial characteristics of dynamic physical feature data, respectively. More formally, this paper supposes Ud(0,T) as dynamic feature information of 𝑡 = 0 to 𝑡 = 𝑇 and divides it into T equal time-slots. The sequences 𝑥𝑖 = 𝑥𝑖1 , 𝑥𝑖2 , 𝑥𝑖3 , … , 𝑥𝑖𝑚 (𝑖 = 0, 1, 2, … , 𝑇 ) are defined as the input at time i, where 𝑥 = 𝑥1 , 𝑥2 , … , 𝑥𝑚 represents the input of m modalities. After learning by convolutional neural network, the feature information 𝑦𝑖 (𝑖 = 0, 1, … , 𝑇 ) at each moment is obtained by 𝑦 = 𝑓𝑐 (𝑥), where 𝑦 = 𝑓𝑐 (𝑥) is the learning function of convolutional neural network. As input of the recurrent neural network, yi is mapped to output 𝐻𝑖 (𝑖 = 0, 1, … , 𝑡) by computing activations of the units in the network with the following equations recursively from 𝑖 = 1 to 𝑖 = 𝑇 [51]:

5. Multi-sensor fusion method based on interpretable neural network Although great progress has been made in data processing, the neural network relies on large-scale label data and suffers from black-box problems, which makes it unable to process small sample data and multimodal data with different time scales and data structures. Traditional neural networks are difficult to support fusion decision-making in medical human–robot interaction scenarios. Therefore, we propose a multisensor fusion method based on interpretable neural network(MFIN), which uses the artificial intelligence algorithm with strong learning ability to achieve high-quality decision-making in intelligent medical human–robot interaction scenarios. The MFIN method aims to ensure that robots make right decisions based on the extraction, understanding and fusion of modal features. In this section, robotic decision-making is decomposed into three processes: feature extraction and understanding, data fusion, and behavior decision. Feature extraction and understanding execute single-mode feature extraction and cross-modal association learning between multimodal states. Data fusion is achieved by the interpretation of neural networks. Behavioral decision is made based on deep reinforcement learning technology. Fig. 5 shows the progress of the MFIN method.

𝑧𝑖 = 𝜎(𝑊𝑧 ⋅ [ℎ𝑖−1 , 𝑦𝑖 ] + 𝑏𝑧 ) 𝑓𝑖 = 𝜎(𝑊𝑓 ⋅ [ℎ𝑖−1 , 𝑦𝑖 ] + 𝑏𝑓 )

5.1. Cross-modal association learning

𝑐𝑖 = 𝑓𝑖 ∗ 𝑐𝑖−1 + 𝑧𝑡 ∗ 𝑡𝑎𝑛ℎ(𝑊𝑐 ⋅ [ℎ𝑖−1 , 𝑦𝑖 ]) 𝑜𝑖 = 𝜎(𝑊𝑜 ⋅ [ℎ𝑖−1 , 𝑦𝑖 ] + 𝑏𝑜 )

Since different sensory data from medical environment are complementary, it is necessary for the MFIN method to build a joint representation for multimodal data by establishing a deep neural network based on the results of single-modal representation. The joint representation of the modality is obtained by fusing the semantic information of each modality. According to the analysis of data source in section IV, the user physical data is divided into dynamic feature data and static feature data. Unlike static feature data, the spatial and temporal properties of dynamic feature data change continuously, the method of determining spatiotemporal attributes can be used to distinguish between static feature data

ℎ𝑖 = 𝑜𝑖 ∗ 𝑡𝑎𝑛ℎ(𝑐𝑖 )

(1)

where yi , hi are input and hidden sequences at the tth time step, zi , fi , ci , oi are respectively the activation vectors of the input gate, forget gate, memory cell and output gate. W is the weight matrix, b𝛼 is the bias term 1 of 𝛼 and 𝜎 is the sigmoid function defined as 𝜎(𝑥) = . It can be 1 + 𝑒−𝑥 seen that HT is a feature matrix composed of n vectors, and it is also one of the inputs to next step. On the other hand, in order to perform cross-modal fusion, it needs to predict the relationship between the various modalities. This process 22

K. Lin, Y. Li and J. Sun et al.

Information Fusion 57 (2020) 15–26

Fig. 5. Execution process of the MFIN method.

is performed after obtaining yi , and the probability matrix P is obtained by counting the correction between the various modalities as follows:

𝑝𝑛×𝑛

⎛ 𝑝𝑦1 𝑦1 ⎜𝑝 ⎜ 𝑦2 𝑦1 = ⎜ 𝑝𝑦3 𝑦1 ⎜ ⋮ ⎜ ⎝𝑝 𝑦 𝑚 𝑦 1

𝑝𝑦1 𝑦2 𝑝𝑦2 𝑦2 𝑝𝑦3 𝑦2 ⋮ 𝑝𝑦𝑚 𝑦2

𝑝𝑦1 𝑦3 𝑝𝑦2 𝑦3 𝑝𝑦3 𝑦3 ⋮ 𝑝𝑦𝑚 𝑦3

⋯ ⋯ ⋯ ⋱ ⋯

𝑝𝑦1 𝑦𝑚 ⎞ 𝑝𝑦2 𝑦𝑚 ⎟ ⎟ 𝑝𝑦3 𝑦𝑚 ⎟ ⋮ ⎟ ⎟ 𝑝𝑦𝑚 𝑦𝑚 ⎠

connected by a dynamic routing algorithm to select parameter effectively. More specifically, Us is divided into n capsules according to the modalities, where n represents the number of static modalities. Each capsule consists of several feature vectors [52]. This paper defines ̂ 𝑢𝑗 = {̂ 𝑢𝑗1 , ̂ 𝑢𝑗2 , …}(𝑗 = 1, 2, … , 𝑛) as the input vectors of the jth capsule, and activates them through squash function to obtain the output vectors of bottom capsules 𝑣̂𝑗 , that is, 𝑠𝑞𝑢𝑎𝑠ℎ(̂ 𝑢𝑗 ) → 𝑣̂𝑗 , where squash function compresses the length of vectors to no more than 1 with maintaining ̂𝑘 as the capsules’ inputs of their direction. Then 𝑣̂𝑗 is transformed to 𝑤 ̂𝑘 refers to the next layer based on dynamic routing algorithm, where 𝑤 representation of the feature of higher hierarchy. After iterating several times, the correlation between data of different modalities is measured by the length of vectors in the capsules, the vectors with length close to 1 are selected as the learning result of static modalities and defined as Cs. Thus, CapsNet generates a vector representation of the direction and spatial information containing instances in each capsule layer and use a dynamic routing algorithm instead of the pooling layer to avoid loss of valid information. Although the CapsNet is not designed for multimodal data fusion, the rationality of this idea can be judged by analyzing the static feature data and the characteristics of CapsNet. It extracts the feature information of each static feature data model and stores the result in a capsule in the form of a vector to represent the feature of particular modality. For executing cross-modal association learning, each capsule sends information to a higher level capsule. CapsNet can adaptively select relevant static features for the input samples to complete the corresponding classification/reduction tasks with better robustness. Fig. 6 shows the methods of cross-modal association learning for dynamic and static feature data. In terms of human–robot dialogue, the development of dialogue based on single modal recognition has been relatively mature. However, due to the limitations of single modal recognition and the low accuracy in some scenarios, multimodal compatible human–robot dialogue is introduced to effectively coordinate multi-modal data. In order to learn and train single-modal feature data with complex features, the MFIN introduces a random forest regression method, which has stable structure, high operational efficiency, low data requirements, and is not easy to cause over-fitting problems, etc.

(2)

Where pij represents the relevance between two modalities. Each element on the main diagonal indicates the relevance between each modal and itself which is set to zero. After obtaining the single modal representation of dynamic feature data, it is necessary to construct a deep neural network for cross-modal association learning, the processing object of traditional convolutional neural networks is limited to Euclidean data, that is, it cannot process high-dimensional data for multimodal learning. The graph convolutional network can learn the feature information and structure information of end-to-end nodes at the same time, establish the logical relationship between nodes, and deal with irregular and complex data. The input of the graph convolutional network consists of two parts: the feature matrix and the adjacency matrix. The feature matrix is matrix HT and the adjacency matrix comes from matrix P by setting a threshold tg . Every element pij (i ≠ j) is set to 0 if its value is less than tg , otherwise it is set to 1. This threshold makes all values in the matrix P to become 0 or 1 for transforming the adjacency matrix PA , where 𝑃𝑖𝑗𝐴 = 1 represents node i and j as a neighbor node. The hidden layer of graph convolutional network can be calculated by 𝐺𝑘 = 𝑓𝑔 (𝐺𝑘−1 , 𝑃 𝐴 ), where G0 is HT , fg (x, y) is a nonlinear activation function, such as ReLU(Rectified Linear Unit) function. After several iterations, a graph Gd with a clearer relationship between the modalities is obtained and stored as the learning result of dynamic feature data and a part of the decision basis. Although convolutional neural networks effectively represent the semantic information of a single static feature data, such representations are difficult to continue learning and blending as input to deeper neural networks. This is because convolutional neural networks require a large number of learning samples to learn, which leads to inefficiency multimodal data processing. Moreover, the output of the convolutional neural network is scalar, and many important feature information may be lost during the convergence process. In order to solve these problems, Hinton et al. [52] proposed the concept of capsule network(CapsNet) and proved its excellent performance in image recognition. Its basic component unit is capsule, which is a set of neurons whose inputs and outputs are in vector form. Each element in the vector is a parametric representation of the physical features, and two adjacent capsule layers are

5.2. Interpretable neural networks The information of each modal in cross-modal relational learning is fused in the multimodal representation process, which has a certain degree of intersection and correction. The MFIN realizes the multi-sensor 23

K. Lin, Y. Li and J. Sun et al.

Information Fusion 57 (2020) 15–26

Fig. 6. Cross-modal association learning for dynamic and static feature data.

Table 3 Functional comparison.

fusion with different levels by utilizing the advantages of interpretable neural networks. Feature-relational reasoning is an indeterminate reasoning process. The MFIN uses a Bayesian network to express the correlation between features based on probability. It has a strong ability to deal with uncertainty problems and can effectively perform relationship reasoning between features. Feature representation is the basis of feature relationship reasoning, which is the abstraction of features and the composition of nodes in Bayesian networks. According to the above analysis of the graph neural network and CapsNet, they have excellent performance in feature representation. Specifically, the graph neural network has high performance on graph structure data for feature logic relationship representation. CapsNet use the capsule hierarchy to categorize and transfer function information to realize a hierarchical representation of features. More formally, since Gd and Cs are composed of vectors, Gd is converted into the form of capsule, which is defined as Cd . Cs and Cd are simultaneously used as input of the CapsNet. Each layer of capsules in CapsNet 𝐶𝑖 (𝑖 = 1, 2, … , 𝑙) represents different hierarchies of features, where l is the number of iterations of the CapsNet. Converting Ci into the form of graph Gi as inputs of the graph convolution network to ob̃𝑖 , where 𝐺 ̃𝑖 represents the feature logic relationship at tain the atlas 𝐺 ̃𝑖 and training them by a Bayesian network, layer i. Next, sampling in 𝐺 which obtains the relationship between features expressed by probability distribution and defines as Fb .

state parameter st to obtain the corresponding action at by using policy function 𝜇(st ), and the policy 𝜇 is continuously executed for obtaining subsequent actions. Then 𝑄𝜇 (𝑎𝑡 , 𝑠𝑡 ) = 𝐹𝑣 {𝑄𝜇 (𝑎𝑡+1 , 𝑠𝑡+1 )} is used in value network to obtain Q value of each time point recursively. The policy 𝜇 with the maximum expected value of Q values refers to the optimal solution.

5.3. Deep reinforcement learning Feasible fusion decision is obtained by the training of multi-layer interpretable neural network. Since the output of the neural network is based on probability, there may be multiple feasible fusion decisions that need to be screened out the best solution. To solve the problem of state uncertainty caused by the motion of the robot, the MFIN adopts a deep gradient reinforcement learning method based on policy gradient(DDPG). It transforms the policy gradient method with the Q learning algorithm, and consists of a policy network (actor) and a value network (critic) for outputting and judging actions, respectively [53]. DDPG has advantages of high dimensional data processing, real-time control, fusion efficiency for robot behavior decision. More specifically, first the policy network and the value network are randomly initialized [53]. 𝐹𝑏𝑡 is input into the policy network as the

6. System advantages and functional comparison In this section, the HBMF system is compared with existing medical systems including the emotion communication system(ECS) [54], the edge-based architecture BodyEdge [55], the Cloud-enabled SaaS(Software as a Service) system BodyCloud [56], and the IoT-based human services framework BSN-Care [57]. Table 3 gives a detailed functional comparison of the five systems, where + and - respectively represent whether the system has a certain function. 24

K. Lin, Y. Li and J. Sun et al.

Information Fusion 57 (2020) 15–26

• Robot assistance: Comparing to other systems, the HBMF system enables robots to perform more composite medical assistance work, which includes healthcare services, human–robot interaction, and security guarantee. • Data source: The fusion decision making of the HBMF system involves multiple data sources of users, environment and robots, which is more objective and comprehensive than other systems. • System autonomy: Comparing to other systems, the HBMF system has a more complete strategy of making decision without human intervention, which includes understanding user’s intention, caring users, crisis monitoring, security guarantee. Robots are able to respond to emergencies autonomously. • Robustness: Comparing to other systems, the HBMF’s robustness is at a medium level. This is because it requires more complex data collection and processing, but it is necessary to achieve accurate and secure smart medical services.

[9] A. Zaraki, M. Pieroni, D. De Rossi, D. Mazzei, R. Garofalo, L. Cominelli, M.B. Dehkordi, Design and evaluation of a unique social perception system for human robot interaction, IEEE Trans. Cognit. Dev.Syst. 9 (4) (2016) 341–355. [10] Y. Lin, H. Min, H. Zhou, F. Pei, A human-robot-environment interactive reasoning mechanism for object sorting robot, IEEE Trans. Cognit. Dev. Syst. 10 (3) (2015) 611–623. [11] R. Barbagallo, L. Cantelli, O. Mirabella, G. Muscato, Human-robot interaction through kinect and graphics tablet sensing devices, in: 2016 24th Mediterranean Conference on Control and Automation (MED), 2016, pp. 551–556. [12] X.T. Truong, T.D. Ngo, Toward socially aware robot navigation in dynamic and crowded environments: a proactive social motion model, IEEE Trans. Autom. Sci. Eng. 14 (4) (2017) 1743–1760. [13] H. Rezaee, F. Abdollahi, A decentralized cooperative control scheme with obstacle avoidance for a team of mobile robots, IEEE Trans. Ind. Electron. 61 (1) (2013) 347–354. [14] S. Ji, W. Xu, M. Yang, K. Yu, 3d convolutional neural networks for human action recognition, IEEE Trans. Pattern Anal. Mach. Intell. 35 (1) (2012) 221–231. [15] K. Lin, M. Chen, J. Deng, M. Hassan, G. Fortino, Enhanced fingerprinting and trajectory prediction for iot localization in smart buildings, IEEE Trans. Autom. Sci.Eng. 13 (3) (2016) 1294–1307. [16] M. Chen, Y. Zhou, H. Gharavi, V. Leung, Cognitive information measurements: a new perspective, Inf. Sci. (2019) 487–497. [17] C. Liang, L. Qi, Y. He, L. Guan, 3d human action recognition using a single depth feature and locality-constrained affine subspace coding, IEEE Trans. Circuits Syst. Video Technol. 10 (28) (2017) 2920–2932. [18] Y. Du, W. Wang, L. Wang, Hierarchical recurrent neural network for skeleton based action recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1110–1118. [19] F. Donnarumma, M. Costantini, E. Ambrosini, K. Friston, Action perception as hypothesis testing, Cortex 89 (2017) 45–60. [20] M. Chen, Y. Hao, Label-less learning for emotion cognition, IEEE Trans. Neural Netw. Learn.Syst. 99 (2019) 1–11. [21] H.C. Ravichandar, A.P. Dani, Human intention inference using expectation-maximization algorithm with online model learning, IEEE Trans. Autom. Sci. Eng. 14 (2) (2016) 855–868. [22] E. Marinoiu, M. Zanfir, V. Olaru, C. Sminchisescu, 3d human sensing, action and emotion recognition in robot assisted therapy of children with autism, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2158–2167. [23] Y. Ma, Y. Hao, M. Chen, J. Chen, P. Lu, A. Košir, Audio-visual emotion fusion (AVEF): a deep efficient weighted approach, Inf. Fusion 46 (2019) 184–192. [24] M. Chen, J. Zhou, G. Tao, J. Yang, L. Hu, Wearable affective robot, IEEE Access 6 (2018) 64766–64776. [25] X. Ding, W. Dai, N. Luo, J. Liu, N. Zhao, Y. Zhang, A flexible tonoarteriography-based body sensor network for cuffless measurement of arterial blood pressure, in: 2015 IEEE 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN), 2015, pp. 1–4. [26] C.A. Boano, M. Lasagni, K. Römer, T. Lange, Accurate monitoring of circardian rhythms using wearable body sensor networks, in: In Proceedings of the 10th ACM/IEEE International Conference on Information Processing in Sensor Networks, 2011, pp. 169–170. [27] J.J. Hou, R. Ji, C. Qin, Y. Yang, Z.L. Wang, A system for human gait analysis based on body sensor network, in: 2014 International Conference on Wireless Communication and Sensor Network, 2014, pp. 343–347. [28] K.H. Yeh, A secure iot-based healthcare system with body sensor networks, IEEE Access 4 (2016) 10288–10299. [29] A. Wang, F. Lin, Z. Jin, W. Xu, A configurable energy-efficient compressed sensing architecture with its application on body sensor networks, IEEE Trans. Ind. Inf. 12 (1) (2015) 15–27. [30] Y. Zhou, Z. Sheng, C. Mahapatra, V.C. Leung, P. Servati, Topology design and cross-layer optimization for wireless body sensor networks, Ad Hoc Netw. 59 (2017) 48–62. [31] N.S. Priya, R. Sasikala, S. Alavandar, L. Bharathi, Security aware trusted cluster based routing protocol for wireless body sensor networks, Wireless Pers. Commun. 102 (4) (2018) 3393–3411. [32] G. Zhang, O.W. Samuel, F. Liu, S. Chen, H. Zhou, H. Zhang, G. Li, Electromyogram-based method to secure wireless body sensor networks for rehabilitation systems, in: 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2017, pp. 1246–1249. [33] S. Mohammed, I. Tashev, Unsupervised deep representation learning to remove motion artifacts in free-mode body sensor networks, in: 2017 IEEE 14th International Conference on Wearable and Implantable Body Sensor Networks(BSN), 2017, pp. 183–188. [34] M. Chen, P. Zhou, D. Wu, L. Hu, M.M. Hasssan, A. Alamri, AI-skin: Skin disease recognition based on self-learning and wide data collection through a closed loop framework, Information Fusion, 2019. [35] A. Ahmed, H. Khurshid, M.M. Riaz, A. Ghafoor, T. Zaidi, Regional image fusion with genetic algorithm optimization, in: 2015 9th Asia Modelling Symposium (AMS), 2015, pp. 91–95. [36] F. Ye, J. Chen, Y. Li, J. Kang, Decision-making algorithm for multisensor fusion based on grey relation and DS evidence theory, J. Sens. (2016). [37] A. Cherchar, M. Thameri, A. Belouchrani, A new multi-sensor fusion algorithm based on the information filter framework, in: 2017 Seminar on Detection Systems Architectures and Technologies (DAT), 2017, pp. 1–4. [38] K. Lin, C. Li, D. Tian, A. Ghoneim, M. Hossain, S. Amin, Artificial intelligence based

7. Conclusion In this paper, the HBMF architecture based on human–robot interaction is designed as an innovative smart medical service framework to overcome shortcomings of traditional multi-sensor fusion methods in medical applications. The four-layer HBMF has adaptability and scalability for medical human–robot interaction scenarios. In addition, the data sources involved in smart medical services are analyzed in detail and a suitable multi-sensor fusion framework is designed. Finally, combined with the latest artificial intelligence technology, the MFIN method is proposed to guarantee the quality of medical services with human– robot interaction by improving fusion decision-making. Declaration of Competing Interests The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgment This work was partially supported by the Liaoning Province Higher Education Innovative Talent Support Program, and by Fundamental Research Funds for the Central Universities under grant no. DUT19JC22. Program for the Liaoning Distinguished Professor, Program for Dalian High-level Talent Innovation Support under grant no. 2017RD11, and the Science and Technology Innovation Fund of Dalian under grant no. 2018J12GX036. References [1] Y. Zhang, R. Gravina, H. Lu, M. Villari, G. Fortino, PEA: Parallel electrocardiogram-based authentication for smart healthcare systems, J. Netw. Comput. Appl. 117 (2018) 10–16. [2] G. Fortino, R. Giannantonio, R. Gravina, P. Kuryloski, R. Jafari, Enabling effective programming and flexible management of efficient body sensor network applications, IEEE Trans. Hum.-Mach. Syst. 43 (1) (2012) 115–133. [3] R. Gravina, P. Alinia, H. Ghasemzadeh, G. Fortino, Multi-sensor fusion in body sensor networks: state-of-the-art and research challenges, Inf. Fusion 35 (2017) 68–80. [4] A. Knowles, J. Timmis, R. de Lemos, S. Forrest, H. McCracken, Artificial iimmune systems for data fusion: a novel biologically inspired approach, in: 2008 11th International Conference on Information Fusion, 2008, pp. 1–7. [5] W. Dong, A. Pentland, Multi-sensor data fusion using the influence model, in: International Workshop on Wearable and Implantable Body Sensor Networks (BSN’06), 2006, p. 4. [6] C. Habib, A. Makhoul, R. Darazi, C. Salim, Self-adaptive data collection and fusion for health monitoring based on body sensor networks, IEEE Trans. Ind. Inf. 12 (6) (2016) 2342–2352. [7] L. Yan, Z. Shuai, C. Bo, Multi-sensor data fusion system based on apache storm, in: 2017 3rd IEEE International Conference on Computer and Communications (ICCC), 2017, pp. 1094–1098. [8] J. Kuehn, S. Haddadin, An artificial robot nervous system to teach robots how to feel pain and reflexively react to potentially damaging contacts, IEEE Rob. Autom. Lett. 2 (1) (2016) 72–79. 25

K. Lin, Y. Li and J. Sun et al.

[39]

[40] [41]

[42] [43] [44]

[45] [46]

[47]

Information Fusion 57 (2020) 15–26

data analytics for cognitive communication in heterogeneous wireless networks, IEEE Wireless Commun. 26 (3) (2019) 83–89. G. Shi, C. Geng, H. Liu, H. Su, Y. Jin, S. Sun, The human body characteristic parameters extraction and disease tendency prediction based on multi-sensing fusion algorithms, in: 2016 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems (CYBER), 2016, pp. 126–130. Q. Zeng, W. Chen, J. Liu, H. Wang, An improved multi-sensor fusion navigation algorithm based on the factor graph, Sensors 17 (3) (2017) 641. A. Belmonte-Hernández, G. Hernández-Peñaloza, F. Alvarez, G. Conti, Adaptive fingerprinting in multi-sensor fusion for accurate indoor tracking, IEEE Sens. J. 17 (15) (2017) 4983–4998. P. Wei, H.E. Ball, D.T. Anderson, Multi-sensor conflict measurement and information fusion, Signal Process. Sens./Inf. FusionTarget Recognit. XXV 9842 (2016) 98420F. Z. Wang, W. Wang, B. Su, Multi-sensor image fusion algorithm based on multiresolution analysis, Int. J. Online Eng. 14 (6) (2018). M. Chen, W. Li, G. Fortino, Y. Hao, L. Hu, I. Humar, A dynamic service migration mechanism in edge cognitive computing, ACM Trans. Internet Technol. (TOIT) 19 (2) (2019) 30. K. Lin, J. Luo, L. Hu, M. Hossain, A. Ghoneim, Localization based on social big data analysis in the vehicular networks, IEEE Trans. Ind. Inf. 13 (4) (2017) 1932–1940. S. Ramírez-Gallego, A. Fernández, S. García, Big data: tutorial and guidelines on information and process fusion for analytics algorithms with mapreduce, Inf. Fusion 42 (2018) 51–61. M. Chen, Y. Hao, C.F. Lai, D. Wu, Y. Li, K. Hwang, Opportunistic task scheduling over co-located clouds in mobile environment, IEEE Trans. Serv. Comput. 11 (3) (2016) 549–561.

[48] K. Lin, J. Song, J. Luo, W. Ji, M. Hossain, A. Ghoneim, Green video transmission in the mobile cloud networks, IEEE Trans. Circuits Syst. Video Technol. 27 (1) (2017) 159–169. [49] M. Chen, Y. Hao, C. Lai, D. Wu, Y. Li, K. Hwang, Opportunistic task scheduling over co-located clouds in mobile environment, IEEE Trans. Serv. Comput. (2018) 549–561. [50] K. Simonyan, A. Zisserman, Two-stream convolutional networks for action recognition in videos, Adv. Neural Inf. Process. Syst. (2014) 568–576. [51] Z. Wu, X. Wang, Y.G. Jiang, H. Ye, X. Xue, Modeling spatial-temporal clues in a hybrid deep learning framework for video classification, IEEE Trans. Multimedia (2015). [52] S. Sabour, N. Frosst, G.E. Hinton, Dynamic routing between capsules, Adv. Neural Inf. Process. Syst. (2017) 3856–3866. [53] T.P. Lillicrap, J.J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Wierstra, Continuous control with deep reinforcement learning, Continuous control with deep reinforcement learning (2015). arXiv:1509.02971. [54] C. Min, Z. Ping, G. Fortino, Emotion communication system, IEEE Access 5 (99) (2017) 326–337. [55] P. Pasquale, A. Gianluca, G. Raffaele, C. Giuseppe, F. Giancarlo, L. Antonio, An edge-based architecture to support efficient applications for healthcare industry 4.0, IEEE Trans. Ind. Inf. (2018). 1–1 [56] G. Fortino, D. Parisi, V. Pirrone, G.D. Fatta, Bodycloud: a saas approach for community body sensor networks, Future Gener. Comput. Syst. 35 (2014) 62–79. [57] S.S. Kale, D.S. Bhagwat, A secured iot based webcare healthcare controlling system using BSN, in: 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), 2018, pp. 816–821.

26