An interactive and low-cost full body rehabilitation framework based on 3D immersive serious games

An interactive and low-cost full body rehabilitation framework based on 3D immersive serious games

Journal of Biomedical Informatics 89 (2019) 81–100 Contents lists available at ScienceDirect Journal of Biomedical Informatics journal homepage: www...

11MB Sizes 0 Downloads 23 Views

Journal of Biomedical Informatics 89 (2019) 81–100

Contents lists available at ScienceDirect

Journal of Biomedical Informatics journal homepage: www.elsevier.com/locate/yjbin

An interactive and low-cost full body rehabilitation framework based on 3D immersive serious games Danilo Avolaa, Luigi Cinqueb, Gian Luca Forestia, Marco Raoul Marinib, a b

T



Department of Mathematics, Computer Science and Physics, University of Udine, Via delle Scienze 206, 33100 Udine, Italy Department of Computer Science, Sapienza University of Rome, Via Salaria 113, 00198 Rome, Italy

ARTICLE INFO

ABSTRACT

Keywords: Rehabilitation Serious games Body modeling Immersive Virtual Reality (IVR) Deep learning Time-of-Flight (ToF) camera

Strokes, surgeries, or degenerative diseases can impair motor abilities and balance. Long-term rehabilitation is often the only way to recover, as completely as possible, these lost skills. To be effective, this type of rehabilitation should follow three main rules. First, rehabilitation exercises should be able to keep patient's motivation high. Second, each exercise should be customizable depending on patient's needs. Third, patient's performance should be evaluated objectively, i.e., by measuring patient's movements with respect to an optimal reference model. To meet the just reported requirements, in this paper, an interactive and low-cost full body rehabilitation framework for the generation of 3D immersive serious games is proposed. The framework combines two Natural User Interfaces (NUIs), for hand and body modeling, respectively, and a Head Mounted Display (HMD) to provide the patient with an interactive and highly defined Virtual Environment (VE) for playing with stimulating rehabilitation exercises. The paper presents the overall architecture of the framework, including the environment for the generation of the pilot serious games and the main features of the used hand and body models. The effectiveness of the proposed system is shown on a group of ninety-two patients. In a first stage, a pool of seven rehabilitation therapists has evaluated the results of the patients on the basis of three reference rehabilitation exercises, confirming a significant gradual recovery of the patients' skills. Moreover, the feedbacks received by the therapists and patients, who have used the system, have pointed out remarkable results in terms of motivation, usability, and customization. In a second stage, by comparing the current state-ofthe-art in rehabilitation area with the proposed system, we have observed that the latter can be considered a concrete contribution in terms of versatility, immersivity, and novelty. In a final stage, by training a Gated Recurrent Unit Recurrent Neural Network (GRU-RNN) with healthy subjects (i.e., baseline), we have also provided a reference model to objectively evaluate the degree of the patients' performance. To estimate the effectiveness of this last aspect of the proposed approach, we have used the NTU RGB + D Action Recognition dataset obtaining comparable results with the current literature in action recognition.

1. Introduction Motor abilities and balance control can be impaired by a wide range of adverse events, including strokes, head traumas, degenerative diseases, natural aging processes, and others. Anyway, regardless of the specific event, human body, at any age, tends to recover, as much as possible, these lost skills [1–3]. In this context, rehabilitation exercises play a key role, since they allow patients to maximize their chances of recovery. To be effective, these exercises should follow three main rules [4,5]. First, to keep patient's motivation high, the exercises should have stimulating, dynamic, changing, and sometimes funny interactive environments. Moreover, each exercise should be provided with competitive stimuli, such as specific goals or scoring mechanisms. Second, a ⁎

therapist should be able to easily customize the rehabilitation exercises depending on patients' needs. In fact, during the whole rehabilitative process, a crucial aspect regards the continuous fitting of all parameters of an exercise with respect to the patients' progresses. Moreover, each patient is different from another and can require a unique set of interactions, parameters, and goals. Third, rehabilitation exercises should have mechanisms to measure patients' performance. In fact, different therapists could have different opinions about patients' progresses, especially when the latter are unperceivable. To face the introduced aspects, recent years have seen the development of different visionbased systems to combine advanced Human-Computer Interaction (HCI) techniques and VEs. Initially, vision-based systems to support the different human

Corresponding author. E-mail address: [email protected] (M.R. Marini).

https://doi.org/10.1016/j.jbi.2018.11.012 Received 10 January 2018; Received in revised form 18 September 2018; Accepted 21 November 2018 Available online 03 December 2018 1532-0464/ © 2018 Elsevier Inc. All rights reserved.

Journal of Biomedical Informatics 89 (2019) 81–100

D. Avola et al.

activities, including rehabilitation, were quite uncomfortable and limited. In [4], for example, the authors proposed a virtual glove for hand rehabilitation. Their system was composed of a set of RGB cameras surrounding the patient's hand, whose synchronized video streams were processed to track the hand movements. The latter were associated with a numerical hand model by which different features could be computed, including geometrical parameters, joint dimensions, hand shape, and others. Although complex, the system proposed by the authors already included several modern features, such as versatility and customization. Moreover, it could be considered one of the first NUIs in rehabilitation, i.e., interfaces able to capture hand or body free movements without using physical constraints (e.g., mechanical gloves, bodysuits equipped with sensors) or other haptic interfaces [6]. The NUI concept is very important in rehabilitation area, since more and more works highlight how these interfaces, jointly with serious games based on VEs, can improve the recovering of lost skills in a wide range of neuromotor impairments [7–10]. In addition, NUIs provide a more natural interaction with systems, thus promoting long-term rehabilitation in terms of frequency, quality, and duration. The last two decades have seen an incremental use of NUIs, and other advanced devices, in rehabilitation [11–13]. A typical example is represented by Time-of-Flight (ToF) cameras, which produce a depth map of the observed scene. Unlike the common RGB images, in a depth map each pixel represents the distance between the camera itself and the corresponding point in the scene [14,15]. Usually, these cameras are used to support body modeling and body tracking, thus enabling users to interact with VEs. Another example of NUI is represented by 3D cameras. Unlike ToF cameras, these devices are composed of two lenses to simulate the human binocular vision, thus reproducing depth information. Recently, this kind of technology has been widely used to support devices able to model and track human hand [16]. A last example is provided by the HMDs, i.e., special helmets equipped with small displays in front of each eye. These devices, usually, are equipped with a wide range of sensors (e.g., accelerometers, gyroscopes) which allow users to see the correct part of the scene in relation to the head position [17]. The use of one, or more, of these devices, together with high defined and interactive VEs, can lead to an effective improvement in body and hand rehabilitation [18–20]. Despite this, current solutions are still severely limited. In particular, more investigations are needed to make available rehabilitation systems able to provide a higher interactive and immersive experience. In addition, further efforts should be made to provide these systems with reference models to support the automatic analysis of the patients' performance. To address the different challenges of the current motor rehabilitation, in this paper, an interactive and low-cost full body rehabilitation framework for the generation of 3D immersive serious games is described. The main aim of this work is to present original solutions in terms of higher motivation, engaging interaction, exercises customization, and rehabilitative results. Moreover, by training a GRURNN with healthy subjects (i.e., baseline), also a reference model to objectively evaluate the degree of the patients' performance is provided [21]. The presented framework uses a ToF camera to acquire and model body's movements [22,23], an IR stereo camera to acquire and model hand's gestures [16,24], and an HMD to immerse patients inside the VEs [25]. Two different human models are implemented, a stickman (i.e., a skeleton model) to process the body's movements, and a bubbleman (i.e., a volumetric model composed by spheres) to manage the interaction with the VEs. Notice that, only the body models are processed by the GRU-RNN, since, currently, the framework is focused on motor abilities and balance. Actually, we are just working to enrich the framework with functionalities and serious games designed for the hand rehabilitation. The framework proposed in this paper inherits and extends some of the main features described in [26–28]. The first two works show two previous rehabilitation systems, while the third presents a system for the recognition of sign language and semaphoric hand gestures.

Concluding, the advantages of the proposed framework and the differences with the three works just reported can be summarized as follows:

• With respect to the work described in [26], the framework proposed





in this paper provides more engaging serious games, significant improvements about hand and body modeling, a more advanced editor for the customization of the rehabilitation exercises, and a more complete interactive experience. Moreover, in [26], the serious games were non-immersive, the patients' data were not collected, and no deep learning algorithm was used to process and compare patients' data with a previously trained reference model aimed to automatically provide an evaluation of their performance; The work shown in [27] presents, among others, three main weaknesses in contrast with the framework we propose. First, a less advanced editor for the customization and rendering of the serious games. Second, a less advanced body modeling in which, moreover, the whole model could not be placed inside the VEs. Third, the use of a Long Short-Term Memory Recurrent Neural Network (LSTMRNN) to perform the patients' evaluation process [28]. Notice that, as reported in [29], LSTM-RNNs and GRU-RNNs can have similar performance, but their behavior depends on used data. In our experience, the input provided by the stickman model is more suited for the GRU-RNNs. Anyway, due to their simplified internal structure and absence of a gate, the GRU-RNNs are more performing during the learning stage. Finally, they address better the data augmentation strategy, sometimes required for the quick development of serious games based on long sequences of body gestures [30]; From the work reported in [28], the framework we propose inherits the basic implementation of the hand model in relation to the reference space. In the near future, we also will integrate the hand gesture classification strategy to support complex free hand rehabilitation exercises.

The effectiveness of the proposed system is shown on a group of ninety-two patients. On the basis of three pilot serious games, a pool of seven rehabilitation therapists has confirmed the benefits of the used method. Moreover, the comparisons of the presented framework both with the current related systems and with the current state-of-the-art in action recognition (by using the NTU RGB + D Action Recognition dataset [71]), have highlighted its versatility, efficacy, and novelty. In addition, the feedbacks received by the therapists and patients about the use of the system have underlined its usefulness in terms of motivation, usability, and customization. Finally, the comparison between therapists' evaluations and trained reference models, for each rehabilitative exercise, has confirmed how a deep learning approach can be a reliable support during the rehabilitation processes. The paper is structured as follows. Section 2 discusses the state-ofthe-art about immersive and non-immersive interactive VEs in rehabilitation. Moreover, it points out the role of the deep learning in this kind of systems. Section 3 presents the overall design of the framework, including logical pipeline, body models, and GRU-RNN architecture. Section 4 shows the test environment and the developed serious games. Section 5 reports the evaluation of the framework and its comparison with both related works and current literature in action recognition. Section 6 concludes the paper and provides future improvements. Finally, Appendix A outlines implementation details, including physical architecture, adopted devices, and VR platform. 2. Related work In this section, a possible classification of the different HCIs applied in Virtual Reality (VR) is reported. In general, an HCI can be distinguished from another by means of different features, including: input type, sensor type, naturalness degree, and other. Thus, a way to identify 82

Journal of Biomedical Informatics 89 (2019) 81–100

D. Avola et al.

HCIs is to classify them depending on the type of interaction between human and machine. In this context, these interfaces can be divided in touch-oriented and touchless-oriented. The first class consists of all those interfaces, usually equipped with servomotors, sensors, markers, or other, which must come into contact with users (touched or worn) to activate an interaction process [32,33]. In rehabilitation, typical examples of this class are the haptic gloves. In these interfaces, sensors and mechanical units placed on a wearable glove are used to reconstruct a numerical model of the hand and to enable its interaction with VEs designed for the hand rehabilitation. In the last twenty years, the study of this type of gloves has been a hot topic that has produced a wide range of remarkable tools, including the Rutgers Master II-ND glove [34], the CyberGrasp glove [35], and many others [36–38]. In addition to the hands, touch-oriented interfaces, e.g., armbands, body suits, stabilometric platforms, are currently studied for the rehabilitation of different body skills. In [39], for example, the Myo armband [40], supported by an augmented reality game, is used to perform home-based neurorehabilitation for children with cerebral palsy. In [41], instead, a stabilometric platform and visual feedback are used for the rehabilitation of patients affected by brain injuries. Touchoriented interfaces, undoubtedly, have several advantages. For example, they have a high spatial accuracy since positions are acquired by the physical sensors placed on the interfaces. Moreover, for the same reasons, they do not suffer of occlusions or misinterpretation of the poses [42]. Despite this, these interfaces have several drawbacks that make them unsuitable for various rehabilitation aims. For example, wearable interfaces designed for hand or body rehabilitation are hardly customizable and, often, have a high cost. In addition, touch-oriented interfaces tend to limit the naturalness and spontaneity of movements, which are key factors for an effective motor rehabilitation [43,44]. For these reasons, touchless-oriented interfaces, i.e., vision-based systems, are assuming always greater importance in rehabilitation area, especially in those tasks regarding motor abilities and balance control. Recent years have seen a wide spread of low-cost devices designed to support a large number of vision-based applications. Among these devices, MS Kinect (V1 or V2) [23], Leap Motion Controller (LMC) [24], and Oculus Rift (CV1 or DK2) [25] can be considered the most popular thanks to their compromise between accuracy and adaptability. Even in motor rehabilitation, the use of one, or more, of these devices (or similar) is becoming a standard practice for the development of advanced hand or body rehabilitation frameworks. In [45,46], and [47], for example, three different full body rehabilitation systems, based on MS Kinect, are reported. In the first, a serious games framework designed to study and support hand and leg rehabilitation for post-stroke patients is proposed. In the second, instead, a framework for the objective measurement of functional performance in patients affected by multiple sclerosis is presented. Finally, in the last, a motion rehabilitation and evaluation framework designed to enable patients to conduct rehabilitation training themselves is described. In [48], a VR based serious game, for hand rehabilitation, that uses a LMC is reported. Other similar works are discussed in very recent literature [49,50]. Undoubtedly, touchless devices have faced different problems in HCI [51,52], including body recognition, body modeling, and body tracking. Regarding HMDs, or similar technologies, some works are exploring the use of Immersive Virtual Reality (IVR) for rehabilitation purposes [27,53,54]. These studies follow the success of several VR based systems in rehabilitation area [55–57]. Anyway, different efforts and goals must be still made to obtain versatile and effective IVR based systems able to support hospital and home rehabilitation, at low-cost, with a high level of accuracy, reliability, and customization. Regarding Machine Learning (ML) techniques, as reported in [58], they were initially used in healthcare to support intelligent data analysis for medical diagnosis. More advanced techniques, i.e., Deep Learning (DL) approaches, are nowadays considered a new trend for data examination also in other healthcare areas, including medical image analysis [59] and neuroscience [60]. Despite this, DL approaches

in motor rehabilitation are still highly limited. Actually, in very recent literature, only the work proposed in [27] has a performance evaluation algorithm (i.e., LSTM-RNN) similar to that we propose (i.e., GRU-RNN). This means that, the use of DL algorithms to study patients' movements performed during serious games is an approach that still requires further investigations. Concluding, the framework described in this paper is only aimed to provide a further contribution to the current state-ofthe-art in terms of effectiveness, customization, automatic evaluation, and versatility. Moreover, the proposed system has a low-cost and does not require of dedicated devices or expensive equipped rooms [61]. 3. Framework design The proposed framework allows therapists to create customized exercises for rehabilitation. As previously reported, the system reported in this paper inherits and extends some of the main features described in [26–28]. In particular, compared to the first work, all the basic functionalities have been redesigned, extended, and improved, including serious games and customization, hand and body modeling, immersivity of the VEs, and data collection. Moreover, a DL strategy has been introduced to automatically evaluate patients' performance. With respect to the second work, different significant improvements have been made, including third-person interactive modalities, body modeling customization, and serious games rendering. Moreover, GRURNNs are used, instead of LSTM-RNNs, to speed up training stage and data augmentation strategies. Finally, by the last work, the framework we propose inherits the basic implementation of the hand model in relation to the reference space, thus enabling, in the present version, serious games based on hand touch, and, in the future version, complex free hand gesture rehabilitation exercises. In the current version of the framework, specific features are supported by an interpretive system, i.e., without any programming skill, therapists can easily insert data about exercises and patients, thus allowing the framework to generate the source code for the creation of customized VEs. As shown in Fig. 1, the proposed system is composed by three different subsystems: user, game development, and server. The first subsystem consists of front-end functions and interfaces. It allows therapists to create customized exercises and allows patients to perform them in VEs. In addition, the subsystem allows therapists to design actions and validation rules for the rehabilitation exercises. The second subsystem manages the creation of the VEs. Serious games are described by the eXtensible Markup Language (XML) [62] and stored in different files. In particular, a file contains information about: VE data, objects inserted in the VE, and associated rules. Notice that, in this stage, the subsystem manages, at the same time, two different functionalities: exercise building and VR setting. Regarding the exercise evaluation, a scoring method and a DL approach, based on a previously trained GRU-RNN, are deployed. In the first case, therapists are supported by some tools to verify poses and movements of the patients, instead, in the second case, the framework provides autonomously a rank about patients' performance. The last subsystem uses network to share and store data remotely. The main aim of this submodule is to provide, when possible, a support for home rehabilitation by which therapists can check patients' advances also at distance. Concluding, a main feature of the framework is represented by the VE management, in which the user's Point-Of-View (POV) corresponds to a binocular virtual camera controlled by head's movements. Concerning the user interaction, in Fig. 2 the framework configuration is shown. The patient, equipped by an HMD combined with an IR stereo camera, is placed in front of the ToF camera. The interaction occurs totally touchless. The whole input for the framework is represented by the body movements of the patient. The therapist can manage, in real-time, the main parameters of the exercise (e.g., game speed) and, at the same time, can observe the patients' progresses. 83

Journal of Biomedical Informatics 89 (2019) 81–100

D. Avola et al.

Fig. 1. Overall design of the framework on which three main subsystems can be identified: User, Game Development, and Server. The first deals with the interaction between a user (i.e., a patient) and a loaded serious game. In particular, in this subsystem, the therapist can tune some game parameters (e.g., object features, speeds, difficulty levels). Moreover, the interaction process and the GRU-RNN evaluation can be real-time monitored. Finally, also the evolution of the rehabilitation performance can be observed over time. The second deals with the development of the rehabilitative exercises by the therapists. In this submodule a complete platform (see Appendix A) is made available for the prototyping of the customized games. Finally, the last submodule deals with both the management of the storage units (i.e., serious games and patients' evaluation repositories) and the management of the monitoring unit of the patients (to make available the framework both locally and remotely).

• Environment interaction: this module manages the interaction be-

3.1. Logical pipeline of the framework

tween the human model in the 3D environment and the real user's movements. In particular, it: – Associates the camera movements and orientation with the user's head coordinates; – Associates the user's 3D model movements and orientations with the user's body ones; – Associates the hands and finger model movements and orientations with the user's ones;

Fig. 3 shows the high-level pipeline of how the framework works in runtime stage. Each module has been designed for a specific task, presented below:

• ToF camera: this module (connected to a ToF device) acquires a

• •

• •

continue stream of depth maps from the rehabilitation environment. These maps are processed, in real-time, to distinguish, at each time instant, the background (i.e., static objects and structures in the environment) from the foreground (i.e., patient's body). The foreground is further processed to compute the skeleton and bubbleman data, by which to interact with the VE and measure the patient's movements. IR stereo camera: this module (connected to an IR stereo device) acquires a continue stream of grayscale stereo images of the nearinfrared light spectrum of the patient's hands. This information is processed to compute the tracking of the hands' model, including fingers data. HMD: this module (connected to an HMD device) receives the input streams from the other modules of the framework. It deals with a wide range of main tasks, including visualization, management of the coordinates of user's head position (by accelerometer, gyroscope, magnetometer, and IR marker combination data), data fusion (i.e., virtual data e patient's movements), and VR features management (i.e., VE and serious game parameters). Movement tracking: The main aim of this module is to manage the data related to the patient's movements (i.e., body and hands). Moreover, it processes and coordinates this information to allow its integration within the VE (in terms of 3D spatial coordinates). Exercise development: it corresponds to the preliminary setup phase. The therapist deals with the development of the VE, and when each aspect is designed, by an editor each element of the scenario can be managed. In runtime phase, the system creates the VE of the exercise within a 3D working space.

The module also manages events occurring inside a scene. It calculates the behavior that the system has to assume for each triggered event associated to the interactive objects. The most common operations are: Checking the end-conditions of an event; Checking the interaction-conditions of an event; Checking the 3D virtual coordinates of the patient with respect to the real body. Evaluation (GRU): this module, supported by a trained GRU-RNN, evaluates, in real-time, the patient's performance. Exercise storage: this module deals with the “Serious Games DB” (see Fig. 1). It manages the storing and retrieval processes related to the rehabilitative exercises developed by the therapists. A same exercise can be stored with one or more customized parameters. 3D rendering: this module deals with to build the graphical environment. In particular, it manages shapes, textures, shadows, and HMD 3D effect computation. The letter operation is computationally complex due to the multi-view rendering: there are two cameras, one per eye, with different POVs and, for each one of them, the repetition of the rendering process (for each frame) is required.

• • •

The just reported logical pipeline of the framework has been suitably designed to be abstracted from a specific technology. In fact, 84

Journal of Biomedical Informatics 89 (2019) 81–100

D. Avola et al.

Fig. 2. Framework configuration. During the setting of the system different main requirements must be respected to obtain the measurements, related to the rehabilitative exercises, with a high level of accuracy. The distances and the angle reported in the figure are referred to a set of specific devices (see Appendix A). In general, they depends on several factors, including saturation and optical of the ToF and RGB sensors, respectively. The human shape corresponds to the patient whose interaction with the serious game is totally touchless. The therapist (not shown in the figure) can manage the main parameters of the exercise (e.g., object features, speeds, difficulty levels) in real-time, while the patient is playing.

Fig. 3. Logical pipeline of the framework. The input of the framework is acquired by the following modules: ToF camera, IR stereo camera, and HMD. The data is conveyed to a separate module (movement tracking), whose aim is to process and track it. The computed information is conveyed to another module (environment interaction), whose aim is to coordinate and merge the whole data. To support the entire elaboration the following modules are required: exercise development, exercise storage, 3D rendering, and evaluation. The latter represents a core of the framework since it contains the trained GRU-RNN, by which to evaluate the patient's performance. Finally, the output, in form of visualization and interaction is conveyed to the HMD module. 85

Journal of Biomedical Informatics 89 (2019) 81–100

D. Avola et al.

Fig. 4. Body modeling: (a) RGB origianl image, (b) stickman, (c) bubbleman. The models (b) and (c) are built by using the depth maps stream acquired with the ToF camera. The depth information contained within the maps facilitates the distinction between background and foreground (i.e., patients), thus enabling the construction of the models.

within the schema we only require that a generic device is able to produce a specific input (e.g., a depth map) without regarding of how this input can be generated. This allows the framework to be updated even maintaining its overall design.

extremities). In addition, twenty-six joints for the hand provided by the IR stereo camera. Working on pivot joints, i.e., joints in common between hand and body models, a merged model has been obtained, thus enabling a more advanced coordinate tracking of body's movements and hand's articulation. – Twenty sticks, i.e., arcs that connect the joints corresponding to the body's parts, and nineteen sticks that connect the joints corresponding to the hand's parts. The bubbleman model is composed by: – A maximum of one thousand-centroid objects, composed of one thousand 3D-coordinates, which represent the centroids of an unordered cloud of spheres that composes the body's model. – A maximum of one hundred-centroid objects, composed of one hundred 3D-coordinates, which represent the centroids of a set of unordered spheres that composes the hand's model.

3.2. Hand and body modeling The modeling of the body is supported by the ToF camera. The device acquires the depth maps of its Field of View (FoV), which are subsequently processed to generate the models of the detected persons. The proposed framework provides two kinds of body model: a stickman (i.e., skeleton) and a bubbleman. The first follows the approach described in [91], while the second is a modification of the algorithm proposed in [1] (see below). In Fig. 4, an original subject (a), the linked stickman model (b), and the linked bubbleman model (c), are shown, respectively. The stickman model is used to represent the patient's body and consists of twenty-five joints [23]. On each joint, a set of features is computed, including speed, angle, and others. The features are made available to the therapists, which can use them as support to evaluate the patients' improvements. In addition, the collected features are also used as input of the GRU-RNN to evaluate automatically the patients' performance (see Section 3.3). In other words, the stickman model must be considered the reference model to check, manually or automatically, the patients' status. Differently, the bubbleman model consists of one thousand spheres that entirely cover the patient's body. This model is only used to identity the collisions between a subject and an object inside the VE. In other words, the bubbleman must be considered a volumetric avatar, whose aim is to highlight movements and interactions of the patient with the VE and with the objects contained in it. Notice that, thanks to several factors, including adopted devices, accuracy in the implementation of the framework, and optimization of the imported modeling algorithms, the interaction of both models with the VEs occurs in real-time. To provide a complete overview of the mentioned models, some details follow below:



Details regarding the stickman model can be found in [23] and [26], where the basic notions about the generation of a body's skeleton are reported. With respect to the bubbleman model, it represents an evolution of the modeling algorithm described in [1]. In the last, the authors propose an efficient sampling method to distribute uniform spherical samples on a sub-domain of a 3D space based on a minimum distance criterion between samples. According to the work reported in [69], the modifications introduced in the algorithm can be summarized as follows:

• Inside each sub-cube identified by the original algorithm, a set of •

the the the its

Instead of using only a sensor, the positional information is derived by combining data coming from ToF camera, IR stereo camera, and HMD positional information, thus computing the user's position as follows:



The stickman model is composed by: – Twenty-five joints for the body provided by the ToF camera, of which nineteen internal joints and six external joints (i.e.,

Head PosiHead = ( (PosXiToF

different layers is computed. The set is established according to following parameters: minimum and maximum distance of pixels from the camera, and radius of the spheres. The set of spheres is arranged inside each layer by computing minimum non-overlapped distance between a sphere and neighbors.

PosXiHead )2 , HMD

Head (PosYiToF

PosYiHead )2 , HMD

Head (PosZiToF

Head 2 PosZiHMD ) )

Body Body Body PosiBody = (PosXiToF , PosYiToF , PosZiToF ) Hand Hand Hand PosiHand = (PosXiIRsc , PosYiIRsc , PosZiIRsc )

86

(1)

Journal of Biomedical Informatics 89 (2019) 81–100

D. Avola et al.

where for each joint i in each body part (i.e., head, body, hand), the positions of the X, Y, and Z, coordinates, indicates with Pos, are chosen among ToF camera, IR stereo camera, and HMD positional tracking. The result is the position of the head coordinates calculated according to a Euclidean distance into the 3D space. It means that the head position of the human model is weighed by the position computed by the ToF camera and HMD positional tracking. For avoiding mismatch of these two devices, we accurately calibrated the anchor of the camera on the model. A similar approach is shown in [27], but, unlike that work, the method we present can customize the volumetric dimensions of the 3D human model.

calculation of the score values are provided in a next section (see Section 4). Anyway, this method is considered effective in the following cases:

• The therapist can assist the rehabilitation and can evaluate the exercise execution contemporary to the machine; ▪ The rehabilitation does not require an accurate judgement according to the neuromotor movements that the patient performs; ▪ The required computational power is low.

For increasing the effectiveness of the proposed system, we have introduced a DL based component that allows to evaluate actions and poses. This comparison method has been inspired by the work reported in [27]. Although outside the scope of the present paper, the DL component has been also introduced for evaluating the effectiveness of the proposed system, on the basis of the NTU RGB + D Action Recognition dataset [71], thus obtaining comparable results with the current literature in action recognition (see section 5). The method is based on a GRU-RNN [70] and a set of discriminative features. Concerning the GRU-RNN [70], it has been chosen since a rehabilitation exercise can be seen as a complex body gesture. The GRU-RNNs have similar performance with respect to the LSTM-RNNs, i.e., the networks used in [27], but their behavior depends on used data. In our experience, the input provided by the stickman model is more suited for the GRU-RNNs. Anyway, due to their simplified internal structure and absence of a gate (Fig. 5), the GRU-RNNs are more performing during the learning stage. Concluding, GRU-RNNs address better the data augmentation strategy [92], sometimes required for the quick development of serious games based on long sequences of body gestures [30]. Regarding the features, they have been inherited by the work reported in [27], i.e., positions of keens, average speed values of the knees, and others. Moreover, other features (not included in [27]) have been implemented:

3.3. VE creation for serious games Usually, rehabilitation consists of incremental exercises that, over time, can be customized. The proposed framework, as previously reported, allows therapists to modify existing serious games or to create new ones. The development of the VEs is based on two main aspects: exercise type and customization. The system makes available to the therapists different sets of prebuilds and presets, thus maintaining a high level of customization freedom. At the same time, several limits are imposed to avoid mistakes and to make easy the building process. Polling spawn regulation and the use of not negative integers, for speed and weight, are some examples of limitations. The system can manage the VEs by three functions: 3D space creation, interactive object deployment, and environment customized values. The first deals with the terrain and the surrounding scenario. This function is linked to the exercise customization and it is also linked to the visive impact for the user. In addition, with this function it possible to insert customized parameters for each interactive element of the scene. Environmental modifications can be also automatically managed by the engine that leads the VE, e.g., if the therapist wants to create a very long walk for a specific exercise, the system can create a proportional area for hosting it. Concerning the interactive object deployment, it is designed to be extremely customizable. The framework allows to add all types of simple geometric forms, plus some prebuild objects (e.g., conic obstacles) inside the environment. In addition, the framework allows to schedule the behavior of each of them, including spawn position, dimensions, performed movements, speed, and events when collisions occur. The customization phase also allows to define the user's model starting point and to set some other values before saving the exercise. These parameters can be summarized as follows:

• Positions and angles of all internal joints; • Positions and accelerations of all external joints; • Distances between related couples of joints (e.g., knees). It is very important to observe that these features can be considered as “general features”, which are selected and customized according to the specific serious game. The data that are involved in the evaluation phase are related to the exercise that the patient is executing. In fact, for each exercise some specific data from body parts are obtained and trained. Also the therapist can populate the dataset for custom exercises, but the results are still not guaranteed because a high number of instances are needed for obtaining reliable results. Even if not a focus of the present paper,

• End conditions: points to collect, number of mistakes, time elapsed, and others; • User proportions: dimensions of the user inside the entire environment; • Back-end monitoring functions: recording session, viewing bubbleman or stickman, enabling remote monitoring, and others.

During the building process, two suites (see Appendix A) for the management of both the 3D environment and the customization of the parameters, respectively, are shown to the therapist to support the manipulation of the objects in the scene. These editors were populated by some presets for expanding the customization possibilities. 3.4. Methods for the evaluation of the patient's performance As previously reported, in this paper two types of evaluation approaches are used. The first consists of a simple scoring method, while the second is based on a GRU-RNN. In the scoring method, the patient collects a set of “points” (like in a video game) during the execution of the serious game. When an action is correctly performed, or a mistake occurs, the score value increases or decreases, respectively. Finally, a summary of the values defines the result obtained by the patient at the end of an entire rehabilitative session. More details about the

Fig. 5. GRU scheme. In the reported architecture r and u are the reset and update gates. The first determines how to combine the new input with the previous memory, and the second defines how much of the previous memory to keep around. A and Ā are the activation and the candidate activation. 87

Journal of Biomedical Informatics 89 (2019) 81–100

D. Avola et al.

Fig. 6. Deep learning architecture. From left to right: a set of features (e.g., positions and angles of internal joints) is extracted from the skeleton model of the patient. A complete gesture (i.e., the whole execution of an exercise) can be considered as a sequence of instances (from 1 to n is the sequence length). The sequence is processed by the network to provide the patient's status (i.e., output). Notice that, in our context, the deep learning architecture is not used to perform a classification task, but to quantify how much an exercise has been correctly performed in comparison to an optimal reference model (i.e., the model created by the healthy subjects).

some technical considerations on the implemented GRU-RNN architecture are needed. First of all, for each serious game, the network is trained by using data augmentation strategies to face the possible lack of training data during the definition of a novel exercise [30,92]. In our context, the augmentation strategies are extremely simplified, if fact they only regard the generation of range values starting from the real values acquired during the training stage. Notice that, the work we propose is not a classification task, but a quantification task. This means that the overfitting phenomena can be significantly limited, since the aim of the proposed network is only to calculate the degree of correctness on the basis of the same class. The dataset is composed by actions and poses performed by healthy persons, with no mobility defects. The results are provided in a percentage value that denotes the health status of the patient, i.e., the distance between the performance of the patient and the healthy persons. In particular, the prediction function, at softmax layer [72], provides a similarity score for each exercise. The score is the percentage of the status with respect to the healthy persons. Therefore, a score of 100% means that the patient is healthy, 80% means that the patient loses a certain percentage of motor ability (i.e., 20%) with respect to the specific serious game. In Fig. 6, the entire process of evaluation is shown. The sensors acquire information from patient's body, while the features are extracted from the skeleton and provided to the GRU-RNN. Notice that, the DL architecture proposed in this paper is a 2-stacked GRU-RNNs. This is due to preliminary empirical tests in which we have observed that a such number of layers can be considered a suitable compromise between accuracy and training time. The normalized output is used for providing the percentage of similarity related to the patient's status.

4.1. Rehabilitation exercises The therapist can see the bubbleman during the exercises (Fig. 7). The therapist can also see the stickman, anyway, as previously reported, the second model is mainly used to compute the features. 4.1.1. Plank walk The first exercise is designed for testing patients' equilibrium. In this scenario, the user has to walk on a wood plank between two peaks without “falling down”. It is not necessary to run the entire route for completing the exercise. In our specific case, the length of the plank was around two meters. The IR stereo camera is not required. Only the ToF camera and positional sensors of the HMD are required. Each step of the user is reproduced as a proportional shift of the virtual model in the scene. During the execution of the exercise, the therapist can monitor the entire body of the patient. It is important to notice that, in this specific case, there is also a dizziness effect caused by the height: the user can feel himself 20 m over the terrain on a 60 cm width plank. This fact increases both the motivational factor and the self-control in a completely unusual situation, thus improving the focus on body's movements. According to personalization characteristics of the proposed system, the therapist can customize the distance to cover, the thickness of the plank, and the distance between the plank and the ground. However, we have used the same measures for each the execution during the tests. The score has been calculated as follows:

Se1 =

4. Test environment and serious game definition

(

1 t

+d

)

r

midE (2)

where the t is the time in seconds needed for covering a distance d in meters, midE is the middle value of equilibrium of the patient during the entire exercise, and r is the number of repetitions of the exercise. The value midE is the middle value of the barycenter movement. It is the sum of the distances between the barycenter starting value and its position for each frame. This “decreasing factor” and the starting value are proportionally calculated according to the following equation:

In this section simple scenarios, pilot serious games, and test environments are presented. Each proposed rehabilitation exercise has been recommended by expert therapists. Considering the customization possibilities, a set of three different typologies of psycho-motorial exercises has been selected to evaluate the effectiveness rate. These exercises were selected according to the main disease to treat: stroke. 88

Journal of Biomedical Informatics 89 (2019) 81–100

D. Avola et al.

Fig. 7. Body modeling: (a) RGB origianl image, (b) bubbleman of the patient in a virtual environment, (c) bubbleman of the patient in virtual environment with an interactive object.

The barycenter position consists in calculating the shift of shoulder_center, spine, and hip_center joints on X and Z axis from neutral position. In Fig. 8, the example of hip_center calculation in three different positions (a), (b), and (c) is shown. The projection of the hip should be in the middle between knees in rest position and on the support knee when a leg is raised. The other joints are calculated as well. A match between steps is need for comparing the walk. So, some gait principles are applied [73]. Using Dynamic Time Warping (DTW) the relative maximums (indicating the steps) are associated and, consequently, the data sequences are created. In Fig. 9, the egocentric and side view are shown.

SV = d + et decF =

((BCx

cBCx )2 + (BCy

cBCy )2 + (BCz

1

cBCz )2) 2

SV

midE =

onlyifcBCshift > threshold

decF (3)

where SV is the starting value, d is the distance, et is the expected time to complete the task, decF is the decreasing factor calculated on each step (frame), BC is the barycenter 3D point according to the “optimal position” provided by the therapist, cBC is the current barycenter position, and the threshold is the shift tolerance provided by the therapist. The features selected for the DL module are divided in intervals of thirty equidistant instances (as for the other two exercises) and can be summarized as follows:

4.1.2. Single stance This exercise is created for testing the patients' equilibrium and reflexes. It consists in a potentially infinite straight path with obstacles that the user has to avoid. The patient can rise the right leg at the right time instant to perform an obstacle jump. The system calculates if the bubbleman mesh collides with the obstacle. If it happens, a mistake counter is increased. The therapist can monitor the patient's movements. He can also set some parameters that customize the environment

• Barycenter position obtained by the stickman; • Distances between spine and feet, knees, hips, shoulders, and shoulder center joints (in 3D space), obtained by the stickman; • Angles between ankles, knees, and hips as well as angles between knees, hips, and spine, obtained by the stickman.

Fig. 8. Hip_center calculation: (a) Projections of hip_center joint on X and Z axis at rest position (rest – projection of hip_center in middle between ankles), (b) a step executed during exercise 1 (projection of hip_center on support ankle), (c) a step executed during exercise 2 in the optimal case (projection of hip_center on support ankle). As shown, it is in the middle of knees for the first pose and over (or strictly adiacent) the support knee during a step in cases (b) and (c). According to that, this data can contribute to identify a patient's wrong pose. 89

Journal of Biomedical Informatics 89 (2019) 81–100

D. Avola et al.

Fig. 9. Screenshot of the Plank Walk exercise from both egocentric and side view.

and the entire exercise, like the speed of the run, the number of obstacles, the distance between them, if they have to appear in only one side of the route, their height, and the number of allowed mistakes. Thanks to the VR features, the distance between the obstacles and the POV of the user is easier to define than with a not-VR vision system. The score is calculated as follows:

Se2 =

(d

m) r

hand and finger sequence, the time limit, and the number of allowed mistakes. The score can be calculated as follows:

Se31 =

(hp 1

Se32 =

(t

m)

(5)

e) (6)

r

where Se31 and Se32 are the scores for the first and the second task of the exercises, hp is the distance between the center of the end and the center of the sphere minus the radius of the sphere when collision occurs, t is the time spent by the patient to complete the exercise, e is the number of errors detected (wrong finger), and r is the number of repetitions. The features selected for the deep learning technique are the following:

(4)

where the t is the time needed for covering a distance d , m is the number of mistakes made by the patient in each run, and r is the number of repetitions of the exercise. The features selected for the DL module can be summarized as follows:

• Stickman: barycenter position; • Stickman: distances between all joints and spine joint (in 3D space).

• IR stereo camera: movement smoothness for each finger. Temporal and accuracy scores cannot be easily calculated due to a too wide range of variables. So, according to therapists, the most suitable function is to calculate the smoothness of movements of the patient’s fingers during the exercise execution. In particular, according to [74,75], DLJ and LDLJ formulas are the only valid measurements for calculating movements smoothness:

The barycenter is calculated according to the method for the first exercise. Depending on the distances between each joint and the spine joint, the features are parametric and it does not matter where the patient is in the scene. In Fig. 10, the egocentric and side views before (a) and after (b) the obstacle jump action are shown. 4.1.3. Hands and fingers interaction For testing the upper arts, we have created an exercise for hands and fingers. Both are captured by the IR stereo camera and reproduced by a 3D model inside the VE. The user must complete two different tasks: in the first, the user has to spring up from balls thrown on him/her from different directions and at different speeds; the second consists in touching a sphere in front of him/her with the right fingers. The first exercise is focused on testing reaction speed, the second is focused on testing the precision of well-known neuro-motorial movements. In the first exercise the user can deflect arriving balls with the entire hand (and part of the arm) without any movement restriction. In these exercises, the system can identify mistakes without bubbleman mesh collision. In the first task, if there is no deflection action (the user misses the coming ball) a mistake counter is increased. In the second task, the fault is calculated over time: if the user does not touch the ball with the right fingertip within the time limit, a mistake counter is increased. For each typology of exercise there are various customizations: in the first, it is possible to set the speed limit of the balls, the wideness of the angle from where the balls are launched, the number of balls to deflect, the number of maximum allowed mistakes, and the balls colors. In the second, the therapist can set the ball dimension, the

DLJ = LDLJ =

(t2 t1)5 2 vpeak

t2 t1

d 2v (t ) dt 2

2

dt

ln |DLJ |

(7) (8)

where t is the time, t1 and t2 are the start and end times of the analyzed movement and Vpeak = maxt |t 1, t2 | v (t ) is the peak of speed. In Fig. 11 (a) and (b), the egocentric and side view of fingers exercises are shown. The final score of each exercise is calculated according to a percentage value, where 100% is the perfect score (no errors) and 0 points or negative values are equivalent to 0%. 5. Framework evaluation and comparisons The framework evaluation was performed by using different approaches. Due to the lack of standard testing methods for this kind of systems, we carried out some self- and cross-evaluations to prove the accuracy of the obtained results and the effectiveness of the proposed approach. As a preliminary step, a calibration was performed by measuring the positions of both the real body joints of the user and the linked virtual 3D model (in their respective environments) in order to compute the possible incoherence. The right arm's movements were 90

Journal of Biomedical Informatics 89 (2019) 81–100

D. Avola et al.

Fig. 10. Screenshot of the Single Stance exercise from both egocentric and side view before (a) and after (b) obstacle jump.

• Customization level: level of customization for environments and serious games; • Usefulness of functionalities: how much appropriate the im-

used. We also measured the height and the distance from the spine base of the elbow of a user in rest position. Then, we made the same measurements after asking the user to lift the stretched arm in other position. We registered the first and the second couple of values. While the user was performing these movements, the system captured data from the body and proportionally moved the 3D human model. We computed the same distances, in meters, in the VE. Finally, we compared the respective results. We obtained a difference of 0.08 m in the first position and 0.02 m in the second position. The same operation was repeated for the leg, considering the right ankle and the spine base. We obtained similar results: 0.03 m for the first and 0.02 m for the second. According to the judgements of the therapists, these calibration values were considered totally acceptable. To evaluate the effectiveness and the usability of the system, we also performed tests with two categories of users: a group of therapists and a group of patients. The first was composed by 7 expert physiotherapists with essential computer science knowledge and skill. They were trained from a developer on how to use the system and customize the exercises. After this phase, they created a new exercise and modified two existing ones. Their experience was reported following a graded scale of personal opinions (Table 1). In Table 1, the labels of the columns represent the following meaning:

• • •

plemented functions are to support the development and monitoring of the rehabilitation exercises; Scoring method accuracy: the degree of precision of the scoring method; Deep learning accuracy: the degree of precision of the GRU-RNN based evaluation; General opinion: overall judgement on the entire framework.

The collected results can be considered very significant. We observed that, at least initially, some therapists tended to prefer the old rehabilitation methods. In fact, during the first look, they seemed intimidated by the use of the framework. However, after a few minutes they began to become more and more familiar with the proposed framework. They also underlined the high accuracy of both evaluation methods (i.e., scoring and GRU-RNN). Summarizing, they have highly appreciated the scoring method since it can be considered as a tool for the real-time monitoring. Some perplexities about the total autonomy of the GRU-RNN (even if they have confirmed its high accuracy). We used a similar approach to evaluate the performance with patients. In a pool of > 120 available neuro-motorial disease affected persons, 92 of them were selected (with consent of the ethics review board) to participate as test users. The average age was 40 years. Only

• Ease of use: level of usability of the framework; 91

Journal of Biomedical Informatics 89 (2019) 81–100

D. Avola et al.

Fig. 11. Screenshot of the finger Interaction exercise from both egocentric and side view, while the user is touching the target with the left thumb (a) and the left index (b).

one of them was very young (28 years) and one very old (78 years). Half of them were females and the others were males. We grouped them with two methods: by severity of the disease and by age. The severity was evaluated by the 7 experts before proceeding with the experiments. Each therapist exploited one of the developed exercises for adapting it to the rehabilitation needs of one of his patients. The judgement parameters for patients can be summarized as follows:



The values reported in Table 2 show an overall appreciation of the framework. Notice that, Table 2 shows personal judgments based on relevant needs. These results represent rounded values for each group. The motivational factor is one of the most important features of our system. The low score in comfort level is due to a quite constrictive equipment (i.e., HMD over the face connected with cables on the terminal) and few latency problems. Another problem is related to the computational intensity due to fast movements of the user. It is principally linked to the combination of fast POV change and hardware limit. In Graph 1, instead, the monthly rehabilitation progress of each group is reported. We can highlight the general linearity of each curve. In fact, the framework is able to truly involve the patient. The motivational factor is denoting

• Motivational factor: how much interest the patient has in doing exercises. • Comfort level: level of comfort felt during run-time. It includes •

correlating real movements with those virtual. General opinion: general judgement of the entire framework.

hardware and software: the first concerns the comfort in wearing the HMD and the second concerns the use of the immersive VR. Movement correlation precision: each movement of the patient is captured and reproduced inside the virtual environment. This parameter denotes how much accurate and reactive the system is in

Table 1 Collected results from the therapists' opinions. Each parameter has been classified in a graded scale from 0 to 10 where higher value corrisponds to “better”.

Therapist Therapist Therapist Therapist Therapist Therapist Therapist

1 2 3 4 5 6 7

Ease of use

Customization level

Usefulness of functionalities

Scoring method accuracy

Deep learning accuracy

General opinion

8 6 5 6 7 7 10

8 7 7 9 6 8 7

6 7 6 6 8 7 8

9 10 9 10 10 8 10

8 7 7 7 8 9 9

8 7 7 8 8 9 9

92

Journal of Biomedical Informatics 89 (2019) 81–100

D. Avola et al.

Table 2 Collected results from the patients' opinions. Patiens have been grouped based on disease severity. Each parameter has been classified in a graded scale from 0 to 10 where higher value corrisponds to “better”. These results are the rounded middle values for each group.

Group Group Group Group Group Group

1: 2: 3: 4: 5: 6:

light disease middle-light disease middle disease middle-high disease high disease critical disease

Motivational factor

Comfort level

Precision of movement correlation

General opinion

9 7 9 10 9 9

6 6 7 6 7 8

7 9 8 9 8 8

8 8 9 9 8 8

Rehabilitation progresses based on diseases 100% 80%

79% 68% 58%

72% 63% 55% 43%

60% 40%

28%

20%

Week 1

Group 1

97% 86%

93%

87%

76%

72% 61%

44%

47%

62% 50%

30%

31%

33%

Week 2

Group 2

99% 91%

Week 3

Group 3

Group 4

Week 4

Group 5

Motivational factor

Comfort level

Precision of movement correlation

General opinion

< 40 40–55 56–65 > 65

9 10 9 9

6 7 6 7

8 9 9 8

9 8 9 9

Group 6

dataset with sequences of 24 healthy subjects and obtained 50 instances from each exercise, i.e., 1200 gestures for each class. To obtain more accurate results with respect to a single net, a 2 stacked GRU architecture was deployed. During the training, we obtained the best tradeoff, between time and accuracy, by using 4 layers and 600 epochs for each net. This configuration is obtained after testing the system with different number of layers. Graphs 3 and 4 show the comparison between accuracies while training the system with different epochs at layers 5 and 6. As reported, the best result is obtained at layer 4 and, thanks to the high efficiency of GRU cells and the low number of features, the time needed for training with 4 layers can be considered acceptable. In fact, for each net, the system employed about 1 day to train the model. According to the other network parameters, a learning rate of 0:001 and a batch size of 5 were the best values for providing optimal results with the considered data sequences. As mentioned before, we set this deep learning module to provide results in a percentage scale, using the distance from perfect execution. We asked the therapists to do the same, watching the exercise execution of each patient and the related obtained score (according to the proposed scoring method), in a rounded scale. In Table 4 the average values of therapists and system are shown, where the average is calculated on each exercise and each patient grouped according to severity (see Table 2). Subjective results collected in Tables 1–3 and Graphs 1 and 2 underline the efficacy of the system on young subjects, where the

Table 3 Collected results from the patients' opinions. Patiens have been grouped based on age. Each parameter has been classified in a graded scale from 0 to 10 where higher value corrisponds to “better”. These results are the rounded middle values for each group. Age

an important improvement in enticing the patient to maintain some consistency in exercising. The same analysis is performed on age grouping (Table 3 and Graph 2). The entire case study is supported by a further test that regards the grade of usability of the framework: the System Usability Scale (SUS) [76]. The result is quite satisfying, providing a score of 87, which is better than the average. 5.1. Deep learning module evaluation Since the scoring method is not comparable with the current literature, only the deep learning method is examined. We populated the

Rehabilitation progresses based on diseases 100% 80% 60%

82% 80%

88%

84% 76% 75%

72% 73%

96% 86% 78% 77%

98% 89% 81% 79%

40% 20% Week 1

Week 2

<40

Week 3

40-55

56-65

Graph 1. Rehabilitation progress collected in 4 weeks on all patients grouped by disease severity. Groups goes from light disease (Group 1) to critical disease (Group 6). On y axis is defined the general percentage of health condition (median for each group), this score is given by a therapist. As shown, the rehabilitation progress for each group can be considered linear, but it is slightly faster in lighter disease cases. The patients involved in this study are 76.

Week 4

>65 93

Graph 2. Rehabilitation progress collected in 4 weeks on all patients grouped by age. Groups goes from young (< 40) to old (> 65). On y axis is defined the general percentage of health condition (median for each group), this score is given by a therapist. As shown, the rehabilitation progress for younger people is faster due to the combination of stronger bodies and a higher motivational factor provided by the system. The patients involved in this study are 76.

Journal of Biomedical Informatics 89 (2019) 81–100

D. Avola et al.

rehabilitation curve is growing faster than in older subjects. Concerning the improvements of the patient, for each characteristic, like reaction time, speed, strength, endurance, range of movement, and coordination, the growing curves are proportional to the ones shown in Graph 1. It is also possible to compare hardware components. However, the literature has already decreed Microsoft Kinect V2, the depth + RGB camera we used, one of the best low cost device among the ToF cameras [5,77,78] and a better device than structured light powered ones [79,80]. Concerning results shown in Table 4, the system is considered quite accurate according to therapists’ opinion. In fact, the differences of system and doctors scores are between 0 and 10 percentage points. The therapists underline that this gap is negligible for evaluating the progress of rehabilitation of a patient. The therapists mainly appreciated the system according to the following characteristics:

Table 4 Comparison between therapists’ average scores and systems’ ones of all exercises for each group of patients. The groups are divided according to disease severity, from light (group 1) to severe (group 6).

1. Full customization of the environment, the exercises and the scoring method; 2. The possibility to choose between a volumetric and a skeletal of the patient; 3. The possibility to ask for a support by deep learning module in analyzing health status of the patient; 4. The possibility to remotely manage almost everything.

sample, number of available patients compliant to the experimental protocol, and other. In addition, the whole set of key works referenced in the current state-of-the-art has a number of patients entirely comparable with those we propose. The reason is that usually, in rehabilitation, even significant results on a limited set of patients can be considered a real appreciable starting point for the developing of very innovative methods. Another aspect that must be considered is that, in our approach, the training stage is performed by using an augmented data strategy, this means that the produced optimal models can be considered really trustable, as shown by the reported performance. Finally, just to face this issue, in this paper the proposed method is also compared with a set of works known in the current literature (see below), thus demonstrating its effectiveness and accuracy.

They consider the proposed system one of the most accurate and complete tools in rehabilitation monitoring. Finally, we introduced another comparison. It is related to the curve of results collected during rehabilitation period. We selected four patients with similar neuromotorial diseases and severity, according to therapist. Their mean age was around 45 years old and there were two females and two males. As shown in Graph 5, patient 1 and 2 were assisted for four weeks by our system in their rehabilitation exercises and their improvement curve grew faster than the other two patients’ ones. In particular, there is an average of 13% of motorial function recover more in comparison with patients that did not use the proposed system. A clarification is necessary on the statistical significance of the reported results. We know that the sample size could be considered limited, especially if compared with other computer vision areas (e.g., event recognition). Anyway, different aspects must be taken into account. First of all, in rehabilitation field is very difficult to manage very large datasets for different reasons, including homogeneity of the

Therapist estimation \System estimation

Group of patients 1

Group of patients 2

Group of patients 3

Group of patients 4

Group of patients 5

Group of patients 6

Therapist Therapist Therapist Therapist Therapist Therapist Therapist

90\94 95\94 95\94 90\94 90\94 85\94 100\94

90\86 85\86 95\86 90\86 90\86 80\86 80\86

85\88 85\88 85\88 80\88 85\88 80\88 80\88

75\68 70\68 70\68 60\68 65\68 70\68 70\68

50\40 50\40 40\40 45\40 40\40 45\40 50\40

15\21 25\21 25\21 20\21 20\21 30\21 30\21

1 2 3 4 5 6 7

5.2. Comparisons with related systems It is possible to compare the proposed system with other frameworks developed for the same goal. Unlike other computer vision application areas in which it is possible to easily compare algorithms and methods, in our case there are too many factors that should be considered. In fact, different approaches can be used for achieving the same result. So, a comparison between similar full or partial body rehabilitation systems based on NUI is proposed. In particular, we selected 8 systems [13,27,45,46,81–84], each one powered by specific sensors and algorithms.

Graph 3–4. GRU accuracy based on the number of layers involved. Graph 3 (left) shows the accuracy where the number of epoches is fixed. Graph 4 (right), instead, shows the accuracy when epoches are increased for layers 5 and 6. 94

Journal of Biomedical Informatics 89 (2019) 81–100

D. Avola et al.

Graph 3–4. (continued) Graph 5. Rehabilitation progress collected in 4 weeks on 4 patients. Patient 1 and 2 were assisted by our system. Each one suffers by neuro-motorial disease on the right leg. Vertical axis indicates the health status level of the leg according to a personal scale of a therapist from 0 (completely immobilized) to 1 (perfectly working). During the time, assisted patients are showing better improvements, it means that they are recovering faster than the others. This study is conducted with 72 patients.

Rehabilitation progresses - Right leg mobility percentage 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%

Week 1

Week 2

Patient 1 - Assisted

Week 3

Patient 2 - Assisted

Week 4

Patient 3

In Table 5, comparable parameters between systems are shown. Each one of them presents some limits. As previously mentioned, the systems with touchless technology are less precise than the others, but more hygienic and versatile. In terms of comparison, the lower quality of data can derive from numerous factors, like a more complex type of information, a slower translation of signals, a lower data rate and a potentially higher error rate. On the contrary, the ease of use is maximized thanks to the fact that the user can simply interact with the

Patient 4

system without any control device in his/her hand or on his/her body. It simplifies the interaction at expense of computation. As shown in Table 5, four works able to capture the entire body are selected. Despite the fact that they are more versatile than the ones dedicated to single body parts, they are more limited than the proposed system. We have also to consider that it is harder to identify an entire body than the single parts of it. The margin of error is higher too. The proposed join of data from ToF and IR stereo camera is strongly

Table 5 Comparison between proposed system features and related systems.

Proposed System Avola et al. [27] Shiratuddin et al. [81] Saini et al. [45] Sosa et al. [46] García-Martíinez et al. [82] Meng et al. [83] Vonach et al. [13] Robitaille et al. [84]

Realtime system

Exercise personalization

Full Body detection

Sensor Type

Touchless

Calibration

Immersion

Yes Yes Yes Yes Yes Yes

Yes Yes No No No No

Yes Yes No Yes Yes No

Yes Yes Yes Yes Yes No

Not needed Not needed Not needed Not needed Needed Not needed

Full No No No No No

Yes Yes Yes

No No No

No Yes No

Kinect v2 Kinect Kinect (hands only) Kinect Kinect 3D tracking pose estimation system + Custom tactical controller Myo Armband Wearable Wearable and vision

No No Partial

Not needed Not needed Not needed

Partial Full Full

95

Journal of Biomedical Informatics 89 (2019) 81–100

D. Avola et al.

of accuracy. The actions that require to be in a static pose are the most difficult to recognize due to both the occlusions and the proximity of the skeleton's joints. Anyway, also these results show the accuracy of the proposed approach. Notice that, the architecture of the GRU-RNN has been designed and analyzed to obtain good results according to the length of the gestures. The overall accuracy on each class can be considered satisfactory.

Table 6 Accuracies comparison of proposed and other skeleton-based methods on NTU RGB + D dataset. Method

Best performance

Lie Group [87] HBRNN [88] 2 Layer RNN [71] 2 Layer LSTM [71] Part-aware LSTM [71] ST-LSTM [89] ST-LSTM + Trust Gate [89] JTM [90] Proposed method

52.76% 63.97% 64.09% 67.29% 70.27% 76.10% 77.70% 75.20% 78.64%

6. Conclusions and future developments The proposed work consists in the development of a complete and innovative rehabilitation framework. It can manage touchless interaction and exploits VR for improving user experience. At the same time, it is designed to simplify the therapist work, allowing a high degree of customization. It also stores patients' information during their re-

Graph 6. Accuracy for each action on the NTU RGB + D. The label of each action has been replaced with the associated ID.

increasing the precision of hands tracking despite the use of a single ToF device. We obtained a full body tracking system with similar results of single body portions tracking systems. Always referring to Table 4, we can denote that the proposed system, unlikely all the others, is able to provide VR environments, therefore the immersion for the user, without using full body wearable sensors. In literature there are numerous works [85] that use VR to rehabilitate patients, but each one of them is presenting at least one of the limits treated in this discussion. The main categories have been already explored according to the examined works [13,27,45,46,81–84]. Another comparison is necessary due to the introduction of a deep learning algorithm. The most suitable system as direct competitor of the proposed framework is [27] for its high similarity. According to presented results, the average of percentage points for all the exercise is around 3.1 in both. It means that the accuracy is approximatively maintained. However, developing an equivalent LSTM to the [27] one and comparing the efficiency with the same hardware on the same exercises, we obtained a reduction of 3% of training time. When comparing the method with systems designed for different aims, we obtained the results collected in Table 6. It is retrieved from [86]. It presents a system for recognizing actions performed by humans and captured by RGB-D cameras. However, the method is focused on the RAW depth data, so, we performed our comparisons to the skeletonbased methods. Accuracy tests are performed on the NTU RGB + D dataset [71] and, for a fair comparison, the same method that in [71] was used is applied. It is important to underline that the values of Percentage of Correct Parts (PCP) and Percent of Detected Joints (PDJ) are related to skeleton recognition algorithm and the ToF sensor. It means that the evaluation is conducted on pose estimation accuracy only. The Graph 6 shows details about the obtained results on the NTU RGB + D dataset. We can observe that none of the values is under 50%

habilitation process. The therapists can monitor the progresses in realtime. The system requires of a ToF camera, an IR stereo camera, and an HMD with positional sensors. Tests with users show that the system is robust, quite easy to use, and useful for its goal. However, some limits are still present. When a therapist wants to develop an exercise, the operation is still not so immediate and needs a training phase. Concerning the patient, part of the equipment is still wearable and sometimes it could provide a little sickness feeling due to the latency problems. Tests with real therapists provided good results in terms of interaction and effectiveness. Improvements of the system are planned. For example, the dimension of the bubbleman model spheres could be reduced until they reach the dimension of a single pixel. This operation could improve the user's 3D model and could provide a better feedback during the run-time. In addition, it could be also possible to extend the customization parameters, thus allowing to import interactive objects or complex VEs. Concerning concrete applications, the hardware cost and the usability can promote the introduction of the proposed framework in hospitals as well as health facilities. According to the therapists ' judgements, the proposed system can be considered innovative and accurate enough to offer a new paradigm for future rehabilitation treatments. Advanced HMDs in a forthcoming prospective will be lighter and easier to wear, thus providing better solutions to the current hardware limits. Moreover, faster machines will reduce latency and improve overall performance for generating more accurate models. Finally, the new frontiers of the unsupervised deep learning techniques will be able to provide unexpected results also in this fascinating and useful application field. Conflict of interest We have no conflict of interest to declare.

96

Journal of Biomedical Informatics 89 (2019) 81–100

D. Avola et al.

Appendix A The hardware configuration used for the development environment is composed by an Intel i7 5930k CPU, 16 GB 2400MHZ DDR4 of RAM memory, a NVIDIA GTX 970 4 GB Video Card and an OCZ RevoDrive 3 X2 240 GB. The machine configuration is principally designed for VR features since Oculus Rift DK2 requires high computational performance and data rate speed for smoothness effect. However, the overall computational power is higher than the minimal suggested one for reasonable working conditions according to Oculus, LMC and Kinect needs (working together at the same time). The OS Environment is Windows 10 Professional 64bit Creators Update equipped with Unity3D Version 2017.2. The deep learning library used for the testing phase is Tensorflow sharp, linked with Unity3D. It is one of the most suitable versions for managing deep learning algorithms according to the current VEs. In fact, it shares the programming language (C#) with Unity3D and it includes all the features of classical version of Tensorflow anyway, like GRU cells. 1. Physical architecture In Fig. 1A, the physical architecture of the framework is shown. It is composed by two layers: (a) Platform, (b) Presentation & Application. Details about Presentation & Application layer are reported below. In particular, considerations about hardware and Software Development Kit (SDK) configuration, input acquisition, hand and body modeling, VE building, serious games rules establishment, and rehabilitation evaluation criteria, are reported. In Fig. 2A, the system's class diagram is shown. The main parts of it are the interfaces, the deep learning module, the core engine, and the input manager. In particular, the first one is related to the GUI and the interactive functions. The deep learning works with the linked library for providing

Fig. 1A. Physical architecture of the framework.

Fig. 2A. Class diagram of the framework. 97

Journal of Biomedical Informatics 89 (2019) 81–100

D. Avola et al.

its related functions. Notice that, we are referring to train and predict operations. The core engine is the exercise manager and the general helper for the entire system. Finally, the input manager directly communicates with the different SDKs of the devices. 1.1. Oculus Rift DK2 As previously mentioned, an HMD equipped with head tracking sensors is one of the best choices for providing a feeling of immersion. In this sense, the Oculus Rift DK2 is a complete device. It is composed by a multisensory positional system, a High-Definition (HD) display, a pair of biconvex lenses, and an external case with strap and sponge. The device provides an enjoyable 3D effect through display and lenses combination: the screen is divided in a left part and a right part. Images from left and right cameras, inside the environment, are shown in each respective section. The lenses converge the images in a 3D vision effect. The positional multi-sensor is composed by a combination of an accelerometer, a gyroscope, and a magnetometer, plus an IR sensor. It provides a very fast and accurate identification of the head's movements. The conic IR capture area is large enough to avoid the loss of the patient's movements tracking during the rehabilitation exercises. This device is also relatively cheap, considering its tracking precision and display resolution. Latency between head's movements and visive response on the display is one of the most important characteristics of the device. If the latency is too high (over 20 ms), e.g., when a complex environment rendering occurs, the patient could suffer of nausea effects or other similar diseases. For this reason, the scenarios are optimized for avoiding high visual depth, high-quality shadows, or detailed surfaces. We also used a high-performance hardware to further reduce this motion sickness effect. Finally, Oculus Rift DK2 SDK is completely supported and integrated inside the Unity3D platform. Concerning freedom of movements, this HMD is still wired and needs to be framed into the FoV of the IR sensor. It means that the user must stay inside a specific area. Usually, the cable does not create any problem while a user is performing an exercise. In fact, only when a user turns around of 360 degrees the wire can be an obstacle. However, serious games that require this kind of action are not usual. A user with neck injurie and limited movements of the head can enjoy the HMD effect too because also the lonely 3D vision is sufficient for a quite immersive experience. Notice that, not all the subjects are suitable for the use of this device, since it can cause motion sickness to users with visive diseases. 1.2. MS Kinect V2 For tracking the entire body's movements, the system uses the Microsoft Kinect (MS Kinect) V2 device, one of the most popular NUIs for human body detection. Since the first version, this device has been aimed to support the skeleton reconstruction of the human body [63] by using depth maps [51]. The first version of the device, i.e., MS Kinect V1, used IR cameras and structured light technology for the 3D reconstruction of the shapes. The current version uses ToF cameras, thus providing an improvement in surface recognition. ToF imaging is the process of measuring the depth of a scene by quantifying the changes that an emitted light signal encounters when it bounces back from objects. 1.3. Leap motion controller Some specific neuromotor serious games for hands and fingers require a high precision during the motion recognition phase. MS Kinect V2, in this context, has different limits, i.e., distance from the device, occlusions, and smallness of the focus objects (i.e., fingers) are factors that decrease its accuracy. The Leap Motion Controller (LMC) can help to avoid some of the reported problems. This device identifies the patients' hands and track their movements in a ∼150 degrees angle of view. It also allows to reproduce a 3D virtual model of the hands, in real-time, thus improving the immersion effect. In the proposed framework, all the data structures and the features of the hand model have been inherited from the work described in [28]. The LMC is equipped with two IR cameras that identify the shapes of the hands and their positions in 3D space. The device can be frontally placed on an HMD, thus enabling the interaction process between the hand model and visor. Notice that, in this unusual placement, the reference plane of the LMC is translated, this fact needs to be managed during the running of the rehabilitation exercises and during the interaction processes. 1.4. Unity 3D platform The Unity 3D (and the linked 3D rendering engine) is one of the most popular platforms for the creation of advanced VEs. It is compatible with several devices and third-part software (plug-in and libraries), including the devices used for the proposed framework and the related SDKs. The platform, moreover, supports NVIDIA PhysX 3™ [64] and Box2D physics engines for the reproduction of physical effects. In addition, Unity3D has a highly optimized graphic pipeline for both DirectX [65] and OpenGL [66]. The features of this platform fully support the customization of the VEs designed for the rehabilitation exercises. Notice that, due to its versatility, the platform has been used in a wide range of application areas, in which high-performance was required [67,68]. Fig. 3A shows the Graphical User Interface (GUI), in Unity 3D, used to support the therapists during the definition of the serious games. In particular, the screenshot represents the management environment used by the therapists for the building of the rehabilitation exercises. The environment has been obtained by starting from the factory back-end panel of Unity 3D and by adding several prefabs, GUI modifications, and extensions, specifically designed to promote the development of the exercises. By using the environment, the therapists can modify an existing exercise or can create a new one. With regards the interactive rules, each object has a set of features and properties, which can be defined, modified, and deleted. For the managing of the general setting of the games a specific extension of the factory back-end panel has been developed, thus providing therapists with a simple and effective interface able to set patients' requirements, VEs parameters, serious games features, and other.

98

Journal of Biomedical Informatics 89 (2019) 81–100

D. Avola et al.

Fig. 3A. GUI, in Unity 3D, for the development of serious games. It contains setting commands and components to support the therapists during the definition of: VEs, rehabilitation exercise, interaction rules.

virtual reality headset, Int. J. Eng. Trends Technol. 13 (4) (2014) 175–179. [18] J.H. Crosbie, S. Lennon, M.D.J. McNeill, S.M. McDonough, Virtual reality in the rehabilitation of the upper limb after stroke: the user’s perspective, Cyberpsychol. Behavior 9 (2) (2006) 137–141. [19] B. Liu, Z. Wang, G. Song, G. Wu, Cognitive processing of traffic signs in immersive virtual reality environment: an ERP study, Neurosci. Lett. 485 (1) (2010) 43–48. [20] S. Invitto, C. Faggiano, S. Sammarco, V. De Luca, L.T. De Paolis, Haptic, virtual interaction and motor imagery: entertainment tools and psychophysiological testing, Sensors 16 (3) (2016) 394–411. [21] Y. Boxuan, F. Junwei, L. Jun, Residual recurrent neural networks for learning sequential representations, Information 9 (3) (2018) 1–14. [22] O. Wasenmuller, D. Stricker, Comparison of kinect v1 and v2 depth images in terms of accuracy and precision, Asian Conference on computer vision (ACCV), 2016, pp. 34–45. [23] Microsoft Kinect for XBOX ONE, Microsoft, http://www.xbox.com/it-IT/xbox-one/ accessories/kinect-for-xbox-one, 2018. [24] Leap Motion, Leap Motion Inc, https://www.leapmotion.com/product/vr, 2018. [25] Oculus Rift DK2, Oculus VR, https://www.oculus.com/en-us/dk2/, 2018. [26] D. Avola, M. Spezialetti, G. Placidi, Design of an efficient framework for fast prototyping of customized human–computer interfaces and virtual environments for rehabilitation, Comput. Methods Programs Biomed. 110 (3) (2013) 490–502. [27] D. Avola, L. Cinque, G.L. Foresti, M.R. Marini, D. Pannone, VRheab: a fully immersive motor rehabilitation system based on recurrent neural network, Multimedia Tools Appl. 1–28 (2018). [28] D. Avola, M. Bernardi, L. Cinque, G. L. Foresti, C. Massaroni, Exploiting Recurrent Neural Networks and Leap Motion Controller for the Recognition of Sign Language and Semaphoric Hand Gestures, in IEEE Transactions on Multimedia, 2018. [29] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput. 9 (8) (1997) 1735–1780. [30] J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling, Deep Learning and Representation Learning Workshop (NIPS), 2014, pp. 1–9. [32] L. Dipietro, A.M. Sabatini, P. Dario, Evaluation of an instrumented glove for handmovement acquisition, J. Rehabil. Res. Dev. 40 (2) (2003) 179–190. [33] L.K. Simone, D.G. Kampler, Design considerations for a wearable monitor to measure finger posture, J. Neuro Eng. Rehab. 5 (2) (2005) 1–10. [34] M. Bouzit, G. Burdea, G. Popescu, R. Boian, The Rutgers Master II-new design forcefeedback glove, IEEE/ASME Trans. Mechatronics 7 (2) (2002) 256–263. [35] CyberGrasp System, CyberGrasp Glove, URL: http://www.cyberglovesystems.com/ cybergrasp/, 2018. [36] X. Luo, T. Kline, H.C. Fischer, K.A. Stubblefield, R.V. Kenyon, D.G. Kamper, Integration of augmented reality and assistive devices for post-stroke hand opening rehabilitation, IEEE Annual International Conference on Engineering in Medicine and Biology, 2005, pp. 6855–6858. [37] L. Connelly, Y. Jia, M.L. Toro, M.E. Stoykov, R.V. Kenyon, D.G. Kamper, A pneumatic glove and immersive virtual reality environment for hand rehabilitative training after stroke, IEEE Trans. Neural Syst. Rehabil. Eng. 18 (5) (2010) 551–559. [38] L. Connelly, M.E. Stoykov, Y. Jia, M.L. Toro, R.V. Kenyon, D.G. Kamper, Use of a pneumatic glove for hand rehabilitation following stroke, IEEE Annual International Conference on Engineering in Medicine and Biology, 2009, pp. 2434–2437.

References [1] G. Placidi, D. Avola, M. Ferrari, D. Iacoviello, A. Petracca, V. Quaresima, M. Spezialetti, A low-cost real time virtual system for postural stability assessment at home, Comput. Methods Programs Biomed. 117 (2) (2014) 322–333. [2] T. Hachaj, M.R. Ogiela, The adaptation of GDL motion recognition system to sport and rehabilitation techniques analysis, J. Med. Syst. 40 (6) (2016) 137–146. [3] S. Liang, K.-S. Choi, J. Qin, W.-M. Pang, Q. Wang, P.-A. Heng, Improving the discrimination of hand motor imagery via virtual reality based visual guidance, Comput. Methods Programs Biomed. 132 (2016) 63–74. [4] G. Placidi, D. Avola, D. Iacoviello, L. Cinque, Overall design and implementation of the virtual glove, Comput. Biol. Med. 43 (11) (2013) 1927–1940. [5] A.E.F. Da Gama, T.M. Chaves, L.S. Figueiredo, A. Baltar, M. Meng, N. Navab, V. Teichrieb, P. Fallavollita, MirrARbilitation: a clinically-related gesture recognition interactive tool for an AR rehabilitation system, Comput. Methods Programs Biomed. 123 (2016) 105–114. [6] B. Wodlinger, J.E. Downey, E.C. Tyler-Kabara, A.B. Schwartz, M.L. Boninger, J.L. Collinger, Ten-dimensional anthropomorphic arm control in a human brain−machine interface: difficulties, solutions, and limitations, J. Neural Eng. 12 (1) (2014) 1–17. [7] P. Jorissen, M. Wijnants, M. Lamotte, Dynamic interactions in physically realistic collaborative virtual environments, IEEE Trans. Visual Comput. Graphics 11 (6) (2005) 649–660. [8] M.C.R. Harrington, Empirical evidence of priming, transfer, reinforcement, and learning in the real and virtual trillium trails, IEEE Trans. Learn. Technol. 4 (2) (2011) 175–186. [9] J.M.I. Zannatha, A.J.M. Tamayo, Á.D.G. Sánchez, J.E.L. Delgado, L.E.R. Cheu, W.A.S. Arévalo, Development of a system based on 3D vision, interactive virtual environments, ergonometric signals and a humanoid for stroke rehabilitation, Comput. Methods Programs Biomed. 112 (2013) 239–249. [10] K.E. Laver, S. George, S. Thomas, J.E. Deutsch, M. Crotty, Virtual reality for stroke rehabilitation, Cochrane Database Syst. Rev. 9 (2011) 1465–1858. [11] D. González-Ortega, F.J. Díaz-Pernas, M. Martínez-Zarzuela, M. Antón-Rodríguez, A kinect-based system for cognitive rehabilitation exercises monitoring, Comput. Methods Programs Biomed. 113 (2) (2014) 620–631. [12] K. Kim, M.Z. Rosenthal, D.J. Zielinski, R. Brady, Effects of virtual environment platforms on emotional responses, Comput. Methods Programs Biomed. 113 (3) (2014) 882–893. [13] E. Vonach, C. Gatterer, H. Kaufmann, VRRobot: Robot actuated props in an infinite virtual environment, 2017 IEEE Virtual Reality (VR), Los Angeles, CA, 2017, pp. 74–83. [14] R. Horaud, M. Hansard, G. Evangelidis, C. Menier, An overview of depth cameras and range scanners based on time-of-flight technologies, Mach. Vis. Appl. 27 (7) (2016) 1005–1020. [15] S. Foix, G.A. Lenyà, C. Torras, Lock-in Time-of-Flight (ToF) cameras: a survay, IEEE Sens. J. 11 (2011) 1–11. [16] J. Guna, G. Jakus, M. Pogacnik, S. Tomazic, J. Sodnik, An analysis of the precision and reliability of the leap motion sensor and its suitability for static and dynamic tracking, Sensors 14 (2) (2014) 3702–3720. [17] P.R. Desai, P.N. Desai, K.D. Ajmera, K. Mehta, A review paper on oculus rift-a

99

Journal of Biomedical Informatics 89 (2019) 81–100

D. Avola et al.

[67] A. Rodríguez, et al., A VR-based serious game for studying emotional regulation in adolescents, IEEE Comput. Graph. Appl. 35 (1) (2015) 65–73. [68] G. Aranyi, et al., Subliminal cueing of selection behavior in a virtual environment, Presence 23 (1) (2014) 33–50. [69] A. Toet, A morphological pyramidal image decomposition, Pattern Recogn. Lett. 9 (4) (1989) 255–261. [70] J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv, 2014. [71] A. Shahroudy, J. Liu, T.T. Ng, G. Wang, NTU, RGB+ D: A large scale dataset for 3D human activity analysis. arXiv preprint arXiv:1604.02808, 2016. [72] W. Byeon, T.M. Breuel, e.F. Rau, M. Liwicki, Scene labeling with lstm recurrent neural networks, Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3547–3555. [73] A. Switonski, A. Michalczuk, H. Josinski, A. Polanski, Dynamic time warping in gait classification of motion capture data, Proceedings of World Academy of Science, Engineering and Technology, 2013, pp. 1–6. [74] S. Balasubramanian, A. Melendez-Calderon, E. Burdet, A robust and sensitive metric for quantifying movement smoothness, IEEE Trans. Biomed. Eng. 59 (8) (2012) 2126–2136. [75] S. Balasubramanian, A. Melendez-Calderon, A. Roby-Brami, E. Burdet, On the analysis of movement smoothness, J. Neuroeng. Rehab. 12 (1) (2015) 112. [76] J. Brooke, SUS: a retrospective, J. Usab. Stud. 8 (2) (2013) 29–40. [77] D. Webster, O. Celik, Experimental evaluation of Microsoft Kinect's accuracy and capture rate for stroke rehabilitation applications, IEEE Haptics Symposium (HAPTICS), Houston, 2014, pp. 455–460. [78] H. Gonzalez-Jorge, B. Riveiro, E. Vazquez-Fernandez, J. Martínez-Sánchez, P. Arias, Metrological evaluation of Microsoft Kinect and Asus Xtion sensors, Measurement 46 (6) (2013) 1800–1806. [79] M. Samir, E. Golkar, A.A.A. Rahni, Comparison between the Kinect™ V1 and Kinect™ V2 for respiratory motion tracking, IEEE International Conference on Signal and Image Processing Applications (ICSIPA), Kuala Lumpur, 2015, pp. 150–155. [80] L. Yang, L. Zhang, H. Dong, A. Alelaiwi, A. El Saddik, Evaluating and improving the depth accuracy of Kinect for windows v2, IEEE Sens. J. 15 (8) (2015) 4275–4285. [81] M.F. Shiratuddin, A. Hajnal, A. Farkas, K.W. Wong, G. Legradi, A proposed framework for an Interactive Visuotactile 3D Virtual Environment system for visuomotor rehabilitation of stroke patients, International Conference on Computer & Information Science (ICCIS), Kuala Lumpur, 2012, pp. 1052–1057. [82] S. García-Martínez, F. Orihuela-Espina, L.E. Sucar, A.L. Moran, J. HernándezFranco, A design framework for arcade-type games for the upper-limb rehabilitation, International Conference on Virtual Rehabilitation (ICVR), Valencia, 2015, pp. 235–242. [83] Y. Meng, C. Munroe, Y.N. Wu, M. Begum, A learning from demonstration framework to promote home-based neuromotor rehabilitation, 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), New York, 2016, pp. 1126–1131. [84] N. Robitaille, P.L. Jackson, L.J. Hébert, C. Mercier, L.J. Bouyer, S. Fecteau, C.L. Richards, B.J. McFadyen, A virtual reality avatar interaction (VRai) platform to assess residual executive dysfunction in active military personnel with previous mild traumatic brain injury: proof of concept, Disab. Rehab.: Assistive Technol. 12 (2017) 758–764. [85] P. Rego, P.M. Moreira, L.P. Reis, Serious games for rehabilitation: A survey and a classification towards a taxonomy, 5th Iberian Conference on Information Systems and Technologies, Santiago de Compostela, 2010, pp. 1–6. [86] P. Wang, W. Li, Z. Gao, C. Tang, P.O. Ogunbona, Depth pooling based large-scale 3D action recognition with convolutional neural networks, IEEE Trans. Multimedia 20 (5) (2018) 1051–1061. [87] R. Vemulapalli, F. Arrate, R. Chellappa, Human action recognition by representing 3d skeletons as points in a lie group, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 588–595. [88] Y. Du, W. Wang, L. Wang, Hierarchical recurrent neural network for skeleton based action recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2015). [89] J. Liu, A. Shahroudy, D. Xu, G. Wang, Spatio-temporal lstm with trust gates for 3d human action recognition, European Conference on Computer Vision, (2016). [90] P. Wang, Z. Li, Y. Hou, W. Li, Action recognition based on joint trajectory maps using convolutional neural networks, Proceedings of the ACM on Multimedia Conference, 2016, pp. 102–106. [91] J. Han, L. Shao, D. Xu, J. Shotton, Enhanced computer vision with microsoft kinect sensor: A review, IEEE Trans. Cybern. 43 (5) (2013) 1318–1334. [92] A. Mikołajczyk, M. Grochowski, Data augmentation for improving deep learning in image classification problem, International Interdisciplinary PhD Workshop (IIPhDW), 2018, pp. 117–122.

[39] C. Munroe, Y. Meng, H. Yanco, M. Begum, Augmented reality eyeglasses for promoting homebased rehabilitation for children with cerebral palsy, ACM/IEEE International Conference on Human Robot Interaction, (2016) 565-565. [40] S. Rawat, S. Vats, P. Kumar, Evaluating and exploring the myo armband, International Conference on System Modeling Advancement in Research Trends (SMART), 2016, pp. 115–120. [41] K. Pokorná, Use of stabilometric platform and visual feedback in rehabilitation of patients after the brain injury, Prague Medical Repor 107 (4) (2006) 433–442. [42] I. Díaz, J.J. Gil, M. Louredo, A haptic pedal for surgery assistance, Comput. Methods Programs Biomed. 116 (2) (2013) 97–104. [43] C.G. Burgar, P.S. Lum, P.C. Shor, H.F.M.V. der Loos, Development of robots for rehabilitation therapy: the Palo Alto Stanford experience, J. Rehabil. Res. Dev. 37 (6) (2000) 663–673. [44] L.E. Kahn, P.S. Lum, W.Z. Rymer, D.J. Reinkensmeyer, Robot-assisted movement training for the stroke-impaired arm: does it matter what the robot does? J. Rehabil. Res. Dev. 43 (5) (2006) 619–630. [45] S. Saini, D.R.A. Rambli, S. Sulaiman, M.N. Zakaria, S.R. Mohd Shukri, A low-cost game framework for a home-based stroke rehabilitation system, International Conference on Computer & Information Science (ICCIS), 2012, pp. 55–60. [46] G.D. Sosa, J. Sánchez, H. Franco, Improved front-view tracking of human skeleton from Kinect data for rehabilitation support in Multiple Sclerosis, Symposium on Signal Processing, Images and Computer Vision (STSIVA), 2015, pp. 1–7. [47] W. Pei, G. Xu, M. Li, H. Ding, S. Zhang, A. Luo, A motion rehabilitation self-training and evaluation system using Kinect, International conference on ubiquitous robots and ambient intelligence (URAI), 2016, pp. 353–357. [48] M. Alimanova, S. Borambayeva, D. Kozhamzharova, N. Kurmangaiyeva, D. Ospanova, G. Tyulepberdinova, G. Gaziz, A. Kassenkhan, Gamification of hand rehabilitation process using virtual reality tools: using leap motion for hand rehabilitation, IEEE International Conference on Robotic Computing (IRC), 2017, pp. 336–339. [49] O. Postolache, F. Lourenço, J.M. Dias Pereira, P. Girão, Serious game for physical rehabilitation: measuring the effectiveness of virtual and real training environments, IEEE International Instrumentation and Measurement Technology Conference (I2MTC), 2017, pp. 1–6. [50] W.J. Li, C.Y. Hsieh, L.F. Lin, W.C. Chu, Hand gesture recognition for post-stroke rehabilitation using leap motion, International Conference on Applied System Innovation (ICASI), 2017, pp. 386–388. [51] L.A. Schwarz, A. Mkhitaryan, D. Mateus, N. Navab, Human skeleton tracking from depth data using geodesic distances and optical flow, Image Vis. Comput. 30 (3) (2012) 217–226. [52] E. Kollorz, J. Penne, J. Hornegger, A. Barke, Gesture recognition with a Time-ofFlight camera, Int. J. Intell. Syst. Technol. Appl. 5 (3/4) (2008) 334–343. [53] E. Biffi, C. Maghini, A. Marelli, E. Diella, D. Panzeri, A. Cesareo, C. Gagliardi, G. Reni, A.C. Turconi, Immersive virtual reality platform for cerebral palsy rehabilitation, Workshop on ICTs for Improving Patients Rehabilitation Research Techniques (REHAB), 2016, pp. 85–88. [54] S. Viñas-Diz, M. Sobrido-Prieto, Virtual reality for therapeutic purposes in stroke: a systematic review, Neurologia 31 (4) (2016) 255–277. [55] R.M.E. Moreira da Costa, L.A. Vidal de Carvalho, The acceptance of virtual reality devices for cognitive rehabilitation: a report of positive results with schizophrenia, Comput. Methods Programs Biomed. 73 (3) (2004) 173–182. [56] P. Wang, I.A. Kreutzer, R. Bjärnemo, R.C. Davies, A web-based cost-effective training tool with possible application to brain injury rehabilitation, Comput. Methods Programs Biomed. 74 (3) (2004) 235–243. [57] J.-R. Wu, M.-L. Wang, K.-C. Liu, M.-H. Hu, P.-Y. Lee, Real-time advanced spinal surgery via visible patient model and augmented reality system, Comput. Methods Programs Biomed. 113 (3) (2014) 869–881. [58] I. Kononenko, Machine learning for medical diagnosis: history, state of the art and perspective, Artif. Intell. Med. 23 (1) (2001) 89–109. [59] J. Ker, L. Wang, J. Rao, T. Lim, Deep learning applications in medical image analysis, IEEE Access 6 (2018) 9375–9389. [60] A.H. Marblestone, G. Wayne, K.P. Kording, Toward an integration of deep learning and neuroscience, Front. Comput. Neurosci. 10 (2016) 94. [61] A. Knight, S. Carey, R. Dubey, An interim analysis of the use of virtual reality to enhance upper limb prosthetic training and rehabilitation. ACM International Conference on Pervasive Technologies Related to Assistive Environments, 2016, pp. 1–4. [62] XML, https://www.w3.org/XML/, 2018. [63] T. Gonzalez-Sanchez, D. Puig, Real-time body gesture recognition using depth camera, Electron. Lett. 47 (12) (2011) 697–698. [64] NVidia, http://www.nvidia.it/object/nvidia-physx-it.html, 2018. [65] DirectX, Microsoft Support, https://support.microsoft.com/it-it/kb/179113, 2018. [66] OpenGL, https://www.opengl.org/, 2018.

100