Journal Pre-proof I-Support: A robotic platform of an assistive bathing robot for the elderly population A. Zlatintsi, A.C. Dometios, N. Kardaris, I. Rodomagoulakis, P. Koutras, X. Papageorgiou, P. Maragos, C.S. Tzafestas, P. Vartholomeos, K. Hauer, C. Werner, R. Annicchiarico, M.G. Lombardi, F. Adriano, T. Asfour, A.M. Sabatini, C. Laschi, M. Cianchetti, A. Güler, I. Kokkinos, B. Klein, R. López
PII: DOI: Reference:
S0921-8890(19)30496-8 https://doi.org/10.1016/j.robot.2020.103451 ROBOT 103451
To appear in:
Robotics and Autonomous Systems
Received date : 20 June 2019 Revised date : 8 January 2020 Accepted date : 27 January 2020 Please cite this article as: A. Zlatintsi, A.C. Dometios, N. Kardaris et al., I-Support: A robotic platform of an assistive bathing robot for the elderly population, Robotics and Autonomous Systems (2020), doi: https://doi.org/10.1016/j.robot.2020.103451. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
© 2020 Published by Elsevier B.V.
Journal Pre-proof
lP repro of
I-SUPPORT: a Robotic Platform of an Assistive Bathing Robot for the Elderly Population
A. Zlatintsia,∗, A. C. Dometiosa , N. Kardarisa , I. Rodomagoulakisa , P. Koutrasa , X. Papageorgioua , P. Maragosa , C. S. Tzafestasa , P. Vartholomeosb , K. Hauerc , C. Wernerc , R. Annicchiaricod , M. G. Lombardid , F. Adrianod , T. Asfoure , A. M. Sabatinif , C. Laschif , M. Cianchettif , A. G¨ ulerg,1 , I. Kokkinosh,1 , B. Kleini , and R. L´opezj a
Institute of Communication and Computer Systems (ICCS), National Technical University of Athens (NTUA), Greece b OMEGA Technology, Athens, Greece c Bethanien Hospital, Germany d Fondazione Santa Lucia, Rome, Italy e Karlsruhe Institute of Technology (KIT), Germany f Scuola Superiore Sant’Anna (SSSA), Pisa, Italy g Department of Computing, Imperial College London, UK h Department of Computer Science, University College London, UK i Frankfurt University of Applied Sciences, Frankfurt am Main, Germany j Robotnik Automation, SLL, Valencia, Spain
rna
Abstract
In this paper we present a prototype integrated robotic system, the I-Support bathing robot, that aims at supporting new aspects of assisted daily-living activities on a real-life scenario. The paper focuses on describing and evaluating key novel technological features of the system, with the emphasis
Jou
on cognitive human-robot interaction modules and their evaluation through a series of clinical validation studies. The I-Support project on its whole ∗
Corresponding author: Athanasia Zlatintsi, School of ECE, National Technical Univ. of Athens (NTUA), 15773 Athens,Greece, e-mail:
[email protected] 1 During this work I. Kokkinos and R. A. G¨ uler were working at the French Institute for Research in Computer Science and Automation (INRIA), France.
Preprint submitted to Robotics and Autonomous Systems
October 13, 2019
Journal Pre-proof
lP repro of
has envisioned the development of an innovative, modular, ICT-supported service robotic system that assists frail seniors to safely and independently complete an entire sequence of physically and cognitively demanding bathing tasks, such as properly washing their back and their lower limbs. A variety of innovative technologies have been researched and a set of advanced modules of sensing, cognition, actuation and control have been developed and seamlessly integrated to enable the system to adapt to the target population abilities. These technologies include: human activity monitoring and recognition, adaptation of a motorized chair for safe transfer of the elderly in and out the bathing cabin, a context awareness system that provides full environmental awareness, as well as a prototype soft robotic arm and a set of user-adaptive robot motion planning and control algorithms. This paper focuses in particular on the multimodal action recognition system, developed to monitor, analyze and predict user actions with a high level of accuracy and detail in real-time, which are then interpreted as robotic tasks. In the
rna
same framework, the analysis of human actions that have become available through the project’s multimodal audio-gestural dataset, has led to the successful modelling of Human-Robot Communication, achieving an effective and natural interaction between users and the assistive robotic platform. In order to evaluate the I-Support system, two multinational validation studies
Jou
were conducted under realistic operating conditions in two clinical pilot sites. Some of the findings of these studies are presented and analysed in the paper, showing good results in terms of: (i) high acceptability regarding the system usability by this particularly challenging target group, the elderly end-users, and (ii) overall task effectiveness of the system in different operating modes.
2
Journal Pre-proof
lP repro of
Keywords: Human-Robot Communication, Assistive Human-Robot
Interaction (HRI), Bathing robot, Multimodal dataset, Audio-gestural command recognition, Online validation with elderly users
1
1. Introduction
Advanced countries with well organized and modern health care systems
3
tend to become aging societies, according to World Health Organization’s
4
research on health and ageing [1]. The percentage of population with special
5
needs for nursing attention (including people with disabilities) is significant
6
and due to grow. Health care experts are called to support these people dur-
7
ing the performance of Activities of Daily Living (ADLs) such as dressing,
8
eating and showering, inducing great financial burden both to the families
9
[2] and the caregivers [3]. Great research effort has been spent over the last
10
decades [4, 5, 6, 7] on studying and classifying the functional disabilities of
11
older adults and associating the latter with basic factors of morbidity and
12
mortality.
rna
2
Personal care (showering or bathing), which is crucial for a person’s hy-
14
giene, is included among the first ADLs, which incommode an elderly’s life [7]
15
and ADL difficulties in bathing or showering represent the strongest predic-
16
tor of subsequent institutionalization in older adults [8]. Older adults require
17
assistance in bathing or showering more frequently than for any other ADL
18
[9]. As bathing is a highly intimate ADL, the wish for independence from
19
personal bathing assistance of caregivers as long as possible is, however, not
20
unusual in older adults [10]. An assistive bathing robot may thus repre-
21
sent an opportunity for older adults with bathing disability to take care of
Jou
13
3
Journal Pre-proof
themselves in bathing, to reserve their privacy, and reduce the burden of
23
caregivers.
lP repro of
22
During the last decades, an enormous number of socially interactive
25
robots have been developed constituting the field of Human-Robot Interac-
26
tion (HRI) an actual motivating challenge. The robotics society is attempt-
27
ing to tackle this challenge of unattended nursing by developing flexible and
28
modular assistive devices that aim to cover the needs for support of everyday
29
tasks involved in the caring of frail people in both in-house [11] and clinical
30
environments. Specifically, devices intended for this purpose involve either
31
static physical interaction [12, 13, 14], or are mounted on mobile platforms
32
[15, 16]. The focus of these solutions lies on a specific body part (e.g., the
33
head) and the support of disabled people on the performance of ADLs with
34
rigid manipulators. Furthermore, the literature reveals two commercial so-
35
lutions presented in [17, 18], which provide a safe, independent showering
36
experience and are equipped with soaping and water rinsing system. On the
37
downside, both of these solutions completely lack physical interaction with
38
the user and therefore lack some basic functionalities of the bathing sequence
39
such as scrubbing and wiping the senior.
rna
24
Direct physical interaction with frail seniors raises a multitude of issues,
41
including safety, reliability for human-robot interaction and adaptability to
42
the user’s needs and preferences. Moreover, human body parts are curved
43
and deformable and their size and shape differs a lot from one person to
44
another. Unexpected body-part motion may also occur during the operation
45
of the robot, increasing the risk of undesirable and possibly harmful contact
46
between the human and the robot. Therefore, there are augmented human
Jou
40
4
Journal Pre-proof
perception requirements, not only in terms of sensorial information adequacy
48
and accuracy but also in terms of perception algorithms.
lP repro of
47
The first requirement cannot be addressed with simple proximity sensors
50
and monocular cameras, since both of these solutions give poor sensorial feed-
51
back. Additionally, cameras intended for visual capturing with RGB data
52
during the shower process, could raise some ethical issues in terms of privacy
53
and comfort, especially when an elderly person’s mental health deteriorates.
54
On the contrary, the use of cheap depth or stereo vision cameras can provide
55
rich information with good accuracy [19, 20] and fulfill the requirements of a
56
great variety of applications [21], without necessarily the use of RGB data.
57
The second requirement has actually led HRI to extend into other research
58
areas [22]. One such area concerns the development of multimodal percep-
59
tion interfaces, required to facilitate natural human-robot communication,
60
including not only visual sensors, but also audio or inertial sensors [23]. The
61
concept behind these algorithms is to design interaction techniques that will
62
enhance the communication making it natural and intuitive, enabling robots
63
to understand, interact and respond to human intentions intelligently. For a
64
review, we refer the reader to [24, 25, 26, 27, 28, 29].
rna
49
Recently, researchers in computer vision proposed some approaches based
66
on Deep Learning (DL) techniques, which have presented very detailed re-
67
sults on human perception and specifically body-part segmentation. The
68
first fully-convolutional neural network (CNN) implementing semantic seg-
69
mentation is [30] and many more works [31, 32, 33, 34] followed with detailed
70
results, which are able to segment the human body parts at pixel level or get
71
a sparse pose of the body parts [35, 36] with close to real-time performance.
Jou
65
5
Journal Pre-proof
Direct interaction of a robotic device with the environment is a research
73
subject that the robotics society is addressing for many years. But the inter-
74
action with a human being is a much more delicate action and is considered
75
risky to be executed by a rigid robotic manipulator, even if it is equipped
76
with the most sophisticated force/impedance control schemes. On the other
77
hand, the advantage of soft robots [37] lies on their inherent or structural
78
compliance, which gives them the ability to actively interact with the envi-
79
ronment and undergo large deformations [38]. The term soft robotics is not
80
only used to state that the devices are made of soft materials, but also to
81
underline the shift from robots with rigid links (even hyper redundant ones
82
[39]) to bio-inspired continuum robots. Many continuum manipulators have
83
already been presented [40] with different types of actuation, e.g., tendon
84
based [41, 42] or a combination with pneumatic chambers [43, 44, 45]. The
85
repertoire of actuation is not only important for the motion dexterity and
86
shape [46] of a soft robot but also for stiffening [47, 48] and compliance, two
87
properties that are crucial especially for physical interaction with a human.
88
Contributions and overview
rna
lP repro of
72
The I-Support project envisioned, and during its course accomplished,
90
the development and integration of an innovative, modular, ICT-supported
91
robotic system that supports frail older adults’ motion abilities. It success-
92
fully assists them to safely and independently complete various physically
93
and cognitively demanding bathing tasks, such as properly washing their
94
back and their lower limbs. The main contributions of the work presented in
95
this paper relate to the development and seamless integration of novel cog-
96
nitive human-robot interaction technologies and to the evaluation of these
Jou
89
6
Journal Pre-proof
technologies, as individual modules as well as an integrated assistive robotic
98
platform as a whole, through a series of clinical validation studies in realistic
99
scenarios and under real operating conditions. These technologies include:
100
human activity monitoring and recognition, adaptation of a motorized chair
101
for safe transfer of the elderly in and out the bathing cabin, a context aware-
102
ness system that provides full environmental awareness, as well as a prototype
103
soft robotic arm and a set of user-adaptive robot motion planning and con-
104
trol algorithms. Key features of this assistive robotic platform, described
105
in the paper, are supported by our state-of-the-art pipeline for multimodal
106
modeling and learning which aims to enhance the human-robot communica-
107
tion making it natural, intuitive and easy to use, addressing aspects of smart
108
assistive HRI. An important contribution of the work presented in this paper
109
also concerns a new dataset that includes audio commands, gestures, which is
110
an integral part of human communication [49], and co-speech gesturing data,
111
which is still quite limited in HRI [50], as well as a suite of tools used for data
112
acquisition. The end goal of our work is to support and maximize safety, reli-
113
ability and self-confidence for the frail users, as well as to enable intuitive and
114
transparent HRI which incorporates reliable user intention recognition and
115
real-time adaptation of the reactive and supportive robotics system’s behav-
116
ior. To that end, two multinational validation studies were conducted in two
117
European clinical pilot sites for the evaluation of the various functionalities
118
of the I-Support system in the bathroom environment of the selected care
119
facilities. The validation studies included elderly subjects and were carried
120
out under realistic conditions, showing good results and high acceptability
121
by this particularly challenging target group.
Jou
rna
lP repro of
97
7
Journal Pre-proof
The remainder of this paper is structured as follows: in Sec. 2 we describe
123
the overall architecture of the I-Support system. In Sec. 3 we go into more
124
detail regarding the developed online audio-gestural recognition system for
125
human-robot communication, while in Sec. 4 we describe the adaptive robotic
126
motion planning method. The dataset collected during the course of the
127
project for modelling and offline evaluation is presented in Sec. 5. Finally, in
128
Sec. 6 two multinational validation studies conducted in two European hos-
129
pitals are described, showcasing promising results both regarding the system
130
performance but also the user satisfaction.
131
2. System Architecture
rna
lP repro of
122
Figure 1: Installation of the I-Support system in clinical environment for experimental
Jou
validation. The devices constituting the overall system are presented. (a) Amphiro b1 water flow and temperature sensor. (b) General aspect of the system showing the Motorized Chair, the Soft Robotic Arm and the installation of the Kinect sensors (for audio-gestural communication). (c) Air temperature, humidity and illumination sensors by CubeSensors. (d) Smartwatch for user identification and activity tracking.
132
The I-Support system is depicted in Fig. 1(b), presenting an installation 8
Journal Pre-proof
in a clinical bathroom environment. The safety and operational requirements
134
emerging from two use cases were taken into account. These use cases include
135
two demanding washing tasks in terms of mobility and force exertion for
136
the elderly, i.e. washing the back and the legs (lower limbs). Next, we
137
describe the main modules of the system, namely: the motorized chair, the
138
human-robot interaction module and the sensors used for communication,
139
the context awareness system, the soft robotic arm and the overall software
140
and process architecture.
141
2.1. Motorized chair
lP repro of
133
A motorized chair has been employed inside the shower to effectively
143
assist the older adults during sit-to-stand and stand-to-sit tasks and for safely
144
transfer from the exterior to the interior space of the shower cabin. The
145
design that was followed was to adapt a commercially available chair due to
146
the reduced costs in comparison to custom designs. The selected chair was
147
adapted in accordance in order to provide 3DOF motion. More specifically, a
148
lateral translation adjusts the proximity of the user to the robot and a vertical
149
translation regulates the height of the chair from the ground. Additionally,
150
the rotational DOF around its axis allows for easy accessibility in different
151
bathroom environments.
152
2.2. Human-Robot Interaction
Jou
153
rna
142
For the purpose of human-robot communication and perception, the I-
154
Support system was equipped with Kinect V2 RGB-D cameras. These sen-
155
sors are frequently used for visual analysis in assistive robotics [51], [52], [53]
156
as they are inexpensive, reliable and simple to waterproof. They are also easy 9
Journal Pre-proof
to mount on different surfaces and integrate software-wise. Three Kinect sen-
158
sors were placed in the bathroom as shown in Fig. 1(b).
159
setup was designed so as to be able to deal with the two main technologi-
160
cal tasks of the I-Support project, namely: a) audio-gestural command and
161
action recognition and b) body pose estimation for robotic manipulation,
162
considering various constraints (i.e. the size of the bath cabin, the size and
163
the placement of the chair and the soft-arm robot base). For body pose es-
164
timation, it is required to have two different data streams for the analysis of
165
the tasks under investigation (i.e. washing the legs and washing the back),
166
which are RGB and depth. These two streams should be registered, in other
167
words they have to be calibrated and aligned, to allow simultaneous process-
168
ing. For the second task, i.e. (audio-)gestural and action recognition, it is
169
essential to have the RGB stream in High Definition (HD) format, in order
170
to retain the region of interest (i.e. hand gestures or face) with acceptable
171
resolution.
lP repro of
157
This multi-view
To tackle the above mentioned tasks, we have employed three Kinect V2
173
sensors that can provide all required visual and audio information. These
174
sensors can capture Full HD RGB video at 30fps (frames per second), while
175
the depth information is recorded using the infrared camera embedded in
176
the Kinect in standard resolution. The color stream is captured in BGRA
177
format of total 32bpp (bits per pixel) resulting in an uncompressed image
178
of about 8MB, while for the depth information the same format of total
179
16bpp is employed (ca. 800KB). For the audio/spoken command recogni-
180
tion task we experimented with the audio stream that can be captured by
181
the built-in multi-array microphone of the Kinect sensors. Specifically, each
Jou
rna
172
10
lP repro of
Journal Pre-proof
rna
Figure 2: Data streams acquired by Kinect 1 and 3: RGB (top), depth (2nd row) and logdepth (3rd row) frames from a selection of gestures (“Temperature Up”, “Scrub Legs”), accompanied by the corresponding German spoken commands waveforms (4th row) and spectrograms (bottom row) “Roberta, W¨ armer”, “Roberta, Wasch die Beine”.
Kinect incorporates an array of four individual microphones and the raw
183
audio information can be captured at 16000 Hz, with 32-bit resolution. Fig-
184
ure 2 shows examples of the data streams that are acquired from the Kinect
185
sensors.
186
187
Jou
182
In the early stages of system integration process the placement of the
Kinect sensors inside the bathroom was a challenging task. The constraints 11
Journal Pre-proof
of the camera placement were the capturing of all significant tasks involving
189
HRI, while at the same time satisfying the space limitations imposed by a
190
conventional bathroom in nursing homes. After extensively experimenting
191
with all possible positions the resulting sensor set-up was able to capture
192
the necessary information for both tasks, i.e. information of the user’s back
193
or legs for robotic manipulation, as well as the hand gestures performed for
194
communication with the robot. Two of the sensors (Kinect sensors 1 and 2)
195
were placed inside the bath cabin, in order to capture the legs and the back
196
of the user during the different tasks, while a third camera was placed outside
197
the cabin, in order to capture the gestures performed by the user during the
198
task washing the back.
lP repro of
188
Specifically, during the task washing the legs Kinect 2 recorded the user’s
200
legs (including registered RGB and depth in SD resolution), for body pose
201
estimation and visual tracking of the robot; while Kinect 1 was used by
202
the audio-gestural and action recognition module. Except for the streams
203
in SD resolution, sensor 1 also recorded the color stream (RGB) in Full
204
HD resolution. In this configuration no video information was captured by
205
Kinect 3. During the task washing the back, Kinect 1 recorded the back of
206
the user (captured information includes RGB and depth in SD resolution),
207
while Kinect 3 recorded the color stream, used for gesture recognition, in Full
208
HD. In this case, Kinect 2 was not required for capturing any data.
209
2.3. Context Awareness System (CAS)
Jou
210
rna
199
In order to achieve a proper operational flow during a showering task, it
211
is important for the system not only to be able to perceive and communicate
212
with the user, but also to have full environmental awareness. The latter 12
Journal Pre-proof
can be achieved with the aid of extra sensorial data coming from different
214
types of sensors. To begin with, Amphiro sensors, depicted in Fig. 1(a),
215
can provide useful data to the system regarding water flow and temperature.
216
Additionally, air temperature, humidity and illumination are environmental
217
conditions indicative for the user’s safety and comfort. These values are
218
obtained by sensors constructed by CubeSensors, shown in Fig. 1(c).
lP repro of
213
A smartwatch similar to the one presented in Fig. 1(d) is integrated for
220
user identification and activity tracking purposes [54]. The identification pro-
221
cess includes a data acquisition step in which the user’s personal data (e.g.
222
gender, age, size) and preferences (e.g. showering duration, scrubbing pat-
223
terns, body parts to avoid etc.) are stored in a database. The identification
224
step implemented with the smartwatch includes a bar-code scanning, through
225
which the personalization data are passed to the system and are taken into
226
account for the bathing procedure. Furthermore, activity tracking algorithms
227
are employed to recognize falls or inactivity time periods, which are indica-
228
tive for emergency situations. The above mentioned data are available not
229
only to the I-Support system but also to the nursing staff via an Android
230
application, for monitoring and acting while emergency situations emerge.
231
2.4. Soft Robotic Arm
rna
219
A soft-arm has been developed to assist elderly people in bathing tasks.
233
Soft manipulators can be considered intrinsically safe thanks to the actuation
234
technologies they are made of. One of their main features is their compliant
235
body that can deform passively to adapt to environment changes, thus reduc-
236
ing the complexity of active control. The technical requirements, in terms of
237
desired reachable workspace and expected movements have been guaranteed
Jou
232
13
lP repro of
Journal Pre-proof
Figure 3: Construction stage of the Soft Robotic Manipulator. The modular design provides the flexibility to modify the robot’s characteristics according to the motion requirements.
by a hybrid actuation strategy, as summarized in [43]. Here, we briefly recap
239
the arrangement of the actuators in the modules (Fig. 3). Two different ac-
240
tuation technologies are combined based on their working principle, in order
241
to achieve the desired motion behavior:
243
244
245
246
• Flexible fluidic actuators, which enable elongation and omni-directional bending, exhibiting low accuracy; • Tendon-driven mechanisms, which can shorten the mechanism and pro-
Jou
242
rna
238
vide (redundant) omni-directional bending with high movement resolution.
247
As a result, the system is capable of elongation/contraction and bending
248
movements, exploring a workspace compatible with different activation pat14
Journal Pre-proof
terns, while specific antagonistic activation sequences provide stiffness capa-
250
bilities as well.
251
2.5. Software Architecture and Operational Flow
lP repro of
249
252
In order to accommodate the hardware devices mentioned above along
253
with the required real-time processing of the data, we have designed and
254
implemented a system that uses Linux operating system in order to be able
255
to handle multiple and fully synchronized Kinect sensors in modular con-
256
figurations. With this architecture, the acquisition and the data processing
257
is made using the Robot Operating System (ROS) using a different Linux
258
machine for each camera. For the ROS setup we have used a master-client
259
network approach, where a master computer system controls all the other
260
hosts where the various sensors are connected.
Integration of the software nodes was accomplished in three stages: (i)
262
unit testing of ROS free software (i.e. mathematical functions, non-ROS libs
263
etc.), (ii) unit testing of ROS nodes and creation of ROS packages, and (iii)
264
integration of all packages and testing that they all work together according
265
to specs (topics, services, actions, loop rates). It was demonstrated that ROS-
266
network set up and data connectivity function successfully. The process for
267
remote launching of the ROS nodes was almost entirely automated through
268
nested launch files.
269
2.5.1. Operational flow and Finite State Machine (FSM)
Jou
270
rna
261
The operational flow of I-Support is composed by the robotic task se-
271
quence of the washing activities. It also includes the initialization, the ter-
272
mination and the error handling actions of the I-Support system. The robotic 15
Journal Pre-proof
task sequence for the showering process has been derived by conducting a sur-
274
vey about the senior users’ preferences on shower-activities, based on ques-
275
tionnaires that were answered by the seniors and by the caregivers. The
276
outcome of the survey indicated that the critical body areas that should be
277
washed by I-Support system are the back of the user, the private parts, and
278
the legs of the user. For the pilot studies the consortium decided to test and
279
validate the washing of the legs and of the back of the user. Furthermore, the
280
information gathered by the end-users indicated that the washing activities
281
should be broken down into washing, rinsing and scrubbing. Also, it indi-
282
cated that the Seniors would like to use a small set of commands that will
283
allow them to intuitively interact with the robotic system and synthesize a
284
complete sequence of shower activities including actions such as start, stop,
285
pause and repeat the procedure. To this end, the following minimal set of
286
audio-gestural commands was defined: {wash − back, wash − legs, scrub −
287
back, scrub − legs, rinse − back, rinse − legs, start, halt, stop, repeat}.
lP repro of
273
The entire I-Support process is modeled as a sequence of states and is su-
289
pervised by a finite state machine (FSM) which has been developed and mod-
290
eled as a directed graph. Each node of the graph corresponds to a state (for
291
example washing, rinsing, scrubbing), and each directed edge corresponds
292
to an event that triggered a change of state and optionally some associated
293
action (for example the user requests through audio-gestural commands to
294
repeat or stop the washing). The FSM manages the showering process by
295
monitoring the progress of each state and the generation of the state se-
296
quence until the termination of the process. The actual sequence of states is
297
not fixed, instead it is flexible and varies depending on the external triggers.
Jou
rna
288
16
Journal Pre-proof
The triggering events are generated either by the HRI module (see Sec. 3),
299
or by the robotic motion planner or by the Chair-Controller; based on the
300
triggering events different states sequences can be synthesized by the end-
301
user. For example, an indicative set of state sequence is: IDLE, CHAIR-IN,
302
DOUSING, SCRUBBING, SCRUBBING-PAUSED, RINSING, WASHING-
303
ENDED, CHAIR-MOVING-OUT. In between these states (which represent
304
washing activities) there are intermediate states that represent robot kine-
305
matic transitions to a starting pose required for the initiation of the next
306
activity. The described sequence is depicted in Fig. 4. Any emergency sit-
307
uations are handled by the emergency stop state, to which all states can
308
transition. The state washing-ended is also accessed by all states in order to
309
facilitate handling of errors, such as the inability of the user to complete the
310
washing procedure or the abrupt wish to terminate the process.
lP repro of
298
The FSM was implemented in ROS using the Smach package, which is a
312
powerful and scalable Python-based library for hierarchical state machines.
313
The Smach package does not depend on ROS and can be used in any Python
314
project. The executive Smach stack however provides seamless integration
315
with ROS, including smooth actionlib integration and a Smach viewer to
316
visualize and introspect state machines. Implementation wise, all states are
317
stored in the generic state machine container. They are initialized and when
318
triggered by an event they execute the action code stored in the state. Each
319
has a number of possible outcomes associated with it. An outcome is a user-
320
defined string that describes how a state finishes. The transition to the next
321
state is specified based on the outcome of the previous state.
Jou
322
rna
311
The Finite State Machine (FSM) was personalized, thus, depending on
17
Journal Pre-proof
lP repro of
State-Diagram-for-ISUPPORT-“Washing-Back”-applicaGon-(with-rinsing)
GUI:Next
CHAIRMOVINGIN
RESET
ARMMOVINGDOUSING AG:Stop+or+ AG:Halt HighLevelCon GUI:Stop DOUSINGDOUSINGtrol:TargetRea PAUSED HALTED ched AG:Repeat+or+ AG:WashBack+or# GUI:Repeat+ AG:Repeat+or# DOUSING AG:ScrubBack DOUSINGAG:WashBack+ AG:Halt+or+ HALTED or+GUI:Repeat GUI:Halt HighLevelControl:D ouseSuccess+or+ GUI:Next GUI:Next AG:ScrubBack+ ARMor+GUI:Next MOVINGSCRUBBING AG:Halt
CHAIR-IN
GUI:Next
GUI:Back
AG:WashBack+or+ GUI:Next
GUI:Repeat IDLE
AG:Repeat+or+ AG:ScrubBack+or+ GUI:Repeat
SCRUBBI NGHALTED
AG:Halt+or+ GUI:Halt
AG:Halt
AG:WashBack+ or+GUI:Next
GUI:Next
HighLevelControl: TargetReached SCRUBBI AG:Halt SCRUBBI AG:Stop+or+ NGNGGUI:stop PAUSED HALTED SCRUBBI NG AG:Repeat+or+ AG:ScrubBack+or+ GUI:Repeat HighLevelControl:Scrub Success+or+GUI:Next ARMMOVINGRINSING
AG:ScrubBack
GUI:Next
HighLevelControl:
CHAIRMOVINGOUT
GUI:Next
WASHINGENDED
TargetReached AG:Repeat+or#AG:WashBack+ or+GUI:Repeat AG:Halt+or+ RINSINGAG:Stop+or+ RINSING- AG:Halt GUI:Next HALTED GUI:Stop PAUSED AG:Halt+or+GUI:Halt RINSING HighLevelControl:RinseSuccess+or+ GUI:Next
RINSINGHALTED
AG:Repeat+or# AG:WashBack+ or+GUI:Repeat
GUI:Next
AG:Emergency Stop
rna
All#states-except:G+IDLE G+CHAIR+MOVING+IN G+CHAIR+MOVING+OUT G+WASHING+ENDED G+EMERGENCY+STOPG+
GUI:End
WASHINGENDED
All#states
EMERGEN CY-STOP
Figure 4: Example of I-Support FSM state diagram for washing back application.
the user, it could be automatically modified in order to meet the particular
324
needs and requirements (i.e. size, gender, preferences, medical condition).
325
This was achieved by interfacing the FSM with the Context Awareness Sys-
326
tem (CAS) and the personalised information that was stored in the I-Support
327
system database. The FSM also performed the launching of the ROS nodes
328
(depending on the current state) and the monitoring (running, success, fail)
Jou
323
18
329
of each active node.
330
331
2.6. Safety validation
lP repro of
Journal Pre-proof
332
I-Support is a hardware device that comes into physical contact with hu-
333
mans, either on purpose during scrubbing and washing tasks or accidentally
334
as it describes trajectories in the shower workspace. Furthermore, it is an
335
electrical device that operates in a humid and wet environment subjected
336
to jets of water. It might operate in a healthcare environment, such as a
337
hospital, where the levels of noise and electromagnetic noise should be kept
338
at a minimum. Hence, it was necessary to take actions for thorough hazard
339
mitigation and for a comprehensive safety validation of the I-Support system
340
during the design process.
The methodology that was adopted to address the safety issues involved
342
the following steps: (i) analysis of the international safety standards and
343
regulations and selection of the most relevant standard, (ii) classification of
344
the I-Support device according to that standard, (iii) hazard analysis of the I-
345
Support system (identification of hazards, identification of causes associated
346
with hazards, determination of degree of risk), (iv) hardware and software
347
design decisions to mitigate hazards (such us upper limits on acceleration
348
and speed, minimization of weight, use of soft materials, DC voltages below
349
25V, EC compliance, emergency stops, controlled system shutdown, etc.),
350
(v) testing and validation of the I-Support components and of the integrated
351
system by an external certified safety engineer, (vi) safety validation of the
352
complete I-Support installation in the operating environment where the pilot
353
studies took place.
Jou
rna
341
19
Journal Pre-proof
The ISO 13482 was selected as the most relevant safety guideline to be
355
followed throughout the project development. The I-Support robotic solution
356
was classified as a personal care robot, restraint-free assistance robot (to help
357
provide more ease and comfort in daily life for independent living). Hence,
358
the comprehensive description of the hazard-risk analysis, the subsequent
359
design decisions, and the official safety validation results all complied with
360
the ISO 13482 safety requirements. The hazard analysis, risk analysis and
361
design decisions are all presented in detail in [D2.3, Annex 1, http://www.i-
362
support-project.eu/dissemination/]. Finally, an external safety officer was
363
employed to perform systematic tests and inspections and verify that the
364
hardware and software implementation for all components, as well as for the
365
overall installation in the healthcare environment comply with the ISO 13482
366
standards and safety requirements.
367
3. Audio-Gestural HRI
lP repro of
354
Gesture recognition is a visual task that can aid in human-robot interac-
369
tion (HRI) and relies on similar visual processing methods as in action clas-
370
sification and pose estimation. For the I-Support system, we implemented a
371
simple rule for gesture recognition that involves the recognition of a vocab-
372
ulary of gestures, which serve the role of visual non-verbal commands. For
373
robustness we supplement gesture recognition with the additional perceptual
374
task of spoken command recognition. The two modalities can work either
375
independently or in fusion for enhanced performance. Within this context
376
of assistive robotics, and by exploring state-of-the-art approaches from auto-
377
matic speech recognition and visual action recognition, we have developed an
Jou
rna
368
20
Journal Pre-proof
intelligent interface that multimodally recognizes actions and commands by
379
fusing the unimodal information streams to obtain the optimum multimodal
380
hypothesis.
381
3.1. Visual processing
lP repro of
378
Gesture recognition allows the interaction of the elderly users with the
383
robotic platform through a predefined set of gestural commands. For this
384
task, we have employed state-of-the-art computer vision approaches for fea-
385
ture extraction, encoding, and classification. Our gesture and action classi-
386
fication pipeline, see Fig. 5, employs Dense Trajectories [55] along with the
387
popular Bag-of-Visual-Words (BoVW) framework. The main concept con-
388
sists of sampling feature points n from each video frame on a regular grid
389
and tracking them through time based on optical flow. Specifically, the em-
390
ployed descriptors are: the Trajectory descriptor, HOG [56], HOF [57] and
391
Motion Boundary Histograms (MBH) [56]. As depicted in Fig. 6, non-linear
392
transformation of depth using logarithm (log-depth) enhances edges related
393
to hand movements and leads to richer dense trajectories on the regions of
394
interest, close to the result obtained using the RGB stream.
rna
382
The features were encoded using BoVW and were assigned to K = 4000
396
clusters forming the representation of each video. Afterwards, each trajec-
397
tory was assigned to the closest visual word and a histogram of visual word
398
occurrences was computed. For classification non-linear SVMs were adopted
399
using the χ2 kernel [56], and different descriptors were combined in a multi-
400
channel approach accomplishing an accuracy of 81% and 84% for the tasks
401
washing the legs and back, respectively. Since multiclass classification prob-
402
lems were considered, an one-against-all approach was followed and the class
Jou
395
21
lP repro of
Journal Pre-proof
Jou
rna
Figure 5: Visual gesture classification pipeline.
Figure 6: Comparison of dense trajectories extraction over the RGB (top), depth (middle) and log-depth (bottom) clips of gesture “Scrub Back”.
403
with the highest score was selected accomplishing classification results of up
404
to 83% and 85% for the two tasks. 22
lP repro of
Journal Pre-proof
Figure 7: Spoken command recognition module for HRI, integrated in ROS, including always-listening mode and real time performance.
405
3.2. Audio processing
In addition to gesture command recognition, we have also developed a
407
spoken command recognition module [58] (Fig. 7) that detects and recognizes
408
commands provided by the user freely, at any time, among other speech and
409
non-speech events, possibly infected by environmental noise and reverbera-
410
tion. The employed features for acoustic modeling were 39 Mel-Frequency
411
Cepstral Coefficients (MFCCs) with their first- and second-order derivatives
412
extracted every 30ms from overlapping windows of 25ms duration. We target
413
robustness via a) denoising of the far-field signals by delay-and-sum beam-
414
forming, b) global Maximum Likelihood Linear Regression (MLLR) adapta-
415
tion of the acoustic models to the speaker-microphone channels of the tar-
416
geted environment, and c) combined command detection/recognition. In
417
order to build our models we performed offline classification experiments of
418
pre-segmented commands, based on a task-dependent grammar of 23 Ger-
419
man spoken commands, accompishing accuracies of 76% and 68% for the two
Jou
rna
406
23
Journal Pre-proof
tasks. For a more detailed analysis and further results regarding the offline
421
experiments of the audio-gestural HRI module we refer the reader to [59].
422
4. Real-time Robotic motion Adaptation
lP repro of
420
The execution of the robotic tasks, instructed by the user or the carer via
424
HRI commands, relies on a detailed and adaptive robotic motion planning
425
method. This method was developed during the project and is based on
426
enriched visual information. In particular, Point-Cloud data provided by the
427
Kinect sensors, which are mounted on the shower room as shown in Fig. 1 (b),
428
are used as an input together with accurate body part recognition performed
429
either as a pixel-wise [31] area coverage of each body part, or by using more
430
structural [60] skeleton information as depicted in Fig. 8. The latter deep
431
learning methods highly enhance the environment perception skills of the
432
system and are able to provide robust input to the motion planning, remain-
433
ing at the same time unaffected by the changes of the environment conditions
434
(e.g. illumination) of different shower rooms.
rna
423
The motion planning method initially proposed in [61] provides a solu-
436
tion to the problem of defining the motion behavior of a robotic manipulators
437
end-effector, operating over a curved deformable surface (e.g. the users body
438
part). Such surfaces characterize the human body parts, which are system-
439
atically or randomly moving and deforming (e.g. due to users breathing
440
motion). The main goal of the motion behavior task is the online calculation
441
of the reference pose for the end-effector, in order for the robotic manipulator
442
to be compliant with the body part and simultaneously execute predefined
443
surface tasks (e.g. scrubbing the users back). Due to the motion control
Jou
435
24
lP repro of
Journal Pre-proof
Figure 8: Example of the visual user perception algorithm in two instances: (a) handsup and (b) relaxed. The visual information is used as an input for the motion planning method.
complexity of the soft robotic manipulator described in Sec. 2.4, the motion
445
planning method is structured to be model free. Therefore, it can be ad-
446
justed to any robotic manipulator, provided that all the robot’s workspace
447
and velocity constraints are taken into account.
rna
444
The two scenarios, i.e. washing or scrubbing the back or the legs, con-
449
sidered in the I-Support project are actually the most challenging for the
450
elderly in terms of mobility limitations. These scenarios were considered
451
during the development of the motion planning method as shown in Fig. 9.
452
In the back washing scenario the area pixel-wise visual information is used,
453
whereas in the legs washing scenario we use the skeleton human recognition.
454
These recognition techniques are combined with the Point-Cloud information
455
obtained from the Kinect sensors in order to achieve accurate reference pose
456
estimation.
Jou
448
25
lP repro of
Journal Pre-proof
Figure 9: Two scenarios were considered during the development of the motion planning method: (a) Back washing and (b) Legs washing. In (a) area pixel-wise visual information was used, whereas in (b) the skeleton information was adequate for the completion of washing tasks for the legs. The red spheres depict the result of the skeleton recognition for the right leg and the green sphere depicts the target position that the robot has to reach as a result of the motion planning method.
Another important aspect of the motion planning method is the straight-
458
forward combination with other motion planning techniques, which allow
459
for the incorporation of imitation learning techniques. More specifically in
460
[62] we integrated a leader-follower framework of motion primitives (CC-
461
DMP) [63] with a vision-based motion planning method to adapt the ref-
462
erence path of a robots end-effector and allow the execution of washing ac-
463
tions. This system incorporates clinical carers expertise by producing mo-
464
tions, which are learned by demonstration, using data from the publicly
465
available KIT whole-body motion database [64]. This integration accom-
466
plishes to make the robotic washing actions more human-friendly and more
467
acceptable by the elderly users.
Jou
rna
457
26
Journal Pre-proof
5. Audio-gestural data collection
lP repro of
468
In order to be able to build accurate models for learning in our audio-
470
gestural communication module, we conducted a systematic collection of
471
multisensory data, where various experiments were designed with the in-
472
volvement and guidance of the clinical partners. During the whole process
473
of designing the data collection experiments, the clinical partners continu-
474
ously contributed with valuable feedback in order to take into account the
475
specific capabilities and needs of the elderly end-users. The experiments
476
contained multiple scenarios, e.g. for entering and exiting the shower, wash-
477
ing/scrubbing the back or the legs, for stopping or repeating a procedure, for
478
changing the temperature of the water etc. Figure 10 shows some samples
479
of the designed gestures for various bathing tasks.
rna
469
480
Jou
Figure 10: Sample gestures for the bathing HRI task.
The dataset includes data recordings in a more strict and guided con-
481
text (where the commands are predefined) and recordings of freestyle audio-
482
gestural commands while introducing various dialogue features for the inter-
483
action. Specifically, three different sessions for data collection were defined, 27
Journal Pre-proof
namely recordings of: a) 33 audio-gestural commands, performed simulta-
485
neously, b) spoken commands only and c) gesture commands only. For the
486
spoken commands we provided a short and a long version that is preceded
487
by a system activation keyword, the female name “Roberta” that exists in
488
both German and Italian language, where the actual validations with the
489
elderly users were conducted. For the gesture commands we provided more
490
than one gestures in order to include variability and thus, consider factors
491
such as a) physiological aspects of the elderly, b) intuitiveness and natural-
492
ness, c) the cognitive capacity of the elderly as well as d) the design of a
493
system that could recognize smaller or larger variations of the same com-
494
mand. Those commands, in a second phase, and for the validations with
495
the elderly end-users were narrowed down, taking into account the results
496
of the first recognition experiments (recognizability and discriminability by
497
the machine algorithms employed) and after consulting the clinical partners
498
regarding what is most suitable for the elderly end-users.
lP repro of
484
Data collection experiments were performed using the Kinect sensors in-
500
tegrated in a ROS environment and synchronized using software trigger-
501
ing, while an integrated annotation and acquisition web-interface that facil-
502
itates on-the-fly temporal ground-truth annotation for fast acquisition [65]
503
was used. For carrying out the recordings supplementary material has been
504
provided with the predefined spoken commands and the videos of the prede-
505
fined gestures. For the freestyle experiments pictures have been assembled
506
in order to guide the user so as to perform the various tasks using natural
507
language and gestures.
Jou
rna
499
28
Journal Pre-proof
5.1. Audio-Gestural Development Dataset
lP repro of
508
509
We have recorded visual data from 23 users (eight females and fifteen
510
males, aged 23-35) while performing predefined gestures, and audio data from
511
8 users while uttering predefined spoken commands in German (the users
512
were non-native German speakers, having only some beginner’s course). The
513
total number of commands for each task was: 25 and 27 gesture commands
514
for washing the legs and the back, respectively, and 23 spoken commands
515
for the core bathing tasks, i.e. washing/scrubbing/wiping the back or legs,
516
for changing base settings of the system, i.e. temperature, water flow and
517
spontaneous/emergency commands. A background model was also recorded,
518
including generic motions or gestures that are actually performed by humans
519
during bathing; so as to be able to reject out-of-vocabulary gestures/motions
520
as background actions.
521
5.2. Extended Audio-Gestural Dataset
The data have been collected from 12 native German speakers (three
523
females and nine males, aged 18-30), having three iterations/repetitions each.
524
In more detail, the extended dataset includes:
rna
522
• Calibration recording (checker-board pattern, no human subject).
526
• Human subject is given textual/visual description of the action to be
527
528
529
530
Jou
525
performed (e.g. “instruct the system to dry the legs”); for both washing positions (back/legs).
• Human subject is shown a video of predefined gestures; for both washing positions.
29
Journal Pre-proof
• Background model including random “negative” gestures.
532
• Wash/wiping motions in wash-back/legs position, including four differ-
533
lP repro of
531
ent wiping styles each (horizontal/vertical, small/big circles). • Speech commands read in wash-back/legs position.
535
Complementary high-precision VICON MX motion capture data using 10
536
cameras have been recorded for a subset of the subjects, due to the efforts
537
involved for calibrating the VICON system given the occlusions caused by the
538
I-Support setup. Specifically, seven human subjects have been recorded in
539
two recording sessions each, during the above-described recording protocol,
540
yielding a total of 1.7 TB of data in 1636 Rosbags that are available from
541
the KIT Motion Database for the purposes of the project (with VICON data
542
being available for three of the subjects).
543
6. Experimental Evaluation
rna
534
Two multinational validation studies were conducted in two European
545
pilot sites: a) Fondazione Santa Lucia (FSL) Hospital in Rome, Italy and
546
b) Bethanien Hospital in Heidelberg, Germany; including target population,
547
outcomes and indicators to evaluate the I-Support system for addressing all
548
evaluation activities. The intention of validating the system multinationally
549
is to increase the value of the users’ subjective assessments and especially
550
to evaluate the acceptability of the system by people with different habits
551
and social-ethical background. Both studies included the installation and
552
the evaluation of the various functionalities of the I-Support system in the
Jou
544
30
Journal Pre-proof
bathroom environment of the selected care facilities. For the conduction of
554
the validations both pilot sites obtained approval from the Ethics Committee.
555
lP repro of
553
The studies were realized in two rounds at each pilot site:
556
1. The first pilot testing consisted of evaluation, in dry conditions, of
557
well-defined functionalities of the first prototype of the bathing robot
558
system and was intended to obtain early feedback. This evaluation
559
round focused on a stable but limited intelligence prototype that in-
560
cluded human-robot interaction (HRI) using audio-gestural commands.
561
Feedback received from this testing round was used into the design and
562
the development process.
2. The second pilot testing round consisted of mainly summative evalua-
564
tion activities and focused on the fully functional and intelligent sys-
565
tem, thus the showering activities with water (i.e. rinsing, scrubbing),
566
including the shared control functionality of the I-Support system, the
567
integrated learning and cognition strategies, as well as the full context
568
awareness. Additionally, a water pouring scenario was also evaluated
569
examining various operation modes and also the end-users’ general sat-
570
isfaction and preferences for various controllers for the robotic motion.
571
The user group of the I-Support bathing robot is defined as persons with
572
difficulties in bathing activities, as evaluated by the bathing item of the
573
Barthel Index (0 pt. = “patient can use a bath tub, a shower, or take
574
a complete sponge bath only with assistance or supervision from another
575
person”) [66], which represents a clinically well-established index to evaluate
576
an individual’s functional disabilities in ADLs. According to this user group
Jou
rna
563
31
Journal Pre-proof
definition, participants of the evaluation studies only included persons with
578
difficulty in their ability to perform bathing activities on their own.
579
580
lP repro of
577
The two use cases evaluated included the interaction of the I-Support system with two regions of the body:
581
• Distal region that comprises lower thighs from knee joints downwards
582
and feet. Note that for washing these body parts, the user has to bend
583
forward with a high risk of losing postural control.
584
• Back region expanding from the cervical spine to the tailbone. In this case, it is practically impossible for the users to reach their back.
586
During the validations, the Kinect sensors were installed in the bathrooms
587
of the two hospital sites as shown in Fig. 1(b); incorporating the required
588
adjustments regarding their positions and angles depending on the available
589
space of each room in order to be able to monitor the elderly and the robotic
590
soft arms. Based on this data the perception unit reconstructed a 3D model
591
of the elderly and the robot and provided feedback to the system controller.
592
Then the system controller generated motion control commands and tool
593
commands and guided autonomously the soft arms to wash, scrub and rinse
594
the specific body parts.
595
6.1. Multimodal Fusion and Online A-G Command Recognition System In-
597
Jou
596
rna
585
tegration
For the online validations and in order to evaluate the System Perfor-
598
mance of the audio-gestural communication of the user with the robot, our
599
online Audio-Gestural (A-G) multimodal action recognition system was used,
600
developed in [65] using the Robotic Operating System (ROS), see Fig. 11. 32
Journal Pre-proof
The online A-G system enables the interaction between the user and the
602
robotic soft arms and thus, monitors, analyzes and predicts the user’s ac-
603
tions, giving emphasis to the command-level speech and gesture recognition.
604
Always-listening recognition is applied separately for spoken commands and
605
gestures, combining at a second level their results. A late fusion scheme is
606
used, where the individual multi-class results are combined, encoding this
607
way inter-modality agreement, using a simple rule that was found quite ef-
608
fective and fast among other approaches [58].
rna
lP repro of
601
609
610
611
612
Jou
Figure 11: Online Audio-Gestural Command Recognition System.
The overall system comprises two separate sub-systems:
1. The spoken command recognizer, a single node that performs alwayslistening speech recognition of predefined command phrases that accompany the gestures.
33
Journal Pre-proof
2. The gesture recognizer consisting of two nodes: the activity detector
614
that performs temporal localization of segments with visual activity
615
and the gesture classifier.
lP repro of
613
616
The spoken command recognition module works as described in Sec. 3.2.
617
The activity detector processes the RGB or the depth stream in a frame-by-
618
frame basis and determines the existence of visual activity in the scene, using
619
a thresholded activity “score” value. The output from both sub-systems are
620
combined at the fusion node producing a single result. Specifically, the imple-
621
mented fusion node receives the N-best results announced by the individual
622
recognizers, checking for available results every 0.5 secs by receiving periodi-
623
cally messages in order to synchronize the recognizers. A waiting period T is
624
also defined, during which the node waits to combine the incoming messages.
625
If this period expires, it is assumed that one modality either may not have
626
been activated by the user, or failed to detect the given input. In such cases,
627
the node announces single-modality results.
The A-G recognition system’s grammars (for both spoken and gesture
rna
628
command recognition) and functionalities were adapted to the specific bathing
630
tasks, delivering recognition results as ROS messages to the system’s finite
631
state machine (FSM) that: (1) decided the action to be taken after each
632
recognized command, (2) controlled the various modules and (3) managed
633
the dialogue flow by producing the right audio feedback to the user, for more
634
details see Sec. 2.5.
635
6.2. Experimental Setup and Protocol
636
Jou
629
The proposed experimental setups aimed to: 34
Journal Pre-proof
mean age±SD Heidelberg
MMSE (max. 30pt.) evaluated commands scenario # users
FSL
mean age±SD MMSE (max. 30pt.) evaluated commands scenario
Validation Round II
lP repro of
Validation Round I
# users
29 naive German-speaking patients
25 naive German-speaking patients
81.4±7.7 years
77.9±7.9 years
25.3 ± 3.1
25.6 ± 3.1
7 A-G commands
7 A-G commands
“Legs” and “Back” position
“Back” position
Gesture-only and audio-gestural scenario
25 naive Italian-speaking patients
25 naive Italian-speaking patients
67.4±8.9 years
69.4±7.5 years
27.6 ± 2.5
24.3 ± 3.5
7 A-G commands
7 A-G commands
“Legs” and “Back” position
“Legs” position
Table 1: Overview of the validation setups for the two rounds at the two pilot sites.
637
1. Objectively evaluate the effectiveness of the bathing systems individual
638
modules. Therefore, the experimental protocols were designed to pro-
639
vide as much data as possible for statistical analysis, given the frailty
640
of the users and their limitations.
2. Assess the acceptability of the overall system by the elderly subjects.
642
To accomplish this, the experiments were conducted under realistic
643
conditions and included typical procedures that the users might follow
644
during their bathing routine with the I-Support system.
646
647
648
649
3. Assess the ability of the users to complete a bathing procedure, according to their preferences, using the HRI components of the system.
Jou
645
rna
641
4. Assess the acceptability of the system by the caregiving personnel and its ability to monitor the effectiveness and safety of the bathing procedure.
35
Journal Pre-proof
English Wash legs Wash back Scrub back Stop (pause) Repeat (continue) Halt
lP repro of
Vocabulary of A-G commands Italian
German
Lava le gambe
Wasch meine Beine
Lava la schiena
Wasch meinen R¨ ucken
Strofina la schiena
Trockne meinen R¨ ucken
Basta
Stop
Ripeti
Noch einmal
Fermati subito
Wir sind fertig
Table 2: Validation Round I: The audio-gestural commands that were included in the two bathing scenarios. All commands were preceded by the keyword “Roberta”.
ID 1 2 3 4 5 6 7
Distal Region Command Modality Wash Legs A Stop A Repeat A Halt A Wash Legs A-G Halt A-G Halt A-G
Back Region Command Modality Wash Back A-G Halt A-G Scrub Back A-G Stop A-G Repeat A-G Halt A-G Halt A-G
Table 3: Validation Round I: The sequence of Audio (A) and Audio-Gestural (A-G) com-
650
rna
mands performed by the participants in the validation experiments.
6.2.1. Validation Round I
During the experiments, we simulated the two bathing scenarios at dry
652
conditions, at the two pilot sites: 1) FSL Hospital and 2) Bethanien Hospi-
653
tal. Validation round I followed the same evaluation study designs in both
654
pilot sites, in order to obtain comparable evaluation results. For the HRI
655
experiments on each site, potential I-Support users were recruited based on
656
the following main inclusion criteria: (1) dependency in bathing activities
657
as assessed by the bathing item of the Barthel Index [66] and (2) no se-
658
vere cognitive impairment as assessed by a score of >17 on the Mini-Mental
Jou
651
36
Journal Pre-proof
State Examination (MMSE) [67]. Recruitment yielded a significant num-
660
ber of participants at both sites: 25 (mean age±SD: 67.4±8.9 years) and 29
661
(mean age±SD: 81.4±7.7 years), respectively. Table 1 shows an overview of
662
the setups for the two validation rounds at the two pilot sites; where naive
663
denotes elderly users having no experience with the system, but only a few
664
minutes training prior to their interaction.
lP repro of
659
The experimental protocol exhibited a variety of 7 audio (A) or audio-
666
gestural (A-G) commands, in Italian and in German (see Table 2), in se-
667
quences that would simulate realistic interaction flows for both tasks. Ta-
668
ble 3 shows the sequence of the A and A-G commands as performed in both
669
validation experiments. Prior to the actual testing phase, all commands were
670
introduced to the participants by the clinical test administrator, while dur-
671
ing the experiment the participants were guided on how to interact with the
672
robot by showing the audio commands written on posters and the audio-
673
gestural commands by performing them, instructing them to simply read or
674
mimic them. The administrator could also intervene whenever the flow was
675
changed unexpectedly after a system failure. Additionally, a technical su-
676
pervisor handled the PCs and annotated on-the-fly the recognition results of
677
the system.
678
6.2.2. Validation Round II
Jou
679
rna
665
A second set of validation experiments were carried out at the same pilot
680
sites, aiming at an enhanced user experience. In Bethanien hospital the
681
experiments were also designed to accompany a clinical sub-study. For the
682
purpose of these validation studies, some of the gestural commands were
683
redesigned based on feedback from previous tests in order to become easier 37
Journal Pre-proof
and more intuitive for the elderly users. To address the lack of data for the
685
newly introduced gestures, a small scale data collection with healthy subjects
686
was carried out prior to the beginning of the studies.
lP repro of
684
687
Validation round II at FSL: 25 naive Italian-speaking patients tested
688
the system (mean age±SD: 69.4±7.5 years). The experimental protocol in
689
this case too included a set of 7 audio-gestural commands, for which Italian
690
audio models were developed. The participants were seated in the “legs”
691
position and the robot responded to each command with (a) audio feedback
692
and (b) by simulating the appropriate action with the soft arm for the the
693
commands: “Wash Legs”, “Scrub Legs”, “Stop”, “Repeat” and “Halt”. The
694
exact scenario is shown in Table 4. The administrator, for each participant,
695
filled in a report sheet according to the user’s performance and the system’s
696
command recognition.
Validation round II at Bethanien: A set of 25 elderly patients were
698
selected with mean age±SD: 77.9±7.9 years. The experimental protocol
699
included 7 commands, shown in Table 5, to which each patient was briefly
700
introduced. They were first asked to perform only the gestural part of the 7
701
commands (gesture-only (G) experiment) and then both the audio and the
702
gestural part of the commands at the same time (audio-gestural (A-G) ex-
703
periment). During all experiments, the participant was seated in the “back”
704
position. After a command was performed, a short break took place to give
705
the I-Support system the opportunity to respond to the command. In case of
706
successful recognition, the system responded with an appropriate audio re-
707
sponse and soft-arm action. The maximum system response time was about
708
3 to 5 seconds for the G and the A-G experiments, respectively. If the system
Jou
rna
697
38
Journal Pre-proof
1. –
Soft-arm motion & feedback
lP repro of
A-G command
the soft-arm goes to a pre-defined initial position
2. Wash my legs
audio feedback is provided and the soft-arm performs
= “Roberta, Lava la gambe”
a washing movement
2. Increase temperature
audio feedback is provided, the soft-arm continues
= “Roberta, Pi` u fredda” 3. Lower temperature = “Roberta Pi` u calda” 4. Stop
the washing movement
audio feedback is provided, the soft-arm continues the washing movement
audio feedback is provided, the soft-arm’s movement
= “Roberta, Fermati” 5. Scrub my legs
is paused
audio feedback is provided, the soft-arm performs
= “Roberta, Strofina la gambe”
a scrubbing movement
6. Stop
audio feedback is provided, the soft-arm’s movement
= “Roberta, Fermati” 7. Repeat
is paused
audio feedback is provided, the soft-arm resumes
= “Roberta, Ancora” 8. Halt
the scrubbing movement
audio feedback is provided and the soft-arm goes to its rest position
rna
= “Roberta, subito”
Table 4: Validation Round II (FSL): Commands tested in the scenario on the audiogestural human-robot interaction. Left column indicates the audio commands and the right column the audio feedback given by the system and the corresponding simulated action of the soft-arm.
did not recognize the command correctly in this time interval or the com-
710
mand was performed incorrectly by the participant, the test administrator
711
asked the participant to repeat the command once more (i.e. maximum 14
712
attempts for each experiment [2 attempts × 7 commands]).
Jou
709
39
Journal Pre-proof
1. Wash back
Gesture Command
lP repro of
Audio Command
= “Roberta, wasch meinen R¨ ucken”
2. Higher temperature = “Roberta, w¨armer”
3. Lower temperature = “Roberta, k¨alter”
4. Scrub back
= “Roberta, schrebbe meinen R¨ ucken”
5. Repeat
= “Roberta, noch einmal”
6. Stop
7. Halt
rna
= “Roberta, stop”
= “Roberta, wir sind fertig”
Table 5: Validation round II (Bethanien): Commands tested on the audio-gestural human-
Jou
robot interaction scenario. Left column indicates the audio commands and the right column the corresponding gesture for the audio-gestural commands.
713
6.3. Audio-Gestural Validation Experiments and Results
714
6.3.1. Validation Results Round I
715
716
Multimodal recognition was evaluated in terms of (1) Multimodal Com-
mand Recognition Rate (CRR): CRR = # of commands correctly recognized 40
Journal Pre-proof
by the system / # of commands correctly performed by the user, (2) accu-
718
racy, and (3) user performance/learning rate, so as to measure the correlation
719
of the system’s performance with the user’s experience. The above metrics
720
were considered in order to correlate, as much as possible, the systems per-
721
formance with the user’s experience. User performance was evaluated by
722
the percentage of well-performed commands relative to the total number of
723
commands tested. A gestural command was rated as not well-performed
724
when the movement of the gesture was not performed as intended, including
725
movement errors that seriously affected the trajectory of the gesture. An
726
audio-gestural command was rated as not well-performed if the gesture was
727
not performed as intended (as described before) and/or if phrasing of the
728
audio command was false. In this validation round, demanding acoustic and
729
visual conditions were faced along with large variability in the performance
730
of the A-G commands by the participants, constituting the recognition task
731
rather challenging. Table 6 shows the obtained CRR (%) and accuracy re-
732
sults (%), which are up to 84% and 80% for both washing tasks, averaged
733
across 29 (FSL) and 25 (Bethanien) users, respectively. The deviation be-
734
tween the two sites is due to the lower age and higher cognitive level of
735
the FSL patients, as well as the slight differences in lightning and cameras’
736
placement.
rna
Gesture recognition was expected more challenging while bathing the legs,
Jou
737
lP repro of
717
738
due to occlusions of the hands with the robot and/or the chair. The users
739
experienced only a limited amount of false alarms (3 in total as measured at
740
FSL), which were considered annoying, since the system triggered a response
741
without an “actual” input. Regarding the user performance, the participants
41
Journal Pre-proof
User Performance %
CRR %
Accuracy %
Speech
Av.
L
B
Av.
L
B
L
B
83.5
86
73
79.5
98
99
81
78
79.5
67
77
72
91
90
84
71
L
B
FSL
80
87
Bethanien
85
74
lP repro of
System Performance %
Gestures
Table 6: Validation Round I Results: Average Audio-Gestural Command Recognition Results; System Performance (CRR%) and User Performance (%) averaged across 29 and 25 users at FSL and Bethanien Hospitals, respectively. (L stands for legs, B for back and Av. for average.
742
performed successfully the spoken commands (over 98% and 90% accuracy
743
for the two sites), while the average performance of gestures was satisfactory
744
(between 70% to over 80% for the two tasks), considering the quick train-
745
ing provided by the administrator. Finally, we have to mention that the
746
results of both modalities were somehow degraded, when the user performed
747
simultaneously the A-G commands, due to increased cognitive load.
Figure 12 shows indicative curves on how the users performed, on their
749
first attempt, each gesture command after the short training, as measured for
750
the two task at the FSL validation. We note that initially (gesture ID 5) the
751
users either were not familiar with this type of communication or their con-
752
centration level was low, since they were performing only spoken commands
753
up to that point. There was however a tendency of increased learning rate,
754
meaning that during the experiments the users got more familiar with the
755
multimodal commands and executed them more accurately, indicating the
756
intuitiveness of this HRI modality. Especially for commands such as “Halt”,
757
which was repeated several times (ID 4,6,7) during the washing sequence
Jou
rna
748
42
lP repro of
Journal Pre-proof
Figure 12: Validation Round I Results (FSL): Recognition statistics per command. The vertical red lines distinguish the command sequences (see the corresponding IDs in Table 3 for the tasks “washing the legs” and “washing the back”.
the command performance of the user reached levels higher than 90%. This
759
observation is highly important, since we can conclude that simple combina-
760
tions of spoken and gestural commands are both memorable and suitable for
761
an elderly user’s communication with an assistive robotic system.
762
6.3.2. Validation Results Round II
rna
758
During Validation round II at FSL we put the emphasis on the
764
overall system integration, such as the control flow of the bathing proce-
765
dure, where the system performance in CRR% was up to 83.5% for the legs
766
position. The recognition performance of the individual audio and gesture
767
modalities was 82.9% and 48.7% respectively, indicating, as in previous ex-
768
periments, that the synergy between complementary modalities can actually
769
enhance overall results.
770
Jou
763
Figure 13 reports results on User Performance and System Performance 43
lP repro of
Journal Pre-proof
Figure 13: Validation Round II Results (FSL): User Performance (%) and System performance (%) rates per command, showing the percentage of the well-performed commands, i.e. commands performed as intended (User Performance) and the percentage of system recognitions (System Performance) for each of the following showering task commands: Wash Legs (WL), Temperature Up (TU), Temperature Down (TD), 1st Stop (S 1), Scrub Legs (SL), 2nd Stop (S 2), Repeat (R), Halt (H). Note: Audio-Gestural commands were considered as well-performed when both the audio and the gestural command were performed as intended.
(% per command), i.e. the rates of well-performed commands and commands
772
recognized by the system. As a second attempt was made when a command
773
was not well performed and/or it was not recognized, for both User and Sys-
774
tem Performance the reported percentages refer to the rates of well-performed
775
or recognized commands at the first attempts and when summed the first and
776
the possible second attempts. Regarding user performance, since in this val-
777
idation the commands were audio-gestural, by well-performed we consider
778
that both the audio and the gestural command were performed as intended.
779
Regarding system performance, percentages show the system’s recognition
780
regardless the users performance. Indeed, we noticed that although a com-
781
mand was not well-performed, the system was able to recognize it.
Jou
rna
771
44
Journal Pre-proof
The results for user and system performance are quite different, thus de-
783
serving a careful discussion. As for user performance, the results show a
784
wide range of rates of well-performed commands with the minimum value of
785
40% (“Wash Legs” command, 1st attempt) and the maximum value of 88.5%
786
(second “Stop” command, 1st + 2nd attempt). Most of the times, the user
787
had more difficulty in performing the gestural command than the audio one,
788
thus most of times the second attempt was required due to insufficient per-
789
formance. Interestingly, the “Stop” command, that is performed two times
790
throughout the whole showering session, presents different rates between the
791
first (on average 51%) and the second (on average 88.25%) attempt. This
792
might suggest the unease of performing gestural commands and the need of
793
repetition in order to better learn it. Regarding the system performance, the
794
recognition rates shows a narrower range, with the minimum value of 72%
795
(for the commands “Wash Legs”, “Temperature Up”, and “Repeat”, 1st at-
796
tempt) and a maximum value of 100% (for the commands “Scrub Legs” and
797
first/second “Stop”, 1st + 2nd attempt).
rna
lP repro of
782
Overall System Usability: During Validation Round II, the partic-
799
ipants completed the System Usability Scale (SUS) to evaluate their sub-
800
jective perception of the overall usability of the I-Support bathing robot.
801
The SUS is a well-established, reliable and valid 10–item scale, which can
802
be quickly and easily administered to determine the user-perceived usability
803
(effectiveness, efficacy, and satisfaction) of technical systems [68]. Its items
804
are scored on a 5–point Likert-type scale ranging from “strongly agree” to
805
“strongly disagree”. The combined scores of the individual SUS items are
806
converted into a total SUS score ranging from 0 to 100, with a higher score in-
Jou
798
45
lP repro of
Journal Pre-proof
Figure 14: Percentage (%) distribution of participants’ ratings in the different SUS score categories.
807
dicating better usability. SUS scores can be classified as “worst imaginable”
808
(0–25 points), “poor” (25–39 points), “acceptable” (39–52 points), “good”
809
(52–73 points), “excellent” (73–85 points), and “best imaginable” (85–100
810
points) perceived usability [69].
The SUS score across participants (n = 25) averaged 63.8±12.1 points,
812
indicating an overall “good” usability of the I-Support system tested during
813
the validation experiments. The SUS scores ranged from the minimum of 42.5
814
points to the maximum of 87.5 points, suggesting more or less a uniformity
815
in the SUS results. However, when averaging the scores into the different
816
rating categories, as can be seen in Fig. 14, most of the participants gave
817
a rather positive feedback on the I-Support usability. Specifically, 12% of
818
the participants rated the I-Support system as “acceptable”, 48% as “good”,
819
28% as excellent, and 12% as “best imaginable”.
Jou
820
rna
811
Validation results round II at Bethanien: 46
Journal Pre-proof
The user performance and system performance were rated and captured
822
by standardized observation of the clinical test administrator using a re-
823
port sheet for each modality. The user performance of the G and the A-G
824
commands was rated using the same assessment strategy for both modali-
825
ties. Table 7 shows a detailed analysis for the system performance, where we
826
observe significantly higher CCR (%) results for the A-G experiment com-
827
pared to the G-only experiment. The two esoteric columns show monomodal
828
results for the audio and gesture modality separately obtained during the
829
multimodal (A-G) scenario. In the last column, which shows the fusion (A-
830
G) results, we observe that the use of both modalities can actually enhance
831
the results. In our understanding and from the observations we made during
832
the validation, we assume that the users probably paid much more attention
833
to gestural than to audio commands.
lP repro of
821
Finally, Fig. 15 shows results for the system and the user performance per
835
command, where we note that the per-gesture performance for the gesture-
836
only experiment accomplished a CRR of up to 68.7%. Regarding the indi-
837
vidual per command results, the system accomplished the maximum value of
838
83% for the “Wash” command, while the performance of the users yielded the
839
maximum value of 66% for “Halt”. In this case too, we observe that the sys-
840
tem successfully recognized various commands that were not well-performed
841
by the users (i.e. ”Wash” 83% vs. 34% and “Repeat” 56% vs. 26% for Sys-
842
tem and User Performance, respectively), which indicates the significance of
843
building good models for learning as well as designing models that are able
844
to recognize smaller or larger variations of the same command.
Jou
845
rna
834
Water Pouring Scenario at Bethanien
47
Journal Pre-proof
Without training
Audio-Gestural
lP repro of
Gesture-only
59.6%
audio
gesture
fusion
79.5%
48.0%
86.2%
Table 7: Validation Round II Results (Bethanien): Average System Performance results (CRR%) for the G-only and the A-G experiments; and monomodal results obtained during the multimodal A-G experiment.
Figure 15: Validation Round II Results (Bethanien): System Performance (CRR%) and User Performance (%) results per command.
The main goal of the water pouring scenario, conducted at Bethanien,
847
was to study the ability of the elderly to control the showering process using
848
different operation modes for the robotic soft-arm, which provide different
849
amount of assistance during bathing the upper back region.
851
852
853
854
Specifically, the three operation modes to be evaluated were:
Jou
850
rna
846
• Autonomous operation: The soft-arm automatically executed the motions needed to provide water pouring for the full coverage of the upper back region. In this mode, the participant had no control of the robot motion.
48
Journal Pre-proof
• Shared control : The participant could use an input device to issue sim-
856
ple motion commands (i.e. left vs. right and up vs. down), which
857
were translated to a number of (high-level) discrete commands for the
858
I-Support control system (i.e. soft-arm moved left/right or up/down),
859
while the system provided assistance in terms of audio signals indi-
860
cating that: (1) the participant’s command was recognized and (2)
861
the command was successfully executed, meaning that the participant
862
could issue the next motion command. Further assistance was provided
863
in terms of ensuring that the upper back region was not exceeded (i.e.
864
robot motions were restrained to remain within the standardized tar-
865
get body area, shown in Fig. 16). In this mode, the participant had
866
predominant, but not full control over the robot motion.
lP repro of
855
• Tele-manipulation: The participant could issue motion commands (i.e.
868
up vs. down and left vs. right) using the input device, similar to the
869
shared control mode. In this mode, however, the system did not provide
870
the audio signals for operating assistance, nor did it constrain the robot
871
motion to the upper back region. Consequently, the participant had
872
full control of the robot motion.
rna
867
Training and Comparison of Input Devices: The main question
874
during the first stage of this study was the user satisfaction and acceptabil-
875
ity of the input device for the elderly users of the I-Support system. A mo-
876
tion tracking input method involving technologies, which are available in any
877
smartwatch as presented in Sec. 2.3 versus a more typical button input device
878
were examined. For the first input method a motion tracking hand-wearable
879
device was constructed that was strapped on the external side of the palm
Jou
873
49
lP repro of
Journal Pre-proof
Figure 16: Standardized target body area (upper back region) with the six targets points for which the soft-arm provided water pouring. The red-colored cross represents the starting and final position, the black arrows indicate the path the water stream took on
rna
the participant’s back region.
(Fig. 17), containing an Inertial Measurement Unit (IMU) (including 3-axis
881
accelerometers, 3-axis gyroscopes and 1 magnetometer), a micro-controller,
882
and a bluetooth transmitter for wireless operation. The device could track
883
the motion of the user’s hand, which was transmitted to a central controller
884
and those motions were then translated by the central controller into the
885
desired motion command for the control of the soft-arm. A thimble with an
886
embedded pressure sensor acted as an activation switch for the tracker de-
887
vice (Fig. 17(b)) and only when the user pressed the thimble, the tracker was
888
activated and the motion of the hand was recorded and translated into the
Jou
880
50
lP repro of
Journal Pre-proof
Figure 17: a) Motion tracking hand-wearable device. b) Thimble with the embedded pressure sensor for activation).
889
desired motion command. The second input device was a commercial water-
890
proof computer keyboard, where the user could issue a motion command for
891
the soft-arm to be controlled by pressing the appropriate arrow button (e.g.
892
upward movement = up-arrow button).
Both input methods were introduced to the participants as an option
894
for controlling the soft-arm motion. Participants were initially trained in
895
both options by a short computer game, which required to catch a red cube
896
by a user-controlled green cube. If the red cube was caught then it would
897
randomly jump to another field on the “game board” and the participant
898
was instructed to catch it again as many times and as fast as he/she could
899
for a time period of 1 min. The participants were asked, which option of the
900
controller was found easier and thus would like to use in order to control the
901
robot motion of the soft-arm during the water pouring scenario.
Jou
902
rna
893
After the training on the two controllers, and independently of the par-
903
ticipants cognitive status, all (100%) mentioned that (1) providing motion
904
commands was much easier with the computer keyboard than with the mo-
51
Journal Pre-proof
tion tracking hand-wearable device and (2) they prefer to use the computer
906
keyboard for controlling the soft-arm in the water pouring scenario. There-
907
fore, the following water pouring scenario was performed in all participants
908
with the use of the waterproof computer keyboard.
909
lP repro of
905
Water Pouring Experiments:
The second stage of this study was
910
conducted inside the showering cabin under real water pouring conditions.
911
Initially, the participant (wearing a swimsuit or swimming trunks) was seated
912
on the motorized chair and the water temperature was set according to
913
his/her preferences. After that, the operation modes were tested in the
914
following order: (1) autonomous operation, (2) shared control, and (3) tele-
915
manipulation.
916
The test administrator explained that in the first autonomous operation
917
(duration 1 min) the soft-arm would provide water pouring fully automat-
918
ically for the upper back region following a predefined path, as shown in
919
Fig. 16, with the starting point at the top right.
After the autonomous operation was completed, the participant was in-
921
troduced into the shared control mode (duration 2 min). In this mode, the
922
participant controls the motion of the soft-arm’s water stream on his/her
923
own using the controller chosen after the training game. In this mode the
924
system would also provide an audio signal as previously described. In addi-
925
tion, it was explained that further assistance would be provided in terms of
926
restricting the robot motion to the standardized target body area. Finally,
927
the participant was instructed to cover the entire upper back region (i.e. all
928
six target points) as fast as possible by using the shared control mode.
Jou
929
rna
920
In the last round of testing, the participant used the tele-manipulation
52
Journal Pre-proof
Task Effectiveness [%]
lP repro of
Operation Mode
(mean ± SD)
Autonomous operation
100.0 ± 0.0
Shared control
79.4 ± 18.2
Tele-manipulation
64.4 ± 19.4
Table 8: Task effectiveness in the water pouring scenario with the different operation modes
930
mode (duration 2 min), in which he/she also controlled the motion of the
931
soft-arm’s water stream on his/her own, using the same controller as before,
932
trying to cover the entire back region as fast as possible. The participant had
933
now full motion control over the soft-arm and the system did not provide any
934
further assistance.
In the water pouring scenario, task effectiveness with the different oper-
936
ation modes was assessed by a measure of the area coverage, defined as the
937
percentage of the predefined target body area covered with water during the
938
standardized time period (e.g. 3 out of 6 target points covered with water
939
corresponds to 50% coverage).
rna
935
Maximum task effectiveness was achieved for all participants when the au-
941
tonomous operation mode was used, indicating a very good and reliable sys-
942
tem performance. Task effectiveness substantially decreased with the shared
943
control mode and even more with the tele-manipulation mode, as compared
944
to the fully autonomous mode (see Table 8). Fourteen participants (66.7%) in
945
the shared control mode and 19 participants (90.5%) in the tele-manipulation
946
mode did not achieve the maximum possible coverage.
947
Jou
940
Our results indicate that the autonomous operation mode for the robotic 53
lP repro of
Journal Pre-proof
Figure 18: Percentage (%) distribution of participants’ ratings in the different SUS score categories
soft-arm of the bathing robot is highly effective and reliable in providing
949
water pouring for a predefined body area. When giving participants more
950
control over the soft-arm, task effectiveness gradually decreased with lower
951
assistance provided by the bathing robot. These findings suggest that full
952
system autonomy seems to provide a preferred mode of operation for this
953
group of elderly population. A more detailed analysis on the differences in
954
the task effectiveness (and also in the user satisfaction) with the different
955
operation modes fits the scope of another publication focusing further on the
956
findings of the clinical validation study [70]. Overall System Usability: Finally, the participants completed the
Jou
957
rna
948
958
System Usability Scale (SUS) to evaluate their subjective perception of the
959
overall usability of the I-Support bath robot system.
960
961
The SUS score across participants that completed the “water pouring”
scenario (n = 22) averaged 60.7 ± 23.0 points, indicating an overall “good” 54
Journal Pre-proof
usability of the I-Support system tested during the validation experiments.
963
The SUS scores ranged from the minimum of 0 points to the maximum of 100
964
points, suggesting a large heterogeneity in the SUS results. However, when
965
averaging the scores into the different rating categories (Fig. 18) most of
966
the participants gave a rather positive feedback on the I-Support usability.
967
More than 81.8% of the participants rated the I-Support system between
968
“acceptable” to “best imaginable”, while less than 18.2% rated the I-Support
969
system between “poor” and “worst imaginable”.
970
7. Conclusion
lP repro of
962
This paper presents the I-Support robotic platform, a human-centered
972
robotic bathing system for smart assisted living, which provides assistance
973
to the frail elderly population in order to safely and independently be able
974
to complete an entire sequence of bathing tasks, such as washing their back
975
and their lower limbs. To achieve this target advanced modules of cognition,
976
sensing, context awareness and actuation have been developed, during the
977
course of the project, and have been seamlessly integrated into the robotic
978
assistance, including a functional soft-arm prototype and the adaptation of
979
a cost-effective robotic chair. Additionally, we contribute a new multimodal
980
audio-gestural dataset and a suite of tools used for data acquisition that
981
have been used for the development and modeling of our high-accuracy, real-
982
time, state-of-the-art multimodal action recognition module for the analysis,
983
monitoring and prediction of user actions. We experimentally validated the I-
984
Support system, in two clinical validation studies that were conducted in two
985
European pilot sites showcasing really good system performance, achieving
Jou
rna
971
55
Journal Pre-proof
this way an effective and natural interaction and communication, through
987
audio-gestural commands, between users and the assistive robotic platform.
988
Regarding the task effectiveness in the water pouring scenario, the results
989
indicate that the robotic soft-arm is highly effective and reliable and that
990
full system autonomy is preferred by the elderly population. Finally, the two
991
validation studies also proved high acceptability regarding the overall system
992
usability by the elderly end-users.
993
Acknowledgment
994
lP repro of
986
This research work was supported by the EU under the project I-SUPPORT with grant H2020-643666.
996
The authors would like to thank Dr. Vincenzo Genovese and Mariangela
997
Manti (SSSA), Holger Roßberg and Dr. Inga Schl¨omer (FRA-AUS), Dr. Vas-
998
silis Pitsikalis (NTUA) and all students/researchers that were involved in the
999
data collection process.
1000
1001
1002
References
rna
995
[1] W. H. Organization, World report on ageing and health, World Health Organization, 2015.
[2] S. H. Zarit, K. E. Reever, J. Bach-Peterson, Relatives of the impaired
1004
elderly: correlates of feelings of burden, The gerontologist 20 (1980)
1005
1006
1007
Jou
1003
649–655.
[3] S. McFall, B. H. Miller, Caregiver burden and nursing home admission of frail elderly persons, Journal of Gerontology 47 (1992) 73–79. 56
Journal Pre-proof
[4] D. D. Dunlop, S. L. Hughes, L. M. Manheim, Disability in activities of
1009
daily living: patterns of change and a hierarchy of disability, American
1010
Journal of Public Health 87 (1997) 378–383.
lP repro of
1008
1011
[5] S. Katz, A. Ford, R. Moskowitz, B. Jackson, M. Jaffe, Studies of illness
1012
in the aged: The index of ADL: a standardized measure of biological
1013
and psychosocial function, JAMA 185 (1963) 914–919.
1014
[6] M. Porta, A dictionary of epidemiology, Oxford University Press, 2014.
1015
[7] J. C. Mill´an-Calenti, J. Tub´ıo, S. Pita-Fern´andez, I. Gonz´alez-Abraldes,
1016
T. Lorenzo, T. Fern´andez-Arruty, A. Maseda, Prevalence of functional
1017
disability in activities of daily living (ADL), instrumental activities of
1018
daily living (ADL) and associated factors, as predictors of morbidity and
1019
mortality, Archives of Gerontology and Geriatrics 50 (2010) 306–310.
[8] S. Ahluwalia, T. Gill, D. Baker, T. Fried, Disaggregating activities of
1021
daily living limitations for predicting nursing home admission, Health
1022
Services Research 50 (2015) 560–578.
rna
1020
[9] J. Wiener, R. Hanley, R. Clark, J. Van Nostrand, Measuring the ac-
1024
tivities of daily living: Comparisons across national surveys, Journal of
1025
Gerontology 45 (1990) 229–237.
Jou
1023
1026
[10] S. Ahluwalia, T. Gill, D. Baker, T. Fried, Perspectives of older per-
1027
sons on bathing and bathing disability: A qualitative study, American
1028
Geriatrics Society 58 (2010) 450–456.
57
Journal Pre-proof
[11] C. Balaguer, A. Gimenez, A. Huete, A. Sabatini, M. Topping, G. Bolm-
1030
sjo, The MATS robot: service climbing robot for personal assistance,
1031
IEEE Robotics & Automation Magazine 13 (2006) 51–58.
lP repro of
1029
1032
[12] T. Hirose, S. Fujioka, O. Mizuno, T. Nakamura, Development of hair-
1033
washing robot equipped with scrubbing fingers, in: Proc. Int’l Conf. on
1034
Robotics and Automation (ICRA-2012), pp. 1970–1975.
1035
[13] Y. Tsumaki, T. Kon, A. Suginuma, K. Imada, A. Sekiguchi, D. Nenchev,
1036
H. Nakano, K. Hanada, Development of a skincare robot, in: Proc. Int’l
1037
Conf. on Robotics and Automation (ICRA-2008), pp. 2963–2968.
1038
[14] M. Topping, An overview of the development of Handy 1, a rehabilita-
1039
tion robot to assist the severely disabled, Artificial Life and Robotics
1040
4(4) (2000) 188–192.
[15] M. Hillman, K. Hagan, S. Hagan, J. Jepson, R. Orpwood, The we-
1042
ston wheelchair mounted assistive robot - the design story, Robotica 20
1043
(2002) 125–132.
rna
1041
[16] B. J. F. Driessen, H. G. Evers, J. A. v Woerden, Manusa wheelchair-
1045
mounted rehabilitation robot, Proc. of the Institution of Mechanical
1046
Engineers, Part H: Journal of Engineering in Medicine 215 (2001) 285–
1047
290.
1048
1049
1050
1051
Jou
1044
[17] ADL-Solutions, tem,
Oasis
seated
2016.
shower
sys-
http://www.adl-solutions.com/
Oasis-Seated-Shower-System-for-Optimal-Accessible-Bathing-Tempe-AZ. html.
58
Journal Pre-proof
1053
[18] RoboticsCare, Robotics care poseidon, 2016. http://www.roboticscare.
lP repro of
1052
com/robotics-care-poseidon/.
1054
[19] L. Yang, L. Zhang, H. Dong, A. Alelaiwi, A. El Saddik, Evaluating and
1055
improving the depth accuracy of Kinect for Windows v2, IEEE Sensors
1056
Journal 15 (2015) 4275–4285.
1057
1058
[20] K. Khoshelham, S. O. Elberink, Accuracy and resolution of Kinect depth data for indoor mapping applications, Sensors 12 (2012) 1437–1454.
1059
[21] S. Pillai, S. Ramalingam, J. J. Leonard, High-performance and tun-
1060
able stereo reconstruction, in: Proc. IEEE Int’l Conf. on Robotics and
1061
Automation (ICRA-2016), pp. 3188–3195.
1062
1063
[22] M. A. Goodrich, A. C. Schultz, Human-Robot Interaction: a survey, Found. Trends Human-Computer Interaction 1 (2007) 203–275.
[23] K. Davis, E. Owusuz, V. Bastaniy, L. Marcenaro, J. Hu, C. Regazzoni,
1065
L. Feijs, Activity recognition based on inertial sensors for ambient as-
1066
sisted living, in: Proc. Int’l Conf. on Information Fusion (FUSIO-2016).
1067
[24] R. Kachouie, S. Sedighadeli, R. Khosla, M.-T. Chu, Socially assistive
1068
robots in elderly care: A mixed-method systematic literature review,
1069
Int’l Jour. Human-Computer Interaction 30 (2014) 369–393.
Jou
rna
1064
1070
[25] D. Johnson, R. Cuijpers, J. Juola, E. Torta, M. Simonov, A. Frisiello,
1071
M. Bazzani, W. Yan, C. Weber, S. Wermter, N. Meins, J. Oberzaucher,
1072
1073
1074
P. Panek, G. Edelmayer, P. Mayer, C. Beck, Socially assistive robots, a comprehensive approach to extending independent living, Int’l Journal of Social Robotics 6 (2014) 195–211. 59
Journal Pre-proof
[26] E. Efthimiou, S.-E. Fotinea, T. Goulas, A.-L. Dimou, M. Koutsom-
1076
bogera, V. Pitsikalis, P. Maragos, C. Tzafestas, The MOBOT plat-
1077
form –showcasing multimodality in human-assistive robot interaction,
1078
in: Proc. Int.’l Conf. on Universal Access in Human-Computer Inter-
1079
action, Vol. 9738 Lecture Notes in Computer Science, Springer Int’l
1080
Publishing, 2016, pp. 382–391.
lP repro of
1075
1081
[27] F. Rudzicz, R. Wang, M. Begum, A. Mihailidis, Speech interaction with
1082
personal assistive robots supporting aging at home for individuals with
1083
alzheimer’s disease, ACM Trans. Access. Comput. 7 (2015) 1–22.
1084
[28] A. K¨otteritzsch, B. Weyers, Assistive technologies for older adults in
1085
urban areas: A literature review, Cognitive Computation 8 (2016) 299–
1086
317.
[29] A. Zlatintsi, I. Rodomagoulakis, V. Pitsikalis, P. Koutras, N. Kardaris,
1088
X. Papageorgiou, C. Tzafestas, P. Maragos, Social Human-Robot Inter-
1089
action for the elderly: Two real-life use cases, in: Proc Int’l Conf. on
1090
Human-Robot Interaction (HRI–17), March 2017.
rna
1087
[30] J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for
1092
semantic segmentation, in: Proc. IEEE Int’l Conf. on Computer Vision
1093
and Pattern Recognition (CVPR-2015), pp. 3431–3440.
1094
1095
1096
1097
Jou
1091
[31] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille, Deeplab:
Semantic image segmentation with deep convolutional
nets, atrous convolution, and fully connected crfs, arXiv:1606.00915 (2016). 60
arXiv preprint
Journal Pre-proof
[32] L.-C. Chen, Y. Yang, J. Wang, W. Xu, A. L. Yuille, Attention to scale:
1099
Scale-aware semantic image segmentation, in: Proc. IEEE Int’l Conf.
1100
on Computer Vision and Pattern Recognition (CVPR-2016).
lP repro of
1098
1101
[33] F. Xia, P. Wang, L. Chen, A. L. Yuille, Zoom better to see clearer:
1102
Human part segmentation with auto zoom net, CoRR abs/1511.06881
1103
(2015).
1104
[34] X. Liang, X. Shen, J. Feng, L. Lin, S. Yan, Semantic object parsing with
1105
graph LSTM, in: Proc. European Conf. on Computer Vision, Springer,
1106
pp. 125–143, 2016.
1107
[35] S.-E. Wei, V. Ramakrishna, T. Kanade, Y. Sheikh, Convolutional pose
1108
machines, in: Proc. IEEE Int’l Conf. on Computer Vision and Pattern
1109
Recognition (CVPR-2016), pp. 4724–4732.
[36] Z. Cao, T. Simon, S.-E. Wei, Y. Sheikh, Realtime multi-person 2D pose
1111
estimation using part affinity fields, arXiv preprint, arXiv:1611.08050
1112
(2016).
rna
1110
1113
[37] C. Laschi, B. Mazzolai, M. Cianchetti, Soft robotics: Technologies and
1114
systems pushing the boundaries of robot abilities, Science Robotics 1
1115
(2016).
[38] L. Wang, F. Iida, Deformation in soft-matter robotics: A categoriza-
1117
tion and quantitative characterization, IEEE Robotics & Automation
1118
1119
1120
Jou
1116
Magazine 22 (2015) 125–139.
[39] G. S. Chirikjian, J. W. Burdick, A hyper-redundant manipulator, IEEE Robotics & Automation Magazine 1 (1994) 22–29. 61
Journal Pre-proof
1122
[40] I. D. Walker, Continuous backbone “continuum” robot manipulators,
lP repro of
1121
ISRN Robotics 2013 (2013).
1123
[41] F. Renda, C. Laschi, A general mechanical model for tendon-driven con-
1124
tinuum manipulators, in: Proc. Int’l Conf. on Robotics and Automation
1125
(ICRA-2012), pp. 3813–3818.
1126
[42] M. D. Grissom, V. Chitrakaran, D. Dienno, M. Csencits, M. Pritts,
1127
B. Jones, W. McMahan, D. Dawson, C. Rahn, I. Walker, Design and
1128
experimental testing of the octarm soft robot manipulator, in: Defense
1129
and Security Symposium, Int’l Society for Optics and Photonics (2006),
1130
pp. 62301–62301.
[43] M. Manti, A. Pratesi, E. Falotico, M. Cianchetti, C. Laschi, Soft assistive
1132
robot for personal care of elderly people, in: Proc. IEEE Int’l Conf. on
1133
Biomedical Robotics and Biomechatronics (BioRob-2016), pp. 833–838.
1134
[44] Y. Ansari, M. Manti, E. Falotico, Y. Mollard, M. Cianchetti, C. Laschi,
1135
Towards the development of a soft manipulator as an assistive robot
1136
for personal care of elderly people, Int’l Journal of Advanced Robotic
1137
Systems 14(2) (2017).
1139
[45] C. Festo, Bionic motion robot, 2017. https://www.festo.com/group/en/ cms/12747.htm.
Jou
1138
rna
1131
1140
[46] B. Bardou, P. Zanne, F. Nageotte, M. De Mathelin, Control of a multiple
1141
sections flexible endoscopic system, in: Proc. IEEE/RSJ Int’l Conf. on
1142
Intelligent Robots and Systems (IROS-2010), pp. 2345–2350.
62
Journal Pre-proof
[47] M. Manti, V. Cacucciolo, M. Cianchetti, Stiffening in soft robotics: A
1144
review of the state of the art, IEEE Robotics & Automation Magazine
1145
23 (2016) 93–106.
lP repro of
1143
1146
[48] M. Mahvash, P. E. Dupont, Stiffness control of a continuum manipulator
1147
in contact with a soft environment, in: Proc. IEEE/RSJ Int’l Conf. on
1148
Intelligent Robots and Systems (IROS-2010), pp. 863–870.
1149
1150
[49] D. McNeill, Hand and Mind: What Gestures Reveal about Thought, Univ. of Chicago Press, 1996.
[50] J. Cassell, Computer vision in human-machine interaction, A framework
1152
for gesture generation and interpretation, Cambridge Univ. Press, 1998.
1153
[51] T. L. Chen, M. Ciocarlie, S. Cousins, P. M. Grice, K. Hawkins, K. Hsiao,
1154
C. C. Kemp, C. King, D. A. Lazewatsky, A. E. Leeper, H. Nguyen,
1155
A. Paepcke, C. Pantofaru, W. D. Smart, L. Takayama, Robots for
1156
humanity: using assistive robotics to empower people with disabilities,
1157
IEEE Robotics & Automation Magazine 20 (2013) 30–39.
rna
1151
1158
[52] D. Vasquez, P. Stein, J. Rios-Martinez, A. Escobedo, A. Spalanzani,
1159
C. Laugier, Human Aware Navigation for Assistive Robotics, Springer
1160
International Publishing, Heidelberg, pp. 449–462. [53] Y. Gu, H. Do, Y. Ou, W. Sheng, Human gesture recognition through
1162
a Kinect sensor, in: IEEE Int’l Conf. on Robotics and Biomimetics
1163
Jou
1161
(ROBIO-2012), pp. 1379–1384.
1164
[54] V. Genovese, A. Mannini, A. M. Sabatini, Smartwatch step counter for
1165
slow and intermittent ambulation, IEEE Access 5 (2017) 13028–13037. 63
Journal Pre-proof
1167
[55] H. Wang, C. Schmid, Action recognition with improved trajectories, in:
lP repro of
1166
Proc. IEEE Int’l Conf. on Computer Vision (ICCV-13).
1168
[56] H. Wang, M. M. Ullah, A. Kl¨aser, I. Laptev, C. Schmid, Evaluation of
1169
local spatio-temporal features for action recognition, in: Proc. British
1170
Machine Vision Conf. (BMVC-09).
1171
[57] I. Laptev, M. Marszalek, C. Schmid, B. Rozenfeld, Learning realistic
1172
human actions from movies, in: Proc. IEEE Conf. on Computer Vision
1173
and Pattern Recognition (CVPR-08).
1174
[58] I. Rodomagoulakis, N. Kardaris, V. Pitsikalis, E. Mavroudi, A. Kat-
1175
samanis, A. Tsiami, P. Maragos, Multimodal human action recognition
1176
in assistive human-robot interaction, in: Proc. Int’l Conf. on Acoustics,
1177
Speech and Signal Processing (ICASSP-16).
[59] A. Zlatintsi, I. Rodomagoulakis, P. Koutras, A. C. Dometios, V. Pit-
1179
sikalis, C. S. Tzafestas, P. Maragos, Multimodal signal processing and
1180
learning aspects of Human-Robot Interaction for an assistive bathing
1181
robot, in: Proc. Int’l Conf. on Acoustics, Speech and Signal Processing
1182
(ICASSP-2018), Calgary, Canada.
rna
1178
[60] Z. Cao, T. Simon, S.-E. Wei, Y. Sheikh, Realtime multi-person 2D
1184
pose estimation using part affinity fields, in: Proc. IEEE Int’l Conf. on
1185
Jou
1183
Computer Vision and Pattern Recognition (CVPR-2017).
1186
[61] A. C. Dometios, X. S. Papageorgiou, A. Arvanitakis, C. S. Tzafestas,
1187
P. Maragos, Real-time end-effector motion behavior planning approach
1188
using on-line point-cloud data towards a user adaptive assistive bath 64
Journal Pre-proof
robot, in: Proc. Int’l Conf. on Intelligent Robots and Systems (IROS-
1190
2017).
lP repro of
1189
1191
[62] A. C. Dometios, Y. Zhou, X. S. Papageorgiou, C. S. Tzafestas, T. Asfour,
1192
Vision-Based online adaptation of motion primitives to dynamic sur-
1193
faces: Application to an interactive robotic wiping task, IEEE Robotics
1194
and Automation Letters 3 (2018) 1410–1417.
1195
[63] Y. Zhou, M. Do, T. Asfour, Coordinate change dynamic movement
1196
primitivesa leader-follower approach, in: Proc. IEEE/RSJ Int’l Conf.
1197
on Intelligent Robots and Systems (IROS-2016), pp. 5481–5488.
[64] C. Mandery, O. Terlemez, M. Do, N. Vahrenkamp, T. Asfour, Unifying
1199
representations and large-scale whole-body motion databases for study-
1200
ing human motion, IEEE Transactions on Robotics 32 (2016) 796–809.
1201
[65] N. Kardaris, I. Rodomagoulakis, V. Pitsikalis, A. Arvanitakis, P. Mara-
1202
gos, A platform for building new human-computer interface systems
1203
that support online automatic recognition of audio-gestural commands,
1204
in: Proc. ACM on Multimedia Conf. (ACM-16).
1205
1206
rna
1198
[66] F. Mahoney, D. Barthel, Functional evaluation: The barthel index, Maryland State Medical Journal 14 (1965) 61–65. [67] M. F. Folstein, S. E. Folstein, P. R. McHugh, “Mini-mental state”.
1208
A practical method for grading the cognitive state of patients for the
1209
1210
1211
Jou
1207
clinician, Journal Psychiatr. Research 12 (1975) 189–198.
[68] J. Brooke, SUS: a “quick and dirty” usability scale. Usability Evaluation in Industry, London, Taylor & Francis, 1986. 65
Journal Pre-proof
[69] A. Bangor, P. Kortum, J. Miller, Determining what individual SUS
1213
scores mean: Adding an adjective rating scale, Journal of Usability
1214
Studies 4 (2009) 114–123.
lP repro of
1212
[70] C. Werner, A. Dometios, P. Maragos, C. Tzafestas, J. Bauer, K. Hauer,
1216
Evaluating the task effectiveness and user satisfaction with different op-
1217
eration modes for an assistive bathing robot in older adults with bathing
1218
disability, Submitted to Assistive Technology (2019).
Jou
rna
1215
66
Journal Pre-proof Highlights: - I-Support bathing robot: a prototype integrated robotic system aiming at supporting new aspects of assisted daily-living activities on a real-life scenario.
lP repro of
- Variety of innovative technologies and a set of advanced modules of sensing, cognition, actuation and control have been developed.
- Multimodal action recognition system to monitor, analyze and predict user actions with a high level of accuracy and detail in real-time, interpreted as robotic tasks.
- Analysis of human actions through the project's multimodal audio-gestural dataset, led to the successful modelling of Human-Robot Communication, achieving an effective and natural interaction between users and the assistive robotic platform.
Jou
rna
- Two validation studies in two clinical pilot sites conducted under realistic operating conditions showing high acceptability regarding the system usability by the elderly end-users.
Journal Pre-proof
Declaration of interests
lP repro of
☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Jou
rna
☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:
Journal Pre-proof
Jou
rna
lP repro of
Dr. Athanasia Zlatintsi (F) received the Ph.D. degree in Electrical and Computer Engineering from NTUA in 2013 and her M.Sc. in Media Technology from Royal Institute of Technology (KTH, Stockholm, Sweden) in 2006. Since 2007 she has been a researcher at the Computer Vision, Speech Communication & Signal Processing Group (CVSP) at NTUA working in different projects, funded by the European Commission and Greek Ministry of Education, in the areas of music and audio signal processing and multimodal HumanComputer/Robot Interaction. Her research contributions include the development of efficient algorithms (using nonlinear methods) for the analysis of the structure and the characteristics of musical signals as well as for the detection of saliency in audio signals in general; for the creation of audio and music summaries.