I-Support: A robotic platform of an assistive bathing robot for the elderly population

I-Support: A robotic platform of an assistive bathing robot for the elderly population

Journal Pre-proof I-Support: A robotic platform of an assistive bathing robot for the elderly population A. Zlatintsi, A.C. Dometios, N. Kardaris, I. ...

7MB Sizes 4 Downloads 15 Views

Journal Pre-proof I-Support: A robotic platform of an assistive bathing robot for the elderly population A. Zlatintsi, A.C. Dometios, N. Kardaris, I. Rodomagoulakis, P. Koutras, X. Papageorgiou, P. Maragos, C.S. Tzafestas, P. Vartholomeos, K. Hauer, C. Werner, R. Annicchiarico, M.G. Lombardi, F. Adriano, T. Asfour, A.M. Sabatini, C. Laschi, M. Cianchetti, A. Güler, I. Kokkinos, B. Klein, R. López

PII: DOI: Reference:

S0921-8890(19)30496-8 https://doi.org/10.1016/j.robot.2020.103451 ROBOT 103451

To appear in:

Robotics and Autonomous Systems

Received date : 20 June 2019 Revised date : 8 January 2020 Accepted date : 27 January 2020 Please cite this article as: A. Zlatintsi, A.C. Dometios, N. Kardaris et al., I-Support: A robotic platform of an assistive bathing robot for the elderly population, Robotics and Autonomous Systems (2020), doi: https://doi.org/10.1016/j.robot.2020.103451. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2020 Published by Elsevier B.V.

Journal Pre-proof

lP repro of

I-SUPPORT: a Robotic Platform of an Assistive Bathing Robot for the Elderly Population

A. Zlatintsia,∗, A. C. Dometiosa , N. Kardarisa , I. Rodomagoulakisa , P. Koutrasa , X. Papageorgioua , P. Maragosa , C. S. Tzafestasa , P. Vartholomeosb , K. Hauerc , C. Wernerc , R. Annicchiaricod , M. G. Lombardid , F. Adrianod , T. Asfoure , A. M. Sabatinif , C. Laschif , M. Cianchettif , A. G¨ ulerg,1 , I. Kokkinosh,1 , B. Kleini , and R. L´opezj a

Institute of Communication and Computer Systems (ICCS), National Technical University of Athens (NTUA), Greece b OMEGA Technology, Athens, Greece c Bethanien Hospital, Germany d Fondazione Santa Lucia, Rome, Italy e Karlsruhe Institute of Technology (KIT), Germany f Scuola Superiore Sant’Anna (SSSA), Pisa, Italy g Department of Computing, Imperial College London, UK h Department of Computer Science, University College London, UK i Frankfurt University of Applied Sciences, Frankfurt am Main, Germany j Robotnik Automation, SLL, Valencia, Spain

rna

Abstract

In this paper we present a prototype integrated robotic system, the I-Support bathing robot, that aims at supporting new aspects of assisted daily-living activities on a real-life scenario. The paper focuses on describing and evaluating key novel technological features of the system, with the emphasis

Jou

on cognitive human-robot interaction modules and their evaluation through a series of clinical validation studies. The I-Support project on its whole ∗

Corresponding author: Athanasia Zlatintsi, School of ECE, National Technical Univ. of Athens (NTUA), 15773 Athens,Greece, e-mail: [email protected] 1 During this work I. Kokkinos and R. A. G¨ uler were working at the French Institute for Research in Computer Science and Automation (INRIA), France.

Preprint submitted to Robotics and Autonomous Systems

October 13, 2019

Journal Pre-proof

lP repro of

has envisioned the development of an innovative, modular, ICT-supported service robotic system that assists frail seniors to safely and independently complete an entire sequence of physically and cognitively demanding bathing tasks, such as properly washing their back and their lower limbs. A variety of innovative technologies have been researched and a set of advanced modules of sensing, cognition, actuation and control have been developed and seamlessly integrated to enable the system to adapt to the target population abilities. These technologies include: human activity monitoring and recognition, adaptation of a motorized chair for safe transfer of the elderly in and out the bathing cabin, a context awareness system that provides full environmental awareness, as well as a prototype soft robotic arm and a set of user-adaptive robot motion planning and control algorithms. This paper focuses in particular on the multimodal action recognition system, developed to monitor, analyze and predict user actions with a high level of accuracy and detail in real-time, which are then interpreted as robotic tasks. In the

rna

same framework, the analysis of human actions that have become available through the project’s multimodal audio-gestural dataset, has led to the successful modelling of Human-Robot Communication, achieving an effective and natural interaction between users and the assistive robotic platform. In order to evaluate the I-Support system, two multinational validation studies

Jou

were conducted under realistic operating conditions in two clinical pilot sites. Some of the findings of these studies are presented and analysed in the paper, showing good results in terms of: (i) high acceptability regarding the system usability by this particularly challenging target group, the elderly end-users, and (ii) overall task effectiveness of the system in different operating modes.

2

Journal Pre-proof

lP repro of

Keywords: Human-Robot Communication, Assistive Human-Robot

Interaction (HRI), Bathing robot, Multimodal dataset, Audio-gestural command recognition, Online validation with elderly users

1

1. Introduction

Advanced countries with well organized and modern health care systems

3

tend to become aging societies, according to World Health Organization’s

4

research on health and ageing [1]. The percentage of population with special

5

needs for nursing attention (including people with disabilities) is significant

6

and due to grow. Health care experts are called to support these people dur-

7

ing the performance of Activities of Daily Living (ADLs) such as dressing,

8

eating and showering, inducing great financial burden both to the families

9

[2] and the caregivers [3]. Great research effort has been spent over the last

10

decades [4, 5, 6, 7] on studying and classifying the functional disabilities of

11

older adults and associating the latter with basic factors of morbidity and

12

mortality.

rna

2

Personal care (showering or bathing), which is crucial for a person’s hy-

14

giene, is included among the first ADLs, which incommode an elderly’s life [7]

15

and ADL difficulties in bathing or showering represent the strongest predic-

16

tor of subsequent institutionalization in older adults [8]. Older adults require

17

assistance in bathing or showering more frequently than for any other ADL

18

[9]. As bathing is a highly intimate ADL, the wish for independence from

19

personal bathing assistance of caregivers as long as possible is, however, not

20

unusual in older adults [10]. An assistive bathing robot may thus repre-

21

sent an opportunity for older adults with bathing disability to take care of

Jou

13

3

Journal Pre-proof

themselves in bathing, to reserve their privacy, and reduce the burden of

23

caregivers.

lP repro of

22

During the last decades, an enormous number of socially interactive

25

robots have been developed constituting the field of Human-Robot Interac-

26

tion (HRI) an actual motivating challenge. The robotics society is attempt-

27

ing to tackle this challenge of unattended nursing by developing flexible and

28

modular assistive devices that aim to cover the needs for support of everyday

29

tasks involved in the caring of frail people in both in-house [11] and clinical

30

environments. Specifically, devices intended for this purpose involve either

31

static physical interaction [12, 13, 14], or are mounted on mobile platforms

32

[15, 16]. The focus of these solutions lies on a specific body part (e.g., the

33

head) and the support of disabled people on the performance of ADLs with

34

rigid manipulators. Furthermore, the literature reveals two commercial so-

35

lutions presented in [17, 18], which provide a safe, independent showering

36

experience and are equipped with soaping and water rinsing system. On the

37

downside, both of these solutions completely lack physical interaction with

38

the user and therefore lack some basic functionalities of the bathing sequence

39

such as scrubbing and wiping the senior.

rna

24

Direct physical interaction with frail seniors raises a multitude of issues,

41

including safety, reliability for human-robot interaction and adaptability to

42

the user’s needs and preferences. Moreover, human body parts are curved

43

and deformable and their size and shape differs a lot from one person to

44

another. Unexpected body-part motion may also occur during the operation

45

of the robot, increasing the risk of undesirable and possibly harmful contact

46

between the human and the robot. Therefore, there are augmented human

Jou

40

4

Journal Pre-proof

perception requirements, not only in terms of sensorial information adequacy

48

and accuracy but also in terms of perception algorithms.

lP repro of

47

The first requirement cannot be addressed with simple proximity sensors

50

and monocular cameras, since both of these solutions give poor sensorial feed-

51

back. Additionally, cameras intended for visual capturing with RGB data

52

during the shower process, could raise some ethical issues in terms of privacy

53

and comfort, especially when an elderly person’s mental health deteriorates.

54

On the contrary, the use of cheap depth or stereo vision cameras can provide

55

rich information with good accuracy [19, 20] and fulfill the requirements of a

56

great variety of applications [21], without necessarily the use of RGB data.

57

The second requirement has actually led HRI to extend into other research

58

areas [22]. One such area concerns the development of multimodal percep-

59

tion interfaces, required to facilitate natural human-robot communication,

60

including not only visual sensors, but also audio or inertial sensors [23]. The

61

concept behind these algorithms is to design interaction techniques that will

62

enhance the communication making it natural and intuitive, enabling robots

63

to understand, interact and respond to human intentions intelligently. For a

64

review, we refer the reader to [24, 25, 26, 27, 28, 29].

rna

49

Recently, researchers in computer vision proposed some approaches based

66

on Deep Learning (DL) techniques, which have presented very detailed re-

67

sults on human perception and specifically body-part segmentation. The

68

first fully-convolutional neural network (CNN) implementing semantic seg-

69

mentation is [30] and many more works [31, 32, 33, 34] followed with detailed

70

results, which are able to segment the human body parts at pixel level or get

71

a sparse pose of the body parts [35, 36] with close to real-time performance.

Jou

65

5

Journal Pre-proof

Direct interaction of a robotic device with the environment is a research

73

subject that the robotics society is addressing for many years. But the inter-

74

action with a human being is a much more delicate action and is considered

75

risky to be executed by a rigid robotic manipulator, even if it is equipped

76

with the most sophisticated force/impedance control schemes. On the other

77

hand, the advantage of soft robots [37] lies on their inherent or structural

78

compliance, which gives them the ability to actively interact with the envi-

79

ronment and undergo large deformations [38]. The term soft robotics is not

80

only used to state that the devices are made of soft materials, but also to

81

underline the shift from robots with rigid links (even hyper redundant ones

82

[39]) to bio-inspired continuum robots. Many continuum manipulators have

83

already been presented [40] with different types of actuation, e.g., tendon

84

based [41, 42] or a combination with pneumatic chambers [43, 44, 45]. The

85

repertoire of actuation is not only important for the motion dexterity and

86

shape [46] of a soft robot but also for stiffening [47, 48] and compliance, two

87

properties that are crucial especially for physical interaction with a human.

88

Contributions and overview

rna

lP repro of

72

The I-Support project envisioned, and during its course accomplished,

90

the development and integration of an innovative, modular, ICT-supported

91

robotic system that supports frail older adults’ motion abilities. It success-

92

fully assists them to safely and independently complete various physically

93

and cognitively demanding bathing tasks, such as properly washing their

94

back and their lower limbs. The main contributions of the work presented in

95

this paper relate to the development and seamless integration of novel cog-

96

nitive human-robot interaction technologies and to the evaluation of these

Jou

89

6

Journal Pre-proof

technologies, as individual modules as well as an integrated assistive robotic

98

platform as a whole, through a series of clinical validation studies in realistic

99

scenarios and under real operating conditions. These technologies include:

100

human activity monitoring and recognition, adaptation of a motorized chair

101

for safe transfer of the elderly in and out the bathing cabin, a context aware-

102

ness system that provides full environmental awareness, as well as a prototype

103

soft robotic arm and a set of user-adaptive robot motion planning and con-

104

trol algorithms. Key features of this assistive robotic platform, described

105

in the paper, are supported by our state-of-the-art pipeline for multimodal

106

modeling and learning which aims to enhance the human-robot communica-

107

tion making it natural, intuitive and easy to use, addressing aspects of smart

108

assistive HRI. An important contribution of the work presented in this paper

109

also concerns a new dataset that includes audio commands, gestures, which is

110

an integral part of human communication [49], and co-speech gesturing data,

111

which is still quite limited in HRI [50], as well as a suite of tools used for data

112

acquisition. The end goal of our work is to support and maximize safety, reli-

113

ability and self-confidence for the frail users, as well as to enable intuitive and

114

transparent HRI which incorporates reliable user intention recognition and

115

real-time adaptation of the reactive and supportive robotics system’s behav-

116

ior. To that end, two multinational validation studies were conducted in two

117

European clinical pilot sites for the evaluation of the various functionalities

118

of the I-Support system in the bathroom environment of the selected care

119

facilities. The validation studies included elderly subjects and were carried

120

out under realistic conditions, showing good results and high acceptability

121

by this particularly challenging target group.

Jou

rna

lP repro of

97

7

Journal Pre-proof

The remainder of this paper is structured as follows: in Sec. 2 we describe

123

the overall architecture of the I-Support system. In Sec. 3 we go into more

124

detail regarding the developed online audio-gestural recognition system for

125

human-robot communication, while in Sec. 4 we describe the adaptive robotic

126

motion planning method. The dataset collected during the course of the

127

project for modelling and offline evaluation is presented in Sec. 5. Finally, in

128

Sec. 6 two multinational validation studies conducted in two European hos-

129

pitals are described, showcasing promising results both regarding the system

130

performance but also the user satisfaction.

131

2. System Architecture

rna

lP repro of

122

Figure 1: Installation of the I-Support system in clinical environment for experimental

Jou

validation. The devices constituting the overall system are presented. (a) Amphiro b1 water flow and temperature sensor. (b) General aspect of the system showing the Motorized Chair, the Soft Robotic Arm and the installation of the Kinect sensors (for audio-gestural communication). (c) Air temperature, humidity and illumination sensors by CubeSensors. (d) Smartwatch for user identification and activity tracking.

132

The I-Support system is depicted in Fig. 1(b), presenting an installation 8

Journal Pre-proof

in a clinical bathroom environment. The safety and operational requirements

134

emerging from two use cases were taken into account. These use cases include

135

two demanding washing tasks in terms of mobility and force exertion for

136

the elderly, i.e. washing the back and the legs (lower limbs). Next, we

137

describe the main modules of the system, namely: the motorized chair, the

138

human-robot interaction module and the sensors used for communication,

139

the context awareness system, the soft robotic arm and the overall software

140

and process architecture.

141

2.1. Motorized chair

lP repro of

133

A motorized chair has been employed inside the shower to effectively

143

assist the older adults during sit-to-stand and stand-to-sit tasks and for safely

144

transfer from the exterior to the interior space of the shower cabin. The

145

design that was followed was to adapt a commercially available chair due to

146

the reduced costs in comparison to custom designs. The selected chair was

147

adapted in accordance in order to provide 3DOF motion. More specifically, a

148

lateral translation adjusts the proximity of the user to the robot and a vertical

149

translation regulates the height of the chair from the ground. Additionally,

150

the rotational DOF around its axis allows for easy accessibility in different

151

bathroom environments.

152

2.2. Human-Robot Interaction

Jou

153

rna

142

For the purpose of human-robot communication and perception, the I-

154

Support system was equipped with Kinect V2 RGB-D cameras. These sen-

155

sors are frequently used for visual analysis in assistive robotics [51], [52], [53]

156

as they are inexpensive, reliable and simple to waterproof. They are also easy 9

Journal Pre-proof

to mount on different surfaces and integrate software-wise. Three Kinect sen-

158

sors were placed in the bathroom as shown in Fig. 1(b).

159

setup was designed so as to be able to deal with the two main technologi-

160

cal tasks of the I-Support project, namely: a) audio-gestural command and

161

action recognition and b) body pose estimation for robotic manipulation,

162

considering various constraints (i.e. the size of the bath cabin, the size and

163

the placement of the chair and the soft-arm robot base). For body pose es-

164

timation, it is required to have two different data streams for the analysis of

165

the tasks under investigation (i.e. washing the legs and washing the back),

166

which are RGB and depth. These two streams should be registered, in other

167

words they have to be calibrated and aligned, to allow simultaneous process-

168

ing. For the second task, i.e. (audio-)gestural and action recognition, it is

169

essential to have the RGB stream in High Definition (HD) format, in order

170

to retain the region of interest (i.e. hand gestures or face) with acceptable

171

resolution.

lP repro of

157

This multi-view

To tackle the above mentioned tasks, we have employed three Kinect V2

173

sensors that can provide all required visual and audio information. These

174

sensors can capture Full HD RGB video at 30fps (frames per second), while

175

the depth information is recorded using the infrared camera embedded in

176

the Kinect in standard resolution. The color stream is captured in BGRA

177

format of total 32bpp (bits per pixel) resulting in an uncompressed image

178

of about 8MB, while for the depth information the same format of total

179

16bpp is employed (ca. 800KB). For the audio/spoken command recogni-

180

tion task we experimented with the audio stream that can be captured by

181

the built-in multi-array microphone of the Kinect sensors. Specifically, each

Jou

rna

172

10

lP repro of

Journal Pre-proof

rna

Figure 2: Data streams acquired by Kinect 1 and 3: RGB (top), depth (2nd row) and logdepth (3rd row) frames from a selection of gestures (“Temperature Up”, “Scrub Legs”), accompanied by the corresponding German spoken commands waveforms (4th row) and spectrograms (bottom row) “Roberta, W¨ armer”, “Roberta, Wasch die Beine”.

Kinect incorporates an array of four individual microphones and the raw

183

audio information can be captured at 16000 Hz, with 32-bit resolution. Fig-

184

ure 2 shows examples of the data streams that are acquired from the Kinect

185

sensors.

186

187

Jou

182

In the early stages of system integration process the placement of the

Kinect sensors inside the bathroom was a challenging task. The constraints 11

Journal Pre-proof

of the camera placement were the capturing of all significant tasks involving

189

HRI, while at the same time satisfying the space limitations imposed by a

190

conventional bathroom in nursing homes. After extensively experimenting

191

with all possible positions the resulting sensor set-up was able to capture

192

the necessary information for both tasks, i.e. information of the user’s back

193

or legs for robotic manipulation, as well as the hand gestures performed for

194

communication with the robot. Two of the sensors (Kinect sensors 1 and 2)

195

were placed inside the bath cabin, in order to capture the legs and the back

196

of the user during the different tasks, while a third camera was placed outside

197

the cabin, in order to capture the gestures performed by the user during the

198

task washing the back.

lP repro of

188

Specifically, during the task washing the legs Kinect 2 recorded the user’s

200

legs (including registered RGB and depth in SD resolution), for body pose

201

estimation and visual tracking of the robot; while Kinect 1 was used by

202

the audio-gestural and action recognition module. Except for the streams

203

in SD resolution, sensor 1 also recorded the color stream (RGB) in Full

204

HD resolution. In this configuration no video information was captured by

205

Kinect 3. During the task washing the back, Kinect 1 recorded the back of

206

the user (captured information includes RGB and depth in SD resolution),

207

while Kinect 3 recorded the color stream, used for gesture recognition, in Full

208

HD. In this case, Kinect 2 was not required for capturing any data.

209

2.3. Context Awareness System (CAS)

Jou

210

rna

199

In order to achieve a proper operational flow during a showering task, it

211

is important for the system not only to be able to perceive and communicate

212

with the user, but also to have full environmental awareness. The latter 12

Journal Pre-proof

can be achieved with the aid of extra sensorial data coming from different

214

types of sensors. To begin with, Amphiro sensors, depicted in Fig. 1(a),

215

can provide useful data to the system regarding water flow and temperature.

216

Additionally, air temperature, humidity and illumination are environmental

217

conditions indicative for the user’s safety and comfort. These values are

218

obtained by sensors constructed by CubeSensors, shown in Fig. 1(c).

lP repro of

213

A smartwatch similar to the one presented in Fig. 1(d) is integrated for

220

user identification and activity tracking purposes [54]. The identification pro-

221

cess includes a data acquisition step in which the user’s personal data (e.g.

222

gender, age, size) and preferences (e.g. showering duration, scrubbing pat-

223

terns, body parts to avoid etc.) are stored in a database. The identification

224

step implemented with the smartwatch includes a bar-code scanning, through

225

which the personalization data are passed to the system and are taken into

226

account for the bathing procedure. Furthermore, activity tracking algorithms

227

are employed to recognize falls or inactivity time periods, which are indica-

228

tive for emergency situations. The above mentioned data are available not

229

only to the I-Support system but also to the nursing staff via an Android

230

application, for monitoring and acting while emergency situations emerge.

231

2.4. Soft Robotic Arm

rna

219

A soft-arm has been developed to assist elderly people in bathing tasks.

233

Soft manipulators can be considered intrinsically safe thanks to the actuation

234

technologies they are made of. One of their main features is their compliant

235

body that can deform passively to adapt to environment changes, thus reduc-

236

ing the complexity of active control. The technical requirements, in terms of

237

desired reachable workspace and expected movements have been guaranteed

Jou

232

13

lP repro of

Journal Pre-proof

Figure 3: Construction stage of the Soft Robotic Manipulator. The modular design provides the flexibility to modify the robot’s characteristics according to the motion requirements.

by a hybrid actuation strategy, as summarized in [43]. Here, we briefly recap

239

the arrangement of the actuators in the modules (Fig. 3). Two different ac-

240

tuation technologies are combined based on their working principle, in order

241

to achieve the desired motion behavior:

243

244

245

246

• Flexible fluidic actuators, which enable elongation and omni-directional bending, exhibiting low accuracy; • Tendon-driven mechanisms, which can shorten the mechanism and pro-

Jou

242

rna

238

vide (redundant) omni-directional bending with high movement resolution.

247

As a result, the system is capable of elongation/contraction and bending

248

movements, exploring a workspace compatible with different activation pat14

Journal Pre-proof

terns, while specific antagonistic activation sequences provide stiffness capa-

250

bilities as well.

251

2.5. Software Architecture and Operational Flow

lP repro of

249

252

In order to accommodate the hardware devices mentioned above along

253

with the required real-time processing of the data, we have designed and

254

implemented a system that uses Linux operating system in order to be able

255

to handle multiple and fully synchronized Kinect sensors in modular con-

256

figurations. With this architecture, the acquisition and the data processing

257

is made using the Robot Operating System (ROS) using a different Linux

258

machine for each camera. For the ROS setup we have used a master-client

259

network approach, where a master computer system controls all the other

260

hosts where the various sensors are connected.

Integration of the software nodes was accomplished in three stages: (i)

262

unit testing of ROS free software (i.e. mathematical functions, non-ROS libs

263

etc.), (ii) unit testing of ROS nodes and creation of ROS packages, and (iii)

264

integration of all packages and testing that they all work together according

265

to specs (topics, services, actions, loop rates). It was demonstrated that ROS-

266

network set up and data connectivity function successfully. The process for

267

remote launching of the ROS nodes was almost entirely automated through

268

nested launch files.

269

2.5.1. Operational flow and Finite State Machine (FSM)

Jou

270

rna

261

The operational flow of I-Support is composed by the robotic task se-

271

quence of the washing activities. It also includes the initialization, the ter-

272

mination and the error handling actions of the I-Support system. The robotic 15

Journal Pre-proof

task sequence for the showering process has been derived by conducting a sur-

274

vey about the senior users’ preferences on shower-activities, based on ques-

275

tionnaires that were answered by the seniors and by the caregivers. The

276

outcome of the survey indicated that the critical body areas that should be

277

washed by I-Support system are the back of the user, the private parts, and

278

the legs of the user. For the pilot studies the consortium decided to test and

279

validate the washing of the legs and of the back of the user. Furthermore, the

280

information gathered by the end-users indicated that the washing activities

281

should be broken down into washing, rinsing and scrubbing. Also, it indi-

282

cated that the Seniors would like to use a small set of commands that will

283

allow them to intuitively interact with the robotic system and synthesize a

284

complete sequence of shower activities including actions such as start, stop,

285

pause and repeat the procedure. To this end, the following minimal set of

286

audio-gestural commands was defined: {wash − back, wash − legs, scrub −

287

back, scrub − legs, rinse − back, rinse − legs, start, halt, stop, repeat}.

lP repro of

273

The entire I-Support process is modeled as a sequence of states and is su-

289

pervised by a finite state machine (FSM) which has been developed and mod-

290

eled as a directed graph. Each node of the graph corresponds to a state (for

291

example washing, rinsing, scrubbing), and each directed edge corresponds

292

to an event that triggered a change of state and optionally some associated

293

action (for example the user requests through audio-gestural commands to

294

repeat or stop the washing). The FSM manages the showering process by

295

monitoring the progress of each state and the generation of the state se-

296

quence until the termination of the process. The actual sequence of states is

297

not fixed, instead it is flexible and varies depending on the external triggers.

Jou

rna

288

16

Journal Pre-proof

The triggering events are generated either by the HRI module (see Sec. 3),

299

or by the robotic motion planner or by the Chair-Controller; based on the

300

triggering events different states sequences can be synthesized by the end-

301

user. For example, an indicative set of state sequence is: IDLE, CHAIR-IN,

302

DOUSING, SCRUBBING, SCRUBBING-PAUSED, RINSING, WASHING-

303

ENDED, CHAIR-MOVING-OUT. In between these states (which represent

304

washing activities) there are intermediate states that represent robot kine-

305

matic transitions to a starting pose required for the initiation of the next

306

activity. The described sequence is depicted in Fig. 4. Any emergency sit-

307

uations are handled by the emergency stop state, to which all states can

308

transition. The state washing-ended is also accessed by all states in order to

309

facilitate handling of errors, such as the inability of the user to complete the

310

washing procedure or the abrupt wish to terminate the process.

lP repro of

298

The FSM was implemented in ROS using the Smach package, which is a

312

powerful and scalable Python-based library for hierarchical state machines.

313

The Smach package does not depend on ROS and can be used in any Python

314

project. The executive Smach stack however provides seamless integration

315

with ROS, including smooth actionlib integration and a Smach viewer to

316

visualize and introspect state machines. Implementation wise, all states are

317

stored in the generic state machine container. They are initialized and when

318

triggered by an event they execute the action code stored in the state. Each

319

has a number of possible outcomes associated with it. An outcome is a user-

320

defined string that describes how a state finishes. The transition to the next

321

state is specified based on the outcome of the previous state.

Jou

322

rna

311

The Finite State Machine (FSM) was personalized, thus, depending on

17

Journal Pre-proof

lP repro of

State-Diagram-for-ISUPPORT-“Washing-Back”-applicaGon-(with-rinsing)

GUI:Next

CHAIRMOVINGIN

RESET

ARMMOVINGDOUSING AG:Stop+or+ AG:Halt HighLevelCon GUI:Stop DOUSINGDOUSINGtrol:TargetRea PAUSED HALTED ched AG:Repeat+or+ AG:WashBack+or# GUI:Repeat+ AG:Repeat+or# DOUSING AG:ScrubBack DOUSINGAG:WashBack+ AG:Halt+or+ HALTED or+GUI:Repeat GUI:Halt HighLevelControl:D ouseSuccess+or+ GUI:Next GUI:Next AG:ScrubBack+ ARMor+GUI:Next MOVINGSCRUBBING AG:Halt

CHAIR-IN

GUI:Next

GUI:Back

AG:WashBack+or+ GUI:Next

GUI:Repeat IDLE

AG:Repeat+or+ AG:ScrubBack+or+ GUI:Repeat

SCRUBBI NGHALTED

AG:Halt+or+ GUI:Halt

AG:Halt

AG:WashBack+ or+GUI:Next

GUI:Next

HighLevelControl: TargetReached SCRUBBI AG:Halt SCRUBBI AG:Stop+or+ NGNGGUI:stop PAUSED HALTED SCRUBBI NG AG:Repeat+or+ AG:ScrubBack+or+ GUI:Repeat HighLevelControl:Scrub Success+or+GUI:Next ARMMOVINGRINSING

AG:ScrubBack

GUI:Next

HighLevelControl:

CHAIRMOVINGOUT

GUI:Next

WASHINGENDED

TargetReached AG:Repeat+or#AG:WashBack+ or+GUI:Repeat AG:Halt+or+ RINSINGAG:Stop+or+ RINSING- AG:Halt GUI:Next HALTED GUI:Stop PAUSED AG:Halt+or+GUI:Halt RINSING HighLevelControl:RinseSuccess+or+ GUI:Next

RINSINGHALTED

AG:Repeat+or# AG:WashBack+ or+GUI:Repeat

GUI:Next

AG:Emergency Stop

rna

All#states-except:G+IDLE G+CHAIR+MOVING+IN G+CHAIR+MOVING+OUT G+WASHING+ENDED G+EMERGENCY+STOPG+

GUI:End

WASHINGENDED

All#states

EMERGEN CY-STOP

Figure 4: Example of I-Support FSM state diagram for washing back application.

the user, it could be automatically modified in order to meet the particular

324

needs and requirements (i.e. size, gender, preferences, medical condition).

325

This was achieved by interfacing the FSM with the Context Awareness Sys-

326

tem (CAS) and the personalised information that was stored in the I-Support

327

system database. The FSM also performed the launching of the ROS nodes

328

(depending on the current state) and the monitoring (running, success, fail)

Jou

323

18

329

of each active node.

330

331

2.6. Safety validation

lP repro of

Journal Pre-proof

332

I-Support is a hardware device that comes into physical contact with hu-

333

mans, either on purpose during scrubbing and washing tasks or accidentally

334

as it describes trajectories in the shower workspace. Furthermore, it is an

335

electrical device that operates in a humid and wet environment subjected

336

to jets of water. It might operate in a healthcare environment, such as a

337

hospital, where the levels of noise and electromagnetic noise should be kept

338

at a minimum. Hence, it was necessary to take actions for thorough hazard

339

mitigation and for a comprehensive safety validation of the I-Support system

340

during the design process.

The methodology that was adopted to address the safety issues involved

342

the following steps: (i) analysis of the international safety standards and

343

regulations and selection of the most relevant standard, (ii) classification of

344

the I-Support device according to that standard, (iii) hazard analysis of the I-

345

Support system (identification of hazards, identification of causes associated

346

with hazards, determination of degree of risk), (iv) hardware and software

347

design decisions to mitigate hazards (such us upper limits on acceleration

348

and speed, minimization of weight, use of soft materials, DC voltages below

349

25V, EC compliance, emergency stops, controlled system shutdown, etc.),

350

(v) testing and validation of the I-Support components and of the integrated

351

system by an external certified safety engineer, (vi) safety validation of the

352

complete I-Support installation in the operating environment where the pilot

353

studies took place.

Jou

rna

341

19

Journal Pre-proof

The ISO 13482 was selected as the most relevant safety guideline to be

355

followed throughout the project development. The I-Support robotic solution

356

was classified as a personal care robot, restraint-free assistance robot (to help

357

provide more ease and comfort in daily life for independent living). Hence,

358

the comprehensive description of the hazard-risk analysis, the subsequent

359

design decisions, and the official safety validation results all complied with

360

the ISO 13482 safety requirements. The hazard analysis, risk analysis and

361

design decisions are all presented in detail in [D2.3, Annex 1, http://www.i-

362

support-project.eu/dissemination/]. Finally, an external safety officer was

363

employed to perform systematic tests and inspections and verify that the

364

hardware and software implementation for all components, as well as for the

365

overall installation in the healthcare environment comply with the ISO 13482

366

standards and safety requirements.

367

3. Audio-Gestural HRI

lP repro of

354

Gesture recognition is a visual task that can aid in human-robot interac-

369

tion (HRI) and relies on similar visual processing methods as in action clas-

370

sification and pose estimation. For the I-Support system, we implemented a

371

simple rule for gesture recognition that involves the recognition of a vocab-

372

ulary of gestures, which serve the role of visual non-verbal commands. For

373

robustness we supplement gesture recognition with the additional perceptual

374

task of spoken command recognition. The two modalities can work either

375

independently or in fusion for enhanced performance. Within this context

376

of assistive robotics, and by exploring state-of-the-art approaches from auto-

377

matic speech recognition and visual action recognition, we have developed an

Jou

rna

368

20

Journal Pre-proof

intelligent interface that multimodally recognizes actions and commands by

379

fusing the unimodal information streams to obtain the optimum multimodal

380

hypothesis.

381

3.1. Visual processing

lP repro of

378

Gesture recognition allows the interaction of the elderly users with the

383

robotic platform through a predefined set of gestural commands. For this

384

task, we have employed state-of-the-art computer vision approaches for fea-

385

ture extraction, encoding, and classification. Our gesture and action classi-

386

fication pipeline, see Fig. 5, employs Dense Trajectories [55] along with the

387

popular Bag-of-Visual-Words (BoVW) framework. The main concept con-

388

sists of sampling feature points n from each video frame on a regular grid

389

and tracking them through time based on optical flow. Specifically, the em-

390

ployed descriptors are: the Trajectory descriptor, HOG [56], HOF [57] and

391

Motion Boundary Histograms (MBH) [56]. As depicted in Fig. 6, non-linear

392

transformation of depth using logarithm (log-depth) enhances edges related

393

to hand movements and leads to richer dense trajectories on the regions of

394

interest, close to the result obtained using the RGB stream.

rna

382

The features were encoded using BoVW and were assigned to K = 4000

396

clusters forming the representation of each video. Afterwards, each trajec-

397

tory was assigned to the closest visual word and a histogram of visual word

398

occurrences was computed. For classification non-linear SVMs were adopted

399

using the χ2 kernel [56], and different descriptors were combined in a multi-

400

channel approach accomplishing an accuracy of 81% and 84% for the tasks

401

washing the legs and back, respectively. Since multiclass classification prob-

402

lems were considered, an one-against-all approach was followed and the class

Jou

395

21

lP repro of

Journal Pre-proof

Jou

rna

Figure 5: Visual gesture classification pipeline.

Figure 6: Comparison of dense trajectories extraction over the RGB (top), depth (middle) and log-depth (bottom) clips of gesture “Scrub Back”.

403

with the highest score was selected accomplishing classification results of up

404

to 83% and 85% for the two tasks. 22

lP repro of

Journal Pre-proof

Figure 7: Spoken command recognition module for HRI, integrated in ROS, including always-listening mode and real time performance.

405

3.2. Audio processing

In addition to gesture command recognition, we have also developed a

407

spoken command recognition module [58] (Fig. 7) that detects and recognizes

408

commands provided by the user freely, at any time, among other speech and

409

non-speech events, possibly infected by environmental noise and reverbera-

410

tion. The employed features for acoustic modeling were 39 Mel-Frequency

411

Cepstral Coefficients (MFCCs) with their first- and second-order derivatives

412

extracted every 30ms from overlapping windows of 25ms duration. We target

413

robustness via a) denoising of the far-field signals by delay-and-sum beam-

414

forming, b) global Maximum Likelihood Linear Regression (MLLR) adapta-

415

tion of the acoustic models to the speaker-microphone channels of the tar-

416

geted environment, and c) combined command detection/recognition. In

417

order to build our models we performed offline classification experiments of

418

pre-segmented commands, based on a task-dependent grammar of 23 Ger-

419

man spoken commands, accompishing accuracies of 76% and 68% for the two

Jou

rna

406

23

Journal Pre-proof

tasks. For a more detailed analysis and further results regarding the offline

421

experiments of the audio-gestural HRI module we refer the reader to [59].

422

4. Real-time Robotic motion Adaptation

lP repro of

420

The execution of the robotic tasks, instructed by the user or the carer via

424

HRI commands, relies on a detailed and adaptive robotic motion planning

425

method. This method was developed during the project and is based on

426

enriched visual information. In particular, Point-Cloud data provided by the

427

Kinect sensors, which are mounted on the shower room as shown in Fig. 1 (b),

428

are used as an input together with accurate body part recognition performed

429

either as a pixel-wise [31] area coverage of each body part, or by using more

430

structural [60] skeleton information as depicted in Fig. 8. The latter deep

431

learning methods highly enhance the environment perception skills of the

432

system and are able to provide robust input to the motion planning, remain-

433

ing at the same time unaffected by the changes of the environment conditions

434

(e.g. illumination) of different shower rooms.

rna

423

The motion planning method initially proposed in [61] provides a solu-

436

tion to the problem of defining the motion behavior of a robotic manipulators

437

end-effector, operating over a curved deformable surface (e.g. the users body

438

part). Such surfaces characterize the human body parts, which are system-

439

atically or randomly moving and deforming (e.g. due to users breathing

440

motion). The main goal of the motion behavior task is the online calculation

441

of the reference pose for the end-effector, in order for the robotic manipulator

442

to be compliant with the body part and simultaneously execute predefined

443

surface tasks (e.g. scrubbing the users back). Due to the motion control

Jou

435

24

lP repro of

Journal Pre-proof

Figure 8: Example of the visual user perception algorithm in two instances: (a) handsup and (b) relaxed. The visual information is used as an input for the motion planning method.

complexity of the soft robotic manipulator described in Sec. 2.4, the motion

445

planning method is structured to be model free. Therefore, it can be ad-

446

justed to any robotic manipulator, provided that all the robot’s workspace

447

and velocity constraints are taken into account.

rna

444

The two scenarios, i.e. washing or scrubbing the back or the legs, con-

449

sidered in the I-Support project are actually the most challenging for the

450

elderly in terms of mobility limitations. These scenarios were considered

451

during the development of the motion planning method as shown in Fig. 9.

452

In the back washing scenario the area pixel-wise visual information is used,

453

whereas in the legs washing scenario we use the skeleton human recognition.

454

These recognition techniques are combined with the Point-Cloud information

455

obtained from the Kinect sensors in order to achieve accurate reference pose

456

estimation.

Jou

448

25

lP repro of

Journal Pre-proof

Figure 9: Two scenarios were considered during the development of the motion planning method: (a) Back washing and (b) Legs washing. In (a) area pixel-wise visual information was used, whereas in (b) the skeleton information was adequate for the completion of washing tasks for the legs. The red spheres depict the result of the skeleton recognition for the right leg and the green sphere depicts the target position that the robot has to reach as a result of the motion planning method.

Another important aspect of the motion planning method is the straight-

458

forward combination with other motion planning techniques, which allow

459

for the incorporation of imitation learning techniques. More specifically in

460

[62] we integrated a leader-follower framework of motion primitives (CC-

461

DMP) [63] with a vision-based motion planning method to adapt the ref-

462

erence path of a robots end-effector and allow the execution of washing ac-

463

tions. This system incorporates clinical carers expertise by producing mo-

464

tions, which are learned by demonstration, using data from the publicly

465

available KIT whole-body motion database [64]. This integration accom-

466

plishes to make the robotic washing actions more human-friendly and more

467

acceptable by the elderly users.

Jou

rna

457

26

Journal Pre-proof

5. Audio-gestural data collection

lP repro of

468

In order to be able to build accurate models for learning in our audio-

470

gestural communication module, we conducted a systematic collection of

471

multisensory data, where various experiments were designed with the in-

472

volvement and guidance of the clinical partners. During the whole process

473

of designing the data collection experiments, the clinical partners continu-

474

ously contributed with valuable feedback in order to take into account the

475

specific capabilities and needs of the elderly end-users. The experiments

476

contained multiple scenarios, e.g. for entering and exiting the shower, wash-

477

ing/scrubbing the back or the legs, for stopping or repeating a procedure, for

478

changing the temperature of the water etc. Figure 10 shows some samples

479

of the designed gestures for various bathing tasks.

rna

469

480

Jou

Figure 10: Sample gestures for the bathing HRI task.

The dataset includes data recordings in a more strict and guided con-

481

text (where the commands are predefined) and recordings of freestyle audio-

482

gestural commands while introducing various dialogue features for the inter-

483

action. Specifically, three different sessions for data collection were defined, 27

Journal Pre-proof

namely recordings of: a) 33 audio-gestural commands, performed simulta-

485

neously, b) spoken commands only and c) gesture commands only. For the

486

spoken commands we provided a short and a long version that is preceded

487

by a system activation keyword, the female name “Roberta” that exists in

488

both German and Italian language, where the actual validations with the

489

elderly users were conducted. For the gesture commands we provided more

490

than one gestures in order to include variability and thus, consider factors

491

such as a) physiological aspects of the elderly, b) intuitiveness and natural-

492

ness, c) the cognitive capacity of the elderly as well as d) the design of a

493

system that could recognize smaller or larger variations of the same com-

494

mand. Those commands, in a second phase, and for the validations with

495

the elderly end-users were narrowed down, taking into account the results

496

of the first recognition experiments (recognizability and discriminability by

497

the machine algorithms employed) and after consulting the clinical partners

498

regarding what is most suitable for the elderly end-users.

lP repro of

484

Data collection experiments were performed using the Kinect sensors in-

500

tegrated in a ROS environment and synchronized using software trigger-

501

ing, while an integrated annotation and acquisition web-interface that facil-

502

itates on-the-fly temporal ground-truth annotation for fast acquisition [65]

503

was used. For carrying out the recordings supplementary material has been

504

provided with the predefined spoken commands and the videos of the prede-

505

fined gestures. For the freestyle experiments pictures have been assembled

506

in order to guide the user so as to perform the various tasks using natural

507

language and gestures.

Jou

rna

499

28

Journal Pre-proof

5.1. Audio-Gestural Development Dataset

lP repro of

508

509

We have recorded visual data from 23 users (eight females and fifteen

510

males, aged 23-35) while performing predefined gestures, and audio data from

511

8 users while uttering predefined spoken commands in German (the users

512

were non-native German speakers, having only some beginner’s course). The

513

total number of commands for each task was: 25 and 27 gesture commands

514

for washing the legs and the back, respectively, and 23 spoken commands

515

for the core bathing tasks, i.e. washing/scrubbing/wiping the back or legs,

516

for changing base settings of the system, i.e. temperature, water flow and

517

spontaneous/emergency commands. A background model was also recorded,

518

including generic motions or gestures that are actually performed by humans

519

during bathing; so as to be able to reject out-of-vocabulary gestures/motions

520

as background actions.

521

5.2. Extended Audio-Gestural Dataset

The data have been collected from 12 native German speakers (three

523

females and nine males, aged 18-30), having three iterations/repetitions each.

524

In more detail, the extended dataset includes:

rna

522

• Calibration recording (checker-board pattern, no human subject).

526

• Human subject is given textual/visual description of the action to be

527

528

529

530

Jou

525

performed (e.g. “instruct the system to dry the legs”); for both washing positions (back/legs).

• Human subject is shown a video of predefined gestures; for both washing positions.

29

Journal Pre-proof

• Background model including random “negative” gestures.

532

• Wash/wiping motions in wash-back/legs position, including four differ-

533

lP repro of

531

ent wiping styles each (horizontal/vertical, small/big circles). • Speech commands read in wash-back/legs position.

535

Complementary high-precision VICON MX motion capture data using 10

536

cameras have been recorded for a subset of the subjects, due to the efforts

537

involved for calibrating the VICON system given the occlusions caused by the

538

I-Support setup. Specifically, seven human subjects have been recorded in

539

two recording sessions each, during the above-described recording protocol,

540

yielding a total of 1.7 TB of data in 1636 Rosbags that are available from

541

the KIT Motion Database for the purposes of the project (with VICON data

542

being available for three of the subjects).

543

6. Experimental Evaluation

rna

534

Two multinational validation studies were conducted in two European

545

pilot sites: a) Fondazione Santa Lucia (FSL) Hospital in Rome, Italy and

546

b) Bethanien Hospital in Heidelberg, Germany; including target population,

547

outcomes and indicators to evaluate the I-Support system for addressing all

548

evaluation activities. The intention of validating the system multinationally

549

is to increase the value of the users’ subjective assessments and especially

550

to evaluate the acceptability of the system by people with different habits

551

and social-ethical background. Both studies included the installation and

552

the evaluation of the various functionalities of the I-Support system in the

Jou

544

30

Journal Pre-proof

bathroom environment of the selected care facilities. For the conduction of

554

the validations both pilot sites obtained approval from the Ethics Committee.

555

lP repro of

553

The studies were realized in two rounds at each pilot site:

556

1. The first pilot testing consisted of evaluation, in dry conditions, of

557

well-defined functionalities of the first prototype of the bathing robot

558

system and was intended to obtain early feedback. This evaluation

559

round focused on a stable but limited intelligence prototype that in-

560

cluded human-robot interaction (HRI) using audio-gestural commands.

561

Feedback received from this testing round was used into the design and

562

the development process.

2. The second pilot testing round consisted of mainly summative evalua-

564

tion activities and focused on the fully functional and intelligent sys-

565

tem, thus the showering activities with water (i.e. rinsing, scrubbing),

566

including the shared control functionality of the I-Support system, the

567

integrated learning and cognition strategies, as well as the full context

568

awareness. Additionally, a water pouring scenario was also evaluated

569

examining various operation modes and also the end-users’ general sat-

570

isfaction and preferences for various controllers for the robotic motion.

571

The user group of the I-Support bathing robot is defined as persons with

572

difficulties in bathing activities, as evaluated by the bathing item of the

573

Barthel Index (0 pt. = “patient can use a bath tub, a shower, or take

574

a complete sponge bath only with assistance or supervision from another

575

person”) [66], which represents a clinically well-established index to evaluate

576

an individual’s functional disabilities in ADLs. According to this user group

Jou

rna

563

31

Journal Pre-proof

definition, participants of the evaluation studies only included persons with

578

difficulty in their ability to perform bathing activities on their own.

579

580

lP repro of

577

The two use cases evaluated included the interaction of the I-Support system with two regions of the body:

581

• Distal region that comprises lower thighs from knee joints downwards

582

and feet. Note that for washing these body parts, the user has to bend

583

forward with a high risk of losing postural control.

584

• Back region expanding from the cervical spine to the tailbone. In this case, it is practically impossible for the users to reach their back.

586

During the validations, the Kinect sensors were installed in the bathrooms

587

of the two hospital sites as shown in Fig. 1(b); incorporating the required

588

adjustments regarding their positions and angles depending on the available

589

space of each room in order to be able to monitor the elderly and the robotic

590

soft arms. Based on this data the perception unit reconstructed a 3D model

591

of the elderly and the robot and provided feedback to the system controller.

592

Then the system controller generated motion control commands and tool

593

commands and guided autonomously the soft arms to wash, scrub and rinse

594

the specific body parts.

595

6.1. Multimodal Fusion and Online A-G Command Recognition System In-

597

Jou

596

rna

585

tegration

For the online validations and in order to evaluate the System Perfor-

598

mance of the audio-gestural communication of the user with the robot, our

599

online Audio-Gestural (A-G) multimodal action recognition system was used,

600

developed in [65] using the Robotic Operating System (ROS), see Fig. 11. 32

Journal Pre-proof

The online A-G system enables the interaction between the user and the

602

robotic soft arms and thus, monitors, analyzes and predicts the user’s ac-

603

tions, giving emphasis to the command-level speech and gesture recognition.

604

Always-listening recognition is applied separately for spoken commands and

605

gestures, combining at a second level their results. A late fusion scheme is

606

used, where the individual multi-class results are combined, encoding this

607

way inter-modality agreement, using a simple rule that was found quite ef-

608

fective and fast among other approaches [58].

rna

lP repro of

601

609

610

611

612

Jou

Figure 11: Online Audio-Gestural Command Recognition System.

The overall system comprises two separate sub-systems:

1. The spoken command recognizer, a single node that performs alwayslistening speech recognition of predefined command phrases that accompany the gestures.

33

Journal Pre-proof

2. The gesture recognizer consisting of two nodes: the activity detector

614

that performs temporal localization of segments with visual activity

615

and the gesture classifier.

lP repro of

613

616

The spoken command recognition module works as described in Sec. 3.2.

617

The activity detector processes the RGB or the depth stream in a frame-by-

618

frame basis and determines the existence of visual activity in the scene, using

619

a thresholded activity “score” value. The output from both sub-systems are

620

combined at the fusion node producing a single result. Specifically, the imple-

621

mented fusion node receives the N-best results announced by the individual

622

recognizers, checking for available results every 0.5 secs by receiving periodi-

623

cally messages in order to synchronize the recognizers. A waiting period T is

624

also defined, during which the node waits to combine the incoming messages.

625

If this period expires, it is assumed that one modality either may not have

626

been activated by the user, or failed to detect the given input. In such cases,

627

the node announces single-modality results.

The A-G recognition system’s grammars (for both spoken and gesture

rna

628

command recognition) and functionalities were adapted to the specific bathing

630

tasks, delivering recognition results as ROS messages to the system’s finite

631

state machine (FSM) that: (1) decided the action to be taken after each

632

recognized command, (2) controlled the various modules and (3) managed

633

the dialogue flow by producing the right audio feedback to the user, for more

634

details see Sec. 2.5.

635

6.2. Experimental Setup and Protocol

636

Jou

629

The proposed experimental setups aimed to: 34

Journal Pre-proof

mean age±SD Heidelberg

MMSE (max. 30pt.) evaluated commands scenario # users

FSL

mean age±SD MMSE (max. 30pt.) evaluated commands scenario

Validation Round II

lP repro of

Validation Round I

# users

29 naive German-speaking patients

25 naive German-speaking patients

81.4±7.7 years

77.9±7.9 years

25.3 ± 3.1

25.6 ± 3.1

7 A-G commands

7 A-G commands

“Legs” and “Back” position

“Back” position

Gesture-only and audio-gestural scenario

25 naive Italian-speaking patients

25 naive Italian-speaking patients

67.4±8.9 years

69.4±7.5 years

27.6 ± 2.5

24.3 ± 3.5

7 A-G commands

7 A-G commands

“Legs” and “Back” position

“Legs” position

Table 1: Overview of the validation setups for the two rounds at the two pilot sites.

637

1. Objectively evaluate the effectiveness of the bathing systems individual

638

modules. Therefore, the experimental protocols were designed to pro-

639

vide as much data as possible for statistical analysis, given the frailty

640

of the users and their limitations.

2. Assess the acceptability of the overall system by the elderly subjects.

642

To accomplish this, the experiments were conducted under realistic

643

conditions and included typical procedures that the users might follow

644

during their bathing routine with the I-Support system.

646

647

648

649

3. Assess the ability of the users to complete a bathing procedure, according to their preferences, using the HRI components of the system.

Jou

645

rna

641

4. Assess the acceptability of the system by the caregiving personnel and its ability to monitor the effectiveness and safety of the bathing procedure.

35

Journal Pre-proof

English Wash legs Wash back Scrub back Stop (pause) Repeat (continue) Halt

lP repro of

Vocabulary of A-G commands Italian

German

Lava le gambe

Wasch meine Beine

Lava la schiena

Wasch meinen R¨ ucken

Strofina la schiena

Trockne meinen R¨ ucken

Basta

Stop

Ripeti

Noch einmal

Fermati subito

Wir sind fertig

Table 2: Validation Round I: The audio-gestural commands that were included in the two bathing scenarios. All commands were preceded by the keyword “Roberta”.

ID 1 2 3 4 5 6 7

Distal Region Command Modality Wash Legs A Stop A Repeat A Halt A Wash Legs A-G Halt A-G Halt A-G

Back Region Command Modality Wash Back A-G Halt A-G Scrub Back A-G Stop A-G Repeat A-G Halt A-G Halt A-G

Table 3: Validation Round I: The sequence of Audio (A) and Audio-Gestural (A-G) com-

650

rna

mands performed by the participants in the validation experiments.

6.2.1. Validation Round I

During the experiments, we simulated the two bathing scenarios at dry

652

conditions, at the two pilot sites: 1) FSL Hospital and 2) Bethanien Hospi-

653

tal. Validation round I followed the same evaluation study designs in both

654

pilot sites, in order to obtain comparable evaluation results. For the HRI

655

experiments on each site, potential I-Support users were recruited based on

656

the following main inclusion criteria: (1) dependency in bathing activities

657

as assessed by the bathing item of the Barthel Index [66] and (2) no se-

658

vere cognitive impairment as assessed by a score of >17 on the Mini-Mental

Jou

651

36

Journal Pre-proof

State Examination (MMSE) [67]. Recruitment yielded a significant num-

660

ber of participants at both sites: 25 (mean age±SD: 67.4±8.9 years) and 29

661

(mean age±SD: 81.4±7.7 years), respectively. Table 1 shows an overview of

662

the setups for the two validation rounds at the two pilot sites; where naive

663

denotes elderly users having no experience with the system, but only a few

664

minutes training prior to their interaction.

lP repro of

659

The experimental protocol exhibited a variety of 7 audio (A) or audio-

666

gestural (A-G) commands, in Italian and in German (see Table 2), in se-

667

quences that would simulate realistic interaction flows for both tasks. Ta-

668

ble 3 shows the sequence of the A and A-G commands as performed in both

669

validation experiments. Prior to the actual testing phase, all commands were

670

introduced to the participants by the clinical test administrator, while dur-

671

ing the experiment the participants were guided on how to interact with the

672

robot by showing the audio commands written on posters and the audio-

673

gestural commands by performing them, instructing them to simply read or

674

mimic them. The administrator could also intervene whenever the flow was

675

changed unexpectedly after a system failure. Additionally, a technical su-

676

pervisor handled the PCs and annotated on-the-fly the recognition results of

677

the system.

678

6.2.2. Validation Round II

Jou

679

rna

665

A second set of validation experiments were carried out at the same pilot

680

sites, aiming at an enhanced user experience. In Bethanien hospital the

681

experiments were also designed to accompany a clinical sub-study. For the

682

purpose of these validation studies, some of the gestural commands were

683

redesigned based on feedback from previous tests in order to become easier 37

Journal Pre-proof

and more intuitive for the elderly users. To address the lack of data for the

685

newly introduced gestures, a small scale data collection with healthy subjects

686

was carried out prior to the beginning of the studies.

lP repro of

684

687

Validation round II at FSL: 25 naive Italian-speaking patients tested

688

the system (mean age±SD: 69.4±7.5 years). The experimental protocol in

689

this case too included a set of 7 audio-gestural commands, for which Italian

690

audio models were developed. The participants were seated in the “legs”

691

position and the robot responded to each command with (a) audio feedback

692

and (b) by simulating the appropriate action with the soft arm for the the

693

commands: “Wash Legs”, “Scrub Legs”, “Stop”, “Repeat” and “Halt”. The

694

exact scenario is shown in Table 4. The administrator, for each participant,

695

filled in a report sheet according to the user’s performance and the system’s

696

command recognition.

Validation round II at Bethanien: A set of 25 elderly patients were

698

selected with mean age±SD: 77.9±7.9 years. The experimental protocol

699

included 7 commands, shown in Table 5, to which each patient was briefly

700

introduced. They were first asked to perform only the gestural part of the 7

701

commands (gesture-only (G) experiment) and then both the audio and the

702

gestural part of the commands at the same time (audio-gestural (A-G) ex-

703

periment). During all experiments, the participant was seated in the “back”

704

position. After a command was performed, a short break took place to give

705

the I-Support system the opportunity to respond to the command. In case of

706

successful recognition, the system responded with an appropriate audio re-

707

sponse and soft-arm action. The maximum system response time was about

708

3 to 5 seconds for the G and the A-G experiments, respectively. If the system

Jou

rna

697

38

Journal Pre-proof

1. –

Soft-arm motion & feedback

lP repro of

A-G command

the soft-arm goes to a pre-defined initial position

2. Wash my legs

audio feedback is provided and the soft-arm performs

= “Roberta, Lava la gambe”

a washing movement

2. Increase temperature

audio feedback is provided, the soft-arm continues

= “Roberta, Pi` u fredda” 3. Lower temperature = “Roberta Pi` u calda” 4. Stop

the washing movement

audio feedback is provided, the soft-arm continues the washing movement

audio feedback is provided, the soft-arm’s movement

= “Roberta, Fermati” 5. Scrub my legs

is paused

audio feedback is provided, the soft-arm performs

= “Roberta, Strofina la gambe”

a scrubbing movement

6. Stop

audio feedback is provided, the soft-arm’s movement

= “Roberta, Fermati” 7. Repeat

is paused

audio feedback is provided, the soft-arm resumes

= “Roberta, Ancora” 8. Halt

the scrubbing movement

audio feedback is provided and the soft-arm goes to its rest position

rna

= “Roberta, subito”

Table 4: Validation Round II (FSL): Commands tested in the scenario on the audiogestural human-robot interaction. Left column indicates the audio commands and the right column the audio feedback given by the system and the corresponding simulated action of the soft-arm.

did not recognize the command correctly in this time interval or the com-

710

mand was performed incorrectly by the participant, the test administrator

711

asked the participant to repeat the command once more (i.e. maximum 14

712

attempts for each experiment [2 attempts × 7 commands]).

Jou

709

39

Journal Pre-proof

1. Wash back

Gesture Command

lP repro of

Audio Command

= “Roberta, wasch meinen R¨ ucken”

2. Higher temperature = “Roberta, w¨armer”

3. Lower temperature = “Roberta, k¨alter”

4. Scrub back

= “Roberta, schrebbe meinen R¨ ucken”

5. Repeat

= “Roberta, noch einmal”

6. Stop

7. Halt

rna

= “Roberta, stop”

= “Roberta, wir sind fertig”

Table 5: Validation round II (Bethanien): Commands tested on the audio-gestural human-

Jou

robot interaction scenario. Left column indicates the audio commands and the right column the corresponding gesture for the audio-gestural commands.

713

6.3. Audio-Gestural Validation Experiments and Results

714

6.3.1. Validation Results Round I

715

716

Multimodal recognition was evaluated in terms of (1) Multimodal Com-

mand Recognition Rate (CRR): CRR = # of commands correctly recognized 40

Journal Pre-proof

by the system / # of commands correctly performed by the user, (2) accu-

718

racy, and (3) user performance/learning rate, so as to measure the correlation

719

of the system’s performance with the user’s experience. The above metrics

720

were considered in order to correlate, as much as possible, the systems per-

721

formance with the user’s experience. User performance was evaluated by

722

the percentage of well-performed commands relative to the total number of

723

commands tested. A gestural command was rated as not well-performed

724

when the movement of the gesture was not performed as intended, including

725

movement errors that seriously affected the trajectory of the gesture. An

726

audio-gestural command was rated as not well-performed if the gesture was

727

not performed as intended (as described before) and/or if phrasing of the

728

audio command was false. In this validation round, demanding acoustic and

729

visual conditions were faced along with large variability in the performance

730

of the A-G commands by the participants, constituting the recognition task

731

rather challenging. Table 6 shows the obtained CRR (%) and accuracy re-

732

sults (%), which are up to 84% and 80% for both washing tasks, averaged

733

across 29 (FSL) and 25 (Bethanien) users, respectively. The deviation be-

734

tween the two sites is due to the lower age and higher cognitive level of

735

the FSL patients, as well as the slight differences in lightning and cameras’

736

placement.

rna

Gesture recognition was expected more challenging while bathing the legs,

Jou

737

lP repro of

717

738

due to occlusions of the hands with the robot and/or the chair. The users

739

experienced only a limited amount of false alarms (3 in total as measured at

740

FSL), which were considered annoying, since the system triggered a response

741

without an “actual” input. Regarding the user performance, the participants

41

Journal Pre-proof

User Performance %

CRR %

Accuracy %

Speech

Av.

L

B

Av.

L

B

L

B

83.5

86

73

79.5

98

99

81

78

79.5

67

77

72

91

90

84

71

L

B

FSL

80

87

Bethanien

85

74

lP repro of

System Performance %

Gestures

Table 6: Validation Round I Results: Average Audio-Gestural Command Recognition Results; System Performance (CRR%) and User Performance (%) averaged across 29 and 25 users at FSL and Bethanien Hospitals, respectively. (L stands for legs, B for back and Av. for average.

742

performed successfully the spoken commands (over 98% and 90% accuracy

743

for the two sites), while the average performance of gestures was satisfactory

744

(between 70% to over 80% for the two tasks), considering the quick train-

745

ing provided by the administrator. Finally, we have to mention that the

746

results of both modalities were somehow degraded, when the user performed

747

simultaneously the A-G commands, due to increased cognitive load.

Figure 12 shows indicative curves on how the users performed, on their

749

first attempt, each gesture command after the short training, as measured for

750

the two task at the FSL validation. We note that initially (gesture ID 5) the

751

users either were not familiar with this type of communication or their con-

752

centration level was low, since they were performing only spoken commands

753

up to that point. There was however a tendency of increased learning rate,

754

meaning that during the experiments the users got more familiar with the

755

multimodal commands and executed them more accurately, indicating the

756

intuitiveness of this HRI modality. Especially for commands such as “Halt”,

757

which was repeated several times (ID 4,6,7) during the washing sequence

Jou

rna

748

42

lP repro of

Journal Pre-proof

Figure 12: Validation Round I Results (FSL): Recognition statistics per command. The vertical red lines distinguish the command sequences (see the corresponding IDs in Table 3 for the tasks “washing the legs” and “washing the back”.

the command performance of the user reached levels higher than 90%. This

759

observation is highly important, since we can conclude that simple combina-

760

tions of spoken and gestural commands are both memorable and suitable for

761

an elderly user’s communication with an assistive robotic system.

762

6.3.2. Validation Results Round II

rna

758

During Validation round II at FSL we put the emphasis on the

764

overall system integration, such as the control flow of the bathing proce-

765

dure, where the system performance in CRR% was up to 83.5% for the legs

766

position. The recognition performance of the individual audio and gesture

767

modalities was 82.9% and 48.7% respectively, indicating, as in previous ex-

768

periments, that the synergy between complementary modalities can actually

769

enhance overall results.

770

Jou

763

Figure 13 reports results on User Performance and System Performance 43

lP repro of

Journal Pre-proof

Figure 13: Validation Round II Results (FSL): User Performance (%) and System performance (%) rates per command, showing the percentage of the well-performed commands, i.e. commands performed as intended (User Performance) and the percentage of system recognitions (System Performance) for each of the following showering task commands: Wash Legs (WL), Temperature Up (TU), Temperature Down (TD), 1st Stop (S 1), Scrub Legs (SL), 2nd Stop (S 2), Repeat (R), Halt (H). Note: Audio-Gestural commands were considered as well-performed when both the audio and the gestural command were performed as intended.

(% per command), i.e. the rates of well-performed commands and commands

772

recognized by the system. As a second attempt was made when a command

773

was not well performed and/or it was not recognized, for both User and Sys-

774

tem Performance the reported percentages refer to the rates of well-performed

775

or recognized commands at the first attempts and when summed the first and

776

the possible second attempts. Regarding user performance, since in this val-

777

idation the commands were audio-gestural, by well-performed we consider

778

that both the audio and the gestural command were performed as intended.

779

Regarding system performance, percentages show the system’s recognition

780

regardless the users performance. Indeed, we noticed that although a com-

781

mand was not well-performed, the system was able to recognize it.

Jou

rna

771

44

Journal Pre-proof

The results for user and system performance are quite different, thus de-

783

serving a careful discussion. As for user performance, the results show a

784

wide range of rates of well-performed commands with the minimum value of

785

40% (“Wash Legs” command, 1st attempt) and the maximum value of 88.5%

786

(second “Stop” command, 1st + 2nd attempt). Most of the times, the user

787

had more difficulty in performing the gestural command than the audio one,

788

thus most of times the second attempt was required due to insufficient per-

789

formance. Interestingly, the “Stop” command, that is performed two times

790

throughout the whole showering session, presents different rates between the

791

first (on average 51%) and the second (on average 88.25%) attempt. This

792

might suggest the unease of performing gestural commands and the need of

793

repetition in order to better learn it. Regarding the system performance, the

794

recognition rates shows a narrower range, with the minimum value of 72%

795

(for the commands “Wash Legs”, “Temperature Up”, and “Repeat”, 1st at-

796

tempt) and a maximum value of 100% (for the commands “Scrub Legs” and

797

first/second “Stop”, 1st + 2nd attempt).

rna

lP repro of

782

Overall System Usability: During Validation Round II, the partic-

799

ipants completed the System Usability Scale (SUS) to evaluate their sub-

800

jective perception of the overall usability of the I-Support bathing robot.

801

The SUS is a well-established, reliable and valid 10–item scale, which can

802

be quickly and easily administered to determine the user-perceived usability

803

(effectiveness, efficacy, and satisfaction) of technical systems [68]. Its items

804

are scored on a 5–point Likert-type scale ranging from “strongly agree” to

805

“strongly disagree”. The combined scores of the individual SUS items are

806

converted into a total SUS score ranging from 0 to 100, with a higher score in-

Jou

798

45

lP repro of

Journal Pre-proof

Figure 14: Percentage (%) distribution of participants’ ratings in the different SUS score categories.

807

dicating better usability. SUS scores can be classified as “worst imaginable”

808

(0–25 points), “poor” (25–39 points), “acceptable” (39–52 points), “good”

809

(52–73 points), “excellent” (73–85 points), and “best imaginable” (85–100

810

points) perceived usability [69].

The SUS score across participants (n = 25) averaged 63.8±12.1 points,

812

indicating an overall “good” usability of the I-Support system tested during

813

the validation experiments. The SUS scores ranged from the minimum of 42.5

814

points to the maximum of 87.5 points, suggesting more or less a uniformity

815

in the SUS results. However, when averaging the scores into the different

816

rating categories, as can be seen in Fig. 14, most of the participants gave

817

a rather positive feedback on the I-Support usability. Specifically, 12% of

818

the participants rated the I-Support system as “acceptable”, 48% as “good”,

819

28% as excellent, and 12% as “best imaginable”.

Jou

820

rna

811

Validation results round II at Bethanien: 46

Journal Pre-proof

The user performance and system performance were rated and captured

822

by standardized observation of the clinical test administrator using a re-

823

port sheet for each modality. The user performance of the G and the A-G

824

commands was rated using the same assessment strategy for both modali-

825

ties. Table 7 shows a detailed analysis for the system performance, where we

826

observe significantly higher CCR (%) results for the A-G experiment com-

827

pared to the G-only experiment. The two esoteric columns show monomodal

828

results for the audio and gesture modality separately obtained during the

829

multimodal (A-G) scenario. In the last column, which shows the fusion (A-

830

G) results, we observe that the use of both modalities can actually enhance

831

the results. In our understanding and from the observations we made during

832

the validation, we assume that the users probably paid much more attention

833

to gestural than to audio commands.

lP repro of

821

Finally, Fig. 15 shows results for the system and the user performance per

835

command, where we note that the per-gesture performance for the gesture-

836

only experiment accomplished a CRR of up to 68.7%. Regarding the indi-

837

vidual per command results, the system accomplished the maximum value of

838

83% for the “Wash” command, while the performance of the users yielded the

839

maximum value of 66% for “Halt”. In this case too, we observe that the sys-

840

tem successfully recognized various commands that were not well-performed

841

by the users (i.e. ”Wash” 83% vs. 34% and “Repeat” 56% vs. 26% for Sys-

842

tem and User Performance, respectively), which indicates the significance of

843

building good models for learning as well as designing models that are able

844

to recognize smaller or larger variations of the same command.

Jou

845

rna

834

Water Pouring Scenario at Bethanien

47

Journal Pre-proof

Without training

Audio-Gestural

lP repro of

Gesture-only

59.6%

audio

gesture

fusion

79.5%

48.0%

86.2%

Table 7: Validation Round II Results (Bethanien): Average System Performance results (CRR%) for the G-only and the A-G experiments; and monomodal results obtained during the multimodal A-G experiment.

Figure 15: Validation Round II Results (Bethanien): System Performance (CRR%) and User Performance (%) results per command.

The main goal of the water pouring scenario, conducted at Bethanien,

847

was to study the ability of the elderly to control the showering process using

848

different operation modes for the robotic soft-arm, which provide different

849

amount of assistance during bathing the upper back region.

851

852

853

854

Specifically, the three operation modes to be evaluated were:

Jou

850

rna

846

• Autonomous operation: The soft-arm automatically executed the motions needed to provide water pouring for the full coverage of the upper back region. In this mode, the participant had no control of the robot motion.

48

Journal Pre-proof

• Shared control : The participant could use an input device to issue sim-

856

ple motion commands (i.e. left vs. right and up vs. down), which

857

were translated to a number of (high-level) discrete commands for the

858

I-Support control system (i.e. soft-arm moved left/right or up/down),

859

while the system provided assistance in terms of audio signals indi-

860

cating that: (1) the participant’s command was recognized and (2)

861

the command was successfully executed, meaning that the participant

862

could issue the next motion command. Further assistance was provided

863

in terms of ensuring that the upper back region was not exceeded (i.e.

864

robot motions were restrained to remain within the standardized tar-

865

get body area, shown in Fig. 16). In this mode, the participant had

866

predominant, but not full control over the robot motion.

lP repro of

855

• Tele-manipulation: The participant could issue motion commands (i.e.

868

up vs. down and left vs. right) using the input device, similar to the

869

shared control mode. In this mode, however, the system did not provide

870

the audio signals for operating assistance, nor did it constrain the robot

871

motion to the upper back region. Consequently, the participant had

872

full control of the robot motion.

rna

867

Training and Comparison of Input Devices: The main question

874

during the first stage of this study was the user satisfaction and acceptabil-

875

ity of the input device for the elderly users of the I-Support system. A mo-

876

tion tracking input method involving technologies, which are available in any

877

smartwatch as presented in Sec. 2.3 versus a more typical button input device

878

were examined. For the first input method a motion tracking hand-wearable

879

device was constructed that was strapped on the external side of the palm

Jou

873

49

lP repro of

Journal Pre-proof

Figure 16: Standardized target body area (upper back region) with the six targets points for which the soft-arm provided water pouring. The red-colored cross represents the starting and final position, the black arrows indicate the path the water stream took on

rna

the participant’s back region.

(Fig. 17), containing an Inertial Measurement Unit (IMU) (including 3-axis

881

accelerometers, 3-axis gyroscopes and 1 magnetometer), a micro-controller,

882

and a bluetooth transmitter for wireless operation. The device could track

883

the motion of the user’s hand, which was transmitted to a central controller

884

and those motions were then translated by the central controller into the

885

desired motion command for the control of the soft-arm. A thimble with an

886

embedded pressure sensor acted as an activation switch for the tracker de-

887

vice (Fig. 17(b)) and only when the user pressed the thimble, the tracker was

888

activated and the motion of the hand was recorded and translated into the

Jou

880

50

lP repro of

Journal Pre-proof

Figure 17: a) Motion tracking hand-wearable device. b) Thimble with the embedded pressure sensor for activation).

889

desired motion command. The second input device was a commercial water-

890

proof computer keyboard, where the user could issue a motion command for

891

the soft-arm to be controlled by pressing the appropriate arrow button (e.g.

892

upward movement = up-arrow button).

Both input methods were introduced to the participants as an option

894

for controlling the soft-arm motion. Participants were initially trained in

895

both options by a short computer game, which required to catch a red cube

896

by a user-controlled green cube. If the red cube was caught then it would

897

randomly jump to another field on the “game board” and the participant

898

was instructed to catch it again as many times and as fast as he/she could

899

for a time period of 1 min. The participants were asked, which option of the

900

controller was found easier and thus would like to use in order to control the

901

robot motion of the soft-arm during the water pouring scenario.

Jou

902

rna

893

After the training on the two controllers, and independently of the par-

903

ticipants cognitive status, all (100%) mentioned that (1) providing motion

904

commands was much easier with the computer keyboard than with the mo-

51

Journal Pre-proof

tion tracking hand-wearable device and (2) they prefer to use the computer

906

keyboard for controlling the soft-arm in the water pouring scenario. There-

907

fore, the following water pouring scenario was performed in all participants

908

with the use of the waterproof computer keyboard.

909

lP repro of

905

Water Pouring Experiments:

The second stage of this study was

910

conducted inside the showering cabin under real water pouring conditions.

911

Initially, the participant (wearing a swimsuit or swimming trunks) was seated

912

on the motorized chair and the water temperature was set according to

913

his/her preferences. After that, the operation modes were tested in the

914

following order: (1) autonomous operation, (2) shared control, and (3) tele-

915

manipulation.

916

The test administrator explained that in the first autonomous operation

917

(duration 1 min) the soft-arm would provide water pouring fully automat-

918

ically for the upper back region following a predefined path, as shown in

919

Fig. 16, with the starting point at the top right.

After the autonomous operation was completed, the participant was in-

921

troduced into the shared control mode (duration 2 min). In this mode, the

922

participant controls the motion of the soft-arm’s water stream on his/her

923

own using the controller chosen after the training game. In this mode the

924

system would also provide an audio signal as previously described. In addi-

925

tion, it was explained that further assistance would be provided in terms of

926

restricting the robot motion to the standardized target body area. Finally,

927

the participant was instructed to cover the entire upper back region (i.e. all

928

six target points) as fast as possible by using the shared control mode.

Jou

929

rna

920

In the last round of testing, the participant used the tele-manipulation

52

Journal Pre-proof

Task Effectiveness [%]

lP repro of

Operation Mode

(mean ± SD)

Autonomous operation

100.0 ± 0.0

Shared control

79.4 ± 18.2

Tele-manipulation

64.4 ± 19.4

Table 8: Task effectiveness in the water pouring scenario with the different operation modes

930

mode (duration 2 min), in which he/she also controlled the motion of the

931

soft-arm’s water stream on his/her own, using the same controller as before,

932

trying to cover the entire back region as fast as possible. The participant had

933

now full motion control over the soft-arm and the system did not provide any

934

further assistance.

In the water pouring scenario, task effectiveness with the different oper-

936

ation modes was assessed by a measure of the area coverage, defined as the

937

percentage of the predefined target body area covered with water during the

938

standardized time period (e.g. 3 out of 6 target points covered with water

939

corresponds to 50% coverage).

rna

935

Maximum task effectiveness was achieved for all participants when the au-

941

tonomous operation mode was used, indicating a very good and reliable sys-

942

tem performance. Task effectiveness substantially decreased with the shared

943

control mode and even more with the tele-manipulation mode, as compared

944

to the fully autonomous mode (see Table 8). Fourteen participants (66.7%) in

945

the shared control mode and 19 participants (90.5%) in the tele-manipulation

946

mode did not achieve the maximum possible coverage.

947

Jou

940

Our results indicate that the autonomous operation mode for the robotic 53

lP repro of

Journal Pre-proof

Figure 18: Percentage (%) distribution of participants’ ratings in the different SUS score categories

soft-arm of the bathing robot is highly effective and reliable in providing

949

water pouring for a predefined body area. When giving participants more

950

control over the soft-arm, task effectiveness gradually decreased with lower

951

assistance provided by the bathing robot. These findings suggest that full

952

system autonomy seems to provide a preferred mode of operation for this

953

group of elderly population. A more detailed analysis on the differences in

954

the task effectiveness (and also in the user satisfaction) with the different

955

operation modes fits the scope of another publication focusing further on the

956

findings of the clinical validation study [70]. Overall System Usability: Finally, the participants completed the

Jou

957

rna

948

958

System Usability Scale (SUS) to evaluate their subjective perception of the

959

overall usability of the I-Support bath robot system.

960

961

The SUS score across participants that completed the “water pouring”

scenario (n = 22) averaged 60.7 ± 23.0 points, indicating an overall “good” 54

Journal Pre-proof

usability of the I-Support system tested during the validation experiments.

963

The SUS scores ranged from the minimum of 0 points to the maximum of 100

964

points, suggesting a large heterogeneity in the SUS results. However, when

965

averaging the scores into the different rating categories (Fig. 18) most of

966

the participants gave a rather positive feedback on the I-Support usability.

967

More than 81.8% of the participants rated the I-Support system between

968

“acceptable” to “best imaginable”, while less than 18.2% rated the I-Support

969

system between “poor” and “worst imaginable”.

970

7. Conclusion

lP repro of

962

This paper presents the I-Support robotic platform, a human-centered

972

robotic bathing system for smart assisted living, which provides assistance

973

to the frail elderly population in order to safely and independently be able

974

to complete an entire sequence of bathing tasks, such as washing their back

975

and their lower limbs. To achieve this target advanced modules of cognition,

976

sensing, context awareness and actuation have been developed, during the

977

course of the project, and have been seamlessly integrated into the robotic

978

assistance, including a functional soft-arm prototype and the adaptation of

979

a cost-effective robotic chair. Additionally, we contribute a new multimodal

980

audio-gestural dataset and a suite of tools used for data acquisition that

981

have been used for the development and modeling of our high-accuracy, real-

982

time, state-of-the-art multimodal action recognition module for the analysis,

983

monitoring and prediction of user actions. We experimentally validated the I-

984

Support system, in two clinical validation studies that were conducted in two

985

European pilot sites showcasing really good system performance, achieving

Jou

rna

971

55

Journal Pre-proof

this way an effective and natural interaction and communication, through

987

audio-gestural commands, between users and the assistive robotic platform.

988

Regarding the task effectiveness in the water pouring scenario, the results

989

indicate that the robotic soft-arm is highly effective and reliable and that

990

full system autonomy is preferred by the elderly population. Finally, the two

991

validation studies also proved high acceptability regarding the overall system

992

usability by the elderly end-users.

993

Acknowledgment

994

lP repro of

986

This research work was supported by the EU under the project I-SUPPORT with grant H2020-643666.

996

The authors would like to thank Dr. Vincenzo Genovese and Mariangela

997

Manti (SSSA), Holger Roßberg and Dr. Inga Schl¨omer (FRA-AUS), Dr. Vas-

998

silis Pitsikalis (NTUA) and all students/researchers that were involved in the

999

data collection process.

1000

1001

1002

References

rna

995

[1] W. H. Organization, World report on ageing and health, World Health Organization, 2015.

[2] S. H. Zarit, K. E. Reever, J. Bach-Peterson, Relatives of the impaired

1004

elderly: correlates of feelings of burden, The gerontologist 20 (1980)

1005

1006

1007

Jou

1003

649–655.

[3] S. McFall, B. H. Miller, Caregiver burden and nursing home admission of frail elderly persons, Journal of Gerontology 47 (1992) 73–79. 56

Journal Pre-proof

[4] D. D. Dunlop, S. L. Hughes, L. M. Manheim, Disability in activities of

1009

daily living: patterns of change and a hierarchy of disability, American

1010

Journal of Public Health 87 (1997) 378–383.

lP repro of

1008

1011

[5] S. Katz, A. Ford, R. Moskowitz, B. Jackson, M. Jaffe, Studies of illness

1012

in the aged: The index of ADL: a standardized measure of biological

1013

and psychosocial function, JAMA 185 (1963) 914–919.

1014

[6] M. Porta, A dictionary of epidemiology, Oxford University Press, 2014.

1015

[7] J. C. Mill´an-Calenti, J. Tub´ıo, S. Pita-Fern´andez, I. Gonz´alez-Abraldes,

1016

T. Lorenzo, T. Fern´andez-Arruty, A. Maseda, Prevalence of functional

1017

disability in activities of daily living (ADL), instrumental activities of

1018

daily living (ADL) and associated factors, as predictors of morbidity and

1019

mortality, Archives of Gerontology and Geriatrics 50 (2010) 306–310.

[8] S. Ahluwalia, T. Gill, D. Baker, T. Fried, Disaggregating activities of

1021

daily living limitations for predicting nursing home admission, Health

1022

Services Research 50 (2015) 560–578.

rna

1020

[9] J. Wiener, R. Hanley, R. Clark, J. Van Nostrand, Measuring the ac-

1024

tivities of daily living: Comparisons across national surveys, Journal of

1025

Gerontology 45 (1990) 229–237.

Jou

1023

1026

[10] S. Ahluwalia, T. Gill, D. Baker, T. Fried, Perspectives of older per-

1027

sons on bathing and bathing disability: A qualitative study, American

1028

Geriatrics Society 58 (2010) 450–456.

57

Journal Pre-proof

[11] C. Balaguer, A. Gimenez, A. Huete, A. Sabatini, M. Topping, G. Bolm-

1030

sjo, The MATS robot: service climbing robot for personal assistance,

1031

IEEE Robotics & Automation Magazine 13 (2006) 51–58.

lP repro of

1029

1032

[12] T. Hirose, S. Fujioka, O. Mizuno, T. Nakamura, Development of hair-

1033

washing robot equipped with scrubbing fingers, in: Proc. Int’l Conf. on

1034

Robotics and Automation (ICRA-2012), pp. 1970–1975.

1035

[13] Y. Tsumaki, T. Kon, A. Suginuma, K. Imada, A. Sekiguchi, D. Nenchev,

1036

H. Nakano, K. Hanada, Development of a skincare robot, in: Proc. Int’l

1037

Conf. on Robotics and Automation (ICRA-2008), pp. 2963–2968.

1038

[14] M. Topping, An overview of the development of Handy 1, a rehabilita-

1039

tion robot to assist the severely disabled, Artificial Life and Robotics

1040

4(4) (2000) 188–192.

[15] M. Hillman, K. Hagan, S. Hagan, J. Jepson, R. Orpwood, The we-

1042

ston wheelchair mounted assistive robot - the design story, Robotica 20

1043

(2002) 125–132.

rna

1041

[16] B. J. F. Driessen, H. G. Evers, J. A. v Woerden, Manusa wheelchair-

1045

mounted rehabilitation robot, Proc. of the Institution of Mechanical

1046

Engineers, Part H: Journal of Engineering in Medicine 215 (2001) 285–

1047

290.

1048

1049

1050

1051

Jou

1044

[17] ADL-Solutions, tem,

Oasis

seated

2016.

shower

sys-

http://www.adl-solutions.com/

Oasis-Seated-Shower-System-for-Optimal-Accessible-Bathing-Tempe-AZ. html.

58

Journal Pre-proof

1053

[18] RoboticsCare, Robotics care poseidon, 2016. http://www.roboticscare.

lP repro of

1052

com/robotics-care-poseidon/.

1054

[19] L. Yang, L. Zhang, H. Dong, A. Alelaiwi, A. El Saddik, Evaluating and

1055

improving the depth accuracy of Kinect for Windows v2, IEEE Sensors

1056

Journal 15 (2015) 4275–4285.

1057

1058

[20] K. Khoshelham, S. O. Elberink, Accuracy and resolution of Kinect depth data for indoor mapping applications, Sensors 12 (2012) 1437–1454.

1059

[21] S. Pillai, S. Ramalingam, J. J. Leonard, High-performance and tun-

1060

able stereo reconstruction, in: Proc. IEEE Int’l Conf. on Robotics and

1061

Automation (ICRA-2016), pp. 3188–3195.

1062

1063

[22] M. A. Goodrich, A. C. Schultz, Human-Robot Interaction: a survey, Found. Trends Human-Computer Interaction 1 (2007) 203–275.

[23] K. Davis, E. Owusuz, V. Bastaniy, L. Marcenaro, J. Hu, C. Regazzoni,

1065

L. Feijs, Activity recognition based on inertial sensors for ambient as-

1066

sisted living, in: Proc. Int’l Conf. on Information Fusion (FUSIO-2016).

1067

[24] R. Kachouie, S. Sedighadeli, R. Khosla, M.-T. Chu, Socially assistive

1068

robots in elderly care: A mixed-method systematic literature review,

1069

Int’l Jour. Human-Computer Interaction 30 (2014) 369–393.

Jou

rna

1064

1070

[25] D. Johnson, R. Cuijpers, J. Juola, E. Torta, M. Simonov, A. Frisiello,

1071

M. Bazzani, W. Yan, C. Weber, S. Wermter, N. Meins, J. Oberzaucher,

1072

1073

1074

P. Panek, G. Edelmayer, P. Mayer, C. Beck, Socially assistive robots, a comprehensive approach to extending independent living, Int’l Journal of Social Robotics 6 (2014) 195–211. 59

Journal Pre-proof

[26] E. Efthimiou, S.-E. Fotinea, T. Goulas, A.-L. Dimou, M. Koutsom-

1076

bogera, V. Pitsikalis, P. Maragos, C. Tzafestas, The MOBOT plat-

1077

form –showcasing multimodality in human-assistive robot interaction,

1078

in: Proc. Int.’l Conf. on Universal Access in Human-Computer Inter-

1079

action, Vol. 9738 Lecture Notes in Computer Science, Springer Int’l

1080

Publishing, 2016, pp. 382–391.

lP repro of

1075

1081

[27] F. Rudzicz, R. Wang, M. Begum, A. Mihailidis, Speech interaction with

1082

personal assistive robots supporting aging at home for individuals with

1083

alzheimer’s disease, ACM Trans. Access. Comput. 7 (2015) 1–22.

1084

[28] A. K¨otteritzsch, B. Weyers, Assistive technologies for older adults in

1085

urban areas: A literature review, Cognitive Computation 8 (2016) 299–

1086

317.

[29] A. Zlatintsi, I. Rodomagoulakis, V. Pitsikalis, P. Koutras, N. Kardaris,

1088

X. Papageorgiou, C. Tzafestas, P. Maragos, Social Human-Robot Inter-

1089

action for the elderly: Two real-life use cases, in: Proc Int’l Conf. on

1090

Human-Robot Interaction (HRI–17), March 2017.

rna

1087

[30] J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for

1092

semantic segmentation, in: Proc. IEEE Int’l Conf. on Computer Vision

1093

and Pattern Recognition (CVPR-2015), pp. 3431–3440.

1094

1095

1096

1097

Jou

1091

[31] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille, Deeplab:

Semantic image segmentation with deep convolutional

nets, atrous convolution, and fully connected crfs, arXiv:1606.00915 (2016). 60

arXiv preprint

Journal Pre-proof

[32] L.-C. Chen, Y. Yang, J. Wang, W. Xu, A. L. Yuille, Attention to scale:

1099

Scale-aware semantic image segmentation, in: Proc. IEEE Int’l Conf.

1100

on Computer Vision and Pattern Recognition (CVPR-2016).

lP repro of

1098

1101

[33] F. Xia, P. Wang, L. Chen, A. L. Yuille, Zoom better to see clearer:

1102

Human part segmentation with auto zoom net, CoRR abs/1511.06881

1103

(2015).

1104

[34] X. Liang, X. Shen, J. Feng, L. Lin, S. Yan, Semantic object parsing with

1105

graph LSTM, in: Proc. European Conf. on Computer Vision, Springer,

1106

pp. 125–143, 2016.

1107

[35] S.-E. Wei, V. Ramakrishna, T. Kanade, Y. Sheikh, Convolutional pose

1108

machines, in: Proc. IEEE Int’l Conf. on Computer Vision and Pattern

1109

Recognition (CVPR-2016), pp. 4724–4732.

[36] Z. Cao, T. Simon, S.-E. Wei, Y. Sheikh, Realtime multi-person 2D pose

1111

estimation using part affinity fields, arXiv preprint, arXiv:1611.08050

1112

(2016).

rna

1110

1113

[37] C. Laschi, B. Mazzolai, M. Cianchetti, Soft robotics: Technologies and

1114

systems pushing the boundaries of robot abilities, Science Robotics 1

1115

(2016).

[38] L. Wang, F. Iida, Deformation in soft-matter robotics: A categoriza-

1117

tion and quantitative characterization, IEEE Robotics & Automation

1118

1119

1120

Jou

1116

Magazine 22 (2015) 125–139.

[39] G. S. Chirikjian, J. W. Burdick, A hyper-redundant manipulator, IEEE Robotics & Automation Magazine 1 (1994) 22–29. 61

Journal Pre-proof

1122

[40] I. D. Walker, Continuous backbone “continuum” robot manipulators,

lP repro of

1121

ISRN Robotics 2013 (2013).

1123

[41] F. Renda, C. Laschi, A general mechanical model for tendon-driven con-

1124

tinuum manipulators, in: Proc. Int’l Conf. on Robotics and Automation

1125

(ICRA-2012), pp. 3813–3818.

1126

[42] M. D. Grissom, V. Chitrakaran, D. Dienno, M. Csencits, M. Pritts,

1127

B. Jones, W. McMahan, D. Dawson, C. Rahn, I. Walker, Design and

1128

experimental testing of the octarm soft robot manipulator, in: Defense

1129

and Security Symposium, Int’l Society for Optics and Photonics (2006),

1130

pp. 62301–62301.

[43] M. Manti, A. Pratesi, E. Falotico, M. Cianchetti, C. Laschi, Soft assistive

1132

robot for personal care of elderly people, in: Proc. IEEE Int’l Conf. on

1133

Biomedical Robotics and Biomechatronics (BioRob-2016), pp. 833–838.

1134

[44] Y. Ansari, M. Manti, E. Falotico, Y. Mollard, M. Cianchetti, C. Laschi,

1135

Towards the development of a soft manipulator as an assistive robot

1136

for personal care of elderly people, Int’l Journal of Advanced Robotic

1137

Systems 14(2) (2017).

1139

[45] C. Festo, Bionic motion robot, 2017. https://www.festo.com/group/en/ cms/12747.htm.

Jou

1138

rna

1131

1140

[46] B. Bardou, P. Zanne, F. Nageotte, M. De Mathelin, Control of a multiple

1141

sections flexible endoscopic system, in: Proc. IEEE/RSJ Int’l Conf. on

1142

Intelligent Robots and Systems (IROS-2010), pp. 2345–2350.

62

Journal Pre-proof

[47] M. Manti, V. Cacucciolo, M. Cianchetti, Stiffening in soft robotics: A

1144

review of the state of the art, IEEE Robotics & Automation Magazine

1145

23 (2016) 93–106.

lP repro of

1143

1146

[48] M. Mahvash, P. E. Dupont, Stiffness control of a continuum manipulator

1147

in contact with a soft environment, in: Proc. IEEE/RSJ Int’l Conf. on

1148

Intelligent Robots and Systems (IROS-2010), pp. 863–870.

1149

1150

[49] D. McNeill, Hand and Mind: What Gestures Reveal about Thought, Univ. of Chicago Press, 1996.

[50] J. Cassell, Computer vision in human-machine interaction, A framework

1152

for gesture generation and interpretation, Cambridge Univ. Press, 1998.

1153

[51] T. L. Chen, M. Ciocarlie, S. Cousins, P. M. Grice, K. Hawkins, K. Hsiao,

1154

C. C. Kemp, C. King, D. A. Lazewatsky, A. E. Leeper, H. Nguyen,

1155

A. Paepcke, C. Pantofaru, W. D. Smart, L. Takayama, Robots for

1156

humanity: using assistive robotics to empower people with disabilities,

1157

IEEE Robotics & Automation Magazine 20 (2013) 30–39.

rna

1151

1158

[52] D. Vasquez, P. Stein, J. Rios-Martinez, A. Escobedo, A. Spalanzani,

1159

C. Laugier, Human Aware Navigation for Assistive Robotics, Springer

1160

International Publishing, Heidelberg, pp. 449–462. [53] Y. Gu, H. Do, Y. Ou, W. Sheng, Human gesture recognition through

1162

a Kinect sensor, in: IEEE Int’l Conf. on Robotics and Biomimetics

1163

Jou

1161

(ROBIO-2012), pp. 1379–1384.

1164

[54] V. Genovese, A. Mannini, A. M. Sabatini, Smartwatch step counter for

1165

slow and intermittent ambulation, IEEE Access 5 (2017) 13028–13037. 63

Journal Pre-proof

1167

[55] H. Wang, C. Schmid, Action recognition with improved trajectories, in:

lP repro of

1166

Proc. IEEE Int’l Conf. on Computer Vision (ICCV-13).

1168

[56] H. Wang, M. M. Ullah, A. Kl¨aser, I. Laptev, C. Schmid, Evaluation of

1169

local spatio-temporal features for action recognition, in: Proc. British

1170

Machine Vision Conf. (BMVC-09).

1171

[57] I. Laptev, M. Marszalek, C. Schmid, B. Rozenfeld, Learning realistic

1172

human actions from movies, in: Proc. IEEE Conf. on Computer Vision

1173

and Pattern Recognition (CVPR-08).

1174

[58] I. Rodomagoulakis, N. Kardaris, V. Pitsikalis, E. Mavroudi, A. Kat-

1175

samanis, A. Tsiami, P. Maragos, Multimodal human action recognition

1176

in assistive human-robot interaction, in: Proc. Int’l Conf. on Acoustics,

1177

Speech and Signal Processing (ICASSP-16).

[59] A. Zlatintsi, I. Rodomagoulakis, P. Koutras, A. C. Dometios, V. Pit-

1179

sikalis, C. S. Tzafestas, P. Maragos, Multimodal signal processing and

1180

learning aspects of Human-Robot Interaction for an assistive bathing

1181

robot, in: Proc. Int’l Conf. on Acoustics, Speech and Signal Processing

1182

(ICASSP-2018), Calgary, Canada.

rna

1178

[60] Z. Cao, T. Simon, S.-E. Wei, Y. Sheikh, Realtime multi-person 2D

1184

pose estimation using part affinity fields, in: Proc. IEEE Int’l Conf. on

1185

Jou

1183

Computer Vision and Pattern Recognition (CVPR-2017).

1186

[61] A. C. Dometios, X. S. Papageorgiou, A. Arvanitakis, C. S. Tzafestas,

1187

P. Maragos, Real-time end-effector motion behavior planning approach

1188

using on-line point-cloud data towards a user adaptive assistive bath 64

Journal Pre-proof

robot, in: Proc. Int’l Conf. on Intelligent Robots and Systems (IROS-

1190

2017).

lP repro of

1189

1191

[62] A. C. Dometios, Y. Zhou, X. S. Papageorgiou, C. S. Tzafestas, T. Asfour,

1192

Vision-Based online adaptation of motion primitives to dynamic sur-

1193

faces: Application to an interactive robotic wiping task, IEEE Robotics

1194

and Automation Letters 3 (2018) 1410–1417.

1195

[63] Y. Zhou, M. Do, T. Asfour, Coordinate change dynamic movement

1196

primitivesa leader-follower approach, in: Proc. IEEE/RSJ Int’l Conf.

1197

on Intelligent Robots and Systems (IROS-2016), pp. 5481–5488.

[64] C. Mandery, O. Terlemez, M. Do, N. Vahrenkamp, T. Asfour, Unifying

1199

representations and large-scale whole-body motion databases for study-

1200

ing human motion, IEEE Transactions on Robotics 32 (2016) 796–809.

1201

[65] N. Kardaris, I. Rodomagoulakis, V. Pitsikalis, A. Arvanitakis, P. Mara-

1202

gos, A platform for building new human-computer interface systems

1203

that support online automatic recognition of audio-gestural commands,

1204

in: Proc. ACM on Multimedia Conf. (ACM-16).

1205

1206

rna

1198

[66] F. Mahoney, D. Barthel, Functional evaluation: The barthel index, Maryland State Medical Journal 14 (1965) 61–65. [67] M. F. Folstein, S. E. Folstein, P. R. McHugh, “Mini-mental state”.

1208

A practical method for grading the cognitive state of patients for the

1209

1210

1211

Jou

1207

clinician, Journal Psychiatr. Research 12 (1975) 189–198.

[68] J. Brooke, SUS: a “quick and dirty” usability scale. Usability Evaluation in Industry, London, Taylor & Francis, 1986. 65

Journal Pre-proof

[69] A. Bangor, P. Kortum, J. Miller, Determining what individual SUS

1213

scores mean: Adding an adjective rating scale, Journal of Usability

1214

Studies 4 (2009) 114–123.

lP repro of

1212

[70] C. Werner, A. Dometios, P. Maragos, C. Tzafestas, J. Bauer, K. Hauer,

1216

Evaluating the task effectiveness and user satisfaction with different op-

1217

eration modes for an assistive bathing robot in older adults with bathing

1218

disability, Submitted to Assistive Technology (2019).

Jou

rna

1215

66

Journal Pre-proof Highlights: - I-Support bathing robot: a prototype integrated robotic system aiming at supporting new aspects of assisted daily-living activities on a real-life scenario.

lP repro of

- Variety of innovative technologies and a set of advanced modules of sensing, cognition, actuation and control have been developed.

- Multimodal action recognition system to monitor, analyze and predict user actions with a high level of accuracy and detail in real-time, interpreted as robotic tasks.

- Analysis of human actions through the project's multimodal audio-gestural dataset, led to the successful modelling of Human-Robot Communication, achieving an effective and natural interaction between users and the assistive robotic platform.

Jou

rna

- Two validation studies in two clinical pilot sites conducted under realistic operating conditions showing high acceptability regarding the system usability by the elderly end-users.

Journal Pre-proof

Declaration of interests

lP repro of

☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Jou

rna

☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:

Journal Pre-proof

Jou

rna

lP repro of

Dr. Athanasia Zlatintsi (F) received the Ph.D. degree in Electrical and Computer Engineering from NTUA in 2013 and her M.Sc. in Media Technology from Royal Institute of Technology (KTH, Stockholm, Sweden) in 2006. Since 2007 she has been a researcher at the Computer Vision, Speech Communication & Signal Processing Group (CVSP) at NTUA working in different projects, funded by the European Commission and Greek Ministry of Education, in the areas of music and audio signal processing and multimodal HumanComputer/Robot Interaction. Her research contributions include the development of efficient algorithms (using nonlinear methods) for the analysis of the structure and the characteristics of musical signals as well as for the detection of saliency in audio signals in general; for the creation of audio and music summaries.