ARTICLE IN PRESS
JID: MICPRO
[m5G;October 23, 2015;17:48]
Microprocessors and Microsystems xxx (2015) xxx–xxx
Contents lists available at ScienceDirect
Microprocessors and Microsystems journal homepage: www.elsevier.com/locate/micpro
Speech-controlled cloud-based wheelchair platform for disabled persons Andrej Škraba a,∗, Radovan Stojanovic´ b, Anton Zupan c, Andrej Koložvari a, Davorin Kofjaˇc a
Q1
a
Cybernetics & Decision Support Systems Laboratory, Faculty of Organizational Sciences, University of Maribor, Kidriˇceva cesta 55a, SI-4000 Kranj, Slovenia Faculty of Electrical Engineering, University of Montenegro, Džordža Vašingtona bb, 81000 Podgorica, Montenegro c Rehabilitation Institute – Soˇca, University of Ljubljana, Linhartova 51, SI-1000 Ljubljana, Slovenia b
a r t i c l e
i n f o
Article history: Received 29 January 2015 Revised 20 September 2015 Accepted 2 October 2015 Available online xxx Keywords: Cyber-physical system Internet of things Computer cloud, HTML5 JavaScript ECMA Script speech recognition Devices for rehabilitation Wheelchair Node.js
This paper describes the development of a prototype speech-controlled cloud-based wheelchair platform. The control of the platform is implemented using a low-cost WebKit Speech API in the cloud. The description of the cloud-based wheelchair control system is provided. In addition to the voice control, a GUI is implemented, which works in a web browser as well as on mobile devices providing live video streaming. Development was done in two phases: first, a small, initial prototype was developed and, second, a full size prototype was build. The accuracy of the speech recognition system was estimated as ranging from approximately 60% to up to 97%, dependent on the speaker. The speech-controlled system latency was measured as well as the latency when the control is provided via touch on a so-called smart device. Measured latencies ranged from 0.4 s to 1.3 s. The platform was also clinically tested, providing promising results of cloud-based speech recognition for further implementation. The developed platform is based on a Quad Core ARM Mini PC GK802 running Ubuntu Linux and an Arduino UNO Microcontroller. Software development was done in JavaScript/ECMA Script, applying node.js. © 2015 Published by Elsevier B.V.
1
1. Introduction
2
The availability of the smart wheelchair solutions is often limited due to their high costs [1–5], making them inaccessible for most people. Furthermore, electric wheelchairs are becoming increasingly common, but the control systems are reduced mainly to joysticks that are generally suitable only for patients with motor disabilities in their lower limbs [6]. There are significant efforts in the field of smart wheelchair development guided by speech [7–9] based on SUMMIT speech recognizer or Sphinx [10] indicating the problems of Word Error Rate (WER). Since the technology of speech recognition is critical for the adequate operation of smart wheelchairs, the possibility of applying cloud technology in this field was researched with regard to the possibility of lowering the cost of the system development as well as providing higher accuracy of the speech recognition. By means of the emerging interdisciplinary discipline of cyber-physical systems [11], we propose a low-cost solution for a speech-controlled wheelchair platform, also suitable for the patients with severe disabilities. Cyber-physical systems are integrating the dynamic physical processes with processes of software and communications, providing abstraction and modeling, design and analysis for an integrated whole [12]. Such technology is based on several
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Q2
a b s t r a c t
∗
Corresponding author. Tel.: +386 51322833; fax: +386 42374299. E-mail address:
[email protected],
[email protected] (A. Škraba).
interconnected disciplines, i.e. embedded systems, computers, communications, software, and mechanical engineering. Technologies such as HTML5, JavaScript/ECMA Script in the context of Internet of Things provide the new possibilities in the field of the rehabilitation robotics platform development [13,14]. JavaScript/ECMA Script and HTML5 are becoming more valuable not only in the field of Internet programming but also in the hardware domain [15], blending into cyber-physical systems [13]. The wheelchair robotic platform presented in this paper has been designed in order to meet the following requirements:
22
– Control the movement of the wheelchair with speech recognition using available Web Speech Application Programming Interface (API) in the cloud. – In addition to speech control, the control of the movements of a wheelchair should be possible wirelessly via a web-based GUI. Control should be made from anywhere as long as a local or internet connection is available. – Control of the movement should be implemented in a web browser, mobile devices and other available devices such as SmartTV. The control should be possible in parallel, i.e. several controllers could be used at the same time in order to enable remote monitoring and control. – Real time video streaming from the wheelchair platform should be provided in order to monitor and control the movements and ensure security.
32
http://dx.doi.org/10.1016/j.micpro.2015.10.004 0141-9331/© 2015 Published by Elsevier B.V.
Please cite this article as: A. Škraba et al., Speech-controlled cloud-based wheelchair platform for disabled persons, Microprocessors and Microsystems (2015), http://dx.doi.org/10.1016/j.micpro.2015.10.004
23 24 25 26 27 28 29 30 31
33 34 35 36 37 38 39 40 41 42 43 44 45 46
ARTICLE IN PRESS
JID: MICPRO 2
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
[m5G;October 23, 2015;17:48]
A. Škraba et al. / Microprocessors and Microsystems xxx (2015) xxx–xxx
The developed platform should therefore also provide remote control for a tele-operated hospital transporter solution for patients. Since HTML5/JavaScript/ECMA Script technology will be applied, the wheelchair platform will also be suitable for the educational purposes. 2. System architecture The proposed system was designed in order to benefit patients who cannot control their upper and lower extremities. Fig. 1 represents the overview of the system architecture. The main component of the system is a Mini ARM PC GK802 SOC i.MX 6 with a Quad-core CPU at 1.2 GHz Cortex-A9 running fully fledged Ubuntu Linux 12.04 [16]. A significant advantage of the Mini ARM PC is its integrated WiFi. The Mini PC is connected to the Arduino UNO microcontroller [17] and a Logitech C210 video camera with 640 × 480 resolution. The input into the system is a smart device, e.g. a smartphone, which has become a ubiquitous device, incorporating a camera, a microphone and a speaker. The camera can be used to provide monitoring of movement, the speaker to provide the audio feedback of the wheelchair control system (WCS), while the microphone is available for speech control. One of the possible means of control is also using the touch screen with buttons for motion control. The smart device is connected to the WCS over the internet via a Wi-Fi router. In our case, the wireless functionality is provided by the LinkSys 54 WRT WiFi
router with dd-WRT firmware installed, enabling smart device control. The Wi-Fi router is connected via Unshielded Twisted Pair (UTP) cable to the internet. This connection should be fast in order to provide quick access to the cloud-based speech recognition system. The command issued by speech or touch on the smart device is transmitted via Wi-Fi router to the Mini ARM PC and passed to the Arduino UNO Microcontroller. The priority of the voice and touch screen control is FIFO (First In First Out); in our case, both controllers are active in parallel, which is also convenient for assisting the movement of the wheelchair remotely. In the development phase, a monitor (HDMI link) as well as a USB keyboard and a USB mouse could be connected to the Mini PC. The Logitech C210 onboard mounted camera with 640 × 480 resolution is connected via USB to the Mini PC. To control the camera motion on the z-axis (up and down movement), a Hextronic HXT900 9GR Micro Servo is applied on the pan/tilt AI frame, driven by an Arduino UNO microcontroller PWM (Pulse Width Modulation). The Mini PC is connected to the Arduino UNO microcontroller [17] via USB. The DC Motor Controller is driven by Digital output (DIG) from the Arduino unit controlling both DC Motors. The wheelchair has a 24 V battery on board for powering its motors and electronics. The power for the Mini PC and Arduino Microcontroller comes from the batteries of the wheelchair over the voltage control providing +5 V. Regulated 5V is provided by a Pololu BEC (Battery Eliminator Circuit) Step-Down Voltage Regulator D15V35F5S3. In Fig. 1, the border that determines the parts of the system that are on wheels,
USER SPEECH USER INPUT SMART DEVICE (PHONE, TABLET, TV, PC) TOUCH
PART OF THE SYSTEM THAT IS ON WHEELS
+5V
MINI PC
WIFI ROUTER SMART DEVICES (PHONES, TABLETS, TVS, PCS)
LOCAL MONITORING AND WHEELCHAIR CONTROL
ARM GK802 QUAD CORE LINUX UBUNTU
UTP
HDMI
LEFT DC MOTOR
CAM
VOLT REG
USB DIG USB
USB HUB
USB
USB MOUSE/KEYBOARD
CLOUD / INTERNET
MONITOR
DC MOTOR CONTROLER
ARDUINO UNO MICROCONTROLER
OPTIONAL I/O FOR DEVELOPMENT
PWM CAM. SERVO MOTOR
RIGHT DC MOTOR
CLOUD SPEECH/VOICE RECOGNITION SERVICE SMART DEVICES (PHONES, TABLETS, TVS, PCS)
WEB/INTERNET MONITORING AND WHEELCHAIR CONTROL Fig. 1. Overview of the cloud speech-controlled wheelchair system architecture.
Please cite this article as: A. Škraba et al., Speech-controlled cloud-based wheelchair platform for disabled persons, Microprocessors and Microsystems (2015), http://dx.doi.org/10.1016/j.micpro.2015.10.004
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94
ARTICLE IN PRESS
JID: MICPRO
[m5G;October 23, 2015;17:48]
A. Škraba et al. / Microprocessors and Microsystems xxx (2015) xxx–xxx
3
Table 1 Power consumption specification.
html5 / JavaScript ~ ECMASCRIPT
Input voltage [V]
Current [A]
Batt. capacity [Ah]
Peak current [A]
5.8
0.51
13.5
1.15
webkit Speech Recognition
socket.io
Firmata for node.js node.js HTTPS web server
Ubuntu 12.03 Linux OS USB link
mjpg streamer Apache http server
Mini ARM PC GK802 - HW
[18]. Node.js [19] provides an efficient, lightweight event-driven, nonblocking I/O model, which is suitable for data-intensive real-time applications that run across distributed devices. In our case, a secure HTTPS connection is provided. Socket.io [20] enables easy realization of the real-time functionality in every browser and mobile device [21], concealing the differences between the different transport mechanisms. It is implemented in JavaScript and intended to work in real-time. Speech recognition for the control of the platform is realized with Webkit Speech Recognition [22]. Communication between the Mini PC and Microcontroller Arduino UNO is enabled via Firmata [23]. On the Arduino UNO microcontroller, the firmware Firmata [24] is installed. Furthermore, the Firmata version for node.js is installed on the Mini PC. Average boot time of the system is approximately 2.5 min. The relatively long boot time is because almost fully fledged Linux Ubuntu 12.04 on an ARM processor with a complete GUI is booted. For the development purposes, the GUI was enabled for easier prototyping. In the case of production, the GUI of the operating system would be disabled, thus providing faster boot time.
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
Firmata
Arduino UNO HW
Actuators and Sensors - HW
Fig. 2. Overview of cloud wheelchair software architecture.
95 96 97 98 99 100 101 102 103 104 105 106 107
i.e. on the board of the wheelchair platform, is shown. The Firmata firmware library implements the Firmata protocol for communicating with software on the Mini PC host computer via USB. This provides the means of controlling and monitoring the wheelchair platform by several people simultaneously, using several smart devices locally as well as remotely over the internet connection. Connection to the cloud/Internet could also be established by the smart phone’s Long Term Evolution (LTE) internet connection. Fig. 2 shows the software stack that is needed in order to provide HTML5/JavaScript/ECMAScript functionality, which is shown at the top of Fig. 1. On the Mini PC, Ubuntu 12.04 Linux is installed with Apache HTTP web server software. The latter is needed only to provide live video streaming from the platform by MJPG streamer
3. Hardware realization
127
3.1. Realization of small prototype system
128
In the first phase, the prototype system was developed, as shown in Fig. 3 [25]. No. 1, Fig. 3 marks the camera. It is possible to mount several cameras for real-time MJPG video streaming via USB connection to the Mini PC No. 2, Fig. 3 presents one of the two continuous rotation servo-motors which enable the movement of the platform. No. 3. Fig. 3 (right part of Fig. 3) indicates the Mini ARM PC GK802, No. 4 the USB hub, while No. 5 indicates the Arduino UNO microcontroller. Table 1 shows the measured power consumption specification. The input voltage is measured at the time of the operation; therefore, it is slightly lower than expected from five 2700 mA h AA accumulators. The platform draws an average of I = 0.51 A. The battery pack provides enough power for more than 1 h of operation. The measured peak current was significantly higher at I = 1.15 A. Generally, the power consumption depends on the current operations on the Ubuntu Linux system and may vary significantly. In the full size prototype, the power supply is provided by the 24 V accumulator battery.
129
Fig. 3. Prototype wheelchair Platform: (1) video camera, (2) continuous rotation servo motor for camera control, (3) Mini PC ARM computer Miniand Zealz GK802, (4) USB HUB Blue Manhattan, and (5) Microcontroller board Arduino UNO SMD.
Please cite this article as: A. Škraba et al., Speech-controlled cloud-based wheelchair platform for disabled persons, Microprocessors and Microsystems (2015), http://dx.doi.org/10.1016/j.micpro.2015.10.004
130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146
ARTICLE IN PRESS
JID: MICPRO 4
[m5G;October 23, 2015;17:48]
A. Škraba et al. / Microprocessors and Microsystems xxx (2015) xxx–xxx Table 2 Power consumption specification of the full size prototype control system. Input voltage [V]
Current [A]
Batt. capacity [A h]
Peak current [A]
24
0.150
70
0.250
the DC motor controller, 3b) control unit inverter, 4) voltage divider for DC motor control, 4a) output to the DC motor controller, 5) BEC, 6) USB Hub Akasa, 6a) camera USB input, 6b) USB connection to Arduino UNO. The USB hub has additional two inputs available to connect the keyboard and mouse in the development phase. The GK802 Mini PC also has an HDMI connector available to connect to a monitor. Table 2 shows the measured power consumption specification. Input voltage in this case comes from the wheelchair’s 24V battery pack. The platform draws on average I = 0.150 A.
Fig. 4. Full size wheelchair prototype; (1) control board with GK802 ARM Mini PC and Arduino UNO, (2) mobile phone touchpad/speech control input, (3) DC Motors and battery pack, (4) and camera holder.
Fig. 5. Control part of the full size prototype.
147 148 149 150 151 152 153 154 155 156 157 158
3.2. Realization of full size prototype Fig. 4 shows the full size prototype, which is based on the Sunrise Mobility Quickie [26] wheelchair with a maximum payload of 113 kg. No. 1 indicates the Control board with GK802 ARM Mini PC and Arduino UNO, No. 2 indicates the mobile phone touchpad/speech control input, which replaces the ordinary joystick module, No. 3 the DC Motors and the battery pack, while No. 4 indicates the camera holder. Fig. 5 shows the control part of the full size prototype. The specific parts are as follows: 1) the GK802 ARM Mini PC with Ubuntu Linux 12.04 installed, and integrated WiFi, 1a) USB connection to the USB hub, 1b) Power +5 V, 2) Arduino UNO, 2a) USB connection to USB hub, 3) the power control of the Wheelchair DC motors, 3a) output to
159 160 161 162 163 164 165 166 167
4. Software
168
Fig. 6 shows part of the JavaScript code for the WebKit Speech API input field. When triggered, the JavaScript function is called, which interacts with the Arduino platform. The code is inserted in the .html file which is the response of the node.js server to the client(s). Fig. 7 shows part of the JavaScript code that responds to the input from the WebKit Speech API. If the user, for example, says “go”, the command, encoded with number 1, is send via socket.emit to the server residing on the GK802, and performs the motion command “forward”. As a consequence, the DC motors on the wheelchair platform execute the forward motion. The part of the code in Fig. 7 is also inserted in the .html file, which is the response of the node.js server to the client(s). Fig. 8 shows the JavaScript code interacting with the Arduino microcontroller, which is written in the main .js file on the Mini PC GK802. This part of the code gets the command number “1” via socket by node.js and socket.io module. Fig. 9 shows the part of the HTML5 code that provides the live streaming from the platform. The resolution is set to 640 × 480. The source of the streaming is send via IP address and HTML page. The video streaming on the server is provided by the MJPG streaming library. Fig. 10 shows corresponding code of the HTML page, which is sent to the Mini PC GK802 web folder (var/www) to provide the link for live web streaming in the previous part of the code. The wheelchair control is dependent on the existence of the connection with the client. For example, if the network connection is interrupted, the movement of the wheelchair is stopped. This is done with the following part of the code on the server, applying heartbeat check function (Fig. 11): The heartbeat of the connection is constantly monitored, in our case every half second (’heartbeat interval’). If the response has reached a timeout of 1 second (’heartbeat timeout’), the wheelchair stops (board.LOW). The functionality of monitoring the heartbeat of the client-server connection is part of the standard socket.io and node.js functionality. On the client side, the reconnect
169
Fig. 6. WebKit Speech Recognition API.
Please cite this article as: A. Škraba et al., Speech-controlled cloud-based wheelchair platform for disabled persons, Microprocessors and Microsystems (2015), http://dx.doi.org/10.1016/j.micpro.2015.10.004
170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203
JID: MICPRO
ARTICLE IN PRESS
[m5G;October 23, 2015;17:48]
A. Škraba et al. / Microprocessors and Microsystems xxx (2015) xxx–xxx
5
Fig. 7. JavaScript code for executing commands on Arduino microcontroller.
Fig. 8. JavaScript code which interacts with Arduino microcontroller.
Fig. 9. HTML5 code for live video streaming.
Fig. 10. Web page code on the GK802 server for live video streaming.
Fig. 11. Part of the code with heartbeat check for ensuring client-server network connection.
209
libraries are used in a similar manner to reestablish the connection after the network breakdown. The entire development of the control software was done in JavaScript/ECMA script, which provides several advantages over C/C++, etc., one of them being its relatively simple programming approach, which is appropriate for educational purposes.
210
5. Graphical user interface development (GUI)
211
Fig. 12 shows developed GUI of the prototype wheelchair speechcontrolled platform. The GUI is realized, as mentioned, with HTML5 and JavaScript/ECMA Script. One can observe that the IP is entered in the address bar of the web browser. In our case, the internal IP 192.168.1.103 is used over port 8080. The platform could also be accessed from the external IPs. No. 1 in Fig. 11 indicates the WebKit speech input, also marked with the microphone icon. The user should use the keywords in order to move the platform. No. 2 shows the live
204 205 206 207 208
212 213 214 215 216 217 218
camera streaming within the browser. At the same time, one could observe the same video stream on the mobile device, which is marked by No. 3. No. 4 shows the buttons, which also enable the control of the platform movement. The same interface is shown on the mobile device marked with No. 3.
219 220 221 222 223
6. Speech recognition system performance
224
In order to securely and accurately control the wheelchair, an accurate speech recognition system is needed. It could prove dangerous, for instance, if one could not stop the wheelchair securely in front of an obstacle, due to an unreliable speech recognition system and malfunction of autonomous wheelchair control. Therefore, we have tested Webkit Speech API [27], a JavaScript library that allows speech recognition and speech to text conversion, which is easy to add to the websites and is therefore appropriate for the implementation in the cloud. It is supported by the Google Chrome browser
225
Please cite this article as: A. Škraba et al., Speech-controlled cloud-based wheelchair platform for disabled persons, Microprocessors and Microsystems (2015), http://dx.doi.org/10.1016/j.micpro.2015.10.004
226 227 228 229 230 231 232 233
ARTICLE IN PRESS
JID: MICPRO 6
[m5G;October 23, 2015;17:48]
A. Škraba et al. / Microprocessors and Microsystems xxx (2015) xxx–xxx
100.00% 90.00%
7.00%
11.00%
89.00%
78.00%
16.00%
16.00%
81.00%
79.00%
3.00% Go
5.00%
4.00%
Stop
Le
Not recognized
Correct
3.00%
80.00% 70.00% 60.00% 50.00% 40.00%
84.00%
30.00% 20.00% 10.00% 0.00%
11.00%
13.00%
Right
Back
Incorrect
Fig. 13. Speech recognition results for English words.
100.00%
8.00%
90.00% 80.00%
26.00%
36.00%
43.00%
43.00%
57.00%
57.00%
Levo
Desno
70.00% 60.00% 50.00%
92.00%
40.00% 30.00% Fig. 12. (1) Voice recognition WebKit input field, (2) video streaming in browser GUI, (3) simultaneous video streaming on mobile device, and (4) keyboard for motion control which is also shown in (3).
234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261
20.00% 10.00% 0.00% Naprej
Table 3 English and Slovenian words for basic wheelchair control.
74.00%
64.00%
Stoj Correct
Operation
English word
Slovenian word
Move forward Move backwards Rotate left Rotate right Completely stop the movement
Go Back Left Right Stop
Naprej Nazaj Levo Desno Stoj
(version 25 onward). The Webkit Speech API supports speech recognition in various languages. We have tested it in two languages: English US (en-US) and Slovenian. The latter is currently not supported by the Webkit; therefore, we have utilized a similar language: Serbian (sr-RS). Because our intention is to use one-word commands and not complete sentences, the difference in grammar and most words between Slovenian and Serbian language is not significant. We have tested five words in both languages (Table 3) to perform the basic controls on the wheelchair, which are presented in Table 2. The test was performed by ten different persons, five male and five female, whose native language is Slovenian. The persons were from 20 to 50 years of age. Each word was pronounced 15 times by each person. We have measured how many times Webkit correctly or incorrectly recognized the word. The test was performed on a Samsung Galaxy S4 smartphone with Android 4.4.2 operating system installed and Google Chrome, browser version 38. The smartphone was held approximately 25 cm in front of the person. The test was performed on a Webkit integrated into our web application, where a correctly recognized word has started an operation on the wheelchair. The test was performed in a room without background noise. The results of speech recognition accuracy are presented in Figs. 13 and 14. In English language the words Go, Stop, Left, Right and Back were correctly recognized in 84.00%, 84.00%, 93.00%, 89.00% and 97.00% cases, respectively. The Slovenian words were recognized with lower ratios of correctness. The words Naprej, Stoj, Levo, Desno in Nazaj were correctly recognized in 64.00%, 92.00%, 57.00%, 57.00% and 74.00% cases, respectively. Surprisingly, the word Naprej, which does not exist in Serbian vocabulary, was more correctly recognized
Nazaj
Incorrect
Fig. 14. Speech recognition results for Slovenian words.
than words Levo and Desno which do exist in Serbian vocabulary. The words were incorrectly recognized as: Stoj – stoji, stoje ; Levo – leo, nemo, levom, in vivo, limo; Desno – besno, besmo, vesna, gde smo, bismo; Nazaj – y, ovaj, maglaj, naduvaj, na ovaj, na y, nadaj; Naprej – pre, sprej, na kraj, napred. To obtain higher speech recognition accuracy in the Slovenian language, we have added the “synonyms” for the words Levo, Desno, etc., i.e. the words that were most frequently mistaken for the aforementioned words in our previous analysis, to our speech recognition system. We have to point out that the original words and “synonyms” do not overlap for different operations; therefore, it is safe to include the “synonyms” since the same word or “synonym” would not perform a different wheelchair operation. The test was performed under the same conditions as before. The persons were pronouncing only the words Naprej, Stop, Levo, Desno and Nazaj. We measured the amount of correct operations on a wheelchair and the absence of operations due to not recognizing or falsely recognizing the words. The correctly recognized “synonym” would also start an operation on the wheelchair. The results have shown that if the “synonyms” were embedded into the speech recognition system, the system yields 100% accurate wheelchair operations.
262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282
7. Latency measurement
283
The response time of the wheelchair system to the user commands is a crucial factor for safety of the disabled person. To assess the response time of the wheelchair to the commands issued by the user via the system, we have measured the latency in two ways. First, we have measured the latency between the time when the user has tapped a certain button, e.g. to move the wheelchair forward, on the application screen and the time the relay on the electronic board has clicked. Second, we have measured the latency of the voice commands and the relay click.
284
Please cite this article as: A. Škraba et al., Speech-controlled cloud-based wheelchair platform for disabled persons, Microprocessors and Microsystems (2015), http://dx.doi.org/10.1016/j.micpro.2015.10.004
285 286 287 288 289 290 291 292
JID: MICPRO
ARTICLE IN PRESS
[m5G;October 23, 2015;17:48]
A. Škraba et al. / Microprocessors and Microsystems xxx (2015) xxx–xxx
7
Table 4 Latency between the button tap on the screen and relay click, start of voice command and relay click, and end of voice command and relay click.
Average latency [s] Std. deviation
Fig. 15. A scheme of the latency measurement setup with the following components: (1) WCS with relays, (2) laptop with Audacity software, (3) Android mobile device, (4) Y cable, and (5) microphones.
293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310
In order to perform long time span measurement, we have utilized two microphones that were connected to the Mic input of the laptop via the Y cable. The first microphone was used to record the user tap or the voice command, while the second was set close to the relays to record their clicks. The experimental hardware setup is presented in Fig. 15. The audio signal was recorded with the open source software Audacity, version 2.0.6, on a Windows 7 64-bit platform. The sound was sampled at 44.1 kHz with 32-bit precision. The example of the audio signal is shown in Fig. 16. The latency measurement results are shown in Table 4. We have measured three time spans: a) between the tap on the screen of the smartphone/tablet (button on the screen to produce wheelchair movement) and the relay click; b) between the start of the voice command and the relay click, and c) between the end of the voice command and the relay click. Each measurement was reproduced 10 times and the average time was calculated along with the standard deviation. The average latency between screen tap and relay click, start of voice command and relay click, and end of voice command
Tap – click
Start voice – click
End voice – click
0.388 0.005
1.886 0.055
1.310 0.047
and relay click is 0.388 s, 1.886 s and 1.310 s, respectively. The standard deviation is smaller at the tap-click measurement than at both voice-click measurements. This results from the variability of the time needed to pronounce the voice command, because each person (and also the same person) does not always pronounce the command in the same way. An approximate voice command recognition time can be calculated from Table 4 with equation:
tV R = tEC − tTC
311 312 313 314 315 316 317 318
(1)
where tVR is the voice command recognition time, tEC latency between end of voice command and click and tTC is the latency between the screen tap and relay click. Hence, with regard to Eq. (1), the approximate voice command recognition time is 0.922 s, which represents the time needed by Webkit Speech API to process the audio signal, recognize the speech and to report the recognition results back to the WCS. The voice control should be used only in full autonomous configuration since, for example, the time needed only to speak the word “stop” takes approximately 250 ms (without any recognition time). Within this interval, the wheelchair should have full autonomy enabled; in our case, the sensors should prevent collision, for example, since in 250 ms, travelling at a velocity of 3 km/h, the wheelchair would travel a distance of 21 cm.
Fig. 16. The example of the recorded audio signal where a voice command and a relay click sound are presented. (1) The start of the voice command, (2) the end of the voice command, and (3) the relay click.
Please cite this article as: A. Škraba et al., Speech-controlled cloud-based wheelchair platform for disabled persons, Microprocessors and Microsystems (2015), http://dx.doi.org/10.1016/j.micpro.2015.10.004
319 320 321 322 323 324 325 326 327 328 329 330 331 332
ARTICLE IN PRESS
JID: MICPRO 8
A. Škraba et al. / Microprocessors and Microsystems xxx (2015) xxx–xxx
Fig. 17. Controlling the wheelchair by mobile phone touch pad or speech.
333
8. Clinical testing
334
The usability of the applied speech recognition was tested in the University Rehabilitation Institute Soˇca with two patients. The main task was to determine the accuracy of the speech recognition and applicability for quasi-real time wheelchair motion control. The patients were given a task of issuing motion commands for forward, backward, left, right and stop. For the stop command, the word “house” was used since the accuracy of the recognition was higher than of the word “stop”, since both patients were native Slovene speakers. Fig. 17 shows the patient controlling the wheelchair by touch pad mobile phone, which replaces the joystick module. 1) control board with GK802 and Arduino UNO, 2) touch screen mobile phone as the control, 3) DC motorized wheels with battery pack, 4) camera stand with USB connection to the control board.
335 336 337 338 339 340 341 342 343 344 345 346 347
[m5G;October 23, 2015;17:48]
Table 5 shows the results of the training phase, where the patients issued 10 voice commands for each direction and stop. “0” indicates unsuccessful voice recognition, while “1” indicates successful voice recognition. The accuracy of issued voice commands in the training phase was 79%. After the training phase, the patient used the chair to test the usefulness of the voice control of the wheelchair. Table 6 shows the accuracy of the issued voice commands. Eighty-nine out of one hundred issued commands were recognized correctly (89%), which was significantly better than in the training phase (t-test = −1.94, p = 0.1, N1 = 100, N2 = 100) due to the feedback from the wheelchair platform. It should be taken into account that such an accuracy was achieved without any previous training with the native Slovene speakers using the English language.
348
9. Conclusions and further work
362
With the use of freeware and open source technologies, we provided the means to reduce the cost of the wheelchair platform solutions. The prototype solution provided is highly economical for the end users because the software is open source. At the same time, the speech recognition is sourced from the cloud. Another important economic factor is the ease of development since the hardware is programmed entirely in JavaScript/ECMA Script. The user interface has been implemented for the web browser and mobile platforms, which is user friendly. This also simplifies the development since one code for the GUI works for all platforms with an installed web browser. Due to the fact that the software is developed in JavaScript/ECMA Script, the technical customizations could be performed easily. The solution provides great support for “trial and error” learning methods. The platform could work autonomously for a considerable amount of time, depending only on the capacity of the batteries. The two phase development, where in the first phase a small prototype with all functionalities was developed and in the second phase, a full size prototype was prepared, was shown to be appropriate. This approach is also suitable for educational purposes.
Table 5 Accuracy of the speech recognition in the training phase. Trial no. 1 2 3 4 5 6 7 8 9 10
1st patient
2nd patient
Go
Left
Right
Back
Stop (house)
Go
Left
Right
Back
Stop (house)
0 1 0 0 0 1 0 0 0 1
1 1 0 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
0 0 0 1 1 0 1 1 1 1
0 0 0 0 0 0 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
0 0 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 0 1 1
1 1 1 1 1 1 1 1 1 1
Table 6 Accuracy of the speech recognition in the “drive” phase. Trial no. 1 2 3 4 5 6 7 8 9 10
1st patient
2nd patient
Go
Left
Right
Back
Stop
Go
Left
Right
Back
Stop
1 1 1 1 1 1 1 0 0 1
1 1 1 0 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
0 0 1 1 0 0 0 1 1 0
1 1 1 1 1 1 1 0 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 0 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
Please cite this article as: A. Škraba et al., Speech-controlled cloud-based wheelchair platform for disabled persons, Microprocessors and Microsystems (2015), http://dx.doi.org/10.1016/j.micpro.2015.10.004
349 350 351 352 353 354 355 356 357 358 359 360 361
363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381
Q3
JID: MICPRO
ARTICLE IN PRESS
[m5G;October 23, 2015;17:48]
A. Škraba et al. / Microprocessors and Microsystems xxx (2015) xxx–xxx
406
The disadvantages of the proposed approach is hidden in the developer’s inflexibility for technical innovations, especially adopting the concept of developing control software in JavaScript/ECMA script despite the fact, for which the cloud provides new possibilities. Another caveat might be the lack of technical support for the open source solutions. Clinical testing showed that the patient training at the usage of speech recognition system is important to improve accuracy. The provided motion feedback from the wheelchair platform to the patient increased the accuracy of the speech recognition. Successful clinical testing proves that the proposed design is suitable for the wheelchair speech-based control. Latency measurement provides the indication of what should be expected from the proposed system design. According to the measurement, the response is adequate for the control of an autonomous wheelchair platform. The future challenges are open due to the fact that the HTML5/Javascript/ECMA Script landscape is rapidly evolving [28,29]. The potential future fields of use are education, transport, industry, etc. One important educational aspect is in the fact that students are often eager to put the just learned knowledge into their practice, if not immediately; as quickly as possible. In their own words, they would immediately like to see their creations “dance and sing”. Any longer delay in delivering the hands-on experience only builds up their frustrations and disappointments [15].
407
Acknowledgment
408
413
The research is supported by the Slovenian Research Agency (ARRS) (Research program “Decision support systems in electronic commerce”, Program no.: UNI-MB-0586-P5-0018). We would like to express our gratitude to Mr. Matjaž Bartol and Mr. Klemen Kramžar for their helpfulness and good will to participate in the clinical testing of the new technology.
414
References
415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452
[1] A. Zupan , Sophisticated wheechairs, Rehabilitacija (Rehabil. J.) 6 (Suppl. 1) (2007) S15–S18 Rehabilitation Institute – Soˇca, University of Ljubljana, Linhartova 51. 1000 Ljubljana. [2] C. A. Rockey, E. M. Perko, and W. S. Newman, “An evaluation of low-cost sensors for smart wheelchairs,” in: Proceedings of the 2013 IEEE International Conference on Automation Science and Engineering (CASE), pp. 249–254 [3] A. Escobedo, A. Spalanzani, C. Laugier, Multimodal control of a robotic wheelchair: using contextual information for usability improvement, in: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Tokyo, Japan, November 3–7, 2013, pp. 4262–4267. [4] A.R. Trivedi, A.K. Singh, S.T. Digumarti, D. Fulwani, S. Kumar, Design and implementation of a smart wheelchair, in: Proceedings of the Proceedings of Conference on Advances in Robotics, AIR ’13, 2013, pp. 1–6. [5] R.C. Simpson, Smart wheelchairs: a literature review, J. Rehabil. Res. Dev. 42 (4) (2005) 423–436. [6] A. Ruíz-Serrano, R. Posada-Gómez, A. Martínez Sibaja, G. Aguila Rodríguez, B.E. Gonzalez-Sanchez, O.O. Sandoval-Gonzalez, Development of a dual control system applied to a smart wheelchair, using magnetic and speech control, Proc. Technol. 7 (2013) 158–165. [7] F. Doshi, N. Roy, Spoken language interaction with model uncertainty: an adaptive human–robot interaction system, Connect. Sci. 20 (4) (2008) 299–318 Special Issue: Language and Robots, doi:10.1080/09540090802413145. [8] S. Hemachandra, T. Kollar, N. Roy, S. Teller, Following and interpreting narrated guided tours, in: Proceedings of the 2011 IEEE International Conference on Robotics and Automation (ICRA), 2011, pp. 2574–2579, doi:10.1109/ICRA.2011. 5980209. [9] H. Christensen, I. Casanuevo, S. Cunningham, P. Green, T. Hain, HomeService: voice-enabled assistive technology in the home using cloud-based automatic speech recognition, in: Proceedings of the 4th Workshop on Speech and Language Processing for Assistive Technologies SLPAT 2013, Grenoble, France, 21–22 August, 2013, 2013, pp. 29–34. [10] Sphinx, http://cmusphinx.sourceforge.net/ (accesed, 12.12.2014). [11] Q. Liang and Z. Haidong, “Design of HTML5-based distributed simulation application platform,” in: Proceedings of the 3rd International Conference on Consumer Electronics, Communications and Networks (CECNet), 2013, pp. 262–264 [12] J. Shi, J. Wan, H. Yan, H.A Suo, Survey of cyber-physical systems, in: Proceedings of the International Conference on Wireless Communications and Signal Processing (WCSP), Nanjing, China, 2011, pp. 1–6.
382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405
409 410 411 412
Q4
9
[13] G. Zhu, F. Zhang, W. Zhu, Y. Zheng, HTML5 based media player for real-time video surveillance, in: Proceedings of the 5th International Congress on Image and Signal Processing (CISP 2012) IEEE, 2012, pp. 248–254. [14] E Stojmenova, M Debevc, L Zebec, B Imperl, Assisted living solutions for the elderly through interactive TV, Multimed. Tools Appl. Arch. 66 (1) (2013) 115–129. [15] “Why Intel loves HTML5,” http://software.intel.com/en-us/videos/why-intelloves-html5 (accesed 09.03.14) 2014. [16] Ubuntu. http://www.ubuntu.com/ (accessed 12.12.14) 2014. [17] Arduino, http://www.arduino.cc/ (accessed 09.03.14), 2014. [18] mjpg http://sourceforge.net/projects/mjpg-streamer/ (accessed 12.12.14) 2014. [19] node.js, http://nodejs.org/ (accessed 09.03.14) 2014. [20] socket.io, http://socket.io/ (accessed 09.03.14) 2014. [21] S. Koceski, N. Koceska, I. Kocev , Design and evaluation of cell phone pointing interface for robot control, Int. J. Adv. Robot. Syst 9 (2012) 135. [22] WebKit, http://www.webkit.org/ (accessed 09.03.14) 2014. [23] Firmata, https://github.com/jgautier/firmata (accessed 09.03.14) 2014. [24] Firmata Arduino, http://arduino.cc/en/reference/firmata#.Ux0VOD95Pmc (accessed 09.03.14) 2014. [25] A. Skraba, A. Kolozvari, D. Kofjac, R. Stojanovic, Prototype of speech-controlled cloud-based wheelchair platform for disabled persons, in: Proceedings of the 2014 3rd Mediterranean Conference on Embedded Computing (MECO), 2014, pp. 162–165. [26] Quickie www.sunrisemedical.com (accessed 12.12.14) 2014. [27] https://dvcs.w3.org/hg/speech-api/raw-file/9a0075d25326/speechapi.html [28] J.M.P. Cardoso, T. Carvalho, J.G.F. Coutinho, R. Nobre, R. Nane, P.C. Diniz, Z. Petrov, W. Luk, K. Bertels, Controlling a complete hardware synthesis toolchain with LARA aspects, Microprocess. Microsyst. 37 (8) (2013) 1073—1089 Part C, November 2013, Special Issue on European Projects in Embedded System Design: EPESD2012. [29] Y.K. Lai, C.C. Lee, B.H. Huang, T. Wellem, N.C. Wang, T.Y. Chou, H.T. Nugroho, Realtime detection of changes in network with OpenFlow based on NetFPGA implementation, Microprocess. Microsyst. 38 (5) (2014) 431–442, doi:10.1016/j.micpro. 2014.04.005.
453 454 455 456 457 458 459 460 461 462 463 464 465Q5 466 467 468 469 470 471 472 473 474 475 476Q6 477 478 479 480 481 482 483 484 485
Andrej Škraba obtained his Ph.D. in the field of Organizational Sciences – Informatics from the University of Maribor. He works as an associate professor and researcher in the Cybernetics & Decision Support Systems Laboratory at the Faculty of Organizational Sciences, University of Maribor. His research interests cover systems theory, modeling and simulation, cyber-physical systems and decision processes. So far his work has been published in the following peer reviewed journals: Simulation, System Dynamics Review, Journal of Mechanical Engineering, Computers and Electronics in Agriculture, Kybernetes and Group Decision and Negotiation. In the course of doctoral and post-doctoral studies he successfully completed three distance-learning courses on System Dynamics from Massachusetts Institute of Technology, Center for Advanced Educational Services and Worchester Polytechnic Institute. In 2014 he acquired additional knowledge in the field of System Dynamics at the University of Bergen where he has successfully completed course GEO-SD302. He is a member of System Dynamics Society, INFORMS and SLOSIM.
486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503
Radovan Stojanovi´c obtained Dipl. Ing. degree from University of Montenegro and Ph.D. from University of Patras, Greece in field Electrical Engineering and Computer Engineering. Currently he is a full professor at University of Montenegro where leads the Applied Electronics Centre. He was/is a leader of numerous EU, NATO, Bilateral and National programs and an author of more than 150 publications in international monographs, journals, conference and workshop proceedings. He is a member of the Board of Montenegrin Academy of Science for Natural and Technical Sciences, Think Tank Team of the Ministry of Science of Montenegro, Senate of University of Montenegro, representative of Montenegro in H2020-ICT Committee as well as a President of the Montenegrin Association of New Technologies (MANT) and the establisher and Chairman of the Mediterranean conference on embedded systems (MECO). Also, he is Montenegrin Director of EuroMicro. At Montenegro he established Centre for Applied Electronics, 2003, Centre for Biomedical Engineering (BioEMIS), 2014 and Centre for Simulation of Disasters (GEPSUS), 2014.
504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521
Anton Zupan graduated in Medicine at the University of Ljubljana (Slovenia) in 1980. He is a physician, specialist in physical and rehabilitation medicine and specialist in paediatrics. He works at the University Rehabilitation Institute in Ljubljana as head of the Rehabilitation Engineering Department and head of the Unit for the rehabilitation of patients with neuromuscular diseases. From 1994 he is Associate Professor of physical and rehabilitation medicine in the Faculty of Medicine of the University of Ljubljana. Dr. Zupan is an experienced clinician and scientist. He leaded and participated in several national and international projects as well as conducted several clinical trials investigating rehabilitation programs.
522 523 524 525 526 527 528 529 530 531 532 533 534
Please cite this article as: A. Škraba et al., Speech-controlled cloud-based wheelchair platform for disabled persons, Microprocessors and Microsystems (2015), http://dx.doi.org/10.1016/j.micpro.2015.10.004
JID: MICPRO 10
535 536 537 538 539 540 541 542
ARTICLE IN PRESS
[m5G;October 23, 2015;17:48]
A. Škraba et al. / Microprocessors and Microsystems xxx (2015) xxx–xxx Andrej Kolozvari obtained his M.Sc. in the field of Organizational Sciences – Informatics from the University of Maribor. He has extensive industrial experience in the field or measurement, monitoring and control from Iskra Kibernetika Inc. He is currently a Ph.D. student at the University of Maribor, Faculty of Organizational Sciences working on his thesis in the field of Cyber-Physical systems in education.
Davorin Kofjaˇc received his Ph.D. from the University of Maribor in the field of Management of Information Systems. He has published several papers in international conferences and journals, and has been involved in many national research projects. Currently, he is working as a researcher at the same university in the Cybernetics & Decision Support Systems Laboratory. His research interests include modeling and simulation, artificial intelligence and operational research.
Please cite this article as: A. Škraba et al., Speech-controlled cloud-based wheelchair platform for disabled persons, Microprocessors and Microsystems (2015), http://dx.doi.org/10.1016/j.micpro.2015.10.004
543 544 545 546 547 548 549 550 551