Available online at www.sciencedirect.com
ScienceDirect
This space is reserved for the Procedia header, do not use it This Procedia space isComputer reserved for 108C the Procedia header, do not use it Science (2017) 1602–1611 This space is reserved for the Procedia header, do not use it
International Conference on Computational Science, ICCS 2017, 12-14 June 2017, Zurich, Switzerland
Human Identification and Localization by Robots in Human Identification and Environments Localization by Robots in Collaborative Human Identification and Localization by Robots in Collaborative Environments Craig C. Douglas1 and Robert A. Lodder2 Collaborative Environments 1 2 Craig C. Douglas and Robert A. Lodder 1 University of Wyoming, Laramie, WY, U.S.A. 1 Craig C. Douglas and Robert A. Lodder2 1
[email protected] 2 1 2 2
University of Wyoming, Laramie, WY, U.S.A. University of Kentucky, Lexington, KY, U.S.A.
[email protected] University of Wyoming, Laramie, WY, U.S.A.
[email protected] University of Kentucky, Lexington, KY, U.S.A.
[email protected] [email protected] of Kentucky, Lexington, KY, U.S.A.
[email protected]
Abstract Environments in which mobile robots and humans must coexist tend to be quite dangerous Abstract to the humans. have resorted to separating the two robots Environments inMany whichemployers mobile robots and humans must coexist tendgroups to be since quite the dangerous Abstract move quickly and do not maneuver around humans easily resulting in human injuries. In this to the humans.inMany have resorted to separating the two robots Environments whichemployers mobile robots and humans must coexist tendgroups to be since quite the dangerous paper we provide a roadmap towards being able to integrate the two worker groups (human move andMany do not maneuver around humans easily resulting in groups human since injuries. this to thequickly humans. employers have resorted to separating the two the In robots and toand increase efficiency and safety. paper we provide a roadmap towards being able to easily integrate the two worker injuries. groups (human moverobots) quickly do notboth maneuver around humans resulting in human In this Improved to both robotefficiency communication andtocollaboration in multiple and robots) tohuman increase and safety. paper we provide a roadmap towards being able integrate thehas twoimplications worker groups (human applications. For example: (1) Robots that manage all aspects of dispensing items (e.g., drugs in to both robotefficiency communication and collaboration has implications in multiple andImproved robots) tohuman increase and safety. pharmacies supplies tools in a remote workplace), reducing human errors. Dangerous applications. example: (1) Robots that manage aspects of dispensing items(2) (e.g., drugs in ImprovedorFor human toand robot communication and all collaboration has implications in multiple location capable robots that triage injured subjects using remote sensing of vital signs. pharmacies supplies and (1) tools in a remote workplace), reducing human errors. Dangerous applications.orFor example: Robots that manage all aspects of dispensing items(2) (e.g., drugs(3) in ’Smart’ crash carts that move themselves to a required location in a hospital or in the location capable robotsand that triage subjects using remote sensing of vital signs.field, (3) pharmacies or supplies tools in a injured remote workplace), reducing human errors. (2) Dangerous help dispense drugs and that tools, save time and money, and prevent ’Smart’ crash carts that movetriage themselves tosubjects a required location insensing a hospital or insigns. the field, location capable robots injured using remoteaccidents. of vital (3) help dispense drugs and tools, save time and money, and prevent accidents. ’Smart’ crash carts that move themselves to a required location in a hospital or in the field, Keywords: robotics,Published human interactions, intelligent data assimilation © 2017 The Authors. by Elsevier B.V. Peer-review under responsibility of thesave scientific committee of the assimilation International Conference on Computational Science help dispense drugs and tools, time and money, and prevent accidents. Keywords: robotics, human interactions, intelligent data Keywords: robotics, human interactions, intelligent data assimilation
1 Introduction 1 Introduction There is an obvious need to improve the verbal and nonverbal interaction between humans 1 Introduction and robots, between humans and multiple robots, and to improve productivity and safety
There is an obvious need to improve the verbal and nonverbal interaction between humans metrics human-robotic fields. In the past,verbal human communication with robots generally only and between need humans and multiple robots, and to improve productivity andhumans safety Thererobots, isinan obvious to improve the and nonverbal interaction between involved specialized engineers providing direct command-only dialogue. As robots move into metrics in human-robotic fields. In the past, human communication with robots generally only and robots, between humans and multiple robots, and to improve productivity and safety various professional fields, fluid communication with nonexperts is needed to allow for effective involved specialized engineers providing direct command-only dialogue. As robots move into metrics in human-robotic fields. In the past, human communication with robots generally only collaborative human-robot interaction. Humans work with humans, robots work with robots, but various professional fields, fluid communication with nonexperts is needed to allow for effective involved specialized engineers providing direct command-only dialogue. As robots move into human-robot collaboration is rare. Human identification and localization (HIL) by robots helps collaborative human-robot interaction. Humans work with humans, robots work with robots, but various professional fields, fluid communication with nonexperts is needed to allow for effective facilitate collaborative interaction. Fluid communication requires robots to separate signals in human-robot collaboration is rare. Human identification and localization (HIL) by robots helps collaborative human-robot interaction. Humans work with humans, robots work with robots, but multisensory environments. Bayesian inference methods and Laplacian mixture models with facilitate collaborative interaction. Fluid communication requires robots to separate signals in human-robot collaboration is rare. Human identification and localization (HIL) by robots helps nested sampling to infer the sources can be studied to determine the most effective Humanmultisensory environments. Bayesian inference methods and Laplacian mixture models with facilitate collaborative interaction. Fluid communication requires robots to separate signals in Robot Interaction system. nested sampling to(HRI) infer communication the Bayesian sources can be studied to determine the most effective Humanmultisensory environments. inference methods and Laplacian mixture models with Robot Interaction (HRI) communication system. nested sampling to infer the sources can be studied to determine the most effective Human1 Robot Interaction (HRI) communication system. 1 1 1877-0509 © 2017 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the scientific committee of the International Conference on Computational Science 10.1016/j.procs.2017.05.229
Craig C. Douglas et al. / Procedia Computer Science 108C (2017) 1602–1611 Human Identification and Localization by Robots Douglas and Lodder
Figure 1: Spiral development model Revolutionary, versus evolutionary, development in human robot collaboration can be accomplished using commercial off the shelf (COTS) low technology hardware and software. Speed and nimbleness are a secondary benefit. This philosophy contradicts the standard government supported project philosophy of high cost, specialized components. The hardware basis of the system can be an Amazon Alexa based device (Echo or Dot) [1] for HIL by audio, a smartphone and a SeekThermal infrared camera [16] for vision, Google TensorFlow [15] for artificial intelligence, a common robotic platform based on the open source Robot Operating System (ROS) [14], and a mechanical platform like the Double Robotics Telepresence Robot [13]. The software and hardware for the system is engineered using a spiral development model [4] (see Fig. 1). The spiral model is a risk driven process model generator for engineering projects. Spiral development is a group of technology development processes typified by iterating a set of elemental development processes and managing risk so risk is being reduced actively. Based on the unique risk patterns of a given project, the spiral model guides a team to adopt elements of one or more process models, such as incremental, waterfall, or evolutionary prototyping. A project begins as a paper concept (e.g., in Gazebo [12]) and then develops through a series of standard design reviews and prototypes into a fully functioning robot capable of interacting with humans to accomplish tasks. Spiral development advantages include: • Flexibility for change as more data are gathered on the developing system to produce a product more likely to meet users’ needs. • Identification of critical issues relating to the system’s effectiveness discovered rapidly by prototyping. The applications of software such as TensorFlow [15] is arguably unlimited. TensorFlow is 2
1603
1604
Craig C. Douglas et al. / Procedia Computer Science 108C (2017) 1602–1611 Human Identification and Localization by Robots Douglas and Lodder
Figure 2: TensorFlow diagram a machine learning system used especially for language and visual understanding by artificial intelligence, key elements in HIL. TensorFlow is an interface system capable of allowing access to multiple CPU cores or GPU cards to instantaneously allow a controller access to multiple devices and allowing these devices to interact, talk, and begin to recognize or understand all of the visual and sound data inputs without specific programming needed for that individual data. Fig. 2 depicts tensors flowing through a graph. Nodes are assigned to computational devices and execute asynchronously and in parallel once all the tensors on their incoming edges become available. The remainder of this paper is organized as follows. Section 2 describes prior research in robotics in human identification and localization. Section 3 describes some examples where a COTS approach can make an immediate impact in the workplace or in the field. Section 4 describes some COTS and open source components needed to cost effectively move robot-human environments forward in a short period of time. Section 5 describes possible future research directions and some conclusions based on current work.
2
Robotic Research in HIL
HIL is an important problem for HRI [10, 5, 3]. Acoustic source localization has many applications in communication, including HRI, video conferencing, and noise reduction. Direct localization approaches are based on calculating a cost function over a set of prospective locations and taking the most probable source positions, while the indirect approaches are built upon the estimation of relative signal time delays to project the direction-of-arrival (DOA) and sometimes the position of the source. Most conventional approaches are based on the general3
Craig C. Douglas et al. / Procedia Computer Science 108C (2017) 1602–1611 Human Identification and Localization by Robots Douglas and Lodder
Figure 3: Average errors in position as a function of signal-to-noise ratio.
ized cross correlation (GCC) method that calculates the correlation function using the inverse Fourier transform of the cross-power spectral density function multiplied by an appropriate weighting function [6]. Various microphone array configurations may be employed, and typically the performance in DOA estimation improves with the number of microphones. Most microphone arrays have at least 3 microphones. GCC-PHAT (phase transformation) is the most popular variant of the GCC method and is said to improve the GCC algorithm in a milieu with high reverberation time when the noise is low [17]. Typically GCC-PHAT has been applied to localization of a single sound source and the multiple sound source case has been mostly overlooked in the literature. The number of sound sources is almost always a set parameter. Realistically, it would be better to have a multiple source construct to put a histogram-based GCC-PHAT localization algorithm into practice for an unknown number of sound sources. An extension of the Bayesian inference method for speech localization based on nested sampling to deduce the number of sound sources and their direction of arrival was published in [6]. In that paper a Laplacian mixture was used to model the delay histogram created by the GCC-PHAT algorithm. The number of Laplacian functions sets the number of sources and their direction of arrivals. Their work shows that the source location with an unknown number of sound sources requires 2 levels of inference that complement the histogram-based GCC-PHAT localization algorithm and enable it to be extended for an unknown number of sound sources. HRI should be rapid as well as accurate in commercial environments. A significant part of Bayesian statistics is the approximation of posterior probability. A posterior probability distribution exists in any unknown quantity in a study that is derived from other data obtained from the study (e.g., each new sample chosen for analysis is dependent upon the previous sample). Using this approach, an unknown variable can be estimated by using Bayes theorem. Bayes theorem with nested sampling allows for the measurement of multimodal posteriors. Two microphones have previously been used to allow for localization of sound in a noisy milieu with signal to noise ratio (SNR) values of 20 dB [6] (see Fig. 3). Incorporating this approach into our HIL library, a robot can be designed using COTS microphones capable of determining the human source of communication. The examples in Section 3 outline a few of the ambiguity problems that humans deal with daily. Ambiguity is common in communication between humans. Determining the source of communication can help robots to determine what their response should be. Finding the edge of the sound is a principle in determining its exact location. In order for a robot to locate and identify a speaker, the robot must eliminate the other noise in the environment, including white noise. GCC-PHAT algorithms are used in areas of high reverberation time with low noise presence and are capable of being used over an unknown 4
1605
1606
Craig C. Douglas et al. / Procedia Computer Science 108C (2017) 1602–1611 Human Identification and Localization by Robots Douglas and Lodder
Figure 4: Lab robots must work quickly in close proximity to busy people. number of sources. The time delay of the noise, estimated through GCC-PHAT, allows for the estimation of the associated angle of the sound for every time frame and is used to construct a histogram of the location from which the sound originated. Using this histogram (H), the Bayesian inference can construed. Posterior probability distribution is calculated using the following: p(D | Θ, H, I)p(Θ | H, I) . p(Θ | H, D, I) = P (D | H, I)
3
HIL Examples
In some situations robots are more capable of functioning rapidly for many hours in hostile environments with fewer errors than humans and do this more cost effectively. Effective communication is needed for productive collaboration between the robot and human to ensure the most productive robotic environment. We consider three environments where HIL research is important.
3.1
Humans and Robots in a Lab
In this scenario, humans and robots are working side by side in a lab (see Fig. 4). In real life, robots are already working in labs where humans also work, but there is little interaction between the groups (human and robot). The HIL problem is an important reason collaboration fails. To understand the nature of the HIL problem, visualize what happens when a human needs to walk down an aisle past a robot in a tight laboratory to retrieve a bottle on a shelf. The human says, “Excuse me.” If the robot in the way were a human, that one human might look over his or her shoulder and then take a step to the right. Current lab robots would either ignore the human or all of the robots in the lab might respond by taking one step to the right. That may not be a problem for the robot obstructing the aisle, but may be for the other robots working in that lab. One might be filling a beaker with concentrated acid when it suddenly moves one step to the right. Another might be weighing something important. In either case, 5
Craig C. Douglas et al. / Procedia Computer Science 108C (2017) 1602–1611 Human Identification and Localization by Robots Douglas and Lodder
Figure 5: Triage operations can be enhanced by measuring the heart rate, breathing rate, and blood oxygenation at a distance. the net result is undesirable, and in one case the result is actually dangerous. Robots need to know, without a lot of detailed coding from humans for each task, if a human is addressing them, and which humans are addressing them, in order to respond to those particular humans. The human localization problem is just beginning to be solved by systems like that incorporated in the Amazon Echo, which is able to listen to free speech in a room and respond appropriately within a number of constraints.
3.2
Triage
Triage operations are difficult to conduct in a firefight (battleground), disaster (natural or manmade), or calamity (workplace casualties). However, triage must be performed in order to prioritize the recovery of the injured who can be saved with treatment, but who will not be saved without immediate attention. People who are already dead, who cannot be saved under any circumstances, or will survive their injuries for hours or even days are lower priority. Fig. 5 demonstrates that it is possible to measure heart rate and breathing rate at a distance. Measuring blood oxygenation at a distance gives an even better indication of the shock status of the injured [11]. Robots are by nature better suited than humans for conducting these remote sensing operations during chaotic conditions. HIL allows the robot to follow commands, identify the needy, and assess their status for triage.
3.3
Crash Cart
A crash cart is a mobile inventory of emergency medications and equipment that could be needed at the site of a medical emergency or surgery for life support protocols (see Fig. 6). 6
1607
1608
Craig C. Douglas et al. / Procedia Computer Science 108C (2017) 1602–1611 Human Identification and Localization by Robots Douglas and Lodder
Figure 6: Lifebot virtual ambulance [9]. When a person crashes (or codes in medical jargon), these carts are wheeled to the bedside to be used by physicians, nurses, or any other trained personnel to return the patient to stable vital signs. Due to the nature of many of the medications dispensed from these carts, a number of them are potentially very dangerous if not used in the proper scenario and in the proper amounts. The code response system is subject to random human errors that can potentially be partially alleviated with the introduction of computer and robotic controls in the process. Suppose the crash cart is a smart cart (e.g., connected to the Cloud), it then provides the correct medication based on the patient’s status, or to the healthcare professionals following voice command, allowing the healthcare professionals to be the second check of the accuracy of the dispensed medication. A robotic crash cart decreases the probability that the a healthcare professional would pull an item from the wrong drawer or bin. Integrated sensors can also check whether drawers and bins are mislabeled for the medication that they contain. A code event is an extremely hectic situation, often with over 10 respondents who must act quickly, and such environments can multiply the probability of otherwise low probability errors. A smart cart potentially alleviates some or most of this inherent error. Combining this example with the one in Section 3.1, it becomes possible to send the cart to the site of the code without human assistance, where it can meet the healthcare professionals, and offer intelligent and calmly based medical advice.
4
Human Identification and Localization Components
In this section we identify hardware and software that is either open source, inexpensive, or commonly already in use.
4.1
Hardware
The Double Telepresence Robot is mobile, remotely controlled by people or computer, has a long battery life, and has a design simple enough to allow dynamic modifications based on 7
Human Identification and Localization by Robots Douglas and Lodder Craig C. Douglas et al. / Procedia Computer Science 108C (2017) 1602–1611
(a) Double Telepresence robot.
(b) Amazon Echo 7 microphone array. (6 on edge, 1 in center)
(c) SeekThermal image.
Figure 7: Some HIL components. the environment (see Fig. 7(a)). It already has various apps, e.g., a QR Code Reader to read badges. By reading badges, the robot identifies and determines the likelihood that a person will provide valid instructions. The camera app can be used to identify items in the robot’s environment that might be hazardous to itself and should be avoided. Remote access allows the robot to be driven in a very precise line allowing for quick access to stored items or the injured in the case of a calamity. A robot can provide some continuity through three daily shifts of workers once the HRI is sufficiently well developed. The robot can take a quick look at injuries and use remote biosensing of breathing rate, heart rate, and blood oxygenation for triage. Approximately 86% of all battlefield deaths occur within the first thirty minutes of injury [2], emphasizing the need for reducing the time between injury and triage plus treatment. An Amazon Echo (or a similar device) has an array of microphones that can be used to locate voices after automatic activation (see Fig. 7(b)). It has a significant development library [1] that is freely available. An iPhone hosts a SeekThermal infrared camera, which helps the system to identify humans, distinguish which speaker in the room is speaking (see Fig. 7(c)), if that speaker is oriented and speaking in the direction of the robot, and how many speakers are present in the room and must accounted for. For calamity examples, bodies of the unconscious injured hidden by debris or other casualties can be quickly identified and provided assistance. We also need a semiconductor charge-coupled device [7] analog sensor capable of converting light waves to images to remotely capture the blood oxygen levels of injured.
4.2
Software
The Robot Operating System (ROS) [14] is a flexible framework for writing robot software. It is a collection of tools, libraries, and conventions that aim to simplify the task of creating complex and robust robot behavior across a wide variety of robotic platforms. The ROS designers encourage shared, collaborative robotics software development. 8
1609
1610
Human Identification and Localization by Robots Douglas and Lodder Craig C. Douglas et al. / Procedia Computer Science 108C (2017) 1602–1611
Gazebo is an open source 3D multi-robot simulator for rapidly testing algorithms, designing robots, and performing regression testing using realistic scenarios. It can accurately and efficiently simulate groups of robots, objects, and sensors in complicated indoor and outdoor environments. It generates realistic sensor feedback and physically plausible interactions between objects through its accurate simulation of rigid body physics. It provides high quality, easy to use graphics and programmatic and graphical interfaces. TensorFlow can be used to analyze data from all audio and video sensors, make sense of the data, and take action based upon the data. It works with the Mixed National Institute of Standards and Technology (MNIST) data and utilizes softmax regression to accomplish its objectives. It can be used for visual identification and audio recognition. It comes with easy to use Python and C++ interfaces to build and execute computational graphs. Users can write standalone TensorFlow Python or C++ programs, or test things in an interactive TensorFlow iPython notebook where notes, code, and visualizations can be kept logically grouped. It runs on CPUs or GPUs, and on desktop, server, or mobile computing platforms. It is good for machine learning and can scale-up and train a model built on a laptop model faster on GPUs with no code changes. It can run the model as a service in the cloud. It is not a rigid neural networks library. If a computation can be expressed as a data flow graph, it can be done in it. The user constructs the graph and writes the inner loop that drives computation. It provides helpful tools to assemble subgraphs common in neural networks, but users can write their own higher-level libraries on top of it.
4.3
The Design Flow
A formal systems engineering process is the best way to control costs and assure that the mission objectives are achieved. Having two independent teams (one to propose extensions, the other to give feedback) speeds up development. Formally, the HIL system development process is a flow like
5
Conclusions and Future Work
In this paper, we outlined how using commercial off the shelf low technology hardware and software can be used to add speed and nimbleness to designing and implementing better robothuman co-existent environments, which contradicts the standard government supported multiyear project philosophy of high cost, specialized components. We provided three concrete example environments where improvements in this field are essential. Finally, we described existing hardware and software that meets our criteria. In the future, we intend to implement the ideas on similar examples. A key area is with NASA exploration projects [8]. It is currently impractical for humans to explore comets or planets such as Mars. However, people interact with rovers on Mars with a significant time 9
Human Identification and Localization by Robots Douglas and Lodder Craig C. Douglas et al. / Procedia Computer Science 108C (2017) 1602–1611
delay. NASA requires components that have a long term proven record of reliability, not state of the art, just invented devices. COTS devices and long time existing open source software offer effective components for such projects.
Acknowledgments This research was supported in part by the National Science Foundation grants ACI-144061 and ACI-1541392 and the Research Institute of Mathematical Sciences, Kyoto University.
References [1] Amazon. Alexa based devices. https://alexa.amazon.com, last viewed 1/31/2017. [2] U.S. Army. Combat casualty care research program (CCCRP). http://mrmc.amedd.army.mil/ index.cfm?pageid=medical_r_and_d.ccc.overview, last viewed 1/30/2017. [3] C. Blandin, A. Ozerov, and E. Vincent. Multi-source TDOA estimation in reverberant audio using angular spectra and clustering. Signal Processing, 92:1950–1960, 2012. [4] B. W. Boehm. A spiral model of software development and enhancement. Computer, 21:61–72, 2988. [5] J. Chen, J. Benesty, and Y. Huang. Time delay estimation in room acoustic environments: an overview. EURASIP J. Applied Signal Processing, pages 170–170, 2006. [6] J. Escolano et al. A bayesian direction-of-arrival model for an undetermined number of sources using a two-microphone array. J. Acoustical Society America, 135:742–753, 2014. [7] J. R. Janesick. Scientific Charge-Coupled Devices. SPIE Press, 2001. [8] G. V. Levin, J. D. Miller, P. A. Straat, R. Lodder, and R. B. Hoover. Detecting life and biologyrelated parameters on mars, 2007. https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/ 20070013724.pdf, last viewed 1/30/2017. [9] LifeBot. LifeBot EMS telemedicine mobile integrated healthcare. http://www.lifebot.us, last viewed 1/31/2017. [10] N. Madhu and R. Martin. Acoustic source localization with microphone arrays. In R. Martin, U. Heute, and C. Antweiler, editors, Advances in Digital Speech Transmission, pages 135–170. Wiley, 2008. [11] J. Medendorp, C. Loftin, and R. A. Lodder. Real-time determination of hemoglobin oxygenation state in neonatal mice. PittCon 2004 (Chicago, IL), 2004. https: //sites.google.com/a/contactincontext.org/asrg-public/presentations/PittCon%202004% 20neonatal%20mice%20abstract.pdf?attredirects=0&d=1, last viewed 1/30/2017. [12] OSRFoundation.org. Gazebo. , last viewed 1/30/2017. [13] Double Robotics. Telepresence robot for telecommuters. http://www.doublerobotics.com/, last viewed 1/30/2017. [14] ROS.org. Powering the world’s robots. http://www.ros.org/, last viewed 1/30/2017. [15] TensorFlow. Tensorflow. https://tensorflow.org/, last viewed 1/30/2017. [16] Seek Thermal. Seek thermal imaging cameras. http://www.thermal.com/products/, last viewed 1/30/2017. [17] C. Zhang, D. Florencio, and Z. Zhang. Why does PHAT work well in lownoise, reverberative environments? In Acoustics, Speech and Signal Processing, 2008, pages 2565–2568. IEEE Press, 2008.
10
1611