ELSEVIER
Robotics and Autonomous Systems Robotics and Autonomous Systems 18 (1996) 259-269
Deictic human/robot interaction* Polly K. Pook a,,, l, Dana H. Ballard b a MITArtificial Intelligence Laboratory, 545 Technology Square, Cambridge, MA 02139, USA b Computer Science Department, University of Rochester, Rochester, NY 14627, USA
Abstract This paper discusses a deictic strategy for human/robot interaction based on models of human motor control. The human operator uses gestures to momentarily bind autonomous robot routines to a goal behavior and spatial context. As a result, the operator communicates symbolically, rather than literally control the robot, and the robot benefits from the operator's on-line reasoning. The strategy called teleassistance, has been implemented for dexterous manipulation. A review of the results demonstrates that teleassistance requires executive operator control for only a fraction of the total task time and can accommodate natural variations in tasks better than what the autonomous routines alone can. Keywords: Teleassistance; Semi-autonomy; Deictic
I. Introduction This paper considers a b o t t o m - u p approach in understanding and extending robotic motor control by integrating human guidance. The focus is on dexterous manipulation using a U t a h / M I T robot hand but the ideas apply to other robotic platforms as well. Teleassistance is a novel method of human/robot interaction in which the human operator uses a gestural sign language to guide an otherwise autonomous robot through a given task. The operator wears a glove that measures finger joint angles to relay the sign language. Each sign serves to orient the robot within the task action sequence by indicating the next perceptual subgoal and a relative spatial basis. Teleassistance merges * This work was supported by NSF research grant No. IRI8903582 and by a research grant from the Human Science Frontiers Program. * Corresponding author. E-mail:
[email protected]. I This paper reports work by the author while a graduate student at the University of Rochester Computer Science Dept.
robotic servo loops with human cognition to alleviate the limitations of either full robot autonomy or full human control alone. The operator's gestures are deictic, from the Greek deiktikos meaning pointing or showing, because they circumscribe the possible interpretations of perceptual feedback to the current context and thereby allow the autonomous routines to perform with computational economy and without dependence on a detailed task model. Conversely, the use of symbolic gestures permits the operator to guide the robot strategically without many of the problems inherent to literal master/slave teleoperation, including non-anthropomorphic mappings, poor feedback, and reliance on a tight communication loop. The development of teleassistance stems from an analysis of autonomous control, in light of recent advances in manipulator technology. A qualitative, context-sensitive control strategy that exploits the many degrees of freedom and compliance of sophisticated manipulators governs the underlying
0921-8890/96/$15.00 © 1996 Elsevier Science B.V. All rights reserved SSDI 092 1-8890(95)00080- I
260
PK. Pook, D.H. Ballard/Robotics and Autonomous Systems 18 (1996)259-269
+ robotic sensing - real-time c~n~traints on compurauon
+ robotic sensing + human reasoning
- only as good as its
program
h u m a n reasoning +
feedback latencieslimited bandwidth non-anthropormorRhic _ mapping
Fig. I. The range of semi-autonomous control methods, bounded by low-level autonomous routines at one end and master/slave teleoperation at the other. The autonomous routines are reactive: they take advantage of sensor feedback but depend on pre-compiled programs for planning and reasoning. Conversely, teleoperation permits dynamic human reasoning but sensing is poor. The goal of semi-autonomous methods such as teleassistance is to combine the advantages of the two extremes while cancelling the disadvantages.
autonomous routines in teleassistance (see [26] for details).
2. The need for research in semi-autonomy In semi-autonomy, a person provides guidance to an otherwise autonomous robot. This guidance can take many forms; Fig. 1 sketches the extremes. At one end are autonomous routines, wherein the robot runs precompiled programs written by humans. At the other extreme is literal teleoperation, wherein the robot has no self-autonomy; the person directs every movement. Neither extreme has been particularly successful. One may look at historical concepts of biological motor control to help explain why. 2.1. Teleoperation does not work In the 19th century, the homunculus, or "little person", was the popular conception of human motor control. Just as a person might sit at a piano and play a chosen concerto, the homunculus "sat" in the human brain selecting and playing motor-sensory routines [37]. With every keystroke the homunculus controlled all motor and perceptual acts. Fig. 2(a) is a sketch of the homunculus as a control strategy. There are at least two engineering disadvantages with this method: (1) the computational load on the central controller, the homunculus, creates a bottleneck in the control flow; and (2) the remoteness of the central controller from muscles and sensory cells can cause unacceptable communication delays. The key
Humans
[ cerebrum
I spine
[ spi.e
$$$
$
$$$
sensors 8'] actuators J
sensors & actuators
I sensOrsactuators &
a) Homunculus
b) Sherrington reflex
c) Bernstein perspective
Robot
fO°ora,orl
$ $$$
contro er
sensors & I
sensors & actuators
sensors & actuators
d) Teleoperation
e) Servo-control
f) Teleassistance
Fig. 2. (a)-(t). Schematic models of control.
advantage of the homunculus is centralized reasoning; given all input and total control the homunculus can make informed decisions about which programs to play when. Of course, there could be other explanations for the homunculus strategy but the key here is that the explanation reveals nothing about motor control: the way a person controls movement is that there is another "little person" controlling movement. Traditional master/slave teleoperation mirrors the 19th century viewpoint (Fig. 2(d)). The human
P.K. Pook, D.H. Ballard/Robotics and Autonomous Systems 18 (1996) 259-269
teleoperator, like the homunculus, has direct control of the robot end effectors. Typically the operator wears movement monitoring devices to achieve this control. If the robot is anthropomorphic, a direct mapping causes the robot to simply mimic the human movements. If not, a more complex mapping applies. Varying degrees of feedback to the operator are possible, although current techniques are either very crude (e.g., [11]) or very constraining (e.g., [30]). Two engineering flaws of the homunculus recur in teleoperation: (1) the teleoperator bears the computational load, controlling every movement and noting all feedback. This is exhaustingly tedious for the operator and highly susceptible to error such as inaccurate or inadequate feedback, poor motion mappings from human to robot, or operator inattention [23]. Furthermore, the decisions of how movement is controlled are hidden in the mind of the operator and unavailable for analysis by the robot controller; (2) remote control causes even more problems for the teleoperator than for the homunculus. The homunculus at least cohabits the body of its robot; the teleoperator is off to the side in a body different from the robot. Not only does this cause communication lags, but it can also skew visual perspective, limit the feedback available to the operator, and make mappings awkward and incomplete. Consequent to these flaws, reactions are slow and require intense concentration. Collision avoidance falls on the shoulders of the distant and encumbered operator, endangering the robot. Teleoperation is also potentially dangerous for the operator when the forces acting on the robot reflect back to the operator. In fact, the teleoperator is well equipped to reason about feedback and control, however limited they may be. To sum up, teleoperation lacks distributed local sensing and action but employs central reasoning. 2.2. Autonomy does not work
At the other end of the spectrum are low-level autonomous routines that respond only to feedback from the robot sensors and hard-coded program instructions. These routines exploit distributed local sensing but lack central reasoning, the opposite of teleoperation. A biological analog is found in reflex actions. Sir Charles Sherrington, an English medical doctor at the turn of this century debunked the homunculus
261
theory by drawing attention to reflex actions and their apparent autonomy from cerebral control [33,34]. Two of his observations are particularly relevant to this discussion: first, reflexes can be stimulated in amputated limbs, disconnected from the neural cord; and second, the same reflex could not be triggered by stimulating the neural cord directly. That is, the motor action is directly coupled to the stimulation of the skin and reflexive coordination of low-level perception and action is autonomous from cognition. Fig. 2(b) depicts the self-contained reflex action. Often the aim of robotic autonomous control programs is to produce goal-directed behaviors that evolve from local reactivity, such as navigating by following walls [6,7,10]. These programs are akin to reflex actions (Fig. 2(e)). Reactive motor programs take advantage of the technology of fast servo feedback that enables the robot to react quickly and appropriately within the context of the immediate goal, such as following but not colliding with the wall. However, it is not yet feasible for this architecture to support behaviors that rely on global information and reasoning. Each behavior performs well when situated [7] in its intentional and spatial context, but the problem remains, how to situate it? This is a significant problem in uncertain and dynamic environments that cannot be readily modeled.
2.3. Humans need to interact with robots
The limitations of teleoperation and full autonomy demand a viable alternative for the immediate future. But there is a third long-term motivation for semi-autonomy. Increasingly, human interaction is a necessity for many applications of robotics. One must be able to direct an "intelligent" wheelchair, or a prosthetic device or a robot in one's home or office. In distributed systems such as the ATR, Japan teleconferencing project [20] or MIT's AI Lab "HCI (Human/Computer Interaction) Room" [35] or Media Lab "Smart Room", the goal is for embedded visual and auditory robot systems to maintain a coherent temporal understanding of events in the room, interpret each person's actions as commands, and act accordingly. In these projects the goal is not full autonomy but rather a human-machine symbiosis.
262
P.K. Pook. D.H. Ballard/Robotics and Autonomous Systems 18 (1996) 259 269
3. Semi-autonomy Nicolai Bernstein, a soviet physiologist (18961966), turned his attention to more complex actions, involving many degrees of freedom. He noted that the homunculus model implies that the more complex the motor action, the longer it should take the homunculus to perform it. He claimed that a sprinter reacts to the sound of a starting gun with the same latency as a person lifts a finger off a button [5,19]. Although more sophisticated timing devices provide evidence to contradict this claim, his rebuttal led to an interesting theory of motor control. The concept, developed further by Turvey et al. [37], is that there are low-level "coordinative structures", or motor programs that constrain the degrees of freedom to locally control perceptuo-motor actions. Thus the sprinter has a set of programs that start and maintain a running gait under constraints of balance, forward motion, joint coupling, etc. These programs are initiated and tuned cerebrally but then run within the constrained context with only periodic adjustment. Fig. 2(c) is a sketch of Bernstein's idea. Bernstein's model is promising for semi-autonomy; a human operator could be the "brain" that selects low-level robot routines and passes the necessary parameters (Fig. 2(0). Human supervisory systems have long been in use (e.g., [12] and see [32] for an excellent overview). But the question arises as where to draw the line between high-level human control and low-level servoing. In other words what is the conceptual interface between human and robot? Some current research in semi-autonomous manipulation addresses this question in roughly one of the two ways, as follows. 3.1. A pragmatic interface
One approach is to design a pragmatic interface given the task. For example, Sayers et al, [31] allow the teleoperator to select among a menu of geometric targets. The teleoperation apparatus (the master) is then constrained to motions along the selected geometry. Thus, the slave robot moves smoothly along a line or plane, etc., even though the operator may jerk or deviate from a correct path. This method requires an a priori CAD model of the task geometry. See also [2,15] for systems that intelligently integrate camera
views and CAD models to constrain and verify remote teleoperation. As another example, Yokohoji et al. [40] permit the operator to set a manual clutch that engages teleoperation, autonomous control, or one of the set of semiautonomous modes that integrate the forces exerted by both the operator and the robot controller. The MEISTER system of Hirai et al. [16] also intelligently combine controllers. Such methods apply well to certain tasks but they do not address the integration of cognition with sensing and action in a general way. 3.2. Learning by watching
Another popular approach to semi-autonomous manipulation is to try to find a mapping between observed human action and robot programs [22,17,27,18]. This approach is sometimes called "learning by watching". Although preliminary results are promising, one faces the high-dimensional feature space of human behavior. How to select the salient features implicit in human action? Context dependency, dynamic calibration, and communication latencies pose additional problems for on-line imitation (see [25] for a more detailed analysis). This paper argues instead for an explicit approach to robot supervision, based on observations of deictic biological systems.
4. A deictic strategy Fig. 3 illustrates a human/robot interface in terms of the Bernstein model. The operator (the "brain") does not directly control the robot end effectors but instead communicates symbolically via a gestural sign language. Each sign performs two functions: • to select a goal behavior; and • to provide necessary parameters to situate the behavior. For example, pointing to an object indicates the desire to reach toward it as well as the axis along which to reach. The goal behavior mirrors Bernstein's concept of a coordinative structure [5,36]. A set of robotic servo routines coordinate to perform the desired action. The goal, in the context of the task, and key spatial information inherent to the sign "tune" the coordinated
P.K. Pook, D.H. Ballard/Robotics and Autonomous Systems 18 (1996) 259-269
.
.
-
.
-
.
-
.
.
.
.
.
.
.
.
.
.
.
.
; ; r £ ; u - r e -
.
263
.
-
-
Fig. 3. A deictic sign language to control robot action. Each sign indicates a particular goal behavior, within the context of the task. The structure of the goal behavior parallels Bernstein's concept of a coordinative structure: robotic servo routines arranged in a fixed composition and tuned by the goal and the sign's spatial parameters. routines. Specifically, the tuning parameters constrain the interpretation of position and force feedback. In the example of reaching for an object, the directional axis constrains position control along a line and the goal of reaching the object constrains the interpretation of the forces acting on the hand. This tuning is sufficient to situate the routines so that they may perform autonomously. Teleassistance is based in part on a hypothetical model of biological systems [3]. In this model, orienting movements of the body such as visual fixation and hand position serve to bind cognition, perception, and action to objects in the world. The moment-bymoment dispositions of the eye and hand serialize cognition to make it possible to stay within the limits of working memory. In other words, the natural sequentiality of body movements matches the observed sequentiality and computational parsimony of human decision systems. In teleassistance, the operator sequences a task explicitly into a series of contextsensitive triplets: the cognitive goal, a spatial instantiation, and robotic sensory-motor routines. Each hand sign indicates the first two components, the goal and spatial parameters, to ground the robotic routines in the current context.
classify words like this or that as deictic because they constrain the listener's attention to a specific target from a set of candidate targets. Similarly, a teleassistant sign constrains the robot controller's interpretation of feedback to a specific goal from a set of possible goal behaviors. 2 A key advantage of a deictic strategy is its dynamicism. "This book" constrains reasoning about the utterances immediately preceding and succeeding the phrase. The listener is bound to a context, defined by "this book", and is able to interpret surrounding statements referentially. Hearing "that book" or "this bicycle" immediately shifts the binding to a new context. The listener does not need to keep the old context around (unless this book is about that bicycle, but such recursion is limited). Momentary deictic binding allows the listener to constrain interpretation to a particular context and optimize the use of short-term memory. 4.2. Deictic vision
4.1. Deictic language
Psychophysical studies suggest that animals use deictic strategies to bind cognitive goals to spatial targets and low-level perceptual and motor programs [21,4]. Visual fixation is another example of a deictic strategy. The act of fixating on an object centers the target
To bind context-sensitive components together dynamically is an example of a deictic strategy. Linguists
2 Agre and Chapman [1] introduced deictic strategies to constrain computation in their classic Pengi game.
264
PK. Pook, D.H. Ballard/Robotics and Autonomous Systems 18 (1996) 259-269
in the retinotopic array, potentially simplifying cognitive processes to deal with "the object I ' m looking at" rather than with the attributes that distinguish that particular object from all others [38,13]. Moreover, spatial relationships can be defined relative to the object and so be invariant to the viewer's location in world coordinates. The goal behavior also simplifies visual processing by constraining what information to extract from the image. Research in real-time computer vision has built on the deictic theory. Whitehead and Ballard [39] model visual attention as a binding strategy that instantiates a small set of variables in a "blocks-world" task. The agent's "attention" binds the agent's actions to a visual target. One can extend the concept: the behavioral goal constrains what actions, e.g., what visual routines, the agent should use. Success in computer vision stems predominantly from such special purpose routines: optic flow for motion detection; color histogramming for object recognition; zero-disparity filters for tracking, to name a few. The dynamic selection of a particular perceptual routine is deictic with respect to the goal. 4.3. Deictic action
One may hypothesize that human motor control is also deictic. Just as the goal and the gaze direction constrain visual processing, so can the goal and the body pose constrain action, where action is defined as a goal-directed coupling of motion and sensing. For example, bi-manual animals often use one hand as a vise and the other for dexterous manipulation. We hypothesize that the vise hand marks a reference frame and that the dexterous motions are made relative to that frame [14]. Similarly, body position creates a reference for motor actions. Studies of human handeye coordination have found head movements often to be closely associated with hand movements. Pelz et al. [24] found that most subjects square their heads with respect to the site of a complex manipulation action, even when the eyes are looking elsewhere. This fits our hypothesis if we consider head position as a reference frame for relative hand motion. On the other hand, musicians do not sit still while performing dexterous movements but they do move rhythmically and so maintain a reference beat for finger manipulation. Evidence from [9] supports this the-
ory. She attempted to train actors to make movements that contradicted their speech. The actors were able to make movements that contradicted their speech content, for example, nodding "yes" while saying "no", but could not alter their beat movements, i.e., movemerits that correspond to the rhythm and emphasis of speech. As position and rhythm constrain motion, so does the goal constrain sensing. One pays attention to haptic feedback differently when asked to identify an object blindly than when about to throw it. The goal dictates the extraction and interpretation of tactile and kinesthetic feedback. Psychophysical evidence is supportive of deictic motor control but incomplete. Regardless, a deictic strategy is expedient to robotics.
5. A deictic sign language for robot control Teleassistance bases the human/robot interface on deictic reference. To make this more concrete as a control strategy, each hand sign has two deictic functions. The first is semantic: each sign signifies a goal behavior that calls select robot routines appropriate to the task. Effectively, the reference is a temporal indicator into the succession of actions that compose the task. The second function is spatial: the hand sign binds the routines to a spatial context. Fig. 3 depicts these relations. 5. I. Providing a semantic context
People frequently communicate semantic intbrmation with sign language. We point when giving directions or to denote some object or person. A flat hand held up means "stop"; motioned, the hand means "come along". In teleassistance, each hand sign binds the robot servo controller to a semantic context, the current goal in a task sequence. Without a detailed model of the world, many interpretations of feedback are possible; context allows the controller to select the correct one. Earlier studies demonstrate the importance of contextual disambiguation in reducing the complexity of robot motor tasks. In [26], a dexterous manipulator autonomously grasps a spatula, positions it in a pan and flips a plastic egg without relying on exact
P.K. Pook, D.H. Ballard/Robotics and Autonomous Systems 18 (1996) 259-269
positioning or force information. Each successive action is controlled by interpreting qualitative changes over the robot sensors within the current behavioral context. For example, identifying force contact can be done by simply noting a significant change in tension on any hand joint, regardless of the particular hand or task configuration. To determine which event causes the tension fluctuation, whether it be from grasping the spatula, placing it in the pan or sliding it under the egg, the controller can refer to the current goal. In deterministic autonomous tasks, the sequence of goal behaviors is implicit in the control program but under teleoperator control that implicit knowledge is lost. With teleassistance, however, the operator signals each successive goal explicitly. The servo controller can then interpret feedback in a constrained domain, ignoring perceptions that only serve to distinguish one action from another. 5.2. Providing a spatial context
Certain hand signs also define relative spatial frames for successive perceptual and motor behaviors, thereby avoiding world-centered geometry that varies with robot movement. Pointing and preshaping the hand, for example, create hand-centered spatial frames: pointing indicates a direction for subsequent motion, and preshaping defines a grasp form and tool axis. Physiology preferentially orders (and limits) possible spatial frames. For example, one does not grasp an object with the back of the hand routinely so one can assume fairly safely that the manipulation is relative to the palm axis. The trick of back-palming a coin is an example of the rare exception. By planning a grasping action relative to the palm axis, rather than to a world axis, the same motor routine can be used in various orientations: up, down, etc. In teleassistance, the geometric parameters of each sign are passed to the robot servo routines so that they may be executed in a relative space that is independent of world coordinates.
6. Experimentation In [28,29] we illustrate and evaluate teleassistance for the toy problem of opening a door. An overview
265
of these two experiments and the results are presented here. Hardware. The manipulator is a compliant, tburfingered Utah/MIT hand mounted on a Puma 760 arm. The operator wears an EXOS Dexterous Hand Master (TM) that measures the joint angles on each of four fingers and an Ascension Technology Bird (TM) that records the position and orientation of the operator's arm and wrist in lab coordinates. 6.1. On-site teleassistance
In the first experiment four on-site operators teleassist the robot [28]. A task-specific finite state machine binds each operator sign to a robot behavior, within the task context. The operator has five signs with which to communicate to the robot: point, preshape for a doorknob, preshape for a door lever handle, halt, and emergency stop. The robot is capable of performing four autonomous behaviors, provided the door location and handle type is known: reach, grasp, open and release. The gestural signs are recognized by real-time statistical pattern matching of the operator's hand pose (the EXOS finger joint angles) to previously acquired patterns. On recognition of a hand sign, the controller calls the autonomous robot behavior to perform the action intended by the sign. Relevant spatial parameters inherent to the sign are noted and passed to the behavior. In this way, a person motions the robot through a complex and possibly non-deterministic task. In all, forty teleassisted trials were contrasted with an equal number of door-opening trials under literal teleoperation and fully autonomous control. For each controller, the operators trained until they were comfortable: 10-15 min for teleoperation and 2-10 min for teleassistance. Twice during each operator's 10-trial set, the door was moved to a new position selected arbitrarily within the workspace. Half of the subjects performed teleoperation first, then teleassistance. The other half reversed the order. Results. The results are shown in Fig. 4 along with the results of a completely autonomous controller. Maximizing the operator's strategic control of the robot while minimizing literal control is desirable, as it avoids the tedium and risks of teleoperation. Considering this performance metric, the operator spent only 7 s, on average, controlling the robot via
266
P..K. Pook, D.H. Ballard/Robotics and Autonomous Systems 18 (1996) 259-269 TELEOPERATION
reach
environment or one which mandates human interaction, teleassistance is preferable.
C, a s e
-- AvB.
l
human
mmm
grasp
6.2. Remote teleassistance open
m
release
m
0
I
I
s
I
I
I
5
10
15
20
25
3o
AUTONOMOUS
SERVOING
-- Avg.
Case
robot reach
m
m
grasp
m
open m
rolesse i 5
o
i lO
15
TELEASSISTANCE
point
m
i 20
i 25
-- Avg.
m
preshape
35
Case
human robot
reach
i 30
m m
I
m
~rasp
mlmm
m
halt open
rolease
mmimm O
I S
I 10
I 15
I 20
I 25
I 30
35
time (aeo.)
Fig. 4. Plots of the time that the human operator (dark bars) and the autonomous robot routines (grey bars) actively control the robot during each phase of the task, under the three control strategies. The teleoperator (top) must supervise 100% of the task; under autonomous control (middle), the robot is fully in charge but with limited strategic abilities; in teleassistance (bottom) the operator supervises the robot only 25% of the time for this task. Once the hand is teleassisted to a position near the door handle, the robot completes the task autonomously.
symbolic teleassistant hand signs. The teleoperator, in contrast, spent five times that (33 s) in literal masterslave control of the robot. Teleassistance and teleoperation accommodate arbitrary variations in the workplace, such as door locale and handle type because the human operator is able to reason on-line. In contrast, the autonomous controller can only rely On the accuracy of the task model and its perceptual and planning abilities to note and respond to changes in the environment. In a structured, static environment, the autonomous controller would work very well. In an unstructured
A key problem with teleoperation is latency. A lag in communication between master and slave places the slave robot in physical jeopardy, making control all the more stressful and tedious for the teleoperator. In teleassistance, the jeopardy of latent collision avoidance on the part of the operator is significantly minimized, since the robot can sense and avoid local collisions autonomously. Moreover, the use of a symbolic language raises the interface between human and robot to an abstract plane. We posit that abstract control is less sensitive to temporal feedback delay than literal master/slave control. The second door-opening experiment illustrates the advantage of a symbolic human/robot interface in the face of communication delays, such as occurs with remote operation. In these trials, the operators guide the robot remotely while under simulated communication latencies of up to 8 s[29]. Four operators perform four trials apiece for each latency increment under both teleassistance and teleoperation, with the exception of the 8 s latency which was not tested for teleoperation due to the potential for robot damage and operator discomfort. Hardware. In addition to the previous hardware, the operator wears a Virtual Research (VR) helmet that receives video images from a binocular camera system mounted on a second Puma 760 near the manipulator (Fig. 5). Inspired by earlier work at ETL, Japan, the helmet is fitted with a second Ascension Technology Bird that directly couples the motion of the Puma 760 to the human operators rotory and translatory head movements. Thus the human operator can operate the equipment remotely beyond direct visual range as the operator has a sense of telepresence by seeing through the robot's visual system. A Max Video MV2000 delays the video feed for 0.5, 1, 2, and 4 s to simulate communication latencies (with a corresponding decrease in resolution). Corresponding delays are simulated for sign recognition and the robot's motor response, resulting in an overall perceived delay from 1 to 8 s. Results. The performance times for teleoperation and teleassistance under different latencies are shown
P.K. Pook, D.H. Ballard/Robotics and Autonomous Systems 18 (1996) 259-269
267
Average Times to Open Door under Varying La~ncles i
l ' e m i ~
160
---
Tebaulance --" Autonomy ....
140 s@Sd
120
/} " sS
,e ~
k-
/
~
so
u
m
/
6o
/
m
40 20
..........11.°.1°..................=..-.-..
0
0
Fig. 5. Virtual Research helmet and remote video input. In the foreground is the binocular camera system mounted on a Puma 760. Behind, a human operator wears the Virtual Research helmet. At back is the U t a h / M I T hand mounted on a second Puma 760.
I
i
05.
I
I
2 Latency (sec.)
-o.---Ioeo.''"
i
4
Fig. 6. Performance of teleoperation and teleassistance under increasing latency. Performance for an autonomous controller is shown for comparison. If the task geometry changed significantly, the autonomous control scheme would fail, unlike the other two strategies.
7. Conclusion in Fig. 6. A base line for autonomous control is also shown (when the door position and handle shape are known). If the task geometry changed significantly, the autonomous control scheme would fail. Error bars show the variance among all operators over all trials. Under teleoperation, increasing latencies appear to result in a linear increase in the time to perform the task. The time to perform relates directly to operator difficulty. In contrast, the teleassistance operators performed the task under increasing latency with only a moderate increase in execution time. This suggests that the abstraction afforded by the sign language in teleassistance successfully removes the operator from the need to sensitively monitor control. The subjects' postexperimental remarks support this view; each operator noted that teleassistance was a far easier control mechanism than teleoperation under significant communication lags. Under teleassistance, the subjects viewed actions as higher-level primitives that could simply be called and then relied on to reach completion.
The theoretical platform of a constrained, referential, deictic mode of sensory-motor control is supported by biological observations. This platform may also be useful for examining robotic motor control and human/robot interaction. Although the experiments reported here explore only a narrow range of motor tasks, the results are promising for several reasons: Advantageous combination of teleoperation and autonomous motor programs. The advantages and disadvantages of reactive autonomy and teleoperation are complementary. Reactive servo routines work well when goal-directed and situated in a spatial context. Without high-level reasoning and a detailed task model, however, servo control suffers. Teleoperation addresses this weakness by putting a person in the loop, one who can provide more global guidance. But teleoperation eliminates local servo control, resulting in slow jerky movement that is quite tedious for the teleoperator. Local reactivity is lost. Teleassistance is a two-layer control strategy: a high-level operator
268
P.K. Pook. D.H. Ballard/Robotics and Autonomous Systems 18 (1996) 259-269
deictically binds low-level routines to an intentional and spatial context. C o m p u t a t i o n a l economy. Deictic bindings simplify computation. In particular, the intended goal constrains the interpretation of sensory feedback. For example, pressure on the fingers may indicate either contact with a tool or contact between the tool and a surface, depending on whether the robot is picking up the tool or applying it. A spatial reference simplifies control by providing a relative frame about which the robot moves. This way, the robot can perform the same routine in different task configurations without adjustment. Actions become independent of world geometry. S y m b o l i c h u m a n ~ r o b o t interaction. The gestural signs in teleassistance are symbolic, conveying intent without requiring the person to control the robot literally. This alleviates many of the problems inherent to traditional master/slave teleoperation, in which the robot has no self-autonomy and can only mimic human movement. Teleoperation is plagued by: poor mappings between operator and robot physiology; latent robot response to operator action; reliance on a broad communication bandwidth; tedium for the operator; and the potential for robot damage when solely under remote control. E a s y b o o t s t r a p p i n g . The deictic model makes it easy to define new tasks from existing ones for two reasons. First, the robot behaviors are modular and so easily recomposed. Second, the operator instantiates key spatial variables on-line. Binding motor programs to a spatial context and goal is a dynamic and efficient means of robot control. The deictic strategy of teleassistance maximizes the servoing ability of the robot and, in turn, maximizes the use of human reasoning.
References [1] EE. Agre and D. Chapman, Pengi: An implementation of a theory of activity, Proc. of the Sixth National Conference on Artificial Intelligence (Morgan Kaufmann, Los Altos, CA, 1987) 268-272. [2] EG. Backes, M.K. Long and R.D. Steele, The modular telerobot task execution system for space telerobotics, Proc. of the IEEE International Conference on Robotics and Automation, Vol. 3, Atlanta, Georgia (1993) 524-530.
[3] D.H. Ballard, M.M. Hayhoe and P.K. Pook, Deictic codes for the embodiment of cognition, The Behavioral and Brain
Sciences (1996) (To appear - earlier version available at National Resource Laboratory for the Study of Brain and Behavior TR95.1, January 1995, University of Rochester). [4] D.H. Ballard, M.M. Hayhoe, E Li and S.D. Whitehead, Hand-eye coordination during sequential tasks, Philosophical Transactions of the Royal Socie~ of London
(1992). [5] N. Bernstein, The Coordination and Regulation oj Movement (Pergamon Press, London, 1967). [61 R. Brooks, A layered intelligent control system for a mobile robot, IEEE Journal of Robotics and Automation (1986) 14-23. [7] R.A. Brooks, Intelligence without representation, Workshop on the Foundations of Artificial Intelligence
(Endicott House, Dedham, MA, 1987). [8l R.A. Brooks, A robot that walks; emergent behaviors from a carefully evolved network, Technical Report AI Memo 1091, MIT Artificial Intelligence Laboratory, 1988. [91 J. Cassell, D. McNeill and K.E. McCullough, Speech gesture mismatches: Evidence for one underlying representation of linguistic and nonlinguistic information, Cognition, to appear. [10] J.H. Connell, A colony architecture for an artificial creature, Technical Report 1151, MIT Artificial Intelligence Lab, 1989. [1 I1 EXOS Corp., EXOS Dexterous Hand Master Users Manual, Boston, MA (1992). [12] W.R. Fen'ell and T.B. Sheridan, Supervisory control of remote manipulation, 1EEE Spectrum 4(10) (1967) 81-88. [131 J.J. Gibson, The Perception of the Visual Worm (Houghton Mifflin, Boston, 1950). [14] Y. Guiard, Asymmetricaldivision of labor in human skilled bimanual action: The kinematic chain as a model, Journal of Motor Behavior 19(4) (1987) 486. [151 G. Herzinger, B. Brunner, J. Dietrich and J. Heindl, Sensor-based space robotics - ROTEX and its telerobotics features, IEEE Transactions on Robotics and Automation 9(5) (1993). [16l S. Hirai, T. Sato and "12Matsui, Intelligent and cooperative control of telerobot tasks, IEEE Control Systems (1992) 51-56. [ 17] K. Ikeuchi and T. Suehiro, Towards an assembly plan from observation, Proc. of the IEEE International Conference on Robotics and Automation. Vol. 3, Nice, France (1992) 2171. [18] S.B. Kang and K. Ikeuchi, Grasp recognition and manipulative motion characterization from human hand motion sequences, Proc. of the IEEE International Conference on Robotics and Automation, Vol. 2, San Diego, California (1994) 1759. [19] J.A. Scott Kelso, Human Motor Behavior." An Introduction (Lawrence Erlbaum Associates, London, 1982). [20] J.O. Kishino, H. Takemura and N. Terashima, Virtual space teleconferencing system: Real-time detection and reproduction of 3D human images, Proc. of the HCI International Conference (1993). [21 ] E. Kowler and S. Anton, Reading twisted text: Implications for the role of saccades, Vision Research 27 (I 987) 45~0.
P.K. Pook, D.H. Ballard/Robotics and Autonomous Systems 18 (1996) 259-269
[22] Y. Kuniyoshi, M. Inaba and H. Inoue, Seeing, understanding and doing human tasks, Proc. of the 1EEE International Conference on Robotics and Automation, Vol.1, Nice, France (1992) 2. [231 V. Lumelsky, On human performance in telerobotics, IEEE Transactions of Systems. Man and Cybernetics 21(5) ( 1991 ) 971-982. [241 J. Pelz, M.M. Hayhoe, D.H. Ballard and A. Forsberg, Separate motor commands for eye and head, Investigative Ophthalmology and Visual Science, Supplement (1993). 1251 P.K. Pook, Teleassistance: using deictic gestures to control robot action, Ph.D. Thesis, University of Rochester, Rochester, NY, 1995. [26] P.K. Pook and D.H. Ballard, Sensing qualitative events to control manipulation, Proc. of the SPIE Photonics East Conference, Sensor Fusion V, Boston, MA (1992). [27] P.K. Pook and D.H. Ballard, Recognizing teleoperated manipulations, Proc. of the IEEE International Conference on Robotics and Automation, Vol. 2, Atlanta, Georgia, (1993) 578. [281 P.K. Pook and D.H. Ballard, Deictic teleassistance, Proc. of the IEEE/RSJ/GI International Conference on Intelligent Robots and Systems (IROS) (1994). 129] P.K. Pook and D.H. Ballard, Teleassistance, Proc. of the IEEE International Conference on Robotics and Automation (1995) 1301 K. Salisbury, A demonstration of the Shadow force reflection system, 1994 American Association for Artificial Intelligence (AAAI) Spring Symposium: Toward Physical Interaction and Manipulation, March 1994. [311 C. Sayers, R.P. Paul and M. Mintz, Operator interaction and teleprogramming for sub-sea manipulation, Proc. of the Fourth IARP Workshop on Underwater Robotics (1992).
269
[32] T.B. Sheridan, Telerobotics, Automation and Human Supervisory, Control (MIT Press, Cambridge, MA, 1992). [331 C.S. Sherrington, The Integrative Action of the Nervous System (Yale University Press, New Haven, CT, 1947). [34] C.S. Sherrington, Coordination of the simple reflex, in: C.R. Gallistel, ed., The Organization of Action: A New Synthesis (Lawrence Erlbaum Associates, London, 1980). [35] M.C. Torrance, Advances in human-computer interaction by: the intelligent room, Research Symposium, Human Factors in Computing: CH1'95 Conference (1995). 136] B. Tuller, M.T. Turvey and H.L. Fitch, The Bernstein perspective - II. The concept of muscle linkage or coordinative structure, in: J.A. Scott Kelso, ed., Human Motor Behavior: An Introduction (Lawrence Erlbaum Associates. London, 1982). [37] M.T. Turvey, Hollis L. Fitch and Betty Tuller, The Bernstein perspective - 1. The problems of degrees ot" freedom and context-conditioned variability, in: J.A. Scott Kelso, ed., Human Motor Behavior: An Introduction (Lawrence Erlbaum Associates, London, 1982). [38] S. Ullman, Against direct perception, The Behavioral and Brain Sciences 3 (1980) 373-415. [39] S.D. Whitehead and D.H. Ballard, A preliminary study of cooperative mechanisms for faster reinforcement learning, University of Rochester, Department of Computer Science, 1990. [40] Y. Yokohoji, A. Ogawa, H. Hasanuma and T. Yoshikawa, Operation modes for cooperating with autonomous functions in intelligent teleoperation systems, Proc. of the IEEE btternational Conference on Robotics and Automation, Vol. 3, Atlanta, Georgia (1993) 510.