Chapter 5 Adaptive optimal control of human tracking

Chapter 5 Adaptive optimal control of human tracking

Motor Control and Sensory Motor Integration: Issues and Directions D.J. Glencross and J.P. Piek (Editors) 9 1995 Elsevier Science B.V. All rights rese...

2MB Sizes 1 Downloads 32 Views

Motor Control and Sensory Motor Integration: Issues and Directions D.J. Glencross and J.P. Piek (Editors) 9 1995 Elsevier Science B.V. All rights reserved.

97

Chapter 5

A D A P T I V E O P T I M A L C O N T R O L OF H U M A N T R A C K I N G

Peter D Neilson, Megan D Neilson, & Nicholas J O'Dwyer

Cerebral Palsy Research Unit, Institute of Neurological Sciences, The Prince Henry Hospital and School of Electrical Engineering University of New South Wales

The motor behaviour of subjects performing visual tracking tasks is quantified by identifying the mathematical relationship between the visual information presented to the eye and the resulting motor response generated at the hand. It has long been known that this relationship is equivocal and that no unique mathematical model exists to describe the behaviour of the human operator. In what follows we develop the hypothesis that tracking behaviour is variable because the central nervous system (CNS) functions as an adaptive optimal controller of muscles, biomechanics and external systems. It automatically tunes its input-output relationship to compensate for the dynamics of the system being controlled and to compensate for inherent time delays by predicting future values of the input signals. We explore the proposal that the CNS plans motor responses to achieve goals using a minimum of input muscular energy and that it can trade tracking accuracy against demand for input energy by altering the speed of the response. Hypotheses about information processing performed by the CNS during visual tracking are presented in the form of a computer simulation. Distributed parallel processing circuitry is employed in the simulator to construct adaptive digital filters which operate independently and in parallel These digital filters mimic the behaviour of hypothesized neural adaptive filters within the CNS. Indeed in general descriptions of the simulator can be taken as hypotheses about the structure and function of neural circuitry and about the information processing performed by the CNS during control of movement. As with any scientific theory, the hypotheses are tested experimentally by comparing the behaviour of the simulator with that of human subjects performing the same task. A summary of key findings from a number of studies of human tracking behaviour carried out at our laboratory is presented and many of the findings are compared with the behaviour of the simulator.

98 1.

P.D. NeUson, M.D. NeUson and N.J. O'Dv~er INTRODUCTION

Among the many experimental paradigms used in the study of human movement, tracking holds a distinguished and historical place.

The understanding of tracking performance

became crucially important in a variety of applications in World War II and it was there that the first attempts were made to specify a subject's behaviour in the language of systems control theory. The potential for tracking to shed light on sensory-motor mechanisms was recognized in those early times by none other than Kenneth Craik writing circa 1943:

[Tracking] is a form of coordinated sensory-motor reaction capable of all degrees of complexity, blending at the one end, in the simplest reactions, into something very near a conditioned reflex, and at the other into the application of the most complicated skills and habits, in which anticipation, prediction, grasping a problem, and calculation of the future may be involved. It can be tackled in a form in which physiological and also engineering (servo-motor) terminology is appropriate and suggestive. Thus the datum to which the operator responds is usually described as a 'misalignment' between target and graticule, to which he makes a control movement; as Hick has pointed out, the misalignment acts as a stimulus, evoking a response from the operator...The study of the response to serially presented stimuli may throw some light on the nature and duration of the 'central delay' in reaction times and has interesting analogies with refractory periods in rhythmic reflexes; while the ability of the operator to compensate for the limited 'response rate' of this central mechanism by the appreciation of patterns of groups of stimuli as wholes, and the formulation of complex motor responses or unitary response,patterns to deal with them suggests an interesting field for investigation of sensory and motor integration... (Craik, 1943/1966, pp.47-48)

Despite this early systems

orientation of

Craik and others to tracking, many later

skills oriented psychological studies abandoned the determination of transfer characteristics of the human operator in favour of simpler measures of performance such as overall error measures.

Poulton (1974) and Hammerton (1981) both reflect disillusionment with the

approach on the basis that there is no unique description of the human operator, that models can be overfitted and assumptions of the methods are often not justified.

Moray

(1981) takes a different view along with other engineering oriented contributors to the psychological literature such as Pew (1974), Jagacinski (1977) and Wickens and Gopher (1977).

Adaptive Optimal Control of Human Tracking

99

Our own systems oriented approach to tracking and motor behaviour in general has its origins in the stochastic modelling of human performance in aeronautics (e.g., McRuer & Krendel, 1974) and in the analysis of neurophysiological systems (e.g., Stark, 1968). In this paper we compare and contrast experimental studies of tracking with computer simulations based on our theoretical account of human movement and its underlying mechanisms (Neilson, Neilson & O'Dwyer, 1988, 1992; Neilson, O'Dwyer & Neilson, 1988). Based initially on adaptive control principles (Astrom, 1970), the theory is extended here to incorporate recent developments in adaptive optimal control (Bitmead, Gevers & Wertz, 1990; Clarke & Gawthrop, 1979; Clarke, Mohtadi & Tufts, 1987). For discussion of earlier applications of modem versus classical control theory in tracking see the special journal issue edited by Rouse (1977) and Moray (1981). Many of the concepts on which our theory and experiments are predicated, such as internal models, prediction and intermittency, cannot be reviewed here but the background can be found in works such as Arbib (1972), Kelley (1968), Licklider (1960), McRuer (1980), Moray (1981), Veldhuyzen and Stassen (1976), Young (1969) and Young and Stark (1965).

2.

ELEMENTS OF A VISUAL TRACKING TASK

Visual tracking provides a useful experimental tool for the quantitative assessment of motor behaviour.

As illustrated schematically for a single axis tracking task in Figure 1, the

measured variables are clearly defined. In experiments at our laboratory, a prerecorded

target signal T controls the position of a lcm square target on a 30cm computer display screen. The experimenter manufactures the target signal. For the data presented below we employ either step changes or stochastic variations in target position.

Stochastic target

signals are manufactured by faltering zero mean Gaussian random numbers at a rate of either 20/s or 66.6/s through a variety of different digital falters and then amplitude scaling to produce a 20cm peak to peak displacement on the screen. By designing the frequency response characteristics of the digital falters, the experimenter controls the statistical properties (autocorrelation and power spectrum) of the target signal. The task for the subject is to operate a joystick to control the position of the response cursor on the screen and attempt to keep it aligned as accurately as possible with the target.

The response

cursor is a lcm cross that fits precisely within the lcm square target. Different colours are sometimes employed for the target and response cursor.

The position of the response

1O0

P.D. Neilson, M.D. Neilson and N.J. O'D~er

cursor on the screen is sampled synchronously with the target signal at 20/s or 66.6/s to generate the response signal R. The misalignment between the position of the target and the position of the response cursor on the screen is computed by subtracting R from T to obtain the error signal E. The angular position of the joystick controlled by the subject's hand is also sampled synchronously with the T and R signals and we refer to this signal as the motor response signal MR. By setting a digital filter within the tracking program, the experimenter determines the dynamic relationship between movement of the joystick and the resulting deflection of the response cursor on the screen.

This sets the dynamic

response characteristics of the tracking system H. A stochastic disturbance signal D, manufactured in the same manner as described above for T, can be added to the deflection of the response cursor R. The subject is represented schematically on the fight hand side of Figure 1. The subject sits comfortably with eyes about 50cm in front of the display screen. Visual information V representing the position of the target T, position of the response cursor R and the misalignment between them E is available to the CNS. The CNS generates a vector of motor commands, m, to activate many muscles. The muscles generate tensions, t, which pull on bones and exert torques about joints. The resulting torques produce movements, 0, of the hand and arm to operate the joystick and produce the motor response, MR. The muscles and their biomechanical loads are embedded in a complex reflex control system involving feedback of muscle tensions, muscle lengths and joint angles. As well as being involved in reflex control, the signals m, t and 0 are fedback to the CNS (kinaesthetic feedback and efference copy) and are available for high level processing. Thus the CNS receives kinaesthetic feedback of the movement of the joystick, MR. A typical tracking experiment has a duration of about one minute. A target signal T and disturbance signal D are applied as inpul~s to the tracking system and the error E, motor response MR and response R generated by the subject are recorded as output signals. A variety of modem digital signal analysis and system identification techniques are now available (e.g., The MathWorks Inc., 1993) to statistically describe these signals and identify the dynamic relationships between them. In particular, the open-loop relationship between E and MR describes the input-output characteristics between visual information presented to the subject' s eye and the resulting motor response generated at the hand.

Adaptive Optimal Control of Human Tracking

~

101

G~ussl~n R~n01om

Nul~be~s

Response

CL~SOr

FPt,"e~ ~-TBAND~/II)TH //

vJI Tr'~ckinO1

'

, :~

...... ~>" ~......

/

"" ' P

./_

I~,oo.~hoo,~o
I

~

I .......

~"

L~J Length ']

Feeabo,ck

Figure 1. Schematicdiagram illustrating visual tracking.

3.

C O M P U T E R SIMULATION OF TRACKING BEHAVIOUR

Adaptive Model Theory (AMT) is a computational theory about information processing performed by the CNS during control of movement (Neilson, 1993; Neilson, Neilson & O'Dwyer, 1985, 1988, 1992, 1993; Neilson et al., 1988). Working hypotheses based on AMT have been implemented as a computer simulation of a human subject performing a visual tracking task. The computational theory is expressed in terms of distributed parallel processing circuitry that can be implemented by a digital signal processing microchip. Consequently, the theory can be realized as a real-time controller for a robot or industrial process. The simulations run on a 486 PC using SIMULINK, a MATLAB based software package designed for simulating dynamic systems (The MathWorks, Inc., 1993). Expressing the working hypotheses of a theory in the form of a computer simulation has advantages. Behaviour of the simulator can be monitored and analysed using the same techniques employed to monitor and analyse the behaviour of human subjects performing the same task. It is therefore straightforward to subject the theory to rigorous experimental evaluation. In this section our aim is to provide a brief summary of the adaptive signal processing circuitry used in the design of the simulator.

Secondly, we will compare

simulations with results from a number of experimental studies from our laboratory.

P.D. NeUson, M.D. Neilson and N.J. O'Dwyer

102

3.1 Three-Stage Model of Movement Control As illustrated schematically in Figure 2a, the simulator has three independent parallel processing systems referred to as sensory analysis (SA), response planning (RP) and

response execution (RE) systems. This is an important feature of AMT. In simulation of a reaction time experiment the simulator can plan a response to a stimulus at the same time as it executes a response to a previous stimulus and detects and registers in memory a subsequent stimulus. Although the three processing systems operate independently and in parallel, they communicate with each other by memory buffering (indicated by the rectangular boxes in Figure 2a). The memory buffers (working memory in the CNS) have independent read and write capabilities. Thus there is a sequential transfer of information from SA to RP to RE. We refer to this sequence of SA-RP-RE processing as a Basic Unit

of Motor Production or BUMP.

(~) external sensor y input

~^ ~ nn .~ nr ~ O ~ ]-'q~F-~ I~I-J-~-~F~ riP_) y ~J ~ ~

response

response-

Feedback

(b)

Sensory I ~ s p o n ~ ^natys~ I F't.nn~

I Rt,s~xm~ I I ~ecu~a~ I

~ . s i c Unl~ o f

Not, of" Pr-o~uc~lon

Figure 2. (a) Schematic drawing of SA, RP and RE systems. (b) Schematic illustration of concatenated sequence of BUMPs.

As proposed in AMT, the RP system requires a finite interval of time to read the information provided by SA, to preplan an appropriate motor response and to write this into memory ready for execution by RE. This introduces intermittency into the behaviour of the simulator. In the simulations to be described, the finite time for RP processing is set variously at 100 or 150ms. This is in contrast to the SA and RE systems, seen in AMT as operating continuously and in real-time.

Adaptive Optimal Control of Human Tracking

103

Because of the intermittency introduced by the RP system, tracking responses generated by the simulator consist of a concatenated sequence of submovements produced by an overlapping sequence of BUMPs, as illustrated schematically in Figure 2b.

Any one

submovement involves a sequential transfer of information from SA to RP to RE, constituting a BUMP, but at any one time, all three processors can operate independently and in parallel on different submovements corresponding to different BUMPs.

It is the

intermittent RP system that determines when information is transferred in from SA and out to RE. The apparent discreteness of SA and RE processing in Figure 2b is the product of the finite time required by RP processing. What justification is there for introducing intermittency into the simulation of human tracking? The relevance of intermittency in the modelling of tracking behaviour has long been discussed (see Craik 1947, 1948; Navas & Stark, 1968; Poulton, 1981; Sheridan & Ferrell, 1974). Nevertheless, it is in the study of discrete stimulus-response behaviour that investigation of possible central intermittency is most frequently found (e.g., Welford, 1980).

Recent work in this tradition (see Pashler, 1992) speaks strongly against an

alternative hypothesis of continuous-time processing with time delay.

Pashler and

colleagues have examined a wide range of factors influencing the time required for various stages of processing in double stimulus reaction time experiments. Their results support the existence of a 'response selection bottleneck' occurring after perceptual processing. It is as if a response selection mechanism becomes engaged in selecting an appropriate response to a first stimulus and is unavailable to work on selecting a response to a second stimulus until it has completed the first. A second stimulus presented during the reaction time interval to the first is processed by the perceptual system and queued in working memory until the response selection mechanism completes its processing of the first stimulus and becomes available to work on the second. As a consequence, the reaction time to the second stimulus is increased, relative to when the second task is performed alone, by an amount equal to the time the second stimulus is held in memory. Known as the 'psychological refractory period', this phenomenon is not explained by continuous-time processes with time-delay in which the interresponse interval remains equal to the interstimulus interval. We have made intermittency a key feature of AMT and have argued previously (Neilson et al., 1992) that it provides a theoretical bridge between double stimulation reaction time

104

P.D. Neilson, M.D. Neilson and N.J. O'Dwyer

experiments and continuous-time tracking experiments.

Despite the strong support for

intermittency in stimulus-response behaviour it is common for engineering models to employ continuous-time processing with transmission time delays of 200ms or more. For example, in a recent analysis of trajectory formation and temporal interaction of reaching and grasping (Arbib & Hoff, 1994; Hoff & Arbib, 1993), the hypothesis was developed that the collective delay for the sensorimotor loop yields an appearance of intermittency in what is actually a continuous-time feedback control process. To support the proposal that reaching movements can be continuously modified based on incoming, albeit delayed, sensory information, Hoff and Arbib (1993) cite the findings of an experiment in which the position of the target in a pointing task was perturbed at the onset of the movement (Pelisson, Prablanc, Goodale & Jeannerod, 1986). Subjects generated a smooth transition, without secondary accelerations, in midflight, to a new trajectory, compensating for the altered position of the target. However, the duration of the pointing movements in this experiment were of the order of 500-600ms and the change in target position was only small (e.g., 10% of the distance to the target).

Given the notion of intermittency and

BUMPs as outlined above it is easy to see that there is ample time for an intermittent system to introduce multiple corrections during the pointing movement and, since the correction amplitudes are small, a smooth transition from one trajectory to another without secondary peaks in the velocity or acceleration can be achieved. Later in the chapter we will verify this using a computer simulation of the pointing experiment. 3.2 Description of the Simulator The simulator is adaptive and alters its behaviour in response to changes in the dynamics of the tracking system or variations of the statistical properties of the target and disturbance signals. Consequently, not only does the computer generate predictions based on AMT about the behaviour of human subjects performing tracking tasks, but it also predicts how behaviour changes as the conditions of the task alter. Let us examine the structure of the simulator in a little more detail. An information flow diagram or block diagram of the simulated system is shown in Figure 3. A digital filter, labelled H in Figure 3, simulates the dynamics of the tracking system. It represents the dynamic relationship between movement of the joystick MR and the resulting deflection R of the response cursor on the screen. Target and disturbance signals, T and D, are applied as inputs to the simulator.

The simulator generates two

105

Adaptive Optimal Control of Human Tracking

output signals corresponding to Me and R. These are compared with signals generated by human subjects responding to the same T and D using the same tracking system dynamics.

lT J,

!..

I kk ~ ' J .] achxl:)'~lve ~

~

~

T

L~

predict:or-f"

I

I

T'

1

IN,, M. I(e??erence copy)

-"1

t---.4~......l_~,~"_~',":_'~..~k.....,k...~

~

~

I

,

~, (exo,FFer'ence)

I

I(e??erence / copy> H~

~ ~

l~o~'w=~'~

(expec'Imol res

~:~/"

I

I

H

I cor~-t:r-olle01

I)(exl;erno~t

I

olls~;urbonce)

(cLFFer'ence)~R

J.

Figure 3. Informationflow diagramof the computersimulator.

Sampling Interval: In the simulator, all signals are emulated by a sample and hold process operating at 20 Hz, allowing a bandwidth of 10 Hz for input and output. This is sufficient to simulate even the fastest voluntary movements. As discussed previously (Nelson et al., 1992), this sampling frequency is consistent with a 20 Hz bursting of cortical columns as recorded in cat and monkey cortex (Shaw & Silverman, 1988; von Seelen, Mallot, Krone & Dinse, 1986) and with the 50ms bursts of electromyographic activity (EMG) recorded from agonist and antagonist muscles during fast ballistic movements (Freund, 1983; Ghez, 1991). Following Gottlieb, Corcos and Agarwal (1989), who modelled cortical drive to alpha motor neurons as a rectangular pulse of neural activity, we approximate bursts of neural activity by rectangular pulses of the type generated by a zero order hold device (ZOH) in the D-A converter of a digital computer.

106

P.D. Neilson, M.D. NeUson and N.J. O'Dwyer

Sensory Analysis System: In the simulator the SA system processes three sensory signals, the target signal T, the motor response feedback signal M s, and the response feedback signal R. SA incorporates three adaptive self-tuning digital filters. Detailed descriptions of adaptive digital filters and discussion of their biological realizability (adaptive neural filters) were given previously (Neilson et al., 1992) and mathematical analysis of their behaviour is available in texts such as Widrow and Steams (1985) and Haykin (1986). (1) The adaptive filter at the top of Figure 3 processes the target signal T. Driven by the autocorrelations within T, the filter automatically tunes itself to generate the best possible predictions ~" of future values of T. (2) A second adaptive filter (labelled H min Figure 3) functions as an internal model of the dynamic relationship between M R and R or, in other words, as an internal model of the tracking system, H.

The filter automatically tunes itself to maintain an

accurate internal model H m= H. By transforming M R through H ma signal R' equal to the expected feedback of the response R (expected reafference) is obtained. Comparison of R with R' provides an estimate D' of the disturbance D (exafference). (3) A third adaptive filter (labelled D adaptive predictor in Figure 3) processes IY.

Driven by the

autocorrelations within IY, the filter automatically tunes itself to generate optimal predictions I~ of future values of D. (4) The SA system includes a fourth component labelled R predictor in Figure 3.

This is not an adaptive filter.

It generates 1~, the

predicted trajectory of the response cursor, by combining the fedback value of R with the previously planned R* currently in the memory buffer awaiting execution. Thus the SA system maintains accurate internal models of the dynamics of the tracking system and of the stochastic properties of T and D. It generates predictions "F, I) and 1~ which are updated by new sensory input every 50ms (i.e., 20 Hz) and stored in memory buffers awaiting input to the RP system.

Response Planning System: Intermittency in tracking behaviour is simulated by allocating the RP system a finite operating time. Within this time, referred to as the planning interval, RP reads "F, I~ and 1~ from memory, uses these in planning a desired response trajectory R* and writes R* to memory ready for execution. In the simulations presented, the planning interval is set to

107

Adaptive Optimal Control of Human Tracking

either 100 or 150ms, during which RP plans an R* of equivalent length, namely a fast ballistic movement.

• re~176 II It r~jec~orylP

"t;?l

S-sh~peol correc•

P

?eeolback response

SA

A

RP

error

I

I

RE

%/%A/ "l;Irne,

Figure 4. Illustration of S-shaped desiredresponse trajectory.

As illustrated in Figure 4, R* is planned in the form of an S-shaped trajectory in which the response cursor moves from its predicted path f~ into alignment with the predicted path "r of the target. In the simulator R* also includes a compensatory component set equal to the negative of the predicted disturbance D, but this is not shown in the figure. The S-shaped trajectory corresponds to the motion of an inertial system driven by rectangular pulses of force, a close approximation to the operation of agonist and antagonist muscles during fast ballistic movement (Freund, 1983; Ghez, 1991). Thus we simulate tracking responses as a sequence of ballistic submovements, each planned as an S-shaped trajectory of the response cursor by successive operations of the RP system. The S-function generator is described in detail later. A tracking strategy which uses the fastest possible movement (shortest duration of R*) to align the response cursor with the target will reduce the tracking error E = T - R to zero in the shortest possible time. Most of the simulations presented here use this accuracy optimizing strategy.

In later discussion of speed-accuracy trade-off we show that

lengthening the duration of R* is relevant to other optimizations.

108

P.D. Neilson, M.D. Neilson and N.J. O'Dwyer

Response Execution System: The RE system includes a single adaptive falter (labelled Hd ~ in Figure 3) plus a simulation of the tracking system (labelled H in Figure 3). The adaptive filter H~~ is slaved to the weights (parameters) computed by the adaptive modelling circuitry in the SA system for the filter Hm. By employing the weights in a feedback configuration (Neilson et al., 1992), the filter H d ~ simulates the inverse dynamics of the tracking system H. It has exactly the inputoutput characteristics required to transform the trajectory R* into an appropriate MR to drive H and generate R = R*. Any change in the dynamic response characteristics of the tracking system H leads to an automatic adaptive retuning of the forward model Hm and of the inverse model H=~ employed during response execution.

4.

OBSERVATIONS OF TRACKING BEHAVIOUR

In this section we present key findings from a number of experimental studies of human tracking behaviour carried out at our laboratory. Some of the findings support and extend previous reports and highlight the parallel features of the simulator as an implementation of AMT. Others concern direct comparison between the responses of subjects and that of the simulator. Full reports of all experiments are available in theses held by the School of Electrical Engineering, University of New South Wales.

4.1 Adaptation to Changes in Tracking System Dynamics It is well known that subjects can deal satisfactorily with a variety of tracking systems incorporating different control

characteristics

(McRuer & Krendel, 1974;

Poulton,

1974). Figure 5a shows tracings of the error signal E and the rate of change of the motor response 1VIRrecorded while a subject performed a pursuit tracking task with a zero order (gain equals one) tracking system. Inspection of the traces shows that, to a first order of approximation, the rate of change of the motor response resembles a delayed version of the error signal. This can be stated another way. The motor response resembles a delayed version of the integral of the error signal. This confirms the theoretical relationship between E and M R derived from AMT (Neilson et al., 1988) and inherent in the simulator. It is in agreement with the experimental findings of McRuer and Krendel (1974) expressed in the so-called crossover model of tracking. The observation that subjects respond to the integral of the error is important because it can be shown mathematically

Adaptive Optimal Control of Human Tracking

109

that inclusion of an integrator in the loop improves the steady state tracking accuracy of a feedback system (Nise, 1992).

In a step tracking task, for example, the integral

relationship between E and M R reflects the observation that subjects move the joystick with a series of intermittent corrections until the error is reduced to zero and the response cursor is brought into alignment with the target.

100

(a) ZERO O R D E R (gain equals one) TRACKING S Y S T E M . . . . . . . .

.

50

'.

0

'.

"".

t'~"

'

?.

"

.

-

"'.

~".

"

"

w

10

400

11

13

14

15

(b) FIRST O R D E R O n t o t F a t o r ) , v . 9

|

i "O

12

9 0

'.

"

"'"

-

_

"

16

17

TRACKING SYSTEM , , ,

-

18

-

"

19

, ....

.......-

-.

"

-

20

...

"

W200

10

11

12

13

14 15 16 TIME IN S E C O N D S

17

18

19

20

Figure 5. (a) Error (solid line) and rate of change of motor response (dotted line) plotted in arbitrary units against time in seconds for a subject performing a pursuit tracking task with a zero order (gain equals one) tracking system.. Co) Error (solid line) and motor response (dotted line) for same subject tracking same target signal with a first order (integrator) tracking system. Figure 5b shows tracings of E and MR recorded while the same subject performed the same tracking task with a first order tracking system equal to an integrator, H(s)=l/s. The tracings show that for this tracking system it is MR, and not the rate of change MR, that is proportional to a delayed version of E. In other words, the subject no longer responds to the integral of the error, but to the error itself. While this is a dramatic change in strategy it is not a necessary change.

Many technological feedback systems employ integral error

feedback when controlling dynamic systems.

Neither is it an arbitrary change. As

emphasized by the crossover model of McRuer and Krendel (1974), the system reorganizes itself so the relationship between E and R remains the same (equal to a gain, an integrator

110

P.D. NeUson, M.D. Neilson and N.J. O'Dwyer

and a time delay) despite the change in the dynamics of the tracking system. How can we reconcile these results other than by recognising that the subject has compensated for the integrator characteristics of the tracking system by incorporating a differentiator (i.e., the inverse of the integrator) into his tracking strategy? McRuer and Krendel (1974) give other examples. For instance, increasing the gain of the tracking system causes the subject to reduce gain, introducing phase lag causes the subject to introduce phase lead and conversely, introducing phase lead causes the subject to compensate with phase lag. Based on these observations we assert that the CNS strives to compensate for the dynamics of the tracking system by incorporating an internal model of the inverse dynamics. This is a key feature of AMT and is implemented in the simulator by the SA and RE systems. As we show subsequently, under some circumstances the CNS

introduces

only partial

compensation for the dynamics of the tracking system and we use the simulator to explore this phenomenon.

4.2 Speed of Adaptation An important aspect of tracking behaviour is the speed at which subjects compensate for sudden unexpected changes in the tracking system. For early work in this area using changes in gain or polarity see Young, Green, Elkind & Kelly (1964) and Elkind (1964), also discussions by Poulton (1974) and Sheridan & Fen'ell (1974). Figure 6 shows typical target and response signals recorded from 20 subjects during a gain-change step tracking task (Vu, 1993). The target jumped in a step fashion at random times between two fLxed positions on the screen. After a few minutes practice subjects became skilled at the task and could move the response cursor quickly into alignment with the target in a single movement. Without warning, the gain of the tracking system was unexpectedly increased or decreased by a factor of three. The tracings in Figure 6 show responses generated by two subjects immediately following an unexpected three fold increase in gain. Initially the subjects moved the joystick in exactly the same manner as for the previous responses. This caused the response cursor to move rapidly up the screen through a greater range than expected, overshooting the position of the target. The unexpected rapid movement of the response cursor was the first indication to the subject that the gain of the tracking system had increased. After a further reaction time interval subjects initiated a second response to correct the overshoot and return the response cursor into alignment with the target.

Adaptive Optimal Control of Human Tracking

111

Sometimes a third and fourth correction response was required to align the cursor with the target. The recordings demonstrate intermittency in response planning and illustrate the integrator action in error correction discussed earlier.

On all occasions, the second

response incorporated an adaptive compensation for the increased gain of the tracking system. If this were not so the cursor would have overshot the target by an even larger error in the opposite direction, leading to an unstable sequence of overshooting corrections.

SUBJECT ONE 16

6 10 r

I I I

10

i 8

t I

6

SUBJECT TWO 16

I I

6 I

l~ l l "~ l

_.,

--

'

, 4

i

I

/

4

t I I

II v

I

2

i I I

0 9 0 t l n ~ In sec

tlrne In sec

Figure 6. Step tracking responses from two subjects immediately following an unexpected three-fold increase in gain of the tracking system. Target(solid line) and deflection of response cursor (dotted line). The results show that the CNS partly adapts to a three fold change in gain of the tracking system within a reaction time after the change is detected. Likewise, the SA system in the simulator is driven by discrepancy between the expected response R' and the actual response R to partially adapt the internal model H= for use in the next operation of the RE system.

4.3 Sequence of Aimed Ballistic Responses Our simulation of tracking behaviour constructs the response from a sequence of 100150ms duration aimed ballistic submovements planned in terms of response cursor position. A recent experiment (Ho, 1994) tested this working hypothesis by segmenting

112

P.D. NeUson, M.D. Neilson and N.J. O'Dwyer

the responses generated by subjects operating with a variety of tracking system dynamics (gain, integrator, double integrator, second order lag and first order lead). Experimental data in Figure 7 were generated by a subject performing a two dimensional dual axis tracking task with integrators on both the X and Y axes. The solid line in Figure 7a shows the movement, M R, of the joystick in two dimensions while the solid line in Figure 7b shows the corresponding two-dimensional movement, R, of the response cursor on the display screen. The X and Y components of the experimentally measured M R and R were sampled at 1000Is.

MOTOR

RESPONSE

RESPONSE

800

500

600

400 300

400

200 200

.~ 1 O0

r

o

-200 -1 O0 -400

-200

-600

-800 -1000

-300

-500

0

X-axis

500

1000

00 I " 4 -~O0 X-axis

Figure 7. Movements of (a) the joystick and Co) the response cursor in two dimensions (Y versus X) during a dual axis tracking task with an integrator tracking system on each axis. In both graphs experimental data are shown with a solid line and theoretical data with a dotted line. In graph Co) the solid line falls exactly on top of the dotted line. Open circles correspond to 100ms intervals measured along pathways. Movement of joystick (motor response MR) is measured in arbitrary units while movement of response cursor (R) is measured in units corresponding to 1/40 cm.

In order to compare the experimental data with a simulated sequence of ballistic movements, the MR and R traces were segmented into lOOms intervals as indicated by the open circles in Figure 7. The position and the velocity of MR and R in both the X and Y directions at the beginning and end of each lOOms segment were measured and used as inputs to the S-function simulator. The S-function simulator produced an S-shaped lOOms

Adaptive Optimal Control of Human Tracking

113

duration ballistic movement connecting the initial position and velocity to the final position and velocity. This was done for each lOOms segment in the X and in the Y direction and for both the MR and R traces. The Y component was plotted against the X component for each segment to obtain the simulated ballistic movement in two dimensions. These were concatenated (dotted lines in Figure 7) and superimposed on the experimentally measured data (solid lines in Figure 7) for both MR (Figure 7a) and R (Figure 7b). As seen in Figure 7b, the S-curves closely match R even around sharp comers (the dotted line superimposes exactly on the solid line so only the solid line is visible in Figure 7b). However, in Figure 7a, the two-dimensional S-curves do not fit as well to the MR trace as they do to the R trace in Figure 7b. These data are consistent with the hypothesis that tracking responses are planned as a sequence of aimed ballistic movements and support planning based on movement of the response cursor R rather than on the associated movement of the joystick MR.

4.4 Influence of Target Prediction on Tracking Behaviour Using the mathematical theory of stochastic signals (Box & Jenkins, 1976), by filtering random numbers through 8th-order low-pass Butterworth digital filters with bandwidths ranging between 0.1 Hz and 3.9 Hz, we created a set of 20 target signals with different levels of predictability (Neilson et al., 1993). Altering the predictability of a stochastic signal corresponds to altering the bandwidth of its power spectrum. The 20 signals were used to investigate the influence of target signal predictability on the tracking behaviour of 6 subjects. The averaged behaviour of the subjects is compared with that of the simulator both with and without the target predictor functioning in Figure 8. The weights of all adaptive filters in the simulator, other than the T predictor, were clamped to their correct values. The planning time and the duration of each ballistic movement was set to 150ms. A cross correlation and spectrographic analysis was employed (Neilson et al., 1993) to obtain the gain, phase and remnant frequency response characteristics describing the relationship between T and R for all tracking runs (for both subjects and simulator). The gain and phase at all frequencies varied with the bandwidth of the target. For simplicity, only the gain and phase at a frequency of 1.0 Hz are presented in Figure 8. With the T predictor disabled, the simulator tracked the target signal with a gain close to unity and a

114

P.D. Neilson, M.D. Neilson and N.J. O ' D ~ e r

phase lag close to 120 degrees regardless of the bandwidth of the target (dotted lines in Figure 8). A phase lag of 120 degrees at 1.0 Hz corresponds to a time delay of 300ms, attributable to the sum of the planning time and the movement time set in the simulator. With the T predictor functioning, the relationship between T and R changed dramatically

(a) GAIN VS B A N D W I D T H , , 9

1

(b) PHASE VS B A N D W I D T H 0

9~;"

-20

0.9

... I

.40

0.8

_z

0.7

z

0.6 0.5

~

-1t~ "'........I ... :....... ~"

0.4 -120 0.3

I

o

o~ o B A N D W I D T H IN HZ

.

V

B A N D W I D T H IN HZ

Figure 8. (a) Gain of the T to R relationship at 1.0 Hz plotted as a function of the bandwidth of T. Co) Phase of the T to R relationship at 1.0 Hz plotted as a function of the bandwidth of T. Simulatorwith predictor disabled (dotted line). Simulatorwith predictor functioning (dashed line). Averageacross six subjects (solid line). (dashed lines in Figure 8). This change can be attributed entirely to the influence of target signal prediction because all other aspects of the simulation remained unchanged. With the predictor working, both the gain and phase lag varied systematically with the bandwidth of the target signal. For bandwidths less than 0.7 Hz, the gain was close to unity across the band and the phase lag was close to zero degrees. As the bandwidth of the target signal increased from 0.7 to 2.0 Hz, the gain decreased from near 1.0 to approximately 0.3 (dashed line in Figure 8a) and remained in that vicinity for higher bandwidths. Similarly, the phase lag increased from near zero degrees for low bandwidths to 120-130 degrees for bandwidths of 2.0 Hz or greater (dashed line in Figure 8b). The gain and phase describing the average behaviour of the six subjects (solid lines in Figures 8a and 8b) varied with the

Adaptive Optimal Control of Human Tracking

115

bandwidth of the target in a manner similar to that just described for the simulator with the T predictor functioning. In other words, the adaptive optimal predictor was effective in compensating for the 300ms time delay and reducing the average time lag of R behind T when the bandwidth of the target signal was less than 0.7 Hz.

It became progressively less effective as the

bandwidth of the target was increased. For bandwidths of 2.0 Hz and greater, the adaptive predictor was ineffective in reducing the average time delay below the inherent 300ms. Nevertheless, for large bandwidths, the predictor still influenced tracking behaviour because the gain was greatly reduced. The important point to emphasize is that the gain and phase of the six subjects varied with the bandwidth of the target in a manner comparable with the computer simulation when the T predictor was operative. The results provide compelling evidence in support of the view that the CNS strives to compensate for time delays in the system by predicting future positions of the target.

4.5 Influence of Disturbance Prediction on Tracking Behaviour

Alafaci (1992) investigated the bchaviour of six subjects performing compensatory tracking tasks with a zero order (gain equals one) tracking system. The disturbance signal was manufactured by filtering random numbers through an 8th-order Butterworth falter with a bandwidth of 1.0 Hz.

Results were compared with those from the computer

simulator performing the same tasks both with the D predictor functioning and with the D predictor disabled. The average gain and phase frequency response curves describing the relationship between D and MR across the six subjects are compared with those for the computer simulations (both with and without disturbance prediction) in Figure 9. Changes in the behaviour of the computer simulations under the two conditions can be attributed entirely to the influence of disturbance prediction. As seen in Figure 9, the D predictor introduces a characteristic modification o f the gain and phase curves similar to that observed for the T predictor during pursuit tracking (Neilson et al., 1993).

The gain

decreases across the band and shows a "bowl like" shape with a minimum at a frequency of 0.4-0.5 Hz followed by an increase in gain at higher frequencies (dotted line in Figure 9a). The phase lag and the slope of the phase versus frequency graph (dashed line in Figure 9b) are decreased at higher frequencies relative to the phase curve for the computer simulation with the D predictor disabled (dotted line in Figure 9b). The average gain and phase curves

116

P.D. NeUson, M.D. NeUson and N.J. O'Dwyer

for the six subjects (solid lines in Figure 9) show the same characteristic features as those attributable to the D predictor in the simulator (dashed lines in Figure 9). Alafaci (1992) concluded that human compensatory tracking behaviour is dominated by the influence of disturbance prediction.

Subjects attempt to compensate for their inherent reaction time

delay by predicting future values of the disturbance signal.

(a) GAIN VS FREQUENCY . . . . . . . . . . . . . . . . . ---.. . . . . . . . . . . . :..

~) PHASE VS FREQUENCY 0

,

9

9 ~ -;~ !

0.9 I

,,7

0.8 0.7

~-~

0.6 Z

o

..

% "...

_z

o.s ~

s

w

I

%

".. ".. '.

OA

0.3

-100

0.2 -120

o

o's

;

FR~OuE.cv ~ .z

-140 0

oi~ F~J~.CV

; ~ .Z

Figure 9. (a) Gain frequency response curve and (b) phase frequency response curve describing the disturbance D to motor response MR relationship during a compensatorytracking task (i) averaged across six subjects (solid lines) (ii) computer simulation with the D predictor disabled (dotted lines) and (iii) computer simulationwith the D predictor functioning (dashed lines).

5.

MV, GMV and GPC C O N T R O L L E R S

From the experiments described above we find that subjects performing visual tracking tasks quickly adapt their behaviour to compensate for the dynamics of the tracking system and for time delays within the loop. But it has been shown mathematically that a control system which (a) compensates for the dynamics of the controlled process by incorporating an internal model of the inverse dynamics and (b) compensates for time delays within the loop by predicting the inputs, is the most accurate possible controller in the sense that it reduces the variance of the error signal to a minimum. Such a controller is known as a minimum variance (MV) controller. It was first described by Astrom (1970). The idea of

an adaptive MV controller was introduced by Astrom and Wittenmark (1973) and was

Adaptive Optimal Control of Human Tracking

117

given the name self-tuning regulator. Our previous assertion (Neilson et al., 1992) that in performing a visual tracking task the CNS functions like an adaptive optimal MV controller is consistent with the foregoing experimental observations.

Indeed, the

computer simulator can be described as an adaptive MV controller. It has been recognised in control theory literature, however, that there are practical limitations to the implementation of MV control.

When controlling a long lag system

whose gain decreases rapidly with increasing frequency (a property of all inertial systems), the MV controller compensates by introducing an increasing gain with increasing frequency.

Consequently, the MV controller demands excessive input energy at high

frequencies and can cause the input signal to saturate or exceed its maximum value. The MV controller cannot be employed at all for a class of controlled systems, known as nonminimum phase systems, whose inverse dynamics are unstable. For such a non-minimum phase system the MV controller would be unstable. To overcome these problems with the MV controller, Clarke and Gawthrop (1979) introduced the idea of a generalized minimum

variance (GMV) controller. In the design of a GMV controller a compromise is introduced between the variance of the error signal and the variance of the input signal or generalized input energy. Tracking accuracy is deliberately sacrificed in order to reduce demand on input energy. This design philosophy is known as linear quadratic optimal control because it requires the minimization of a quadratic cost function involving a linear combination of error variance and input variance. The particular linear combination chosen specifies the compromise between error variance and input variance. The result of this compromise is effectively to detune the internal model of the inverse dynamics of the controlled system by reducing its bandwidth and reducing its gain and phase lead, particularly at high frequencies (Isermann, Lachmann & Matko, 1992). In this way it is possible to reduce the excessive input energy demands of the MV controller and to stabilize control of non-minimum phase systems. An extension of the GMV design involves predicting future values of the target signal and then, using a procedure known as receding horizon control, computing an optimal response to move the system from an initial dynamic state to a future dynamic state. Known as the

generalized predictive controller (GPC), this was introduced by Clarke et al. (1987) and improves performance of a wide class of control systems, particularly those with unknown time delays.

118

P.D. NeUson, M.D. Neilson and N.J. O'Dwyer

We have already argued that the human CNS carrying out a tracking task shows all the features of an MV adaptive optimal controller. The question now arises as to what extent it can also handle the practical limitations of this type of controller. Does it compensate for long lag and non-minimum phase tracking systems and if so, does it do it in the manner of a receding horizon or GPC controller? To test this we have experimented with the behaviour of human subjects performing tracking tasks with both long lag and non-minimum phase tracking systems.

5.1 Influence of a Long Lag System on Tracking Behaviour Sriharan, at our laboratory, is currently investigating the behaviour of subjects performing tracking with a variety of different bandwidth tracking systems.

Although

subjects increase gain and introduce phase advance to compensate for the decreased gain and phase lag introduced by the various filters, under some circumstances, only partial compensation is achieved. This phenomenon is illustrated in Figure 10. Responses in Figure 10a were generated by a subject and the simulator performing a pursuit tracking task using a zero order (gain equals one) tracking system. The target signal was manufactured by filtering random numbers through an 8th-order low-pass Butterworth digital filter with a bandwidth of 1.5 Hz. The response waveform generated by the subject is similar in timing and amplitude to that generated by the simulator. Responses in Figure 10b were generated by the subject and the simulator tracking the same target using a second order long lag tracking system. The long lag tracking system had a resonant peak at 0.75 Hz and a phase lag which increased to 100 degrees by 1.5 Hz. The traces in Figure 10 show that the simulator produced identical tracking responses regardless of whether it was tracking with a zero order tracking system or a long lag second order tracking system. This is as expected because the simulator functions like an MV controller and compensates exactly for the dynamics of the tracking system by introducing an accurate internal model of its inverse dynamics.

However, examination of Figure 10b shows that the subject only

partially compensates for the long lag system. The response waveform is smoothed and shows increased phase lag relative to the waveform produced by the simulator. The subject has only partially compensated for the attenuation and phase lag introduced by the long lag filter.

Adaptive Optimal Control of Human Tracking

119

(a) ZERO ORDER TRACKING SYSTEM (gain equals one) , , 1

400

Wj 200 I

," "~ ,~ "" ~

o

."" ."/"

...... "-

- x "'-. % "".

"~"-._-. '- .....

........

IX.

I ..'" a -2OO .... ( '

4000

400 .

"='

0.5

L 2~176 /F. \

0

I 1.5

I 2

2.5

(b) 2nd ORDER LONG LAG TRACKING SYSTEM , , ,

/

IS

G. ~.~

1

,"

....'" ~

......

__.

~

",.. "'"

,

0.5

9. . . . . . . . : -

9........ '.,'\

.....:, "

"'-~.~

i

...

0

~'.,"

1 1.5 TIME IN SECONDS

2

2.5

Figure 10. (a) Comparison of subject (dashed line) and simulator (dotted line) tracking a target (solid line) using a zero order (gain equals one) tracking system. (b) Comparison of subject (dashed line) and simulator (dotted line) tracking the same target (solid line) using a second order long lag tracking system. A zoom onto a 2.5s segment of a one minute duration test is shown. 5.2 Influence of a N o n - M i n i m u m Phase System on Tracking Behaviour The gain and phase frequency response characteristics for the open loop E to M R relationship were measured for 10 subjects performing pursuit tracking (i) with a zero order (gain equals one) tracking system and (ii) with a second order non-minimum phase tracking system with a pulse transfer function H(z -1) = (-.5 + .4z -I + . 1438z2)/(1 - 1.4z 1 + .5z -2) The average gain and phase frequency response curves were compared with the corresponding gain and phase curves for an MV controller (Tang, 1994). Subjects fred it difficult to track with the non-minimum phase system and their performance is inferior to that with the zero order system (rms error ratio = 2.9).

Nevertheless, they are able to

perform the task and their behaviour is not unstable as it would be if they behaved like an MV controller.

The mean gain and phase frequency characteristics of the E to MR

relationship averaged over the 10 subjects for both the zero order tracking system and the non-minimum phase tracking system are compared with the corresponding curves for an MV controller in Figure 11.

120

P.D. NeUson, M.D. Neilson and N.J. O'D~er In Figure l l b it can be seen that during non-minimum phase tracking subjects introduce

phase advance into the E to MR relationship (dashed line) relative to the phase when tracking with a zero order tracking system (solid line).

However, with respect to the

amount of phase advance introduced by the MV controller (dotted line), subjects only (a) GAIN (E to Mr) 12

40

(b) PHASE (E to Mr) ,

9149 .." ..

Z

:< 6 CO

-..

"i-i.i.i.i9 ols

FREQUENCY IN HZ

-00

20

0

0.5 FREQUENCY IN HZ

1

Figure 11. (a) Gain versus frequency and (b) phase versus frequency for the open-loop error E to motor response MR relationship during a pursuit tracking task. The solid lines represent averaged data across 10 subjects using a zero order (gain equals one) tracking system. The dashed lines represent averaged data across the same 10 subjects using the non-minimumphase tracking system. The dotted lines represent the gain and phase required for perfect compensation of the non-minimumphase tracking system.

partially compensate for the large phase lag introduced by the non-minimum phase system. Similarly, as shown in Figure 1 la, subjects introduce less gain (dashed line) than required to compensate for the non-minimum phase system (dotted line).

Clearly, subjects only

partly compensate for the dynamics of the non-minimum phase system and sacrifice tracking performance, as measured by rms error, to stabilize control of the non-minimum phase system.

5.3 The CNS as an Adaptive Optimal Controller In the above experiments we observed that when tracking with a long lag or nonminimum phase system, the behaviour of human subjects deviates in a systematic way from that of an adaptive MV controller (as programmed into the simulator). Subjects only partly compensate for the reduced gain and phase lag introduced by long lag and non-

Adaptive Optimal Control of Human Tracking

121

minimum phase tracking systems. We suggest that the adaptive behaviour of subjects performing tracking tasks can be simulated by the recently developed GPC or receding horizon adaptive optimal controller (for discussion of tracking behaviour in terms of earlier theories of adaptive optimal control see Young & Stark, 1965; Kleinman, Baron & Levison, 1970; Baron, Kleinman & Levison, 1970; Sheridan & Ferrell, 1974). In other words, we propose that in performing a tracking task subjects can vary their behaviour by altering the compromise between tracking accuracy and demand for input energy. In the next section we will explore the idea that this compromise is no more than the well known speed-accuracy trade-off. This trade-off underlies Fitts' law and is also incorporated into optimization models of reaching based on minimization of mean square acceleration or mean square jerk or some other closely related property (Agarwal, Logsdon, Corcos & Gottlieb, 1993; Flash & Hogan, 1985; Hasan, 1986; Hogan, 1984, 1988; Meyer, Abrams, Kornblum, Wright & Smith,1988; Nelson, 1983).

6.

ACCURACY-ENERGY TRADE-OFF

In this section we explore a method by which GPC or receding horizon adaptive optimal control can be implemented within the framework of AMT. We will show that by altering the duration of R* preplanned during each BUMP, the RP system can trade variance of the error signal against variance of the joystick movement during tracking. In other words, it can deliberately sacrifice tracking accuracy in order to conserve input energy. The notion that the duration of R* can be altered does not seem unreasonable since it is known that subjects can alter the duration of aimed reaching movements. Indeed, we argue that the accuracy-energy trade-off observed during visual tracking and the speed-accuracy trade-off observed during aimed reaching are both manifestations of the same underlying mechanism, namely, alteration of the duration of R*.

6.1 Theory of Minimum Mean Square Acceleration Trajectory The proposal developed here is that the RP system includes an optimal trajectory

generator which can be taken as an extension of the S-function generator described previously. Given the initial state, duration and final state, the generator produces an optimum trajectory R* with minimum mean square acceleration. It should be pointed out that for an inertial system, minimization of mean square acceleration is equivalent to

122

P.D. Neilson, M.D. NeUson and N.J. O'Dwyer

minimization of input energy. Furthermore, although the theory is developed in terms of minimizing mean square acceleration, from an algebraic point of view, apart from the inclusion of an additional integral, the derivation is exactly equivalent to the problem of minimizing the mean square jerk (see Flash & Hogan, 1985; Nelson, 1983). We conceptualize the problem of computing a trajectory with minimum mean square acceleration as equivalent to the problem of controlling a double integrator system driven by a zero order hold (ZOH) sampled input, as illustrated in Figure 12.

u(k) I

cLcceter'c~tion velocity u(• x2(t)h~I

-~1 ZE]H

~" -~

,!111. sampted signcl[

position xt(•

ID ,

ZEIH sign&[

,

~1~ x2(k)

xl(k)

Figure 12. Discrete-timeequivalent of double integrator system. The discrete-time (sampled) input signal u(k) is transformed into a ZOH continuous-time signal u(t) by the zero order hold (ZOH). The continuoustime velocityx2(t) and position xl(t) signals are transformed into discrete-time signals x2(k) and xl(k) by the analog to digital converters (A-D). The vector x(k) = [xl(k) x2(k)]~ is the state of the system at sample k. The sampling frequency is 20 Hz and the width of each rectangular pulse is 50ms. Using the discrete-time signals xl (k), x2(k) and u(k) defined in Figure 12 and corresponding respectively to position, velocity and acceleration, the optimal trajectory problem is presented graphically in Figure 13. Given the initial state x(0) at sample k=0, the duration N and the final state x(N) at sample k=N, the problem is to compute the optimal trajectory R*=x(1),x(2) ..... x(N- 1) such that the cost function

J=2

1 N ~ lu2(k ) k=0

(1)

is minimized. Simultaneously, the equations of motion of the double integrator system must be satisfied. This is known as a constrained minimization problem and mathematical theory related to its solution can be found in texts concerned with the theory of optimal

Adaptive Optimal Control of Human Tracking

123

control (e.g., Lewis, 1992). In what follows we will solve the problem using state space theory of discrete-time dynamic systems.

DPTIMAL

initial

DESIRED

I I I I I I I I

state

x(O) = xl(O)

x2(o)

RESPr1NSE TRAJECTDRY

I I I I I I 'x(), x(~)~ /

i IJ.A

R~

T I final state x(N)

=

xl(N) x2(N)

x(o) /

0

1

2

3 4

'

.

.

.

.

.

.

N-1N

sanpl,e nur~ber k

Figure 13. Illustration of optimal desired response trajectory R* interconnecting an initial state x(0) and a final state x(N). The problem is to compute the trajectory x(1),x(2).....x(N-1) so as to minimize the mean square acceleration.

Firstly we set up the state equations for the continuous-time double integrator system shown in Figure 11.

22

0

1

xl

0

0

x2 + 1

0

u(t)

(2)

By integrating over a 50ms sample interval we obtain the ZOH discrete-time equivalent equations

xl(k + 1) x 2 ( k + 1)

or

.00125 1 .05[I xl(k) 0 1 x2(k) + .05

x(k + 1) = Gx(k) + Hu(k)

u(k)

(3)

(4)

124

P.D. NeUson, M.D. NeUson and N.J. O'Dwyer

The aim is to compute the optimal trajectory x(k) illustrated in Figure 13 such that the cost function J given in (1) is minimized subject to the restraining equation (4).

Using the

method of Lagrange multipliers, set up the unconstrained cost function

Jl=

N-1 1 ,Y_, { ~- u E ( k ) + ~ , T ( k + l ) ( - x ( k + l ) + G x ( k ) + H u ( k ) ) } k=O

(5)

where ~T (k + 1) is the transpose of the vector of Lagrange multipliers. To minimize J1, differentiate with respect to u(k), ~(k + 1) and x(k) and equate each of the differentials to zero.

J1

~gu(k)

- u(k)+ HT~,(k + 1) = 0

.'. u(k) = - H T ~ ( k + 1)

(input equation)

(6)

(system equation)

(7)

(costate equation)

(8)

J1 - - ~ = -x(k + 1) + Gx(k) + Hu(k) = 0 0 k(k+l)

9 x(k + 1) = Gx(k)+ Hu(k)

~9 J1

ax(k)

-

),,(k) + G T~,(k + 1) = 0

9 ~,(k) = GT~,(k + 1)

Equations (6), (7) and (8) are solved simultaneously to obtain the optimal solution. Equation (8) describes the free motion of the costate system running backwards in time. By solving (8) we can obtain an expression for k(k + 1) which can then be substituted in (6) and (7).

Adaptive Optimal Control of Human Tracking

~,(k + 1) = G T ( N - k - 1)~,(N )

125 (9)

where ~(N) is the required start,up value for the backward recursion, yet to be evaluated. Substituting (9) into (6) gives

u(k) = - H T G T ( N - k - 1)~,(N )

(10)

Substituting (10) into (7) gives

x(k + 1) = Gx(k) - H H T G T ( N - k - 1)~,(N )

(11)

Equation (11) can be solved reeursively to obtain x(k)=Gkx(O) -

k-1 ~

GJHHTG T(N-k

+ J)),,(N)

j=0

where

= Gkx(0) - F(0,k)k(N)

(12)

k-1 F(0,k) = ~ G J H H T G T ( N - k + j)

(13)

j=O

is known as the discrete-time Grammian. Letting k = N in (12), we can obtain an expression for ~(N).

x(N) = GNx(0) - F(0, N)~(N) 9 ~(N)=_F-I(0,N){

x(N)-GNx(0)}

(14)

Finally, substituting (14) into (12), we obtain an expression for the optimal trajectory expressed in terms of known values x(0), N and x(N).

Ix(k) = Gkx(0)+ F(0,k)F-I(0,N){ x ( N ) - G N x ( 0 ) } ]

(15)

126

P.D. NeUson, M.D. Neilson and N.J. O'D~er

The specified values x(0), N and x(N) are applied as inputs to (15) and the optimal trajectory x(1),x(2) ..... x(N-1) is obtained as the solution. The first term Gkx(0) on the fight hand side of (15) describes the free motion of the double integrator system responding to the initial state x(0). For example, if the initial velocity x2(0) is non-zero, the position signal xl(t) will continue to change with time even when no acceleration force is applied at the input. This is simply a mathematical representation of Newton's first law that a body continues in a state of rest or uniform motion in a straight line unless acted on by an external force. The matrix G describes the free motion of the state of the system over a single sample interval of 50ms. The second term F(0,k)F-I(0,N){ x ( N ) - G N x ( 0 ) } on the fight hand side of (15) describes the response to the input accelerations u(k). Basically, this is a statement of Newton's second law of motion, force equals mass times acceleration. However, by employing the discrete-time Grammian F(0,k) we compute the response to the specific input accelerations u(k) which minimize the mean square acceleration over the duration of the trajectory. Equation (15) implies that the optimum trajectory R* is the sum of the free motion of the double integrator plus its response to the optimal input accelerations. In the next section we will show that (15) represents a design equation for an optimal trajectory generator.

6.2 Design of Optimal Trajectory Generator It is valuable to discover from equation (15) that each element in the optimal trajectory R* is obtained by a simple matrix transformation of the initial and final states x(0) and x(N). Such a transformation can be implemented by adaptive filter circuits of the type described previously for the SA and RE systems (Neilson et al., 1992). Thus we find that SA, RE and RP systems can all be constructed with similar parallel processing circuitry. Moreover, this circuitry consists of a simple computational module repeated many times in parallel. This is consistent with the uniformity of neural circuitry within the CNS. A block diagram of a parallel processing circuit able to implement equation (15) for duration N--4 is illustrated in Figure 14a. Each block represents a simple 2 by 2 matrix transformation as illustrated in Figure 14b. The weights of the Grammian matrices F(0,k) are adapted when the duration N of the optimal trajectory alters. Otherwise, the circuit accepts inputs x(0)

127

Adaptive Optimal Control of Human Tracking and x(N) and transforms them into the optimal trajectory x(1),x(2) ..... x(N-1).

The

transformation takes no more time than required for the signals to flow through the circuit.

(a)

G'x(+ dx(+ dx(+

x('l) x (4) . - - - - J

x(2) x (3)

(b)

ul ul ""

gll ,,.~

u2

g l l ~ ( ~ ~ ~ gr1~2 r ~

u2

y2

Figure 14. (a) Block diagram for an N---4optimal trajectory generator. Each block represents a 2x2 matrix transformation. (b) Three equivalent representations of a 2x2 matrix transformation performed by a parallel processing circuit.

We have modified the RP system in the computer simulation to include an optimal

trajectory generator in place of the S-function generator described earlier. When the duration of R* is set to two sample intervals (lOOms), the trajectory generated by the new circuit exactly matches that produced by the S-function generator.

However, in the

modified RP system, the duration of R* can be altered. We will show in the next section that this allows the simulator to function as a receding horizon adaptive optimal controller.

6.3 Receding Horizon Adaptive Optimal Control Because of the parallel processing architecture described above, the RP system in the simulator can generate R* in two sample intervals. The first sample interval is required to read "F, I) and I~ from memory (see previously), compute x(0) and x(N) and apply them as inputs to the optimum trajectory generator. The second sample interval is required for the optimum trajectory generator to generate R* and write it into memory.

In other

words, the RP system requires only two sample intervals (lOOms) to generate a desired

128

P.D. NeUson, M.D. Neilson and N.J. O'Dwyer

response trajectory R* that may require a second or more to execute. We refer to this as

planning in accelerated time and contend that it is this feature which gives the system the capability of functioning as a receding horizon controller. During each BUMP in the simulated tracking task the RP system plans an R* to move the response cursor into alignment with the predicted position and velocity of the target a chosen distance ahead in time, known as the prediction horizon. The prediction horizon can be varied by the input N to the optimum trajectory generator. Although it may extend for a second or more ahead, only the initial two samples of R* are actually executed by the RE system. By then the RP system has had sufficient time to read in updated predictions and to compute an entirely new R* to replace the first. Thus the tracking response generated by the simulator consists of a concatenated sequence of lOOms submovements. However, each submovement is only the first two samples of a longer duration minimum energy trajectory planned to a prediction horizon. A similar strategy has been analyzed in control theory literature (see Bitmead et al., 1990). It is known as receding horizon LQ control. It forms the basis of a control strategy discussed earlier known as GPC (Clarke et al., 1987) which now enjoys marked success in applications of computer adaptive control. Stability and performance properties have been analyzed. Results show that receding horizon controllers can be designed with guaranteed asymptotic closed-loop stability (Bitmead et al., 1990).

Considering the difficulty

encountered by classical feedback control techniques in stabilizing multi-input systems, this is a remarkable property. The longer the duration of R* (i.e., the further ahead in time the prediction horizon), the more slowly the response cursor is brought into alignment with the target. Consequently, the error is reduced slowly and the variance of the tracking error signal is large. On the other hand, the shorter the duration of R*, the greater the accelerations and the larger the forces required to move the predominantly inertial system. By increasing the speed of R*, the RP system can improve tracking accuracy, but only at the cost of increasing the input energy. Conversely, the RP system can conserve energy and improve closed-loop stability by increasing the duration of R*. But this is achieved at the expense of reduced tracking accuracy. Thus by changing the duration of R*, the RP system can alter the compromise between tracking accuracy and input energy. It is interesting to notice that in many everyday activities, such as walking and talking, the muscles typically

Adaptive Optimal Control of Human Tracking

129

operate at contraction levels less than 10% of maximum, suggesting an energy conserving mode with reserves held for situations of high demand. The mathematical expression of this compromise as stated in optimal control theory is the linear quadratic (LQ) cost function J given by oo

J = ~{ e2(t)+pu2(t)} dt 0

(16)

where e(t) is the error signal, u(t) is the input signal and p is the scalar that sets the compromise between error variance and input energy. This is the basis of the optimization strategy incorporated in the GPC controller discussed earlier. With respect to tracking behaviour it can be written as oo

J = ~{E2+pMR 2} dt 0

(17)

where E is the tracking error, MR is the motor response and p is as above. We saw previously that when subjects perform visual tracking with long lag tracking systems or with non-minimum phase tracking systems, they only partly compensate for the increased gain and phase lag introduced by the tracking system. For the case of nonminimum phase systems this would seem to be attributable, at least in part, to a detuning of the internal model of inverse dynamics, otherwise the system would be unstable. However, we have shown previously (Neilson et al., 1988) that increasing the duration of R* has the effect of introducing a low-pass filter into the loop. Thus increasing R* also results in only partial compensation for the gain and phase lag of the tracking system. We suggest that both mechanisms are operative and are part of the optimal control strategy employed by the CNS.

7.

GENERALIZATION OF TRACKING TO OTHER MOTOR BEHAVIOUR

In previous sections we discussed AMT with respect to the performance of a visual tracking task. However, we contend that the same central mechanisms are involved in the performance of all purposive goal-directed movements. In self-paced tasks we see the output from higher level processing as consisting of a multidimensional target trajectory of high level sensory feature signals.

In speech control, for example, the target

waveforms would consist of a multidimensional trajectory of acoustic features generated

130

P.D. NeUson, M.D. NeUson and N.J. O'Dwyer

by higher levels such as semantic, syntactic and phonological processing. As proposed in AMT, the internally generated target waveforms are then tracked by the CNS using the same mechanisms described above. corrected.

Execution errors are detected and intermittently

R* is planned in terms of the same high level sensory feature signals and

transformed into motor commands by an adaptive internal model of inverse dynamics. It follows that central planning can take place independently of the effectors selected. In light of these comments it is important to emphasize that the RP system is regarded as the lowest level in a hierarchical structure of response planning. Most responses are planned on the basis of a hierarchical structure of long-term to short-term goals. Such higher level cognitive processing is not observable in tracking data. Although we can identify the performance characteristics of the perceptual-motor loop, we cannot observe processes involved in, for example, motivating the subject to perform the task. Nevertheless, as described in the next section, aspects of AMT can be verified using tasks other than visual tracking.

7.1 Intermittent Submovements in Handwriting In Figure 15 we show the letter 'b' written by a subject on a computer bit-pad using an electronic pen (Lui, 1993). The X and Y coordinates of the pen-point were sampled at 145/s. The X versus time and Y versus time plots are shown as a solid line and a dashed line, respectively, in Figure 15a. These data were processed in the same way as described earlier for tracking data. The plots were sectioned into 100ms intervals as shown by the open circles. The position and velocity of the pen in both the X and Y directions at the beginning and end of each 100ms interval were measured. The optimal trajectory pattern generator described in equation (15) was used to generate a 100ms duration continuoustime optimal trajectory for each interval.

The optimal trajectories generated by the

computer were superimposed as dotted lines on the experimental data in Figure 15a. The Y versus X displacement of the pen-point across the bit-pad during writing of the letter 'b' is compared with the concatenated sequence of 100ms duration ballistic movements in the X and Y directions produced by the optimal trajectory generator in Figure 15b.

The

simulated data closely approximate the actual trajectory of the pen point across the bit-pad even during rapid changes in direction such as the loop at the top of the 'b'. The data are consistent with the hypothesis that handwriting movements are comprised of a

Adaptive Optimal Control of Human Tracking

131

concatenated sequence of aimed ballistic submovements each with a duration of about lOOms.

X & Y vs TIME

Y vs X

3.5

4

o= Z3

2.5

i2

1.5 ~-...

9

~

J

1! 0.51

~,

0 -0.5 -1 0

0.2

0.4 0.6 TIME IN SEC

0.8

0 2 4 X DISPLACEMENT IN CM

Figure 15. Movementsof the pen-point in the X and Y directions during writing of the letter 'b' compared with a sequence of lOOms S-shaped ballistic movements fitted to the experimental data by the optimum trajectory generator in the simulator. (a) X component (solid line) and Y component (dashed line) compared with theoretical S-shaped trajectories (dotted lines) plotted against time in seconds. The circles indicate lOOms intervals. (b) Y versus X displacements of the pen-point during handwriting of the letter 'b' (solid line) compared with the sequence of lOOms duration ballistic movements produced by the optimum trajectory generator in the simulator (dotted line). The circles correspond to lOOms intervals along the path.

7.2 Optimal Trajectory During Reaching The bit-pad and electronic pen described above were employed to study the kinematics of subjects drawing diagonal lines from a start point to targets of various sizes and distances (Gow, 1994). Displacement, velocity and acceleration traces in both the X and Y directions were computed and plotted as a function of time.

The optimal trajectory

generator described in equation (15) was employed to generate theoretical trajectories with the same initial position and velocity, duration and final position and velocity as the experimental curves. As shown in Figure 16, the theoretical trajectory provides a close fit

132

P.D. Neilson, M.D. NeUson and N.J. O'Dwyer EXPERIMENTAL DATA & OPTIMUM TRAJECTORY 9000

,

w

i

1

,

,

i

|

,

w

w

7000

.ll

~sooo

,ooo

uJ

x

1000

~

20

40

I

i

60

80

100 120 TIME In msec

i

i

|

140

160

180

Figure 16. X-displacement in pixels (1 pixel = 0.1ram) versus time in milliseconds as subject draws a 10cm diagonal line to a 16ram diameter target in 200ms (solid line). The dotted line is the corresponding theoretical R* trajectory produced by the optimum trajectory generator in the simulator.

to the experimental data for movements with durations less than 200ms. For longer duration movements the experimental curves often showed multiple peaks in the velocity and acceleration traces and deviated from the theoretical curves. This is consistent with the idea of intermittency since the first correction movement would be initiated one reaction time interval after the onset of the movement. The excellent fit between experimental and theoretical data (iUustrated in Figure 16) holds true for short duration (less than 200ms) movements across a range of velocities and amplitudes in both the X and Y directions (Gow, 1994). Each trajectory is designed to move between specified initial and final dynamic states in a given time. corresponds to the position of the target. trajectory is minimized. energy.

The final state

The mean square acceleration of the planned

For an inertial system, this corresponds to minimum muscular

By varying the duration of the reaching movement subjects vary the speed-

accuracy trade-off inherent in Fitts' law.

They also vary the amount of input energy

required. Execution errors causing the movement to deviate from the intended trajectory

Adaptive Optimal Control of Human Tracking

133

are corrected intermittently at planning time rates but the first correction movement is initiated one reaction time interval after the onset of the movement. 45

Simulation of Target Perturbation Experiment , ,

40

:f"

"7-----

t

35

,,"

3O

,- 25 "i 20

0

0

~

0.5 Time in secs

Figure 17. Simulation of target step perturbation experiment using a lOOms intermittency interval in response planning and a duration of 300ms in the optimum trajectory generator. Dashed lines show target and response for the unperturbed target simulation. Solid lines show target and response for the perturbed target simulation.

In Figure 17 we present a simulation of the target perturbation experiment (Pelisson et al., 1986) mentioned earlier. The data were used by Hoff and Arbib (1993) to argue in support of continuous, albeit delayed, feedback correction during reaching and against the idea of intermittency. In the simulation shown, the AMT model employed an intermittency interval of lOOms and the duration of the OTG was set to 300ms. The target simulates a stepwise jump (dashed line in Figure 17) to a distance of 40cm from the start position. The simulator generates a response (dashed line in Figure 17) consisting of a sequence of lOOms duration submovements which converge smoothly to the target. In a second simulation, the target is perturbed by a further step of 4cm to a distance 44cm from the start position at the onset of the response (solid line in Figure 17).

134

P.D. NeUson, M.D. Neilson and N.J. O'Dwyer Despite intermittency in response planning, the simulated response (solid line in Figure

17) to the perturbed target undergoes a smooth transition from one trajectory to the other without introducing discontinuities or secondary peaks in the velocity. This simulation shows that demonstrations of smooth transitions, per se, are insufficient to dismiss the existence of intermittency in response planning. Since the RP system hypothesized in AMT preplans desired response trajectories to move between an initial dynamic state and a final dynamic state, it can generate bump free transitions between submovements. The peaks and troughs most commonly observed in velocity and acceleration prof'des of human movements can be attributed to other influences, such as 'slowing around sharp corners' as observed in hand writing movements (Figure 15). Discontinuities caused by intermittency in response planning become obvious only when there are errors in response execution due to disturbances or model inaccuracies, as illustrated by the gain-change step tracking responses in Figure 6.

7.3 The Value of Computational Models It might be argued that insufficient neurobiological data are available to presently justify construction of a simulation of the information processing performed by the CNS during control of movement. On the other hand, if we construct a computational model which simulates observed motor behaviour and is consistent with known neuroanatomy and neurophysiology, then we argue that this is an advance, for then we know at least one solution as to how it can be done. Even if the brain turns out to operate differently, a computational model provides a means of suggesting and checking theories about the neurobiological processes involved (Gregory, 1978) and can therefore be regarded as an hypothesis generating experimentally testable predictions. One encounters the limitations of the so-called 'black box' phenomenon when interpreting experimental data. No more can be learned about the structure and function of the system under study beyond the information contained in the experimental measurements.

Even if the input-output

relationship of a dynamic system is correctly identified, this tells us little about the internal structure of the system. We still do not know what is contained within the 'black box'. Indeed, simply by transforming the coordinates describing the internal state variables of the system it is possible to generate an infinite number of internal structures all consistent with the measured input-output relationship. Of course, if we could open the box and make

Adaptive Optimal Control of Human Tracking

135

measurements of internal variables we would improve our knowledge of the system, however, the problem of identifying the internal structure of the subsystems so created remains a 'black box' problem. In the case of the tracking data presented above we take an extreme position with respect to the internal neurophysiological mechanisms involved because we measure a gross overall relationship between input to the eye and output at the hand. Nevertheless, we can assert that whatever physiological mechanisms are involved they must be consistent with the observed motor behaviour.

The data can be used

therefore to eliminate many hypothetical proposals. To implement a computer simulation of a human subject performing a tracking task it is necessary to realize a specific internal structure from the large number of possible internal structures that could emulate the input-output behaviour. All of the structural features we have built into the simulator represent hypotheses in AMT about the actual structure and function of information processing within the CNS.

The behaviour of human subjects

revealed by tracking experiments so far are consistent with these hypotheses. We regard the skepticism of some about the value of computational models in furthering understanding of information processing within the brain as unfounded. The data presented in Figure 9 provide an excellent illustration of how computational models can facilitate the quest for understanding.

Unlike in human subjects, the disturbance predictor in the

simulator can be switched on and off.

Consequently, it is possible to show that the

characteristic bowl shape in the gain curve and the knee in the phase curve of the disturbance to motor response relationship measured in human subjects can be attributed entirely to the influence of disturbance prediction. The computer model is nothing more than an unambiguous statement of the hypotheses of AMT and, driven by experimental investigation, we expect its structure to evolve to incorporate adaptive modelling of multivariable, nonlinear processes involved in muscle control systems and biomechanics.

REFERENCES

Agarwal, G.C., Logsdon, J.B., Corcos, D.M., & Gottlieb, G.L. (1993). Speed-accuracy trade-off in human movements: An optimal control viewpoint. In K.M. Newell & D.M. Corcos (Eds.), Variability and motor control (pp. 117-155). Champaign: Human Kinetics Publishers.

P.D. Neilson, M.D. NeUson and N.J. O'Dwyer

136

Alafaci, M. (1992). Identification of the human operator functioning as an adaptive self-

tuning regulator. Master of Engineering Science Thesis, School of Electrical Engineering, University of New South Wales, Australia. Arbib, M.A., (1972). The metaphorical brain. An introduction to cybernetics as artificial

intelligence and brain theory. New York: Wiley. Arbib, M.A., & Hoff, B. (1994). Trends in neural modeling for reach to grasp. In K.M.B. Bennett & U. Castiello (Eds.), Insights into the reach to grasp movement (pp. 311344). New York: Elsevier. Astrom, K.J. (1970). Introduction to stochastic control theory. New York: Academic Press. Astrom, K.L, & Wittenmark, B. (1973). On self-tuning regulators. Automatica, 9, 185199. Baron, S., Kleinman, D.L., & Levison, W.H. (1970). An optimal control model of human response. Part II: Prediction of human performance in a complex task. Automatica, 6, 371-383. Bitmead, R.R., Gevers, M., & Wertz, V. (1990). Adaptive optimal control The thinking

man's GPC. Englewood Cliffs, NJ: Prentice Hall. Box, G.E.P., & Jenkins, G.M. (1976). Time series analysis: Forecasting and control San Francisco: Holden-Day. Clarke, D.W., & Gawthrop, P.J. (1979). Self-tuning control Proceedings of the IEEE,

123, 633-640. Clarke, D.W., Mohtadi, C., & Tufts, P.S. (1987). Generalized predictive control: Parts I and II. Automatica, 23, 137-160. Craik, K.J.W. (1947). Theory of the human operator in control systems. I. The operator as an engineering system. British Journal of Psychology, 38, 56-61. Craik, K.J.W. (1948). Theory of the human operator in control systems. II. Man as an element in a control system. British Journal of Psychology, 38, 142-148. Craik, K.J.W. (1966). The mechanisms of human action. In S.L. Sherwood (Ed.), The

nature of psychology (A selection of papers, essays and other writings by the late Kenneth J.W. Craik). Cambridge: Cambridge University Press.

Adaptive Optimal Control of Human Tracking

137

Elkind, J.I. (1964). A survey of the development of models for the human controller. In R.C. Langford & C.J. Munro (Eds.), Progress in astronautics and aeronautics (pp. 623-643). New York: Academic Press. Flash, T., & Hogan, N. (1985). The coordination of ann movements: An experimentally confirmed mathematical model. Journal of Neuroscience, 5, 1688-1703. Freund, H.J. (1983). Motor unit and muscle activity in voluntary motor control.

Physiological Reviews, 63, 387-436. Ghez, C. (1991). Muscles: Effectors of the motor system. In E.R. Kandel, J.H. Schwartz & T.M. Jessell (Eds.), Principles of neural science. Third Edition. New York: Elsevier. Gottlieb, G.L., Corcos, D.M. & Agarwal, G.C. (1989). Strategies for the control of voluntary movements with one mechanical degree of freedom. Behavioral and Brain

Sciences, 12, 189-250. Gow, S.M. (1994). Computer analysis of hand-writing movements. Final Year Thesis, School of Electrical Engineering, University of New South Wales, Australia. Gregory, R.L. (1978). Eye and brain. The psychology ofseeing. Third edition. New York: McGraw-Hill. Hammerton, M. (1981). Tracking. In D.H. Holding (Ed.), Human skills (pp. 177-201). Chichester: Wiley. Hasan, Z. (1986). Optimized movement trajectories and joint stiffness in unperturbed, inertially loaded movements. Biological Cybernetics, 53, 373-382. Haykin, S. (1986). Adaptive filter theory. Englewood Cliffs, NJ: Prentice Hall. Ho, K.T. (1994). Studies of human eye-hand coordination optimal control of hand

movement. Final Year Thesis, School of Electrical Engineering, University of New South Wales, Australia. Hoff, B., & Arbib, M.A. (1993). Models of trajectory formation and temporal interaction of reach and grasp. Journal of Motor Behavior, 25, 175-192. Hogan, N. (1984). An organizing principle for a class of voluntary movements. Journal of

Neuroscience, 11, 2745-2754. Hogan, N. (1988). Planning and execution of multijoint movements. Canadian Journal of

Physiology and Pharmacology, 66, 508-517. Isermann, R., Lachmann, K.H. & Matko, D. (1992). Adaptive control systems. New York: Prentice Hall.

138

P.D. Neilson, M.D. Neilson and N.J. O'D~er

Jagacinsky, R.J. (1977). A qualitative look at feedback control theory as a style of describing behavior. Human Factors, 19, 331-347. Kelley, C.R. (1968). Manual and automatic control. New York: John Wiley & Sons. Kleinman, D.L., Baron, S., & Levison, W.H. (1970). An optimal control model of human response. Part I: Theory and validation. Automatica, 6, 357-369. Lewis, F.L. (1992). Applied optimal control and estimation. Englewood Cliffs, NJ: Prentice Hall. Licklider, J.C.R. (1960). Quasi-linear operator models in the study of manual tracking. In R.D. Luce (Ed.), Developments in mathematical psychology (pp. 169-279). Illinois: The Free Press of Glencoe. Lui, B.K. (1993). Computer analysis of hand-writing movements. Final Year Thesis, School of Electrical Engineering, University of New South Wales, Australia. McRuer, D. (1980). Human dynamics in man-machine systems. Automatica, 16, 237-253. McRuer, D.T., & Krendel, E.S. (1974). Mathematical models of human pilot behavior (AGARDograph No. 188). Neuilly sur Seine, France: North Atlantic Treaty Organization, Advisory Group for Aerospace Research and Development. Meyer, D.E., Abrams, R.A., Komblum, S., Wright, C.E., & Smith, J.E.K. (1988). Optimality in human motor performance: Ideal control of rapid ann movements.

Psychological Review, 95, 340-370. Moray, N. (1981). Feedback and the control of skilled behaviour. In D.H. Holding (Ed.),

Human skills (pp. 15-39). Chichester: Wiley. Navas, F., & Stark, L. (1968). Sampling or intermittency in hand control system dynamics.

Biophysical Journal, 8, 252-302. Neilson, P.D. (1993). The problem of redundancy in movement control:

The adaptive

model theory approach. Psychological Research, 55, 99-106. Neilson, P.D., Neilson, M.D., & O'Dwyer, N.J. (1985). Acquisition of motor skills in tracking tasks: Learning internal models. In D.G. Russell & B. Abemethy (Eds.), Motor

memory and control (pp. 25-36). Dunedin: Human Performance Associates. Neilson, P.D., Neilson, M.D., & O'Dwyer, N.J. (1988). Internal models and intermittency: A theoretical account of human tracking behavior. Biological Cybernetics, 58, 101112.

139

Adaptive Optimal Control of Human Tracking

Neilson, P.D., Neilson, M.D., & O'Dwyer, N.J. (1992). Adaptive model theory: Application to disorders of motor control. In J.J. Summers (Ed.), Approaches to the

study of motor control and learning (pp. 495-548). Amsterdam: Elsevier. Neilson, P.D., Neilson, M.D., & O'Dwyer, N.J. (1993). What limits high speed tracking performance? Human Movement Science, 12, 85-109. Neilson, P.D., O'Dwyer, N.J., & Neilson, M.D. (1988). Stochastic prediction in pursuit tracking: An experimental test of adaptive model theory. Biological Cybernetics, 58, 113-122. Nelson, W.L. (1983). Physical principles for economies of skilled movements. Biological

Cybernetics, 46, 135-147. Nise, N.S. (1992). Control systems engineering. New York: Benjamin/Cummings. Pashler, H. (1992). Dual task interference and elementary mental mechanisms. In D.E. Meyer & S. Komblum (Eds.), Attention and performance XIV: Synergies in

experimental psychology, artificial intelligence and cognitive neuroscience (pp. 245264). Cambridge, MA: MIT Press. Pelisson, D., Prablanc, C., Goodale, M.A., & Jeannerod, M. (1986). Visual control of reaching movements without vision of the limb: II. Evidence for fast, unconscious processes correcting the trajectory of the hand to the final position of a double-step stimulus. Experimental Brain Research, 62, 303-311. Pew, R.W. (1974). Human perceptual-motor performance. In B.H. Kantowitz (Ed.),

Human information processing: Tutorials in performance and cognition (pp. 1-39). Hillsdale, NJ: Erlbaum. Poulton, E.C. (1974). Tracking skill and manual control. New York: Academic Press. Poulton, E.C. (1981). Human manual control.

In V.B. Brooks (Ed.), Handbook of

physiology, Sect.l" The nervous system, vol II, part 2 (pp. 1337-1390).

Bethesda:

American Physiological Society. Rouse, W.B. (1977). Special issue preface. Human Factors, 19, 313-314. Seelen, W. von, Mallot, H.A., Krone, G., & Dinse, H. (1986). On information processing in the cat's visual cortex. In G. Palm & A. Aertsen (Eds.), Brain theory (pp. 49-79). Berlin: Springer-Verlag.

140

P.D. Neilson, M.D. Neilson and N.J. O'Dwyer

Shaw, G.L., & Silverman, D.J. (1988). Simulations of the trion model and the search for the code of higher cortical processing. In R.M.J. Cotterill (Ed.), Computer simulation in brain science (pp. 189-209). Cambridge: Cambridge University Press.

Shaw, G.L., Silverman, D.J., & Pearson, J.C. (1986). Trion model of cortical organization: Toward a theory of information processing and memory. In G. Palm & A. Aertsen (Eds.), Brain theory (pp. 177-192). Berlin: Springer-Verlag. Sheridan, T.B., & Ferrell, W.R. (1974). Man-machine systems: Information, control, and decision models of human performance. Cambridge: M1T Press.

Stark, L. (1968). Neurological control systems. New York: Plenum Press. The MathWorks, Inc. (1993). SIMULINK dynamic system simulation software. User's guide. Natick, Massachusetts: Author.

Tran, D.T. (1994). Control process of tracking a non-minimum phase system. Final Year Thesis, School of Electrical Engineering, University of New South Wales, Australia. Veldhuyzen, W., & Stassen, H.G. (1976). The internal model. In T.B. Sheridan & G. Johannsen (Eds.), Monitoring behavior and supervisory control (pp. 157-171). New York: Plenum Press. Vu, D.H. (1993). Measuring performance characteristics of the adaptive behaviour of the human operator in a tracking task. Final Year Thesis, School of Electrical Engineering,

University of New South Wales, Australia. Welford, A.T. (1980). Reaction times. London: Academic Press. Wickens, C.D., & Gopher, D. (1977). Control theory measures of tracking as indices of attention allocation strategies. Human Factors, 19, 344-365. Widrow, B., & Stearns, S.D. (1985). Adaptive signal processing. Englewood Cliffs, NJ: Prentice Hall. Young, L.R. (1969). On adaptive manual control. Ergonomics, 12, 635-675. Young, L.R., Green, D.M., Elkind, J.I., & Kelly, J.A. (1964). Adaptive dynamic response characteristics of the human operator in simple manual control. IEEE Transactions on Human Factors in Electronics, 5, 6-13.

Young, L.R., & Stark, L. (1965). Biological control systems - a critical review and evaluation. Developments in manual control. NASA CR-190, Washington, DC:

National Aeronautics and Space Administration.