SKILL ORGANIZATION AND HUMAN LEARNING BEHAVIOR Jacob L. Meiry Assistant Professor Department of Aeronautics and Astronautics Massachusetts Institute of Technology Cambridge, Massachusetts
ABSTRACT
A control policy is identified and a manual control task is learned when the human operator resolves his uncertainty; that is, when all but one response becomes improbable. Bayes' theorem is the proposed analog of man's algorithm for changing his preferences for alternatives. The theory postulates a single-channel information processing system for the operator where the selection of response alternatives and the revision of preferences are functions of the decision center.
A theory and model of motor skills learning postulate a statistical decision process for the human operator controlling a dynamic system. The selection of response alternatives and the revision of preferences for them are functions of a decision cent er in the human mind. This decision cent er is one component of a hypothetic single-channel information system. Also included in this information processing system are a sensor, which perceives the information upon which the decision center acts, and an effector, which executes the response decisions made by the center.
Our model is a digital computer program obtained from a translation of the theory into machine language. A set of parameters governs the learning behavior of the program and corresponds to human psycho-physiological characteristics.
The model of human learning behavior is a digital computer program which is obtained from a translation of the theory into machine language. Behavior of this model is compared with subject behavior in a controlled series of motor skill experiments. The extent of the model's characterization of the time-varying random nature of human learning is brought out by this comparison.
In this work we are concerned with the behavior of human operators learning to regulate the state of a dynamic process by actuating a twoposition relay controller. The process dynamics are second order and the output of the process, x, is displayed to the operator. The objective is to keep x nulled, and operator performance is scored on the basis of the integrated absolute value of x over the duration of the trial.
INTRODUCTION Human learning behavior in manual control tasks including a theory and model of motor skills learning is presented in this paper and is given a stochastic interpretation. In this context, motor skills learning is a statistical revision process by which the human operator identifies a policy for the manual control of a dynamic process. His selection among alternative responses, the theory postulates, is based on his preferences for the alternatives at the moment of choice, these preferences being expressed as probabilities.
*
THEORY AND MODEL OF HUMAN LEARNING BEHAVIOR Our theory begins with the conceptualization of human response generation as a single channel information processing system. A sensor, a decision center, and an effector are the serially con nected components of the system. Information related to the state of the dynamic process and displayed to the human operator is perceived by the sensor, quantized, and transmitted to the decision center. When the cent er is free to process new data, it accepts the most recent sample of state information and decides upon a response to this stimulus. Stored in the memory of the decision cent er are the operator's preferences for the alternatives expressed as probabilities. Selection of a response is based upon the operator's preferences and passed on to the effector for execution. The time interval between the acceptance of a sample and the completion of the selection as well as the interval between the decision and the execution are treated as statistically independent random variables. Revisions of preferences are based on the outcomes of previous responses and are
This paper is based on research supported by the National Aeronautics and Space Administration, Grant NsG-577. It is excerpted from the paper "Stochastic Modelling of Human Learning Behavior" by A. E. Preyss and J. L. Heiry presented at the USC-NASA Conference on Manual Control, ~llirch 1967. The author acknowledges the important contributions by his former students, Major A. E. Preyss and T. T. Chien.
687
described by an application of Bayesian statistics. The learning behavior of the system is thus characterized by the weighting rule which updates the operator's preferences for alternative responses. The following discussion is the detailed development of our theory of human learning behavior.
switch control polarity when the sampled state is (x ,Vj)' The selection rule depends upon a swTtch~ng probability defined as
The Sensor
that the switch curve at x passes through a mesh whose velocity coordinate Ties in the closed interval (v., v). The switching probability Q.(~) expres~es ~he belief that control polarityJshould be altered when the velocity is v. or lower. Indeed, if Qj(~) is high, one isJquite confident that a switch should take place. Selection of alternatives, then, can be based on the value of the switching probability. However, to account for variations in human control behavior, it is assumed that the selection of an alternative is a Bernoulli trial with probability Qj(~) of success, i.e., of switching.
~~n's perception of the displayed information, related to the state of the dynamic process being controlled, is subject to resolution limitations. Fig. 1 shows a finite grid overlaying the state space of the dynamic process. It is assumed that sensory stimuli are categorized by the coordinates (m, i) of the mesh in which the process' state actually lies. In effect we emphasize that the decision center, due to measurement errors, is certain of the current state of the dynamic process only to within the dimensions of a mesh.
The Decision Center The decision cent er samples the state information transmitted by the sensor and uses this data to decide upon a response. For the task in question this involves selecting between two alternatives: to switch the control polarity or not to switch. During the learning phase the human operator does not know which of these two alternatives is correct at any moment, and before he can make a response he must weigh each alternative and select one on the basis of some expression of preference. We propose the use of a subjective probability as an expression of preference in control policy. In order to define a control policy for the regulation of a second order dynamic process, the concept of the switch curve is employed. By definition the switch curve is the locus of all points in the state plane where. a control polarity reversal should occur to force the process' trajectory through the null state. It is further assumed that stored in the memory of the decision center is a set of probabilities p(Hi(x » m for each of the M x N hypotheses, H. (x ): ~
m
The switch curve passes through the mesh (x , vi) and the probabilities p(Hi(~) are distributed so the conditions N
Li:l
P (H. (x » ~ m
1 (1)
m : l , ... ,:1 are satisfied. This relation divides the state plane into :1 columns along which the subjective probabilities sum to unity. What is the strategy of the control policy? During the learning phase, a state of uncertainty in the selection of alternatives is experienced by the human operator. For the development of the theory and model we postulate a unique representation of the decision process, whether or not t
N
Q.(x ) : ~ J m L-
i: j
p(H.(x» ~ m
(2)
Consider the first trial of a motor skill experiment. On what does the subject base his first response? If no clues have been provided by briefing, any preference must reflect a personal bias stemming from past experience. Whatever his background, a subject's initial beliefs are expressed by the probabilities stored in the deci s ion center's memor y at the beginning of the experiment. A decision to respond for th e first time is based on these prior probabilities. A chang e in th e se priors refl e cts the collective effect of add e d information about the dynamic process. To learn a psychomotor task, the human operator must revise his opinion about th e location of the switch curve in the state space. Learning is then considered to be effected by a revision of the prior uncertainty based upon th e weighting of available information. This r e visi o n of pref e rence is expressed by a change i n the pri o r probabilities p(Hi(X m» of the followin g f orm: (3)
Here evidence E is the information about perceived events in the stat e history of the dynamic process being controlled. The subject us e of it in resolving uncertainty as to the location of the switching curve in the phase plane is expressed by a weighting of the prior probability p(H i ) with \.' i (E) . The revised posterior probability is express e c in equation 3 as a conditional probability p(Hi / E) to indicate the probability that the hypothesis Hi(~) is true given the evidence. Using Bayes' theorem as the revision rule, the weighting term wi(E) can be identified as w. (E)
(.:. )
~
since (5 )
In recent investigations l to determine whether or not man applies a revision rule approximating Bayes' theorem in his estimation of posterior probabilities, the dominant finding is that man is conservative: he is inefficient in resolving his uncertainty, and he is unable to make maximum use of the available evidence. The analogy which has been suggested and which we accept is but one of many possibilities. Note in Fig. 1 that for each column of xm there will be N hypotheses that the switch curve passes through the mesh (xm, Vj). Therefore
To derive an expression for the conditional probability p(Ejk/Hi), assume for the moment that the dynamic process is defined by the differential equation
x
=
d
L
p(E/H.)· p(H.)
j=l
J
(6) d
~
~
N
p(E/H.)/ ~ p(E/H.) . p(H.) ~
L
J
J
(7)
j=l i
= 1,
••• , N
Thus the formal evaluation of the weights W.(E) can be accomplished once the priors p(H.) are kfiown and the N conditional probabilities p(E1H.) have been determined. When a prior is revised,~the resulting posterior probability becomes the prior for the next revision, and so on. If a revision is made, the evidence used and the way in which it is used depends on the task itself. Here we are dealing with a state regulator task in which the human operator actuates a relay ro null the output of a second order dynamic process. At any instant during a trial, the signs of the state variables, (x, v), the polarity of the control, and a decision to reverse control polarity present evidence, we believe, to the operator which he can use to resolve his uncertainty.
+ m
min
= xm
x
m
(v. + J
-
vi)
"2
+ "2 - x + (v m j / vi)
(9)
(l0)
where the plus and minus signs respectively denote the largest and smallest absolute value of the subscripted state variable in the (xm, Vi) mesh (see Fig. 2). According to our hypothesis, if a switch is made in the mesh (x m, Vj), the operator, while controlling other dynamic processes of the second order, first degree type, believes that a crossover is more likely to occur within (dmin' dmax) than outside this interval. The theory expresses this preference by assuming a normal distribution fxk for the probability that xk occurs between x and x + dx, given the hypothesis Hi(xm), and defining x
=
(d
;nax
+ d . )/2
(ll)
m~n
for the mean of the distribution and o
x
= (d
max
(12)
- d . )/2 m~n
for its standard deviation. Subsequently, conditional probabilities of interest are evaluated as:
A decision to switch may result in an outcome illustrated by Fig. 2. Call the position at which the trajectory crosses the x-axis xk. The theory postulates that the evidence defined here as E·k(x ): J m
x
max
J
The weighting w.(E) is then w.(E)
(8)
If the control polarity is switched when in the mesh (Xm, Vj) and if the hypothesis Hi(x m) that the switch curve passes through the mesh (xm, Vi) is true, the possible phase trajectory will cross the x-axis at a point xk somewhere in the interval bounded by
N
p(E)
u
=
f x
xk
(x) dx
k
The set of conditional probabilities just defined represent an abstraction of momentum. In other words, these conditionals reflect the belief in stopping an object in shorter (or longer) distances then IXk - Xml if control polarity is switched at xm while traveling with velocities lower (or higher) than Vj. With these conditions, the process of revision is completely defined, since we have provided an algorithm for weighting evidence obtained during manual control of a dynamic process.
Switching in the mesh (x , v.), results in the phase traTect3ry crossing the x-axis between xk and x + l!.x k
is used by the operator to test the validity of the hypotheses Hi (xm), i = 1, ... , N. It is quite evid e nt that in order to revise his estimate of p(Hi), the human operator must assign a value to each of the N conditional probabilities p(E·k/Hi). These probabilities summarize the operator'~ belief to obtain various outcomes of a trial assuming its parameters are known. Indeed, this model of the environment is subjective and most likely different for each individual. For expedience and uniqueness, however, we advance a single formal model to be used in conjunction with the model of learning behavior.
The Effector We choose to represent the effector as a transmission delay, accounting for the time interval elapsed between the completion of a decision in the decision cent er and the execution of a response. This response time, RT, is considered to be a uniformly distributed random variable.
689
m-th
column
j -th row
x
i-th row
tt
V
+ - - - 1 - - - - SWitching somewhere
In th is reg io n,
gt'.Jen thot the 5wdch curve of 0
double InleQ,ol plont plorl posses somewhere Ihru thiS regI o n, results in on O\l'erShoo t somewhere In th.s Infer vol
tt
~
x d mo •
Vh
m~
M
1
- --Il-~
I
'
-
It-i,
N N-I
f (x
'
I _ nfl
x \
--t--
x
I
I Fq:!. F lg. 1
Qu antizati u n 01 percie ve d state o f space (X. v )
Portit ion
Figure 3
690
\\"eighting of evidence .
A selection process and a revision process take place in the decision center. We have inferred that the times for revision and selection are random variables statistically independent of each other and of the response time. The sum of the selection time and the revision time is called here the decision time, DT. The probability density for each component time of DT is assumed uniform.
absolute value of x over the five seconds, and this score is reported to the subject immediately after each trial. The dynamic processes controlled were of the form
x
= u
x + x x
+
u
kx
x - x The Model
=
(14) u
=
u
to represent the complete range of distinct response characteristics of second order dynamics.
Our model is a digital computer program which produces a machine language translation of the theory presented. Contained in this program are the selection process, the revision process, etc., which have been set forth by the theory as elements in the identification of an unknown control policy.
RESULTS AND DISCUSSION Theoretical Results
One detail important for the operation of the model is determining the order in which the human operator established priorities among several matters requiring his immediate attention. It is quite possible that the decision center can interrupt the revision process, store the unfinished computations, and attend to a response, for example. In the model we assume that revisions come first and selections second. No interruptions are permitted.
In our parametric study of the model, we found generally that the performance of the program deteriorates whenever
Finally we must provide some "numbers" for the parameters in the theoretical development. A specification of these parameters corresponds to a specification of the psycho-physiological characteristics of some human operator. As the behavior of the model is governed by the set of numbers chosen, it should be possible to match individual programs with individual human operators. We conducted a parametric study of the model on a digital computer and from the results we determined how these parameters influence the behavior of the model. Then we inferred sets of parameters in an attempt to provide a representative sample of human operator behavioral simulations. Next we performed a motor skill experiment and measured the response behavior of human operators. The two samples were then compared statistically to determine whether they came from the same parent population, i.e., whether they matched.
a.
the sensor perceives the state of the dynamiC process with greater uncertainty (the mesh size is increased).
b.
the decision center is initially more uncertain of the control policy (the priors are distributed uniformly).
c.
the decision cent er requires more time to process information (DT is increased).
d.
the effector requires more time to execute a response (RT is increased).
findings are consistent with the behavior prepartly by conventional control systems theory and expected of any information processing system. Furthermore, the program learns to control a second order dynamic process and the learning process is convergent in all cases. When learned, closed loop performance of the system is nearly time optimal.
T~ese
d~cted
Fig. 4 collectively provides a striking portrayal of learning a control policy: note the program's progress in resolving its uncertainty about the location of the switch curve for u simulation. The height of each surface above the reference plane represents the posterior probability p(Hi(x » at the end of the trial. m
x;
THE EXPERIMENT Four groups of subjects at M.I.T. 's ManVehicle Laboratory were given the opportunity to learn a manual control task. They were briefed on the task and familiarized with the apparatus but were not allowed to practice prior to the first trial. The subject is required to null the initial misalignment between two line segments displayed on an oscilloscope by actuating a two position switch (see Fig. 3). The left segment remains stationary and the displacement of the other relative to it x, satisfies a second order differential equatio~. Each subject is given fifty five-second trials starting with the same initial conditions at ten second intervals. Subject performance on each trial is measured by computing the integral of the
Experimental Results A complete picture of human operator learning behavior in the experiment can be developed from the measurements of the intervals between successive switches in control polarity, the interresponse time. From the interresponse time data, the state of the dynamic process (x, v), at each switch has been calculated. To provide a portrayal of learning for the human operator comparable to model performance illustrated in Fig. 4, we have taken the statistics on the state variables of fifty subjects and computed the ellipsoids of concentration for
691
-. ~
{
'"-;;
--
,..
';:.
E
~
-::;
~
C 0
S .~
t><
...0
.
;
~
«
;.
-. ~
i <>
... ""
"'"""
--
C
.~ -
--""
Col ....
...
0
S... ,
~
~-
""';?0
~
-.
~~
\
.~
~ <>
""
.~
';.
\
\-
\'
the first six responses of trials 1, 2 , 10, 20 , and 50 (Fig. 5) . An ellipsoid of concentration bounds a two-dimensional region over which probability is distributed uniformly such that th e first and second order momen ts o f the uniform dist ributi on are the same as th e actual distribution . L Loose l y interpreted, these ellipsoids bound r egio ns in the state space where "most" subjects make their r espo nses. The shrinking and r eo rientation of the ellipses illustrat e the operator's ensemble progress in identifying a con trol po li cy.
Theory and Experiment Compared The figures presented in the previous sections lead to a qualitative appreciation of the individual ca pabilities of both subjects a nd programs in learning control policies. Specifically, we have shown that the model and the human operato r can l ea rn to regulate second order dynamic processes in a near optimum manner. Secondly, we submit that a uniform control strategy, expressed as a Bayesian revision process, is applicable to all type s of dynamic processes c onsidered here . Finally, to decide whether the theory presented here is a credible explanat ion of human learning of manual con tr ol ta sks , we compared the subject ensemble and a t est sample of programs . figs . 6 , 7, and 8 summarize the learning performance of sub j ects and programs (for groups Ot eigh t) in terms of the ensemb le average of integrated absolute error. It is easily obse rved that all learning cu rves are conve rgent to an assymptotic level afte r approximately the twentieth trial . To establish further the "simila rity" between th e learning behavior of human subjects and the model ~rogram, the ~ann \,h itney " U" t es t was appl ied. The sample o f sub ject interresponse time and the sample of model interresponse time were rank ed in orde r with th e statis~ic C counting the number of tim es a member of the subject sample exceeded a member of the model sample. At the l ~ level, our tests accept the hypothesis that th e subject sample and the model sampl e are d r awn from identica l dis tri butions f o r 95 ~ of the cases we have examined in thi s manne r. ); o te that t es ting the interresponse time samples is subjecting th e best available meas urem en t of r esponse behavior t o a powerful non-parametric statistical te s t.
Discussion of Results We have statistically compa red a sample of behavioral simulations obtained from a computer program with behavioral data frol~ a series of psychomo tor experiments in order to determine whethe r or not the y come fro m the sam~ parent pop ulation (are statistical images of each o ther). The results show that with few exceptions there is no statistical reason for r eject ing this hypothesis. Although thi s favorable outcome offe r s us a quan titative basis for confidence in ou r proposed theory, we hesitate to conclude that this result is sufficient evidence for the theory . However, this comparison offers us a basis for accepting the th eory 's explanation of human l earn ing behavior in the type of manual control tas ks conside r ed .
SL~'[\!ARY
We have developed a theory for the explanation of human learnin g behavior in manual cont rol tasks, endeavoring t o ac~oun t for the inter-subject, intra-subjec r ~d riability in psychomotor experi men ts. This va r iabili t y has been at tributed to the stochastic nature of human information processing, assumed t o be a sequent ial o peration involving three subsystems : the sensor, the decision ce nter, and the effector; each of these has been treated as a probabilistic system with a stochastic description of its function . Our int e rpr etation of Bayesian sta tisti cs for the characterization of dec isi on making is perhaps our most important co ntribution. 6 From the theor y we have derived a model of human lea rning behavior, with a set of read-in parameters, co rresponding to human psycho - physi o logical char ac teri stics. Subsequentl y we have been abl e to execu te a number o f programs which are shown to be sta tistic al images of an ensemb le of human opera tors.
REFERENCES (1)
Beach , L. R ., "Ac curacy and Consistency in the Revision of Subjective Probabilities," Human Factors in El e ctronics, Vol. HFE - 7 , ~o . 1, ~!arch 1966
(2)
Cramer, H., ~!a themati cal ~!ethods of Statistics, Princeton Un iv e rsit y Press , 1946
(3)
~!ann, H. B. , Whitney, D. R. , "On a Test of Whether One of Two Random Variables is Stochastic ally Larger than the Other ," Annals of ~!athematical Statistics, Vo l. 18, 1947, pp . 50 -60
( ... )
Preyss, A. E., " A Theory and ~!odel of Human Learning Behav ior in a 'lanual Cont r ol Task," Sc . D. ThesiS , ~ . I.T . , February 1967
(5)
Chien , T . T., "Human Lea rning Behavior in 'lanual Con tr ol Tasks," ' 1.S. Thesis, '1. 1. T., September 1967
(6)
Preyss, A. E., and 'leiry, J. L., "St ochas tic of Human Learning Behavio r," USC ~ASA Confe r ence on ~!a nual Control, 'larch 1967
~lodelling
~
~
f:. c
§ .~
"-
9-
01-
9-
Ct-
9-
01-
~
c QI
694
IAE
IAE SUBJECTS
10 SUBJECTS OPT IMAL SCORE
5 OPTIMAL SCORE 0
I
10
20
40
30
50
0
NO. OF TR IAlS
1
10
20
30
40
50
40
50
NO OF TRIALS
IAE
IAE
10 PROGRAMS 15
PROGRAMS
5 OPTIMAL SCORE
10
0
10
I
20
30
NO OF TRIALS
OPTIMAL SCORE
F ig. 7 Har monic o s c illa to r learn ing cu r ves (w:1 rad / sec ) 0~1------~10--------720~------~3~0------~4~0--------5LO-NO. OF TR IALS
Fig. 6 Unstabl e pl ant l earning cu r ves.
IAE
10 SUBJECTS
OPT IMAl SCORE
0
1
10
20
30
40
50
40
50
NO. OF TRIALS tAE
10 PROGRAMS
5 OPTIMAL SCORE
0
1
10
20
30
NO. OF TRIALS Fig. 8 Harm on ic oscillato r learn i ng cur ves (w
695
~
1 rad / sec)
..