Proceedings of the 9th IFAC Symposium Advances in Control Education The International Federation of Automatic Control Nizhny Novgorod, Russia, June 19-21, 2012
COLLECTIVE LEARNING-BY-DOING Dmitry A. Novikov Institute of Control Sciences, Russian Academy of Sciences 65 Profsoyuznaya str., 117997, Moscow, Russia
[email protected]
Models of iterative learning, implemented during the professional activity, are considered. The problem of optimal collective learning (task allocation between the team members) is formulated and solved. Copyright © 2011 IFAC Keywords – Iterative Learning, Learning-by-doing, Team, Optimal Learning
1. INTRODUCTION
2. GENERAL PROBLEM STATEMENT AND MODEL OF LEARNING PROCESS
While acting jointly, members of a collective (agents) consciously or unconsciously gain experience of both individual and collective activity. That is, a process of their learning takes place. Here and below we understand learning as "a process and result of gaining individual experience" (Hull, 1943; Novikov, 1998). This interpretation of the term is a particular case of the more general notion of learning as a process of gaining knowledge, skill, and habits (Bush and Mosteller, 1962). Let us consider consecutively a number of models describing the learning effects of members of a collective during the process of their work. Starting from a general problem statement and quantitative description of the learning process, we consider a model of the individual learning process, and then a model of the learning process for a collective of agents.
978-3-902823-01-4/12/$20.00 © 2012 IFAC
In a qualitative sense, a general problem of optimal learning can be stated as follows. Each of agents constituting a collective is characterized by some initial level of skill (e.g., the labour productivity). In process of working activity the labour productivity of an agent grows as far as the agent gains experience, improves practical habits, etc. (the learning in process of work, i.e. learning-by-doing takes place). At that, the rate of this growth (the so called learning rate formally defined below) is individual for each of agents. We are interested in optimal sharing of work in time between the agents. In fact, an agent with low level of initial professional skill being strongly loaded by work from the very beginning can improve its own skill rapidly and will be able to work more efficiently later. From the other hand, may be it is rational to load the agents with higher initial professional 408
10.3182/20120619-3-RU-2024.00002
9th IFAC Symposium Advances in Control Education Nizhny Novgorod, Russia, June 19-21, 2012
rk = r + (r0 – r) e - k, k = 1, 2, … ,
skills? The answers to these questions are not evident. Even more, we have to determine what is understood as the "optimal" sharing of work between the agents. Namely, the total expenses of the agents, the time required to perform a given volume of works by the collective, the result achieved in fixed time, etc., can be the efficiency criterion.
where t is the learning time, k is the number of iterations (trials, attempts) from the moment when the learning begins; r(t) (rk) is the type of an agent (the level of practical habit, professional skill) at the time t (k-the iteration); r0 > 0 is the initial professional skill (the value of type corresponding to the moment when the learning begins, i.e. the first moment of time); r is the "final" value, r r0; is some nonnegative constant defining the rate of the type change and called the learning rate.
Let us proceed to formalization of the considered situation. Starting from the simplest model, we will gradually make more complex later. At that, let us limit ourselves to the case of iterative learning (Novikov, 1998) corresponding to routine enough kinds of activity. The iterative learning is the multiple reiterations of actions, trials, attempts, etc., by a system to achieve a fixed goal in invariable environmental conditions. The iterative learning (IL) underlies forming the habits of a human being, conditioned reflexes of animals, learning of many technical (materialized) and cybernetic (abstract logical) systems. This is the subject of research in pedagogical and engineering psychology, psychophysiology, pedagogics, control theory, and other sciences (see the survey in (Novikov, 1998)).
3. SINGLE AGENT LEARNING Let us first consider the model of learning for a single agent. Denote by yk 0 the volume of work produced by the agent in k-th period of time. If the agent type (level of skill) rk [0; 1] is interpreted as the share of successful actions of the agent, then the agent achieves the result zk = rk yk performing the volume of work yk in the period k.
Invariability of both environmental conditions and goal allows describing IL quantitatively in form of learning curves representing a criterion of the learning level as a function of time of number of iterations.
Then the agent's result which is the total volume of works successfully performed by the agent for k periods of time equals to k
Numerous experimental data argue that the most important general regularity of the iterative learning consists in sloweddown asymptotic behaviour of the learning curves. They are monotonic; the rate of change of the learning level criterion decreases in time; the curve itself asymptotically tends to some limit. In most cases, the iterative learning curves can be approximated by the exponential curves.
r(t) = r + (r – r ) e
- t
, t 0,
l
yl .
(3)
l 1
From the other hand, the agent performs larger volume of (successful and unsuccessful) works: k
Yk =
y
l
.
(4)
l 1
This volume of works can be conditionally considered to be the "experience" gained by the agent (Novikov, 2008), i.e. the "effective internal time" of the agent (the time that passed from the moment of the learning start and spent for the learning process). Substituting (3) to exponent (1), we obtain rk = 1 – (1 – r0) exp(– Yk - 1), k = 2, 3, … . Denote y1, = (y1, y2, …, y), = 1, 2, … and assume y0 = 0. Combining (1)-(4), we obtain the following expressions for the respective volumes of works performed successfully and the types of the agent:
As is noted above, the iterative learning is usually characterized by slowed-down asymptotic learning curve which can be approximated by the exponential curves 0
r
Zk =
The following two aspects of the learning are distinguished. The first aspect concerns the results. While learning, a system has to achieve a required result implying the quality of actions with admissible expenses of time, energy, etc. The second aspect concerns the process and includes adaptation of a system learnt to some kind of activity in working process (e.g., exercises), etc. Respectively, the characteristics of efficiency of the iterative learning and adaptation characteristics are distinguished (Novikov, 1998). As a rule, the adaptation characteristics relate to physiological components of activity (fatiguability, etc.). In this paper we will consider just the characteristics of the learning efficiency (the adaptation characteristics often have quite different dynamics).
(2)
k
Zk =
(1)
y {1 (1 r l
0
) exp(
l 1
or the discrete sequence1
y l 1
1
Here and below the superscript denotes the time interval number, whereas the subscript denotes an agent number. In a case when a single agent is considered, the subscript is omitted. 409
m
)} ,
(5)
m1
k 1
rk = 1 – (1 – r0) exp(–
l 1
y
l
), k = 2, 3, … .
(6)
9th IFAC Symposium Advances in Control Education Nizhny Novgorod, Russia, June 19-21, 2012
Note that with fixed total volume of works the agent's type is defined by expression (6) uniquely and does not depend on how the volumes of works are distributed between the time periods. Therefore the problem of maximizing the agent's type with fixed total volume of works within the framework of the considered model makes no sense.
l 1
T
y l exp(
y
l 1
m
)
m1
. min T y Y }
(9)
{ y1,T |
1
Expression (9) does already not include the initial qualification of the agent r0, i.e. the following assertion is true.
Three "macro-parameters" presents in the model, namely, the total volume of works Y, number of the periods T, and result Z. The desired variable is the "learning trajectory" y1,T.
Assertion 1. A solution to the optimal learning problem does not depend on the initial qualification of the agent.
The optimal learning problems can consist in extremization of one of the variables with other variables fixed 2. So, we obtain that the following problem statements can be considered rationally.
This conclusion is interested for the learning methodology, since from the point of view of the results of individual independent agents only the individual differences between the rates of their learning are essential.
1. Fix the total volume of works Y which can be performed by the agent and the result Z which is required to achieve. We are interested in finding a trajectory minimizing the time of achieving the result:
3. Fix the learning time T and result Z which is required to be achieved. We are interested in finding a learning trajectory minimizing the total volume of works:
T min YT Y . Z T Z
Y min T . Z T Z
(7)
Each of problems (7)-(10) can be reduced to the problem of dynamic programming (or to the set of such problems).
Problem (7) can be conditionally called the minimum time problem.
4. LEARNING OF MULTIPLE AGENTS
2. Fix the total volume of works Y which are required to be performed by the agent and the learning time T. We are interested in finding a trajectory maximizing the result Z:
Z max Y Y . T
(10)
Heretofore we considered a single agent. Let us generalize the derived results on the case of several agents working simultaneously. First, we consider the situation when each of the agents is completely independent on the results and types of other agents (the results and type of each agent do not depend on the results and types of other agents). Then, we analyze the problem of learning of the dependent agents.
(8)
Problem (8) can be conditionally called the problem of optimal agent learning. Certainly, just this problem is most close to the pedagogical problems, when with fixed time and volume of educational material it is required to distribute this material along the time to maximize the "volume of learnt material", i.e. to maximize the "quality of learning". At that, the didactic aspects, i.e. its content, is not significant because of the routine character of the learning subject.
Let us consider a collective which is a set N = {1, 2, …, n} consisting of n agents. By analogy with expressions (5) and (6), we obtain the following formulas for the volumes of works performed successfully and types of the agents, respectively:
Z ik =
Since expression (5) is monotonic by the sum of the volume of agent's works and the learning period duration, problem (8) can be written as
k
yil {1 (1 ri0 ) exp( i l 1
ri k = 1 – (1 – ri0 ) exp(– i
k 1
y l 1
l i
l 1
y m1
m i
)} , (11)
), k = 2, 3, … , i N. (12)
If the results of the collective is the sum of the results of the agents constituting this collective, i.e. n
2
In more general case, one would desire to extremize some functional (e.g., learning expenses, learning quality, etc.) taking into account some additional constraints, varying several variables simultaneously, etc. All these problems form the prospective subject of future research.
Zk =
Z i 1
410
k i
, k = 1, 2, … ,
(13)
9th IFAC Symposium Advances in Control Education Nizhny Novgorod, Russia, June 19-21, 2012
then the problem of optimal learning of the collective (compare with (8)) is given by ZT
, max T N y Y } i
respective volumes of works performed successfully and agents' types:
(14)
Z ik =
{ yi1,T |
1 i 1
k
yil {1 (1 ri0 ) exp( i l 1
that is n
T
y {1 (1 r i 1 l 1
l i
i
0
) exp( i
l 1
y m1
ri k = 1 – (1 – ri0 ) exp(– i m i
)}
. (15) max T N yi Y }
n
l 1
j 1
m1
ij y mj )} , n
k 1
j 1
l 1
ij y lj
),
k = 2, 3, … , i N,
{ yi1,T |
1 i 1
(16)
(17)
where the constant numbers {ij 0} can be interpreted as the efficiencies of experience transfer from j-th agent to i-th one, i, j N.
Problem (15) can be solved by the method of dynamical programming. It can be easily seen that an optimal solution of problem (15) generally depends also on the individual rates
Then the optimal learning problem becomes
0
of the agents' learning {i} and their initial skill { ri }.
n
T
yil {1 (1 ri0 ) exp( i
Assertion 2. If the learning rates of the agents are equal, then the optimal distribution of the works is performing the whole volume of works by an agent with maximum initial skill. If the initial skills of the agent are identical, then the optimal distribution of works is performing the whole volume of works by an agent with the maximal learning rate.
i 1 l 1
n
l 1
j 1
m1
ij yim )}
. max T N yi Y }
(18)
{ yi1,T |
1 i 1
So, in the case when all agents have equal learning rates, the solution of the optimal learning problem appears to be "degenerate"; only one agent works and learns while the rest do not work and learn. From the other hand, such collective can hardly considered to be of full value. From the other hand, one can admit that such situations are not rarely in the real life.
6. EXAMPLE Example 1. Consider problem (18) for two agents with 0
0
T = 11, r1 = 0,1, r2 = 0,3, 1 = 2 = 0,75, Y = 10 (both agents have equal learning rate; the second agent have higher initial skill) and ||ij|| = 1 2 . Qualitatively speaking, the 0 1 first agent learns by its own experience and experience of the second agent (even more efficiently than by its own). The second agent learns only by its own experience. The dynamics of the agents' types is shown in Fig. 1. The dynamics of the optimal volumes of works is presented in Fig. 2.
Let us consider what happens when the agents differ both in the initial skill and learning rate. Nominally, the solution structure of problem (15) when the whole volume of works is performed by the "best" (from the point of view of the combination of initial skill and learning rate) agent is conditioned by great number of variables under the single constraint. Substantially, the problem may have other constraints besides the constraint on the total volume of works performed by the collective members. The constraint on the maximal volume of works which can be performed by each agent for one iteration (period of time) seems to be most natural.
1,0 0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0,0 1
5. COLLECTIVE LEARNING
2
3
4
5
6
7
8
9
10
11
Fig. 1. Dynamics of agents' types in Example 1 So far in discussion of agents' learning-by-doing we assumed that each agent learns only by "its own experience". Nevertheless, the experience exchange takes place in the collectives and the agents can also gain the experience observing activities of other agents (their successes and difficulties). To take account of this effect, let us describe the "experience" gained by the agent not only as the sum of its own actions, but also add this sum with the weighted sum of actions of other agents. As a result we have the following expressions for the 411
9th IFAC Symposium Advances in Control Education Nizhny Novgorod, Russia, June 19-21, 2012
- a lack of the initial skill of the agent can be successfully compensated by the efficient learning by both its own experience and that of the others.
4,0 3,5 3,0 2,5
In conclusion, let us note that there exist the learning curves more complex than exponential or logistic ones, the so-called sequential logistic curves corresponding to development of various adjacent or more complex kinds of activity; generalized logistic curves and others. The detailed discussion of these curves is beyond the limits of this paper. Although, if the learning laws of the collective members are known (even if these laws are rather complex), then the problem of optimal distribution of the volumes of works can be stated similar to it is done above. But search of general (preferably analytic) solution of this problem is the subject of future research.
2,0 1,5 1,0 0,5 0,0 1
2
3
4
5
6
7
8
9
10
11
Fig. 2. Dynamics of optimal volumes of works in Example 1 In first six periods the first agent does not perform the work himself but "observe" the actions of the second agent. At that, the professional skill of the first agent grows more rapidly than that of the second agent. Starting from the seventh period, performing the whole volume of works by the first agent instead of the second one appears to be optimal.
REFERENCES Bush, R., and Mosteller, F. (1962) Stochastic models of learning. Moscow: Fizmatlit. Hull, C.L. (1943) Principles of behavior and introduction to behavior theory. New York: Appleton Century Company, 1943. Novikov, D.A. (1998) Regularities of iterative learning. Moscow: IPU RAN. (in Russian) Novikov, D.A. (2008) Team building under Pareto uncertainty. Proceedings of the 17th World Congress The International Federation of Automatic Control. Seoul, Korea. P. 1633 – 1638.
This example clearly illustrates how a lack of initial skill can be successfully compensated by efficient learning by other's experience. Another (close) interpretation is also possible. The second agent can be considered as a teacher, tutor, instructor that has higher initial skill and teaches the first agent. In a certain moment the learner "outruns" the teacher and can work independently. 7. CONCLUSION So, in this paper we have considered the models of learningby-doing. Under the assumption on that the volume of works already performed by the agent conditionally reflects the "experience" gained by that agent, we have stated and solved the optimal learning problem of choosing the volumes of works performed by the agents in certain time intervals. The analysis shows that the modelling allows making the following conclusions: - with fixed total volume of works of one agent, the characteristics of the learning efficiency do not depend on how the volumes of works are distributed in time periods; - the solution of the optimal iterative learning of one agent does not depend on its initial skill; - the agent learning rate is higher, the larger volume of works must be done by the agent in last periods (and, respectively, the lesser volume of works is required to be allotted in first periods to improve its initial skill); - the optimal learning strategy is to increase the volume of agent's works with time, at that the learning rate is higher, the optimal learning trajectory is more "convex"; - if there are no constraints on individual volumes of works, then the whole volume of works of the collective is done by the "best" agent (from the point of view of combination of the initial skill and learning rate);
412