An Automaton Model of a Hierarchical Learning System

An Automaton Model of a Hierarchical Learning System

Copyright © IFAC Control Science and Technology (8th Triennial World Congress) Kyoto. Japan. 1981 AN AUTOMATON MODEL OF A HIERARCHICAL LEARNING SYSTE...

1MB Sizes 15 Downloads 75 Views

Copyright © IFAC Control Science and Technology (8th Triennial World Congress) Kyoto. Japan. 1981

AN AUTOMATON MODEL OF A HIERARCHICAL LEARNING SYSTEM M. A. L. Thathachar and K. R. Ramakrishnan Department oJ Electrical Engineering, Indian Institute of Science, Bangalore 560012, India

Abstract . A stoc;,astic automaton operating in a random environment .1<\S been extensively studied as a .nodel of a simple learning system. Real liie ODserva tions see n to support t .1e L1ypothesis t io at learning proceeds in a hierarcrucal manner . Tnis is manifest, for example, in the way one learns a foreign language, motor car driving, typing etc. An a ttempt has been made in this paper to model a simple hierarc tucallearning system. The model is Guilt of stoCl'astic automata arranged in a hierarchy, opera ting in a random environment. An algoritnm for updating the action probabilities of the automata, taking into account environmen tal reac tions a t all the levels, is proposed. It is shown tha t the hierarc hical system of automata operating according to this algorithm is € -optimal. Keywords. Learning systems; hierarchical systems; modelling; stochastic automata; convergence. INTRODUC TION In their struc tural and func tiona I organisa tion, living systems follow a hierarc :.u cal plan. Many complex decision making situa tions in everyday life are approached in a hierarchtical fashion (Mesarovic, M. D. and JVf.acko, D. ,1969). Learning, one of the most important behavioural aspects of living beings, offers an instance of a situation where hierarchical approach is adopted. Learning a foreign language, for example, can be regarded as proceeding a t several levels . The first level corresponds to the letters of the alphabet, second level to the words, which are meaningful co",bina tion of letters and toe next level to sentences, which are sequences of words formed according to specified gramma tical rules. In learning to recognise meaningful sentences among others, one has to work, consciously or not, at all t.1e different levelS mentioned above. It also appears reasonable to assume that when opera ting at eacn level there is a reaction of being right or wrong. This reaction could be inbuilt based on a partial knowledge of the language b eing learnt, or could be due to an external source such as a teacher.

of learning automata operating in a random environment(Fu, K. S. ,rc'7n). Such automata have now been extensively studied and methods of design of reinforcement schemes which lead to specific types of behaviour of the automata are available (Lakshmivarahan, S. and Thathachar,1973). Certain convergence properties of the automata are also known (Lakshmivarahan and Thathachar, 1976). A hierarchical system of learning automata was introduced (Tha thactlar, M. A. L. and Ra makris nnan , K. R. ,1980) with tne object o:i: speeding up tne slow convergence or: a single automaton naving large number of actions. In this earlier study, only the automa ta at toe lowest level were capable of eliciting reactions from the environment. In ti1e present setup automata at each level in the hierarcny elicit reactions from the environment. Tne main contribution of this paper is a learning algorithm with a built-in convergence property so that the - hierarchical system is assured to possess €. - optimality. LEARNING A UTOMA TON

Based on such observations in real1ife, an attempt is made in t:us paper to model a simple hierarchical learning system . Early studies of learning models were made in ma thema tical psychology (Bush, R. R. and Mosteller, F . ,1958). fiowever , hierarchical structures have not been considered in them. The present model is buil t

A stochastic automaton is defined as a quadruple<~.p, T ,R and the environment in which it opera tes is defined by a triple(~,C , R where the quantities inside the brackets

>

1065

'> ,

M. A. L. Tha t hac ha r and K. R. Ramakris hnan

1066

are described be,ow. A learning automaton is cl stochastic automaton connt!cted in a :feed6ack l oop with a rand orn environment (Fig. 1) . r =- total nurr.ner of actions of automaton

~ ~ r-<;'~.i" a(]. ,toe set of actions or: toe automa~

l

ton

(~~""
E(t) =- [Pl(t), :?2 ( t ) , .J3::'i . .. . Pr(t)] is the action probability vector of automaton at instant , t' .

Fo«t) represents the action chosen by t :,e automaton at time t, (t, ~ '),':',2 " .), then Pi(t) = Probabil:i ty[ c«(t) "a(i ] (1 a) and Pi( t) = 1 ¥ t (1 ,) ) .<.=/ T"et T= updating operator . T :1en ,£(tH) = T[~(t),~(t), ~(t)] (2) wher e ~(t) E.R, toe set of reactions from the environment (same a s the inP'.lt ~0 t'102 automaton)

f

C (er,

c2 . · . . c), penalty probability vector 1 (3) where ci = Probability [ p.,0:):.1 10«1:): (4)

0(,,]

The e nvir' J!"l, n",!: ,, ::msidere d in this paper , reacts in a binary manner, i. e., R ~ ~, 0 , 1 ' 0' corn!j,n nding to 'Reward' and '1' corresponding to 'Penal ty'.. In order to monitor tne performance of t he learning au toma ton, one of tne n-.easures of performance wnich lldS oeen used aunost exclusively in the literature is t he 'average pena l ty' tha t toe automa ton receives from t<1e environment . The avera ge ,;>enalty, M(t) is given by

J,

~ £ [~~)

..,

I pG:>]

= ~ P.: 0:) Cl

E [M~)l < "h\~~.rCJl-+ €

• -0 .J (7) by a proper choice of the learning a lg.)rit'1m a:1J 2) Absvlutely c -<.,) ,d ient (Narendra and Thathachar, 1974) if , E[M(~+I) p(k)]<:' MO:) (8) ),J t,VPk(t)E.( G,I)(k=1,2 ... . r)¥ci excepting trivial environments wher e all the ci are equal.

I

(9)

,,(t) =0

(H i) =Pj(t) + fA (2)?j(t" P(t) " 1 where A (E) = A [2(t)] and ",(p) ~ ,tt,(t)] are 3T. ~l i t',:.L' .' .: ~, I ' ' l ' " ", continuous func tions of t .1e vector p (t) satis'::;:i'1.:; O<).(€)

<1

(lOa)

o -: 1t~p) ~ mjn(pjl (~- Pj)) (1')') ) Denotm i! t .1e 1:1.;: ~ 2 ~ n ~'1 '; '1 ~h 2 condi t ional e x pectati:m of Pi(t) , :' y Ll Pit t), i. e . • 6pi(t) =E[Pi(t+l)

I

r(t)] -Pi(t)

when Pi (t) 's are as defined in (9) , it can S!lown tha t ...,..

6

(lla) ~e

~ Pj(t)(Cj-ci) (lIb) j';,

Pi(t) = Pi(t{ ),(p)+ ftp)]

In terms of reward probab ili ties ,

Pi(t) = Pi(t) [A(P) + f (P)]

1

Pj(t)(di-dj) (1 1c)

Where d·1 = I-c'1, d J· = 1-c J' ';.:1 Note t!1.a t i'-:::t ~ ~in ~ :: j, J fj, Pe.(t)

J

> ')

(12)

(d)

F or1'(H:: 0, t~le updating i ) ; i3 :ll1ed Reward-Inaction type, and

.1 Pi(t) = Pi(t)~(!::)

:2..

'Y'

( Pj( t)(di- dj)) (.1.4) J=I Wit h p .(t) derined as in (9), it can b e S,10wn tna t the i ncrem en t ~ M( t) in the; condl ti.)nd l expectation of M(t) is

6 M(t) (6)

A learning au toma ton is said to be 1) E: -optimal (NarenJra . K.S. and T!1 0 ~

-Pi(t) ) i~ t ) ~ 0

Pj (t+1) = flj(t) - :\(P)Pj(t);

(5)

The operator T is always given as an algorithm called tile updating algorithm (also known as the learning algorithm). The seguen.:e ?(t) generate.l by the upda ting algorithm is random-and so M( t) is also a random variable.

Li~

Pi (t+1 ) = Pi(t) +).(p)(.1.

= Pi(t) -l'
T i:le a ;:':iJ'1 ': 1 ':' ,'. .rl by toe automaton forms t .le output of tile automaton. and also input of the environmen t. I

M(t)

Laksnmivarahan and TnathaChar (1973) established necessary and sufficient conditions for absolute expediency. Wnen the s t ocha s tic automaton :'lil S chosen actiono(.i , at the instant 't', the general form of an a osolutely eXl'eruent aigorit run is giv2n ')euw .

-;;: :jM(t-tl)

I p(tJ]

:: S'6 Pi(t)ci -=

t;,

:;.M(t)

- ~A. t' i (<:;li

=-~i [">'(P)+ ~(p)l

.t.,

P'(t)Dp(t)

(i 5)

where 2 ij t~ e le m8nt of ma trv: D is gi:,en ~~ (di- d j ) . D IS a rea l, s YIT"netnc matrIX wlb, all the diagonal elerr: ents zero , and non-d iagonal elements positive, As the vector p(t) is always in the positive orthant and so 10:1 5 as iJi(t) €. (0,1), (i= 1,2 . . . , . 1:') P(t)Dp(t) and hence

>1

<

6.M(t) r (1 6) and this implies absolute expedienc y, from (8) . The cla s::; of absolutely expedient automaton updating schemes, has recently been expanded by Aso and Kimura(1979). The implica tion of € - optimality by a O::;Jlu te expediency, in all stationary environments was shown by Lakshmivarahan and Thathachar(1976) . Acc ording to

An Automaton Model of a Hierarchical Learnin3 System Pi (t)

ENVIRONMENT

....u.

STOCHASTIC AUTOMATON p=[p

-

pp ... p]

"21

r

3

f3ER

R~fO,1} Fig . 1.

bcrt

/.)o'W\e

O\.t : op1i.-.t o.c.b .. 0 .... y. t:

6

cS > 0

(17) HIERARCHICAL AUTOMATON LEARNING SYSTEMS. For easier visualisation or t ;le nierarciucal set up, a 3 level :uerarcl1ical automaton model with 2 actions per aub.Jmaton is depichd in Fig. 2 There is a single automa ton a t the top fOLLOwed by 2 in second Level and 4 in the third leveL All the automa ta in the hierarchy interact with t.1e envir-:mment. The following nota tion i5 useJ bJ describe a general 5ysten of N levels. Notation. The following nota tion refers to a hierarchical set up where each automaton has 'r' actions. p(t) = Set of all action probabilities in the hierarchy N Number of levels A First [eve 1 automa ton 114 Reaction of the environment at it" ",? = 1 _ level, 0 or 1 1

/l.

1

I- Level

r-.,a(.i" «ot')= Set of all actions of A

{et .. c:li .cl,.~"- Set of reward probabilities for ac tions

~f!:)'~2.(b).. ~t

0(.1' o(.~, ... "'.,.

Set. or acti:~ pro~abiJ.ities of A, at Instant t , t - 0,1,2 ..... .

n-t'1 Level A'

~-1

=

Ail i2'" ~-1' automaton connected to action 0(, i 1 i 2 ... ~-1 or n-1 th level

i2···in-1can takevalues 1,2, .... r.

= Action

of automaton Ai _1 n = Reward ;,Jr'Jh ,lbility for actionO<: in

-

and

~1TiN (t) = 1 .11,12:' . -. .I

In Fig.

2,

-

(180)

N

Baba ana Sawaragi (1975), the absolutely expedient updating scheme ensures E-optimality. even under non-stationary random environments, provided the penalty probabili ties (WhiCil ar8 n ,);/ time varying) satisfy the following condition:

t

Action prooability ofo(i n at instant 't'

It is to be noted tna t in the hierarchical set UiJ each action at t}le l.ast level nas a unique pat:l to be traversed to reach it from the first level automaton. The product or the probabilities of actions lying on this path is called the pa th probability and is denoted by "TT. (t) =1i ~)1 . IN 11 2" 1.N' = Pi 1 (t) Pi_2 (t) .... Pi N (t) (1 8a)

Learning Automaton

S\4.p ~C-t(l:)} .:. l ... ~ \ Cjo.), j 4=- t}-

=

1067

1T~~~

= P2(t)

P21 (t) Pn2(t)

The notion or absolute expediency and ~ - optimality can be extende:l to the lUerarc\ical s.~t­ up also by considering an e~uival~nt single automaton . The actions of t:le equivalent auto,l1
M. A. L. Thathachar and K. R. Ramakrishnan

1068

~

~

I I I

~

z

cC' ..

o ~

>

J1

I

~

I

z

131

®...

®

I

0

0::

>

J

r----l-- ~----

~

2rtVEL2- 1 1

W oC . . --I

®

I

~

®

r-L: A11

Z

LEVEL 1 I

Zl J1,J 2

t-

r----------~

I

2

LEV~L31

----

~1

I~A ,-

®

A22

I

~3

I I I

Fig.

2.

Hierarchical System

'algorit .-rrr, given ay (9) for a single automaton. Tne basic idea is to employ an ahsolutely expedien t a 19ori t:un for the au toma ton in the hierarchy, be functions in t,e a lgorithm being suita1.Jly se lected so as to rr.ake t'1e entire system absolutely expedient . The action probahilities of automata not in the selzcted path are left unchanged. The genera l E)rm of the algoritlu-n can ':>e stated as f ollows . Upda ting Algori thm . Let o(it..c('~'\:be the actions chu!>!.:n at first level, second ev·; : re spec tively, at instant' t' (i1 , i 2' .. . iN take integer valu~5 in t1e ~a'1b2 1 to r). The c orresponding automata are A ,Ai ,Ai 1 .. . Ai!':!:.!. and the reward vector [~"~.i' ~~- ...=.

e'f!:..

The hierarc'.1Y .:>:: automata wit" reactions at eac h 1evel, opera ting according to the a 19orithm descri!-/~j .i'OV ~~, is f.- optimal i~, N LI(t) S~I "Is C};;) As

= N (t) = ~

L n

/

. (1;+-0. &.t"~ .(.i.-tl)

7f s(j;) A~ PI- j

5:1'1'

(7\= 1., "

p,2,. ... p~

(2(;a)

N)-

W;len t hE actions s elected at various levels,at • t. . . . 0(.!n,' . In" , an t-' - t' a r •. oG·11, ~ 12' .. e t c. Further ~.s~)"' 1- ~~6:) , ~£(t) oeing the reaction of the envir.mmznt at the s-th level at instant 't', and AS (S =1,2 , .. . • N) are positive constants satisfying I'll

First :ev
O~ ~As~1

Pi (t+l)= pi1(t) + L I (t)(l- P i (t)) 1 1 Ph (t+l) =Ph (t) (1-L 1 (t))

(19a)

(j1 • i 1 ) Note that Li(t) is, in general, aobreviation for

Li [E(t)] ( i = 1,2, . . . N). Here g(t) is the set of all action probabilities in the hierarchy. n-th level. Pi (t+I) = PL(t) + ~(t)(I -Pi (t)) n

Theorem 1.

~

n

(19b)

p~~J' = PAt) . • (1 - L( t))n-1 !!. l~J~ n (j ... i ) n n

For all other actions, belongu1g to ot!1er automata at the n-t, level, p . (t+I) Pj (t) n In Associated with each leV'el, there is a 'L' function, and the specifica tiDn of these functions which lead to the convergence of the hierarchy, constitu tes the follo wing theorem .

=

(200) Sel Remarks. 1) Tile s-tn level L function, Ls(t) depends on the sum of reward parameters :>.,,(n=s,s+.I. ". s+k, .. . N). Wh:~n the reaction Obtained by the (s+k)t!1 1e'J2~ (say) is a 'penalty' (i. e . , ~ (t)=l) s+k the (s+k)th level does not make any con trib \.I tion to 'L'ns, (n=l, 2, ... s+k). The path proba !:> i lities remain unchanged only when all the N reactions from all the levels are ';.)2nal ty'.

2) The updating proceeds sequencially from top to bottom. First LI(t) is computed and Ph (t) (j 1 = 1,2, . . . r) upda ted. T~e calcula tion of L 2(t) involves a division by the updated value of Pi (t) i.e. :o"\(t+ll, Similarly, L 3 (t) computa tio~ involves division hy the upda ted values of p . (t) and p . At) (i. e. Pi (t+l) and PI' i (t+1)) 11 1112 1 1 2 and 50 on for L (t), LS(t), .... LN(t). 4

An Aut omaton Model of a Hierar chical Learnin g System

1069

PROOF OF THEOREM 1 The idea behind the proof is to show that the hierarchical set up of automata is equivalent to an automaton wit" r N actions, (corresponding to t ,1e numoer of pat,1S in the nierarchY), opera ting in a random envirorunent witn binary reac tions. T .1e equivalent learning au toma ton is tnen proved to b e a a solutely expedient and nence € - optimal.

; ((ji's(i=1,2""N) can take values s=1 average of reward

1,~,3,

i. e . Dq

. . r)

(;"5a)

is ti1e weignted

proaabil~es d q cl d l' q2'" q lying in the patn, defined by acgonSO<.ql'~ .. "<.qN '

_

Since

0< s., .2.'Af-1

O~'Xs<.l.,

and

are probabilities, O.t.. D <. 1 qs qN _ Let the optimal pa th be defined by ac tions 0(.* 0<.* . o<.t: · Then, j 1, J2 _ ' . . . . I_ N (25b) d"': = Il}ax (d . ) (j1 = 1,2, .... r) Jl Jr J1 (25c) dt: = max (d.}(k= 2,3, . .. N) Jk j 1 .. jk J,1i

d For mathematical si,nplicity, the hierarchical learning algorithm (19) is expressed more compac tly as follo ws. Let a t instant ' t' , oCi1, O(,i 2 .... 0<: iN) ~ Actions selected at various levels

(~1' ~2"'"

'"'l s ~

1 -

J3 N) =

Reactions obtained at various levels .- s = 0 or 1 (.l

f!' si

~

n

(t) (21a)

~

(t+l) p . (t+l) . . p . (t+l) Jl ~ In - 1 (jn = 1,2, ..... r)

b

€. (t)

N

1

S '"

N

Pjl(t) ~.,s As

2. 1s "s -

i j1 1

s~

s~ (t)~.. (21b)

N

~jn ~ "?s ~s - -s=n

1~1 Jl!:.l·~'S\

- PJ n -

.$-1\

and ~. . . -f . -' . - . . . \J -11t I 1 -J 1 , 12 - J ," \= Jk 2 k = 0 otnerwise. Proof. From (1 8a) and (21a) it can be easily Checked . j (tH)

_11'.

1i

-

(t)

Jli

N



(t)

+11. (t) ~ k J1: k=l 1T. (t) ~

;(22)

Denoting the increment in the conditional expectation of 1fj (t) by A1i j (t), we have from (22) II !:!

A1f·

IN

(t) = E

[1f.

(t+l)

-l

IN

_11 j

(t)

I

p(t)]

l:l-

I

J

N E [€s(t) E (t)] jN(t) ~ (23) s=1 Ti J· (t) s 's are linear in A's andhenceAli j (t) is also

-

1f

=

A d"#: s J s N

~ 'As



n-th-level PJ' (t+l) = p' (t) + n In

tha~

~

D"': _ s =1 JN-

I-level

€' n(t)

hence, N

E lin~r

in ]I. 's. ExpressingAii (t) as~linear jW combination of A's, it can be shown 1 that

s=1 From (26) and (24),6.11:," (t) >n (27) IN C;:omparison of Egs . (14) and (24) suggests that the behaviour of the hierarchical system is equivalent to that of a stochastic automaton wnose action probaoilities are 1T · (t), (j 1 ' j2' ... jN = 1,2, .. r) Jopera ting in ~ellLViron­

t

t

me~t with reward prooabili ties Dj , (j l ' j 2' ··· IN= 1.,2, .. employing an abso1ttely expedient algoritnm of tne type linear rewardinaction. It follows from previous arguments tnat t,1e nierarc .t:ical system is absolutely expedient and nence € - optimal.

r)}

COMMENTS ON THE ALGORITHM. 1) In theorem 1, the division ay the product of probabilities may appear questionable, as the probabilities may go to zero . It is shown presently that such an event does not arise w. p.1 and the algorithm presents no computational problems. Wheno(l' 0(1' o(i aretheactionsselecl' -2"" -N ted at first level, second level etc, we ;18ve from (19a), (19b) p . (t+l) = L (t) + p. (:1:) (1 -L 1 (t)) 1 11 11 N i . e. Pilt+1) ~ L I(t) =sA s (28a) N s =1 rJ~'A s <'1,~A s <.1 As s =1 N

L."7

~"1s).s s =2 P. (t+l) 11

1

(24) Details available with authors.

~1

(28b)

M. A. L. Thathachar and K. R. Ramak ris hnan

1070

Likewise we can S .10 d Lla t Ln(t) ~ 1 (n = 3,4,5, . . . N) Since Ln(t)'s are uniformly bounded, no computa tiona l problem arises because of tile division operations in t .le a lgoriLlm. 2) Fro m (27), if tne optimal pati'! is defined by .

0(, *

0(. *

0(,*

t :le actlOns i, i , ' . .. _IN ' then the expec. -2 l pa tn' prob abl' ll' ty mcreases . ' 1optIma ta tion 0 f tle 'Y'.onoton:ically wi th 't' . 3) We ~ave seen that wnen the hierarchy operates according to the al gorithm proposed, the eq uivalent single learning- automaton behaves as a linear reward -inaction automaton . It is interesting to note that to acCUeve this b eloaviour, the individual au to ',Tlaton ~s to be upda t ed in a nonlinear manner. 4) If the contributions of the first N-1 levels, of the N level hierarc!lY, to 'L' f unctions are ignored, i .e . "'s = 0 , (s = 1,2, .. N-1) the hierarchical set-up reduces t o the set-up described by the authors (1 98 (; ). SIMULATION RESULTS A 3-level hierarchical s y stem, with 5 actions per automaton was simulated on a DEC.::.l0 co mputer. T ile second level consisted of 5 automata(i. e . tota l of 25 actions) and t,le third level 25 automata( i. e. a tota l of 125 ac tions) . All t~e penal ty probabilities were chosen above 0.2 using a random nurrwer generator . T ne penalty probabilities for tne actions lying on tole optimal pat:l were all set to be equal to O. 2 and 1Is(s = 1,2,3) were all cnosen to be equal to 0.003. Table 1 gives avera ge optimal pa th probabilities, at intervals of 1000 iterations. The averages were taken over 20 experiments , which were all successful. T ;le convergence of the hierarchy occurs in around 1801) .1 iterations . TABLE 1 Iterations

Average optimal path probab ili ty

1000 2000 30:)!) 4()f)1) 50 00 6000 7000 8000 90 00 1 0nO I) 11000

n. n G. 311 6 0 .5 06 0. 659 0. 771 0. 844 0. 890 0.9 21 O. 946 0. S62 0.S99

CONCLUSIONS An automaton model of a :Uerarc ,ucal learning s ystem .18S been introduced. T .le E - optimal rein forceme n t s e ne,ne for au toma ta at vari 0us l eve,s take,; int,) acc ·) unt t:le acti on pr oba bilities and environmental reac tions at o t ner levels. A note v or thy fea ture of t:le learning a igorit ;"-n is that updating at 0 ;'1 0) level involvE': division by ac H '_'I: probabilitlc!'; at the previous levels. The implication of t :, e information transfer necess itat.:d by the algorit"lm needs to b e explored . The algorithm appears to b e relevant in the routin g problem in ne tovorks . Further, it cou: a pr'.:>·)a') lJ be extended t D work in non- s :ca '-:; Jnary environments described by Narendra and T ha thac !oar (1 ~ 0 ) RE E'ERENC £'3

As o, 1. , and M. Kimura (1979). Absolute expedienc:, of learning automata . Inf. Scien~, 17,91-11 2. Baba , N. and Y. Sawaragi (:1975) . On learning b ehaviDll!:' of stochastic automata under non stationary rand om e nvironme nts . ~ Trans . Syst ., Man & Cyb e rn . (USA ). 273-276. BUS .l, R. R. , and F . Mosteller (1958) . StOCi'laS tic models of !8arning . Wiiey, NeN ¥or:<. Fu , K. S. (1970) . Learni!1 ~ control systems - a review and outlook . IEEE Trans. AU:Jm. Control(USA) . . :,2" 21 0-221. Laks .lmivara l1dn,S . and M.A. L. Tnatnacnar(197o) Bounds on tne probability o f convergence of learni ng automata ,.. IEEE Trans.. Syst. Man & Cybern (USA) . , 6 , 222-226 . Lakshmivara~n , S. and M. A:L. Tha t l18Ci18r(1973). Aosolute J.Y expedient learning algorithm for stoc .IiiStic automa ta. IEEE Trans .

,2.:

Sy st., JI'!an I':~ CjOEorn . (USA) . , 3 , 281-28 6. Mesarovic, ]'v;:. D.and D. !\II-dcko(l 969) . Foundations for a scientific theory of hierarchical systems . In Lancellot Law W nyte(Ed . ) , Hierarc mcal S truc tures I American Elsevier Publishing Company Inc . , New York. Narendra, K. S. and M. A. L. Thathachar (1974). Learning automata-A Survey . ~EE Trans. Syst" Man & Cyu ern . (USA) . ,i: 323-334. Narendra, K. S. and M. A. L. Tna thac har( 1 980) . On the behaviour of a learning automaton in a changing environment ,."ith apPlication . IEEE . Trans . Syst. , Man & Cyoern(USA)., .l..!l.., :
An Automaton Mode l of a Hierarchical Learning System Disc us s ion to Paper 35 . 3 N. Baba (Japan) : You considered two-dimens i onal problems. Can you apply your method to high - dimensional problems? Rastrigin has published a paper in which he states that the random optimization method is better than the gradient method. K. Kamei (Japan): The model o f heuristic search behaviour can apply to high -dimensional problems. But the sea rch of the model does not always converse the global optimum point with the probability 1. We shall compare o ur heuristic method with random opti mi zation in two-dimensional and high-di mens ional unknown functions.

1071

S. Tan imoto (Japan) : ~Ihen you started the experiment did you already know that three local maxima existed on the Test Hill in Fig. l? If so , can you please discuss or evaluate what influence t he pre -given information had. When it comes to their application we usually don't know the frequency distribution o r the number of local maxima, =ntinuous or not. This res ults in the difficulty of the heurist ic search application in the fields of engineering. K. Kamei (Japan): The Test Hills used in the experiments have two to four peaks (local maxima). The subjects (human searche rs ) were not given the information o f the numbers of peaks prior to the experiments.