Urban Intelligent Traffic System Based on Multi-Agent

Urban Intelligent Traffic System Based on Multi-Agent

Copyright @ IFAC Control in Transportation Systems Braunschweig, Gennany, 2000 ' URBAN INTELLIGENT TRAFFIC SYSTEM BASED ON MULTI-AGENT Haitao Ou We...

1MB Sizes 6 Downloads 93 Views

Copyright @ IFAC Control in Transportation Systems Braunschweig, Gennany, 2000 '

URBAN INTELLIGENT TRAFFIC SYSTEM BASED ON MULTI-AGENT

Haitao Ou

Wenyuan Zhang

Yupu Yang

Xiaoming Xu

Department ofAutomation, Shanghai Jiaotong University. 200030, PR.C

Abstract: This research addresses a multi-agent coordination in urban traffic control which is to coordinate the signal of adjacent intersections for the minimize waiting car's queue in the urban traffic network. A multi-agent coordination is adopted, which use the Recursive Modeling Method (RMM) to select rational actions by examining with other agents through modeling their decision making in a distributed multi-agent environment. We describe how decision-making using RMM and Bayesian learning is applied to the urban traffic control domain to settle a multi-agent traffic control system and show experimental results. Copyright © 2000 IFAC Keyword: agents, traffic control, recursive modeling, learning, coordinate

coordinate two controllers for two directions in that intersection. Findler (Findler, 1991) also proposed a 1. INTRODUCTION

hierarchy architecture for traffic network. Other applications in this area are described by Weib (Weib,

Optimization of signal control for a group of a large

1998).

number of signals is an important problem in the field of urban traffic control. Much have been studied

We introduce a multi-agent approach based on RMM

for the control of signals. However, although existing

and Bayesian learning in the domain of urban traffic

control methods work well under real traffic

signal control. The main motivation for this approach

environments, it still suffers from being its control

is that RMM enables an agent to select his rational

deteriorated under rapidly and massively changing

action by examining the expected utility for his

traffic conditions, such as caused by traffic accidents

alternative behaviors, and to coordinate with other agents by modeling their decision making in a multi-

(Papageorgiou, et af., 1989).

agent environment, hence modeling coordination To overcome these difficulties, we proposed multi-

with

agent

and

Bayesian learning is used, in conjunction with RMM,

transportation management is well suited to multi-

to update the beliefs about the other agents based on

agent based approach because of its geographically

their observed actions. We feel that Bayesian belief

distributed

update

approach.

The

nature.

domain

For

of traffic

example,

Burllleister

low

tS

or

no

communication

particularly

useful

requirement.

in

dynamic

(Burmeister, et al., 1997) described a multi-agent

environments characterized by unanticipated change,

system for implementing a future car pooling

such as the urban traffic control domain. We describe

application. Goldman (Goldman and Rosenschein,

how decision making using RMM and Bayesian

1996) addressed an incremental and mutual learning

learning is applied to the traffic control domain.

method in traffic multi-agent systems, which is to

567

2. RECURSIVE MODELING METHOD RMM

is

addressed

by

the model M~ as:

Gmytrasiewicz

BEL(M):= P(Mt I a~)

(Gmytrasiewicz and Durfee, 1995), which endows an

=P(a~ IM~)P(M~)/P(a~)

agent with a compact specialized representation of other agent's beliefs, abilities, and intentions. As

Where: P(M ~ I a~) : the probability of model Mt,

such, it allows the agent to predict the message's decision-theoretic

pragmatics.

Based

on

message's decision theoretic pragmatics,

(I)

the

given the observed action a~; P(a~ I Mt): the

RMM

quantifies the gain obtained due to the message as the

probability that agent j

increase in expected utility obtained as a result of the

a~ if M/~

interaction. The details is omitted due to space limit.

would choose the action

j;

is an accurate model of agent

P(M~) : the prior probability assigned to the given

3. BAYESIAN LEARNING

model;

Because of the dynamic environments characterized

P(a~): the probability of this particular

observed

data.

In

by unanticipated change, we use Bayesian learning,

this

formula,

P(a~)

is

a

in conjunction with RMM, to update the beliefs about

normalization constant, resulting from the condition

other agents based on their observed actions.

that probabilities of all models sum up to 1.

Specifically, Bayesian learning can be used to update

P(a~ I Mt) is already known from the solution of

agent's belief about the other agent b changing the probability associated with the other agent's models

the existing recursive model structure. The above

based on their observed behavior. The belief updating

belief update can be performed incrementally during

procedure consists of four steps (Anthony and Biggs,

a sequence of interactions, based on the other agents'

1992), the specified procedure is omitted here due to

observed behaviors.

limited space. Just to explain the fourth step in detail, suppose that agent i has a number of alternative models, My

4. URBAN TRAFFIC MULTI-AGENT SYSTEM

= {M~,. .. ,M:;}, that he uses to predict

agent j's (i:t. j) behavior. The agent i's belief about agent

j's

model

BEL(M~),

which

will

probability

assigned

is

Bayesian learning update for Urban traffic. The

by

component of an agent's mental state depicted in be

to

denoted

We designed multi-agent system based RMM and

represented this

as

model.

a

Environment and

Let

A, ={a~,a,2, ... ,a~} be the set of actions available to agent i. The prior probability that model M,~ is correct is

P(M~). Now, suppose that agent

observed that agent

j

executed an action

i

a~.

Agent i can now update the probability assigned to

Fig. I.

568

The components of agent's mental state

Figure I, we use a version of the frame-based system.

entrance to crossroads. In our work, decision problem

Thus, frames are used to represent the agent's beliefs,

is represented by the payoff matrix representation

capabilities, and preferences, and to organize his

used in game theory. The elements of payoff matrix

knowledge using the object-oriented paradigm into

are waiting car's number in this network. Obviously,

hierarchy of concepts and classes.

the object of optimal control is minimize the waiting car's queue in the network as far as possible. At intersection, there is a minimal cycle for signal

4.1 Attribute Analysis and Generation of A Payoff

control action (too short time will be dangerous for

Matrix

passing). We defined this cycle with 24 seconds. At the beginning of a cycle, RMM is used to calculate

We consider a simple traffic intersection network,

which could be selected. The distance of two agents

which is consisting of two intersections, as shown in

is 100 meters.

Fig 2. Agent Al and AZ represents this two

{ao, a1 ' Q2}

intersections. The set of action is

.

Where: Action ao represents horizontal driving is

4.2 Modeling Other Agents - Recursive Model

available. Action

at

represents vertical driving is

structure

available. Action

a2

represents both of horizontal

A2

Level I :

a o a 1 a,

and vertical driving is not available.

ao 45

a l 50 52 80

Al Each end of the roads is assumed to be connected to

a 2 63

roads according to random arrival. Each of the cars

= 30km / h.

Al

10 «1 =4sec

represents the time consumed by the acceleration

a o at a,

Level2:

to

the regular speed d. When a car passes a cross road,

ao 45

A2

it changes its direction according to the probability associated with that crossroad, specially let d" i = 1,2 be the next directions for a car, that is

{d,}

= {Forward, Right} . At each

approach

is

to

take

the

Fig. 3.

~,L~ '~6

I

it~o~m=,

The recursive model structure for Al

matrix above, agent A I needs to hypothesize the

9+:

.

[0,0, I]

In order to solve his decision, described by the payoff

*The number represent the waiting cars number

5-'

,

::-~~:I!1f~ ao at a, [i ,0,0] [0,0, I]

[1,0,0]

from traffic flow sensors which are settled at the

8.......~ 5....

a1 53 52 75 a 2 69 80 85 O,~~

~IE~

agent-oriented

view the decision making through the information

~

50 63

ao 45 53 69 Al at 50 52 80 a 2 63 75 85

perspective. In scenario such as one in Figure I, we

3+:

No-Info [1/3,1/3,1/3]]

A2

Level3:

of the crossroads,

the probability {Pd ,} is {0.5, 0.5}.

Our

75 85

0.8/~

external traffic, and car are assumed to arrive those runs at the same speed d

53 69

likely actions of the agent AZ. In the Recursive Modeling Method, the actions of the other rational

~

agents are anticipated using a model of their

Il--

decision-making situation. If it were known that the agent A2 is a rational-maker, then agent AZ decisionmaking could be modeled as a payoff as well. In our

10 10 Fig. 2. a simple traffic network consisting

study, we considered a more realistic situation in which it is not known for sure that A2 is rationally

of two intersections

569

minimizing the payoff as well. For example, it could

express the optimal choice in the agents' decision

be that an accident happened in A2, in which case

making situations recursively depending on the

there is no information as to what action he would

choices of the other agents, one can use dynamic

undertake. Thus, there are two alternative models that

programming to solve them in a bottom-up fashion.

Al can use to model A2; one has the form of the

Here,

payoff matrix, and the other one contains no

numerical method of solving the models using logical

information about A2's action. We call the latter

sampling, according to the algorithm below.

we

have

designed

and

implemented

a

model the No-Info model. In RMM, each of the alternative

models

is

assigned

a

probability

Ai

Let

indicating the likelihood of its correctness. Further, it is likely that A2 is modeling A 1 as well, in his own

= {a; ,a,2 ,.. ·,an

be the set of actions 2

n

available to agent i. Let Pi =[p; ,Pi ""'Pl

attempt to coordinate with Al.

]

be

the probability distribution over the available actions, The resulting hierarchy of model, which we call the

i.e., the conjecture as to agent i's action. The

recursive model structure, terminates with a No-info

conjecture, Pi' is calculated based on frequencies,

model when the agent (in this case A2) runs out of

F,

modeling information. Fig3 is the A I's model

=(f/ ,f/ ,... , f,n ) , with

which each action turns

structure of depth three for the example scenario in

out to be optimal, as follows:

Fig 2.

Input the payoff matrix M i of agent i. Output the probability distribution, Pi' over the

To summarize, level I in the recursive model

actions of agent i.

structure represents the way that A I observes the

Begin F;

situation to make his own decision, shown as AI's

for each probability distribution, PN

payoff matrix. Level 2 depicts the model

f-

(0,0"" ,0), N

f-

0 ,

in the set of

A 1 has of

sampled probability distributions comprising the

A2's situation, and level 3 contains the models that

No-Info model; multiply probability distribution

A 1 anticipates A2 may be using to model A I. The

PN by M , . Select an action

recursive modeling could continue into deeper levels, but in this case we assumed that A1 has no further

a;

which has the

maximum expected utility.

information. In other words, we are examining the

f/ f-f/ +1, Nf-N+I

reasoning of A1 in the particular case, when equipped with a finite amount of information about

End for P, f-[f/ I N,f/ I N, .. ·,f"t IN]

A2, and nested to the third level of modeling. In general, the models have to differentiate fU11her

Return

among knowledge limitation of the different agents

End

involved. For example, the situation in which Al does not have any information about how he is

The dynamic programming bottom-up solution starts

modeled byA2 is different form the situation in

at level 2. Using the sampling algorithm, with the

which A 1 knows that A2 have no information about

sampling density of 0.1, on the third level of the

A I. No-Info on level 2 in Fig. 3 represents the fact

recursive model structure, reveals that probability

that A I has no information on how A2 models A I' s

distributed over A I's actions becomes [0.68, 0.32,

actions if A 1 occurred unexpected change (e.g.,

0.00] for

accident). No-Info on level 3 represents A \·s belief

Go,

al

and

G2 ,

respectively. This

that A2 has no information about how A I models

probability distribution summarizes AI's knowledge

A2's actions, if Al has no unexpected change.

as to A2's expectations, if A2 is not incapacitated, about the actions of AI, if A2 thinks that A 1 is not

Since the modeling structure, such as one in Fig 3,

incapacitated. The dynamic programming bottom-up

570

solution then

proceeds

to

level2.

The

above

formula described in section 2 about our scenario, A I

distribution over the actions of Al is combined with

has two alternative model of A2 as described above.

each individual probability distribution ofNo-lnfo on

Consider the case A I observing that A2 executed the

level2, after each distribution is multiplied by weight action

0.8 and 0.2, respectively. The sampling No-Info-PDF procedure is invoked again, resulting in [0.22. 0.78, 0.00] over A2's actions of a o ,

a1 and

a2 ,

A2's model proceeds as follows: BEL(Mf2) == p(Mf2Ial)

expect that, if A2 is rational, it will most likely

I I = P(aIIMI2)P(MI2)IP(al)

al'

results

= (0.78*0.8)/0.69 = 0.903 into the

as

a

combination

(0.8*[0.22,

(5)

2 BEL(M I 2 ) == P(M?2 I a 1 )

level I , the

=P(a I IM,22)P(M?2)/P(a,)

probability distribution describing A2' s actions is obtained

and

respectively (see Figure 2). The belief update about

third level of modeling, AI's knowledge is enough to

Propagating these

Ml 2

The beliefs about models

M I22 are initially assumed to be 0.8 and 0.2,

respectively. Thus, in spite of uncertainties on the

attempt to execute action

al'

(6)

= (0.33 *0.2) / 0.69 = 0.097

0.78,

0.00]+O.2*[l/3, 1/3,l/3]), which results in [0.24,0.69,

where

0.07]. Now, we can compute the expected utilities of AI's alternative behaviors as follows:

Ml 2 :

the model of A2 with normal traffic

condition. M I22 : the model of A2 with traffic accident

a o : 0.24*45+0.69*53+0.07*69=52.2

(2)

a 1 : 0.24*50+0.69*52+0.07*80=53.68

(3)

a 2 : 0.24*63+0.69*75+0.07*85=72.82

(4)

occurred. The Bayesian belief update can enhance the

quality of coordination

during

subsequent

interactions, by correctly recognizing the models of the other agents from their behavior. In our case, Bayeesian belief update is invoked in every cycle to

Thus, if Al is rational, he will attempt to maximize

monitor the change of traffic network and adaptively

his own expected utility, here is minimal waiting

adjust coordinated actions.

traffic flow, and prefer to execute action

Go'

given

that the expected it most likely that A2's will select action

5. EXPERIMENTAL RESULTS

a l simultaneously. At the beginning of each

The computer simulation is written in VC++. Our

cycle, this calculation procedure will be called.

experiment was aimed at determining the quality of modeling and coordination achieved by the RMM agents in a team and Bayesian belief update, when

4.4 Bayesian Belie/Update

compare to fixed sequence traffic signal control for lone junction and traffic network signal control

In an uncertain and ever-changing situation, such as

through hill-climbing process. The saturated traffic

the traffic scenario described above, it would be

flow in the network is 4100 vehicles per hour, and the

useful for to be able to recognize that has a erious

simulation time is 3*I0 5 signal cycles. The object of

accident, and to adjust accordingly. Specifically,

control is to minimize the waiting car's queue in

Bayesian learning can be used to update agent's

front of intersections. The result is listed in table I.

belief about the other agent by changing the

M I, M2 and M3 represent the multi-agent method

probability associated with the other agent's model

described in this paper, the network signal control

based on their observed behavior. According to the

adopting the hill-climbing method (Robertson, 1993),

571

and the fixed sequence traffic signal control for lone

Method can be applied to high-level tactical decision

junction, respectively.

making for minimizing the waiting car's queue in

Table I the comparison of average

waitin~

car's

front of intersections. Through Bayesian learning, the

queue during each cycle for three methods

agents could increase the quality of the coordinated decision-making achieved within RMM framework.

Average waiting car's queue during each cycle

Thus, a rational agent could recognize the models of

MI

M2

M3

effectively in unexpected changing circumstances.

<0.05

<0.05

<0.05

1008 1152 1296 1368 1512 1656 1728 1944 2088 2186

0.1 0.15 0.26 0.34 0.47 0.77 0.91 1.23 1.94

0.12 0.17 0.29 0.48 0.80 1.08 1.67 2.10 2.45

0.19 0.24 0.37 0.75 1.12 1.95 2.54 3.38 5.80

2.58

3.87

6.78

2320 2548 2670 2858 3016 3120 3368 3420 3608

3.56 4.03 4.85 5.60 6.48 7.86 9.22 10.26 11.22

4.72 6.43 8.77 10.27 13.89 15.73 17.23

8.92 11.2 16.56

Traffic flow Our entered Experimental Network (vehicle/hour) <1008

the other agents correctly and coordinated with them

Because the existed method of urban traffic control

* *

cannot meet the continuous increasing demand of traffic, and the nature of urban traffic network architecture is very suited to apply multi-agent method, we think that settling the multi-agent traffic control system is a new and best way to solve traffic problem.

REFERENCES

Papageorgiou, M., J.M. Blosseville, and H. Hadj-

* * * * * *

Salem (1989). Macroscopic Modeling of Traffic Flow on the Boulebard Periherique in Paris. Transportation Research, vol.B23, pp.29-47. Burmeister, 8., A. Haddadi, and G. Matylis (1997). Applications of multi-agent systems in traffic

• represents the waiting car's queue is very long,

and transportation. lEE Transaction on Software

that means traffic congestion.

Engineering, 144(1 ):51-60. Goldman, C. V. and 1. S. Rosenschein (1996). Mutual

The above results show that the perfonllnllce of

supervised learning in multi-agent systems.

multi-agent signal control based RMM and Bayesian

Distributed AI, pp: 85-96.

belief update is better than others. When the traffic

Findler,

flow is sparse, multi-agent signal control has no obvious advantage.

But when the traffic

N.v.

(1991).

Distributed

control

of

collaborating and learning expert systems for

flow

street traffic signals. In Lewis and Stephanon,

becomes more congest, the advantage of multi-agent

editors, IFAC Distributed Intelligence Systems,

method is obviously. Specially, when the traffic

Pages: 125-130. Pergamon Press.

congestion is occurred in M2 and M3, the method

Weib,

addressed in this paper can still control the traffic

G.

(1998).

Introduction

to

Distributed

Atrificial Intelligence. MII Press. Cambrige,

flow effectively.

MA. Gmytrasiewicz, PJ. and E. H. Durfee (1995). A rigorous, operational formalization of Recursive

6. CONCLUSION

Modeling. ICMAS-95, pp:125-132. Anthony, M. and N. Biggs (1992). Computational

This paper presented a study in mode ling. and coordination

in

the

multi-agent

Learning Theory. Cambridge University Press.

distributed

Robertson,

environment of urban traffic network control. This

D.l.

(1988).

TRANSYI:

A

Traffic

Network Study Tool, Ministry of Transport,

investigation implies that the Recursive Modeling

RRL LR253

572