Copyright @ IFAC Control in Transportation Systems Braunschweig, Gennany, 2000 '
URBAN INTELLIGENT TRAFFIC SYSTEM BASED ON MULTI-AGENT
Haitao Ou
Wenyuan Zhang
Yupu Yang
Xiaoming Xu
Department ofAutomation, Shanghai Jiaotong University. 200030, PR.C
Abstract: This research addresses a multi-agent coordination in urban traffic control which is to coordinate the signal of adjacent intersections for the minimize waiting car's queue in the urban traffic network. A multi-agent coordination is adopted, which use the Recursive Modeling Method (RMM) to select rational actions by examining with other agents through modeling their decision making in a distributed multi-agent environment. We describe how decision-making using RMM and Bayesian learning is applied to the urban traffic control domain to settle a multi-agent traffic control system and show experimental results. Copyright © 2000 IFAC Keyword: agents, traffic control, recursive modeling, learning, coordinate
coordinate two controllers for two directions in that intersection. Findler (Findler, 1991) also proposed a 1. INTRODUCTION
hierarchy architecture for traffic network. Other applications in this area are described by Weib (Weib,
Optimization of signal control for a group of a large
1998).
number of signals is an important problem in the field of urban traffic control. Much have been studied
We introduce a multi-agent approach based on RMM
for the control of signals. However, although existing
and Bayesian learning in the domain of urban traffic
control methods work well under real traffic
signal control. The main motivation for this approach
environments, it still suffers from being its control
is that RMM enables an agent to select his rational
deteriorated under rapidly and massively changing
action by examining the expected utility for his
traffic conditions, such as caused by traffic accidents
alternative behaviors, and to coordinate with other agents by modeling their decision making in a multi-
(Papageorgiou, et af., 1989).
agent environment, hence modeling coordination To overcome these difficulties, we proposed multi-
with
agent
and
Bayesian learning is used, in conjunction with RMM,
transportation management is well suited to multi-
to update the beliefs about the other agents based on
agent based approach because of its geographically
their observed actions. We feel that Bayesian belief
distributed
update
approach.
The
nature.
domain
For
of traffic
example,
Burllleister
low
tS
or
no
communication
particularly
useful
requirement.
in
dynamic
(Burmeister, et al., 1997) described a multi-agent
environments characterized by unanticipated change,
system for implementing a future car pooling
such as the urban traffic control domain. We describe
application. Goldman (Goldman and Rosenschein,
how decision making using RMM and Bayesian
1996) addressed an incremental and mutual learning
learning is applied to the traffic control domain.
method in traffic multi-agent systems, which is to
567
2. RECURSIVE MODELING METHOD RMM
is
addressed
by
the model M~ as:
Gmytrasiewicz
BEL(M):= P(Mt I a~)
(Gmytrasiewicz and Durfee, 1995), which endows an
=P(a~ IM~)P(M~)/P(a~)
agent with a compact specialized representation of other agent's beliefs, abilities, and intentions. As
Where: P(M ~ I a~) : the probability of model Mt,
such, it allows the agent to predict the message's decision-theoretic
pragmatics.
Based
on
message's decision theoretic pragmatics,
(I)
the
given the observed action a~; P(a~ I Mt): the
RMM
quantifies the gain obtained due to the message as the
probability that agent j
increase in expected utility obtained as a result of the
a~ if M/~
interaction. The details is omitted due to space limit.
would choose the action
j;
is an accurate model of agent
P(M~) : the prior probability assigned to the given
3. BAYESIAN LEARNING
model;
Because of the dynamic environments characterized
P(a~): the probability of this particular
observed
data.
In
by unanticipated change, we use Bayesian learning,
this
formula,
P(a~)
is
a
in conjunction with RMM, to update the beliefs about
normalization constant, resulting from the condition
other agents based on their observed actions.
that probabilities of all models sum up to 1.
Specifically, Bayesian learning can be used to update
P(a~ I Mt) is already known from the solution of
agent's belief about the other agent b changing the probability associated with the other agent's models
the existing recursive model structure. The above
based on their observed behavior. The belief updating
belief update can be performed incrementally during
procedure consists of four steps (Anthony and Biggs,
a sequence of interactions, based on the other agents'
1992), the specified procedure is omitted here due to
observed behaviors.
limited space. Just to explain the fourth step in detail, suppose that agent i has a number of alternative models, My
4. URBAN TRAFFIC MULTI-AGENT SYSTEM
= {M~,. .. ,M:;}, that he uses to predict
agent j's (i:t. j) behavior. The agent i's belief about agent
j's
model
BEL(M~),
which
will
probability
assigned
is
Bayesian learning update for Urban traffic. The
by
component of an agent's mental state depicted in be
to
denoted
We designed multi-agent system based RMM and
represented this
as
model.
a
Environment and
Let
A, ={a~,a,2, ... ,a~} be the set of actions available to agent i. The prior probability that model M,~ is correct is
P(M~). Now, suppose that agent
observed that agent
j
executed an action
i
a~.
Agent i can now update the probability assigned to
Fig. I.
568
The components of agent's mental state
Figure I, we use a version of the frame-based system.
entrance to crossroads. In our work, decision problem
Thus, frames are used to represent the agent's beliefs,
is represented by the payoff matrix representation
capabilities, and preferences, and to organize his
used in game theory. The elements of payoff matrix
knowledge using the object-oriented paradigm into
are waiting car's number in this network. Obviously,
hierarchy of concepts and classes.
the object of optimal control is minimize the waiting car's queue in the network as far as possible. At intersection, there is a minimal cycle for signal
4.1 Attribute Analysis and Generation of A Payoff
control action (too short time will be dangerous for
Matrix
passing). We defined this cycle with 24 seconds. At the beginning of a cycle, RMM is used to calculate
We consider a simple traffic intersection network,
which could be selected. The distance of two agents
which is consisting of two intersections, as shown in
is 100 meters.
Fig 2. Agent Al and AZ represents this two
{ao, a1 ' Q2}
intersections. The set of action is
.
Where: Action ao represents horizontal driving is
4.2 Modeling Other Agents - Recursive Model
available. Action
at
represents vertical driving is
structure
available. Action
a2
represents both of horizontal
A2
Level I :
a o a 1 a,
and vertical driving is not available.
ao 45
a l 50 52 80
Al Each end of the roads is assumed to be connected to
a 2 63
roads according to random arrival. Each of the cars
= 30km / h.
Al
10 «1 =4sec
represents the time consumed by the acceleration
a o at a,
Level2:
to
the regular speed d. When a car passes a cross road,
ao 45
A2
it changes its direction according to the probability associated with that crossroad, specially let d" i = 1,2 be the next directions for a car, that is
{d,}
= {Forward, Right} . At each
approach
is
to
take
the
Fig. 3.
~,L~ '~6
I
it~o~m=,
The recursive model structure for Al
matrix above, agent A I needs to hypothesize the
9+:
.
[0,0, I]
In order to solve his decision, described by the payoff
*The number represent the waiting cars number
5-'
,
::-~~:I!1f~ ao at a, [i ,0,0] [0,0, I]
[1,0,0]
from traffic flow sensors which are settled at the
8.......~ 5....
a1 53 52 75 a 2 69 80 85 O,~~
~IE~
agent-oriented
view the decision making through the information
~
50 63
ao 45 53 69 Al at 50 52 80 a 2 63 75 85
perspective. In scenario such as one in Figure I, we
3+:
No-Info [1/3,1/3,1/3]]
A2
Level3:
of the crossroads,
the probability {Pd ,} is {0.5, 0.5}.
Our
75 85
0.8/~
external traffic, and car are assumed to arrive those runs at the same speed d
53 69
likely actions of the agent AZ. In the Recursive Modeling Method, the actions of the other rational
~
agents are anticipated using a model of their
Il--
decision-making situation. If it were known that the agent A2 is a rational-maker, then agent AZ decisionmaking could be modeled as a payoff as well. In our
10 10 Fig. 2. a simple traffic network consisting
study, we considered a more realistic situation in which it is not known for sure that A2 is rationally
of two intersections
569
minimizing the payoff as well. For example, it could
express the optimal choice in the agents' decision
be that an accident happened in A2, in which case
making situations recursively depending on the
there is no information as to what action he would
choices of the other agents, one can use dynamic
undertake. Thus, there are two alternative models that
programming to solve them in a bottom-up fashion.
Al can use to model A2; one has the form of the
Here,
payoff matrix, and the other one contains no
numerical method of solving the models using logical
information about A2's action. We call the latter
sampling, according to the algorithm below.
we
have
designed
and
implemented
a
model the No-Info model. In RMM, each of the alternative
models
is
assigned
a
probability
Ai
Let
indicating the likelihood of its correctness. Further, it is likely that A2 is modeling A 1 as well, in his own
= {a; ,a,2 ,.. ·,an
be the set of actions 2
n
available to agent i. Let Pi =[p; ,Pi ""'Pl
attempt to coordinate with Al.
]
be
the probability distribution over the available actions, The resulting hierarchy of model, which we call the
i.e., the conjecture as to agent i's action. The
recursive model structure, terminates with a No-info
conjecture, Pi' is calculated based on frequencies,
model when the agent (in this case A2) runs out of
F,
modeling information. Fig3 is the A I's model
=(f/ ,f/ ,... , f,n ) , with
which each action turns
structure of depth three for the example scenario in
out to be optimal, as follows:
Fig 2.
Input the payoff matrix M i of agent i. Output the probability distribution, Pi' over the
To summarize, level I in the recursive model
actions of agent i.
structure represents the way that A I observes the
Begin F;
situation to make his own decision, shown as AI's
for each probability distribution, PN
payoff matrix. Level 2 depicts the model
f-
(0,0"" ,0), N
f-
0 ,
in the set of
A 1 has of
sampled probability distributions comprising the
A2's situation, and level 3 contains the models that
No-Info model; multiply probability distribution
A 1 anticipates A2 may be using to model A I. The
PN by M , . Select an action
recursive modeling could continue into deeper levels, but in this case we assumed that A1 has no further
a;
which has the
maximum expected utility.
information. In other words, we are examining the
f/ f-f/ +1, Nf-N+I
reasoning of A1 in the particular case, when equipped with a finite amount of information about
End for P, f-[f/ I N,f/ I N, .. ·,f"t IN]
A2, and nested to the third level of modeling. In general, the models have to differentiate fU11her
Return
among knowledge limitation of the different agents
End
involved. For example, the situation in which Al does not have any information about how he is
The dynamic programming bottom-up solution starts
modeled byA2 is different form the situation in
at level 2. Using the sampling algorithm, with the
which A 1 knows that A2 have no information about
sampling density of 0.1, on the third level of the
A I. No-Info on level 2 in Fig. 3 represents the fact
recursive model structure, reveals that probability
that A I has no information on how A2 models A I' s
distributed over A I's actions becomes [0.68, 0.32,
actions if A 1 occurred unexpected change (e.g.,
0.00] for
accident). No-Info on level 3 represents A \·s belief
Go,
al
and
G2 ,
respectively. This
that A2 has no information about how A I models
probability distribution summarizes AI's knowledge
A2's actions, if Al has no unexpected change.
as to A2's expectations, if A2 is not incapacitated, about the actions of AI, if A2 thinks that A 1 is not
Since the modeling structure, such as one in Fig 3,
incapacitated. The dynamic programming bottom-up
570
solution then
proceeds
to
level2.
The
above
formula described in section 2 about our scenario, A I
distribution over the actions of Al is combined with
has two alternative model of A2 as described above.
each individual probability distribution ofNo-lnfo on
Consider the case A I observing that A2 executed the
level2, after each distribution is multiplied by weight action
0.8 and 0.2, respectively. The sampling No-Info-PDF procedure is invoked again, resulting in [0.22. 0.78, 0.00] over A2's actions of a o ,
a1 and
a2 ,
A2's model proceeds as follows: BEL(Mf2) == p(Mf2Ial)
expect that, if A2 is rational, it will most likely
I I = P(aIIMI2)P(MI2)IP(al)
al'
results
= (0.78*0.8)/0.69 = 0.903 into the
as
a
combination
(0.8*[0.22,
(5)
2 BEL(M I 2 ) == P(M?2 I a 1 )
level I , the
=P(a I IM,22)P(M?2)/P(a,)
probability distribution describing A2' s actions is obtained
and
respectively (see Figure 2). The belief update about
third level of modeling, AI's knowledge is enough to
Propagating these
Ml 2
The beliefs about models
M I22 are initially assumed to be 0.8 and 0.2,
respectively. Thus, in spite of uncertainties on the
attempt to execute action
al'
(6)
= (0.33 *0.2) / 0.69 = 0.097
0.78,
0.00]+O.2*[l/3, 1/3,l/3]), which results in [0.24,0.69,
where
0.07]. Now, we can compute the expected utilities of AI's alternative behaviors as follows:
Ml 2 :
the model of A2 with normal traffic
condition. M I22 : the model of A2 with traffic accident
a o : 0.24*45+0.69*53+0.07*69=52.2
(2)
a 1 : 0.24*50+0.69*52+0.07*80=53.68
(3)
a 2 : 0.24*63+0.69*75+0.07*85=72.82
(4)
occurred. The Bayesian belief update can enhance the
quality of coordination
during
subsequent
interactions, by correctly recognizing the models of the other agents from their behavior. In our case, Bayeesian belief update is invoked in every cycle to
Thus, if Al is rational, he will attempt to maximize
monitor the change of traffic network and adaptively
his own expected utility, here is minimal waiting
adjust coordinated actions.
traffic flow, and prefer to execute action
Go'
given
that the expected it most likely that A2's will select action
5. EXPERIMENTAL RESULTS
a l simultaneously. At the beginning of each
The computer simulation is written in VC++. Our
cycle, this calculation procedure will be called.
experiment was aimed at determining the quality of modeling and coordination achieved by the RMM agents in a team and Bayesian belief update, when
4.4 Bayesian Belie/Update
compare to fixed sequence traffic signal control for lone junction and traffic network signal control
In an uncertain and ever-changing situation, such as
through hill-climbing process. The saturated traffic
the traffic scenario described above, it would be
flow in the network is 4100 vehicles per hour, and the
useful for to be able to recognize that has a erious
simulation time is 3*I0 5 signal cycles. The object of
accident, and to adjust accordingly. Specifically,
control is to minimize the waiting car's queue in
Bayesian learning can be used to update agent's
front of intersections. The result is listed in table I.
belief about the other agent by changing the
M I, M2 and M3 represent the multi-agent method
probability associated with the other agent's model
described in this paper, the network signal control
based on their observed behavior. According to the
adopting the hill-climbing method (Robertson, 1993),
571
and the fixed sequence traffic signal control for lone
Method can be applied to high-level tactical decision
junction, respectively.
making for minimizing the waiting car's queue in
Table I the comparison of average
waitin~
car's
front of intersections. Through Bayesian learning, the
queue during each cycle for three methods
agents could increase the quality of the coordinated decision-making achieved within RMM framework.
Average waiting car's queue during each cycle
Thus, a rational agent could recognize the models of
MI
M2
M3
effectively in unexpected changing circumstances.
<0.05
<0.05
<0.05
1008 1152 1296 1368 1512 1656 1728 1944 2088 2186
0.1 0.15 0.26 0.34 0.47 0.77 0.91 1.23 1.94
0.12 0.17 0.29 0.48 0.80 1.08 1.67 2.10 2.45
0.19 0.24 0.37 0.75 1.12 1.95 2.54 3.38 5.80
2.58
3.87
6.78
2320 2548 2670 2858 3016 3120 3368 3420 3608
3.56 4.03 4.85 5.60 6.48 7.86 9.22 10.26 11.22
4.72 6.43 8.77 10.27 13.89 15.73 17.23
8.92 11.2 16.56
Traffic flow Our entered Experimental Network (vehicle/hour) <1008
the other agents correctly and coordinated with them
Because the existed method of urban traffic control
* *
cannot meet the continuous increasing demand of traffic, and the nature of urban traffic network architecture is very suited to apply multi-agent method, we think that settling the multi-agent traffic control system is a new and best way to solve traffic problem.
REFERENCES
Papageorgiou, M., J.M. Blosseville, and H. Hadj-
* * * * * *
Salem (1989). Macroscopic Modeling of Traffic Flow on the Boulebard Periherique in Paris. Transportation Research, vol.B23, pp.29-47. Burmeister, 8., A. Haddadi, and G. Matylis (1997). Applications of multi-agent systems in traffic
• represents the waiting car's queue is very long,
and transportation. lEE Transaction on Software
that means traffic congestion.
Engineering, 144(1 ):51-60. Goldman, C. V. and 1. S. Rosenschein (1996). Mutual
The above results show that the perfonllnllce of
supervised learning in multi-agent systems.
multi-agent signal control based RMM and Bayesian
Distributed AI, pp: 85-96.
belief update is better than others. When the traffic
Findler,
flow is sparse, multi-agent signal control has no obvious advantage.
But when the traffic
N.v.
(1991).
Distributed
control
of
collaborating and learning expert systems for
flow
street traffic signals. In Lewis and Stephanon,
becomes more congest, the advantage of multi-agent
editors, IFAC Distributed Intelligence Systems,
method is obviously. Specially, when the traffic
Pages: 125-130. Pergamon Press.
congestion is occurred in M2 and M3, the method
Weib,
addressed in this paper can still control the traffic
G.
(1998).
Introduction
to
Distributed
Atrificial Intelligence. MII Press. Cambrige,
flow effectively.
MA. Gmytrasiewicz, PJ. and E. H. Durfee (1995). A rigorous, operational formalization of Recursive
6. CONCLUSION
Modeling. ICMAS-95, pp:125-132. Anthony, M. and N. Biggs (1992). Computational
This paper presented a study in mode ling. and coordination
in
the
multi-agent
Learning Theory. Cambridge University Press.
distributed
Robertson,
environment of urban traffic network control. This
D.l.
(1988).
TRANSYI:
A
Traffic
Network Study Tool, Ministry of Transport,
investigation implies that the Recursive Modeling
RRL LR253
572