Cooperative distributed problem solving for communication network management

Cooperative distributed problem solving for communication network management

Cooperative distributed problem solving for communication network management Robert Weihmayer and Richard Brandau provide motivation for cooperative d...

1MB Sizes 0 Downloads 80 Views

Cooperative distributed problem solving for communication network management Robert Weihmayer and Richard Brandau provide motivation for cooperative distributed problem solving and its application to communication network management

Much of previous Distributed Artificial Intelligence research has sought either to bring identical agents into closely coordinated groups, or to loosely coordinate the actions of dissimilar agents. The research described here explores close cooperation among heterogeneous agents, and is motivated by the requirements of a specific application in telecommunications network management : customer network control and joint private/public network management In this domain, agents that manage the private and public networks must cooperate closely to provide satisfactory solutions to common network problems, yet they possess inherently distinct problem solving knowledge : private (or customer) networks are defined as logical networks constructed with the physical facilities provided by the public network. Thus, some of the network entities that define one agent's world knowledge are known by other agents at a different level of abstraction, creating a complex ipterdependence among agent problem solving activities. This paper provides some basic motivation for cooperative distributed problem solving and its application to communication network management in general, and reports on efforts to understand the nature of cooperation and the functionality of agents in the customer network control domain. In the process, the paper describes a three-agent facility failure problem and an associated interagent cooperation scenario, and presents a research testbed, TEAM-CPS, that explores cooperative problem solving and multiagent interaction. GTE Laboratories Incorporated, 40 Sylvan Road, Waltham, MA 02254, USA

Keywords: communication network management, distributed artificial intelligence, TEAM-CPS, cooperative problem solving, multiagent interaction

TOWARD COOPERATIVE EXPERT SYSTEMS A central objective for Distributed Artificial Intelligence (DAI) research lies in the hope of someday creating a 'society' of expert systems in which assorted computer programs can cooperate and share their local expertise in much the same way as do human experts in, for example, ad hoc consultative relationships . Among the technological barriers that prevent expert systems from joining such a society is their intolerance of diversity. The typical expert system can communicate with another expert system, if at all, only through the same narrow channels it uses to perceive and act on everything else in its environment ; it cannot directly use the capabilities of other expert systems to augment its own reasoning or to broaden the scope of its problem solving . As a specialized area of Al, DAI focuses on multiagent systems of intelligent problem solvers, whereas the Al mainstream deals with individual reasoning systems . Thus, DAI is mainly concerned with distributed concurrent approaches to problem solving by intelligent autonomous agents. Comprehensive surveys can be found in References 1-3 . The Contract Net paradigm ° and the Distributed 5,6 are important early research Vehicle Monitoring Testbed5' ideas and systems that strongly influenced DAI research .

0140-3664/90/090547-11 © 1990 Butterworth-Heinemann Ltd vol 13 no 9 november 1990

547



Much of previous DAI research has sought either to bring identical agents into closely coordinated groups, or to loosely coordinate the actions of dissimilar agents . The authors have also previously considered cooperation and conflict resolution for a homogeneous multiagent system in the inter-regional traffic management domain . A significant limitation of much of this work has been that all the interacting agents must possess nearly identical knowledge to effectively coordinate their actions . Each agent, presented with a given set of inputs, would perform the same as every other agent in the group . The Contract Net paradigm is an example of loosely coupled distributed problem solving where a heterogeneous collection of agents negotiate task-sharing arrangements . It is to be noted that in this terminology the concept of cooperation subsumes that of negotiation and that the problem is bounded by assuming benevolent cooperation among agents . The authors' research is motivated by the requirements of specific real-world telecommunications network management problems wherein the distributed problem solving and decision making processes appear to be better modelled as closely coupled cooperation between dissimilar agents. In fact the requirements of this class of problems are very close to those of ad hoc cooperation among diverse experts. This is why this paper is attempting to narrow the gap between 'cooperative but homogeneous' and 'heterogeneous but nearly-independent' agent systems and is driven by telecommunications network architecture as well as by DAI technology perspectives. A useful metaphor for understanding this form of cooperative problem solving was elaborated in Brandau and Weihmayer 7 : a cook and an architect want to jointly design a kitchen . Each approach the kitchen layout problem from their own perspective and knowledge, the cook applies his functional knowledge of meal preparation and the architect considers the electrical and mechanical infrastructure, construction codes, etc . Their behaviour is closely cooperative inasmuch as they need each other to arrive at a globally feasible solution as they exchange partial or complete layout proposals, critique each other and attempt to compromise and converge on a plan that will satisfy their goals . This 'cook and architect' metaphor will be used here as an illustrative device . Experience in building cooperative distributed problem solving agents in the telecommunication domain has led to the design and construction of successive simple experimental systems of tightly coupled agents, ranging from a collection of identical problem solvers 8, to the present system of dissimilar agents . The work to date is embodied in the TEAM-CPS (Testbed Environment for Autonomous Multiagent Cooperative Problem Solving) system . This work has emphasized 'domain level' heterogeneity, where agents are different because their local problem solving expertise is distinct and because each performs a different function in the domain . Although differences among agents in a DAI system may be justified by theoretical and practical considerations, there are significant costs associated with such differences . In order to maintain a comparable level of cooperation despite such diversity, complexity increases for interagent communications as well as for the reasoning that agents must do about other agents . Benefits have been observed from restricting this diversity 9 by requiring that agents possess common semantics of two sorts : knowledge

5 48

of action effects, and knowledge of goal intentions . This paper will attempt to clarify the many different aspects of this application of DAI to customer network control . The remainder of this section covers two important preliminaries : the relevance of cooperative problem solving in telecommunications network management and a domain description of customer network control . A description is then given of a simplified distributed architecture for customer network control along with an outline of the cooperative models introduced in that section : a three-agent facility failure and fault management scenario . This example is key to an understanding of why and how customer network control, as we define it, requires cooperation and what cooperation is in this context . Attempts are made to characterize agent functionality and knowledge on the basis of our analysis of cooperative scenarios . Finally, TEAM-CPS is discussed as a research testbed to explore the nature agents that can sustain the types of dialogue of previous sections.

Application of DAI to telecommunications Telecommunications network management covers a wide array of operations, administration and maintenance functions . Public Switched Telephone Network (PSTN) administrations are complex human organizations that support these functions . Expert systems and other decision support automation systems are increasingly being introduced to handle traffic management, facility maintenance, short and long range network planning, etc . 1 . The functions performed by such systems are clearly connected at some level and could be performed cooperatively . It is important, however, to realize that contact among these functions is usually merely episodic and follows a loose task allocation pattern that is quite different from cooperative behaviour exemplified by the 'cook and architect' metaphor discussed earlier . For example:

• A traffic management system diagnoses a congestion pattern in the network, takes corrective action and notifies a facility maintenance system . • The facility maintenance system detects that some equipment failure was responsible, initiates repair, and notifies the traffic management system that repairs are underway . It may also notice high utilization patterns, which are reported to the long range planning system for future capacity expansion planning . In the long run, these relatively infrequent encounters between experts represent a kind of cooperation that is valuable to an organization . To justify the expense, however, it is necessary to consider a domain where agents require more regular interaction . A specific class of network management problems is addressed in this paper, namely network traffic control and dynamic reconfiguration problems. Some instances of those are :

• Completely distributed network control schemes where multiple network manager agents negotiate to dynamically allocate bandwidth among their subnets, each of which carry different service types and use different protocol suites .

computer communications



• Centralized network management agents, each with different functional expertise, e.g., design, setup and teardown of customer subnets, on the one hand, and on the other, congestion control and dynamic capacity reconfiguration of customer subnets . • Distributed customer network agents that cooperate with a centralized public network manager to control traffic and allocate capacity to the customer subnets . Using the above 'cook and architect' metaphor as a point of comparison, the first two problem types appear to be like teams of cooperating 'architects' and thus would solve problems at the same level of abstraction and from similar perspectives, meaning that there is inherently less heterogeneity in agent knowledge . The third architecture for automated customer network control involves the closely coupled cooperative behaviour of the 'cook and architect' metaphor . The two agent types in the third architecture correspond to a natural legal and organizational separation between a PSTN administration and its customers . They also reflect a natural categorization of network knowledge, yet cooperation between them is essential for satisfactory network performance .

Customer network control Customer network control is an increasingly pervasive theme in telecommunications 11-13 . With the emergence of generalized new services and automated service provisioning capabilities in next-generation telecommunication networks, customer network control refers to more than a specific service offering : it is likely to become an attribute of all future network services . This has immediate consequences for operations and management of a network in which multiple centres of control exist, in contrast to today's centralized network management practices . This is true both for lower layers of network management protocols and for the higher level decision making layers, where cooperative interactions among people, i.e. network managers, still plays a dominant role . For next-generation networks to provide customer-driven automated service provisioning, they must also automate much of the higher level network management related decision making. The customer network control marketplace is currently delivering a wide array of products and services14. Many of these products are among the most complex offerings on the telecommunication market today, such as Virtual Private Network services . Customer network control technology is, however, in a transitional state . Technology and operations standards for integrated operations with the public network are still needed . Customer networks can include a diverse range of elements: voice/data switches, multiplexers and digital cross-connect systems, backbone transport networks, etc . Some facilities may be customer owned and some public network owned . Currently, the key attributes of customer network control are multiclass traffic control, dynamic load balancing and network reconfiguration, and access control of external traffic inbound to the customer networks . A composite example of a private network with extensive customer network control would involve a corporate network that can reconfigure in real time to meet demand for multipoint video sessions, temporarily offload

vol 13 no 9 november 1990

voice traffic to the public network, reassign incoming 800 traffic to various call answering centres, and balance its use of virtual private network service and dedicated leased public facilities . In the simplified model that is used as a basis for the example in the next section, customer network control refers to:

• the customer's ability to initiate reconfiguration of its topology, • to selectively use public facilities for traffic overflow, • to reroute wide area off-net to on-net traffic, hereafter referred to as '800 traffic' to new customer network access points . All these capabilities can potentially interfere with the operation of the PSTN . Even the most advanced integrated network management packages on today's market do not support automatic decision making to support these capabilities and provide even less support to coordinate such decisions with PSTN operational management . Engineering research and development in this area is also in a transitional stage and is focusing on the infrastructure of customer network control . The work described here takes a different approach and instead emphasizes high-level network management of public and private customer network entities . The assumption is that customer network control as a service cannot be provided without the high level mechanisms that will maintain the global coherence of the network. Central to this approach is the realization that customer network control will introduce multiple centres of decision making, that this will lead to conflict and that cooperation is the essential force that will lead to improved local and global decisions .

COOPERATION IN THE CUSTOMER NETWORK CONTROL DOMAIN Two agent types, the private or customer network manager and the public network manager, are defined . An agent in this context is a computer program with autonomous reasoning, problem solving (e.g. an expert system), and communication expertise, that can participate in organized joint problem solving with other agents . Here, the Public Agent and the Customer Agent are denoted PA and CA, respectively ; the Public Network and the Customer Network are denoted PN and CN .

Nature of the agents and organization of the domain In customer network control, the CA and PA are two distinct types of cooperating agents managing some aspect of their respective networks . CNs are logical networks embedded in a physical PN . The most general case of a multiagent system would include multiple PN and CN instances which would require a network management architecture that supports multilateral cooperation among these agents, as shown in Figure 1 . Here, a three-agent interaction is considered : two CAs cooperating with a single PA . Moreover, for simplicity, it is assumed that communication and cooperation flows between PA and CA but not between CAs. The two CAs, would cooperate in the case of disjoint subnets belonging to the same corporation, for example .

549



Figure 1 .

Multiagent customer network control: domain

sequences of operators or actions which are executable on the two networks . These sequences are also used in communicating plans to other agents . Our initial set of operators for the simple three-agent example includes Restore-trunk, Reconfigure-trunk, Lease-trunk, Routeinitial-traffic, Reroute-800-traffic, and Overflow-traffic . A comprehensive treatment of this domain requires, of course, many more .

model Basic three-agent example The CA is responsible for maintaining service in its network of privately owned and leased facilities with voice traffic switched through Private Branch Exchanges (PBXs). It has special knowledge of the type of traffic that is using that capacity - specifically, it knows that some traffic (because of special bandwidth requirements) can only be carried on its network (e.g. 1 .544 Mbyte point-topoint T1 traffic), whereas other traffic (e.g. voice traffic) can be routed over the PN to overcome congestion or to compensate for facility failures in its network . Thus, at any time, it knows its network's demand for capacity at a greater level of detail (e.g. the priority of an individual videoconference) than does the PA. The CA is usually able to manage its network without interaction with the PA, but when necessary, it can express its requirements as desired capacity between network locations . The authors' model of customer network control includes customer routable '800 traffic' or off-net to on-net traffic destined to dispatch or call-answering centres inside the CNs . This is a service whereby a customer is able to assign and reassign the routing of a logical 800 number to different access points in the CN to suit its own load balancing or internal organization . This ability can have potentially disastrous consequences on the PN by creating, under the control of private network managers but not necessarily of their public network counterparts, unanticipated traffic patterns from the rerouting of large amounts of call-in traffic . Cooperative network management of such services is essential . The PA performs traffic flow and congestion control for the public network traffic, which includes overflow traffic from the CN . It is also responsible for the physical implementation of the logical CN : what the CA sees as logical connectivity between network nodes, the PA knows at the level of individual physical facilities . The local expertise of the CA and PA is the knowledge needed to incrementally solve the following basic CN design problems in response to congestion and/or facility failures :

• CN problem : Load balancing. Given offered multiclass



The expected behaviour of such a cooperative agent system is illustrated in a simple fault management scenario, i .e . a facility failure such as a severed fibre optic cable that affects all agents in the system . After such an event, each CA observes failure of some of its leased connections . It does not know or care where the fibre was cut - indeed, it never realized that the connections that are now failing were all carried by the same fibre . It does know that some of the failed connections (e.g. video) were special and cannot be rerouted to the public voice network . To make room for these special calls, the CA will try to reroute other (voice) connections to the public network . It may also reconfigure the logical connections in its network to better handle its highest-priority traffic, even though it does not know how the logical connections are physically implemented . The CA presents its capacity requirements to the PA, without detailing all the detailed traffic priorities involved in its solution. The PA must minimize the effect of the failure on its own network, while also trying to accommodate the CN requirements. The CA proposals, though reflecting its specialized traffic knowledge, will be naive with respect to the PA's special knowledge ; initially, only the PA will know about congestion in its own network, so the CA's planned reroutes may need revision . Also, because the physical network implementation is known only to the PA, the CA's 'logical' reconfigurations can often be implemented more efficiently, freeing capacity on embedded links that were previously unknown to the CA . A three-agent network problem provides a basis for the cooperative problem solving dialogue discussed in the next section . A PN topology with 10 switches supporting two CNs (CN1 : 5 PBXs ; CN2 : 4 PBXs, managed by CA1 and CA2 respectively) is shown in Figure 2 . Mapping of logical (CN) to physical (PN) facilities as well as CN switches and their access points to the PN are shown in Figure 3 . The CN logical link capacities are also shown, in T1 equivalent units (standard unit of multiplexed link capacity ; one T1 circuit can support 24 standard voice channels in North America) . For simplicity, traffic matrices

traffic demand, find a least cost on-net versus off-net (overflow to the public network) load balancing policy . Topological design . Given a feasible traffic demand to be carried, find a least cost logical routing and capacity assignment that meets performance constraints . PN problem: Physical trunk network routing. Given the CA's point-to-point logical circuit demands, find an acceptable-cost implementation of these circuits that meets physical circuit availability and circuit reconfiguration constraints . Traffic routing. Given joint public network traffic and customer network overflow, find the best traffic and congestion control policy .

The solutions to these problems are expressed as

550

Figure 2.

Three-agent example : basic topology

computer communications



Logical Topology

CN2 3

....



Logical-to-physical Mapping

0

J

Private Switch (PBX) Central Office Switch (Public)

N1



10

3

Figure 5 . Connection to PN Switches

Figure 3.

Pool of leasable capacity

PBX connected to switch 3 . This is a way in which the CA can help the PA with its congestion problem .

Physical-to-logical circuit mappings Cooperation to solve a facility failure problem The following three-agent interaction was constructed and patterned after a number of two-agent interactions simulated by the authors using English and playing the roles of 'agents' during the knowledge acquisition phase of this research . The purpose of this three-agent scenario is to illustrate how the automated agents are expected to cooperate and jointly discover global solutions to their mutual problems . When a physical link fails as shown in Figure 6, the following problems occur :

• The PA's automatic alternate routing capability over-

Figure 4 .

Example of '800 traffic'

for the PN and CNs are not shown ; nor are the PN capacity assignment and current trunk utilization . An example of the '800 traffic' pattern is shown in Figure 4 . It is assumed that CA2 has an active call-answering centre at PBX 3 (connected to public switch 3) and that calls from the designated origins in the network are currently routed to that switch . CN1 is also assumed to overflow some of its traffic to the PN voice trunks between switches 3 and 4 . The last important information is the topology and capacity assignment of a leasable pool of physical trunks from which the PA builds logical links, shown in Figure 5 . This leasable capacity is distinct from the voice traffic carrying capacity of the PN and is dedicated to logical network design . An important element of this problem is that congestion in the public network will push the PA to request that some of the '800 traffic' be rerouted to another access point of the PN, switch 2, for example, to be subsequently routed back within the CN to the call-answering centre

vol 13 no 9 november 1990

flows voice traffic to final routes which quickly get congested as shown in Figure 6 (trunk groups 4-8*, 5-8, and 3-5). At this point congestion can only be relieved with manual traffic controls disseminated throughout the network (except traffic control plans) . • Each CN experiences logical trunk failures and blocking of voice and broadband traffic ; CA2 experiences blocking of its '800 traffic' carried by the PN to the

PN congestion after failure

Figure 6.

impact of physical facility failure

*Simple notation to designate trunk groups in the PN and in the CNs . By extension CA1 may refer to 3 of his T1 circuits between PBX's 3 and 4 as 3x3-4 T1 s .

551

access point, switch 3; CA1 loses voice traffic between PBXs 3 and 4 that overflows to the PN between switches 3 and 4 . This failure scenario creates a highly interdependent crisis management situation . The PN is heavily overloaded as a result of the failure . CN2 does not have enough internal capacity to work around it . CN1 has some options in developing a new topology and capacity assignment to temporarily restore sufficient capacity . Even a simple three-agent cooperative scenario can quickly become unmanageable, some level of synchronization is thus assumed between the agents . Node/ switch labels would usually be different for CN and PN, here they are the same . However, it is usually clear from the context whether a CA refers to a PN switch i, or to the PBX j connected to PN switch i . The rationale behind the dialogue steps, i .e., the internal reasoning and local problem solving, is italicized to distinguish it from the messages exchanged by the agents . This is how the agents solve the problem : 1 . CA1-PA . I lost 3x3-4 T1 s and 2x4-7 T1 s, and the voice calls I overflow to the PN between switch 3 and 4 are blocked . I request that the T1 s be restored and the voice circuits be made available for retrials (48 voice calls) . 2. CA2-PA : I lost 3x3-4 T1 s and 1x1-3 T1, and I have detected that my 800 traffic is blocking . I request that the T1 s be restored and the voice circuits be made available for 800 traffic to access my PBX at 3 . Rationale: the agents initially request complete correction of the problem . 3 . PA-CA1 . Because of limited resources I can only partially restore your T1 s ; thus I tentatively propose to restore (plan ID : CA1 P1) : 2x3-4 T1 s. Because of traffic congestion in the PN, I propose that you internally carry 50% of the overflow traffic between switches 3 and 4 (24 calls) . 4. PA-CA2. Because of limited resources I can only partially restore your T1 s ; thus I tentatively propose to restore (plan ID : CA2P1a) : 1x3-4 T1 and 1x1-3 T1 . Because of traffic congestion in the PN, I propose (plan ID : CA2P1 b) that you reset 800 traffic access point to your PBX at 2 instead of 3, it will get there with only nominal blocking. Rationale: the PA's problem solving focuses on integrating the new restoration and traffic routing goals. On/y4 T1 s can be restored (4 physical trunk hop limitation on building logical trunks, ref. Figure 5) from the pool of leasable links and 9 in all are requested. Conflict resolution between the requests is performed tentatively, allocating the available capacity in proportion to the requests. The PA develops a manual traffic control plan P0 that relieves overall PN congestion by redistributing traffic away from the failure and by leaving room for '800 traffic' to switch 3 . The tentative restoration plans have, in a sense, an information gathering purpose . A better allocation of the scarce resource may later be possible if the PA learns how constrained the other problem solvers are. 5 . CA1-PA. I accept plan CA1 P1 . To make up for the shortfall, I propose to reconfigure some of my circuits : CAl P2 : I get 1 x3-4 T1 by reconfiguring 1 x4-7 T1 and 1x3-7 T1

552

I get 2x4-7 T1 by reconfiguring 2x4-8 T1, 2x8-9 T1 s, and 2x7-9 T1 and by overflowing 24 voice calls between 8 and 9 . I cannot carry the 24 voice overflow calls internally (those initially overflowing to the PN) . Rationale: CA 1 has some flexibility because it operates at a lower level of link ultilization. It reconfigures its remaining 4-7 T1 (we assume in-use trunks can be reconfigured) to get a 3-4 T1 and then reconfigures a 3-hop logical route to get back 2x4-7 T1 s . At this point its initial goals are mostly met The only event that could make plan CA1 P2 infeasible is if the PA cannot carry the overflow between 8 and 9. 6. CA2-PA . I need at least 2x3-4 T1 s more than you propose in CA2P1 a . The only way I can reconfigure is by using my 2-3 capacity but this use conflicts with your plan CA2P1 b. According to this plan, I need the 2-3 capacity to route '800 traffic' from switch 2 to switch 3 . I propose the following alternative (CA2P2b) to CA2P1 b : since '800 traffic' is a problem I will reassign it to switch 1 instead of switch 2 ; moreover to satisfy my goal of reconfiguring 2x3-4 T1 s, I propose (CA2P2a) to overflow 48 voice calls between 2 and 3 and I need 2x1-2 T1s and 2x1-4 T1 s . Rationale: CA2 is very constrained because it is operating at high utilization rates when the failure occurs . Reconfiguration is not yet possible because there is not enough voice traffic to overflow to the PN to free the requisite capacity . Moreover it cannot route '800 traffic' to its internal destination, the PBX attached to switch 3. It then proposes two approaches : CA2P2a) a reconfiguration based on potential capacity freed on 2-3 and new T1 circuits supplied by the PA; and CA2P2b) reactivate a backup call-answering centre at node 1 . This latterapproach follows from the intent implicit in CA2P1b and satisfies CA2's own goals . 7 . PA-CA2 . Your proposal CA2P2b would create additional congestion in the PN . Since your reconfiguration proposal CA2P2a is not feasible, I propose (CA2P3a) to restore an additional 2x3-4 T1 in exchange for your support of my initial proposal, CA2P1 b . I will assist by accepting the overflow voice traffic stated in CA2P2a and by providing an additional 2x2-3 T1s to internally carry '800 traffic' to 3 . Rationale: it is assumed that the PN voice traffic pattern is such that no traffic control plan will produce acceptable performance in the PN if '800 traffic' is routed to node 1 . CA2P2b is thus a major 'threat' to the PN. PA now understands that CA2 is constrained, that CA2P2a is not feasible from the PA viewpoint (ref. Figure 5) and that additional resources will have to be allocated to CA2 . On the other hand, it also understands how it can work with CAI to offset the reallocation of 2x3-4 T1 s tentatively allocated in plan CA1 P1 . Notice here how the PA reuses an operator (overflow 48 voice calls between 2 and 3) formulated in the context of plan CA2P2a, to elaborate plan CA2P1b. 8. CA2-PA. OK, I accept your counterproposal, my problem is solved, confirm reassignment of '800 traffic' to switch 2 . 9. PA-CA1 . I accept your proposal CAl P2 . However I must retract my earlier tentative allocation of 2x3-4 T1 s capacity in CAl P1 . I will instead offer (CA1 P3a) to carry the full 48 voice calls that you initially overflow

computer communications

between 3 and 4, lease new logical circuits (CA1 P3b): 2x3-7 T1, 2x4-8 T1, and 1x8-9 T1, and, to amplify on CA1 P2, accept overflow traffic (CA1 P3c) : 48 voice calls between 7 and 9 and 24 voice calls between 8 and 9 . Rationale: the PA now works on a joint solution with CA 1 . A new traffic control plan P, is generated to maximize available capacity for overflow of CN1 traffic at specific points. Moreover, the PA follows the reconfiguration pattern implicit in CAM and proposes a collection of resources designed to meet CA1's initial goal in spite of the reallocation in favour of CA1 . 10 . CA1-PA . OK, can use CA1 P3b and CA1 P3c to reconfigure missing circuits : 2x3-4 T1 s . My problem is solved. Even such an apparently simple example can easily become unwieldy. Plan infeasibilities or unresolved conflicts in steps 8 or 10 could have resulted in lengthy proposal/counterproposal cycles, and perhaps given rise to dialogue termination problems . In this example it is seen that cooperation between agents has many of the features that we have associated with the 'cook and architect' behaviour : communication, goal sharing, compromise, discovery, conflict resolution and tradeoff in a context of dissimilar problem solving knowledge and viewpoint . Also, the CA, like the cook, can be seen as a specification-giver, where logical connectivities are analogous to, say, distance between appliances . The PA, like the architect, can be considered a constraintsatisfier, where physical network routing imposes constraints analogous to those of plumbing, for example. Also, like the cook and architect, these network agents share a language - in this case, of 'networks' and 'traffic' - which they can use to mediate between their different expertise . AGENT KNOWLEDGE AND CAPABILITIES This section considers some of the agent knowledge that could support the quality of interaction in scenarios such as the one above . The broad functional categories of knowledge used are : local expertise, support for interagent language, focus-of-attention and dialogue control knowledge, and knowledge related to maintaining views of the other agent's problem solving . Local expertise An agent's 'identity' is established by its local expertise . Each agent's local knowledge and expertise corresponds to a 'mini' expert system, one that should cooperate with the other agent's expert system for their mutual benefit . Moreover, the reasoning mechanisms used by the local expertise determine the feasibility of cooperation for a particular domain . The local expertise differs from that of a stand-alone expert system in that it must take advantage of what is learned from the other agent . It does not merely direct attention to new data from the other agent - all expert systems respond to new input data. It must be able to make use of the other agent's goals and what is transmitted about the state of the other agent's world and integrate this with its own goal structure and view of the world . This information is then used to guide the expansion of the local search space, refocus local

vol 13 no 9 november 1990

problem solving strategies to look at new sets of subproblems, etc. It was found that specialized areas of local subproblem expertise, together with the capability of shift problem solving strategy during cooperation, led to an adaptive free flow of expertise from one agent to the other as a function of who is less constrained or has more to offer to global problem solving strategies through the proposal/ counterproposal cycle and to selectively exercise local expertise in the framework of a cooperative dialogue is contained in the knowledge categories described below .

What agents know about each other A prerequisite for cooperation is knowledge about an agent's cooperation partners . This agent knowledge is of two varieties : first, a dynamic model of the changing state of the other agent, and second, knowledge about the enduring properties of the other agent and how they determine its role in the interaction . State knowledge It was found necessary to keep track of each other's approximate state of problem solving in our simulations of interagent dialogues . Much of this information came from the annotations, discussed in more detail in the next section, regarding how a proposal would reduce some outstanding goals of the proposing agent (for example, which failed link was being restored) . This information was used to constrain cooperative problem solving to goals that are currently active and high priority for the other agent, to avoid proposing actions that would defeat higher priority goals of the partner, and to determine whether the interaction was converging to a solution . Role knowledge In addition to knowledge of the cooperation partner's problem solving state, an agent needs to know certain stable facts about the partner, such as its capabilities, strategies, and areas of expertise . In the network domain, the CA must know that the PA can supply replacement T1 s, reconfigure physical connections, and accept overflow traffic, as part of its domain capabilities . The PA's expertise can be counted on to maximize the efficiency of physical routing, which will sometimes cause embedded connections to become available as a result of reconfiguration . The CA should also know that the PA has a goal of minimizing blocking of traffic in its own network, and has a strategy of accomplishingthis by changing the location of overflow CN traffic . By knowing these things, the CA can make appropriate use of the PA's problem solving capabilities, and arrive at a better solution . Eventually, to support the goal of ad hoc consulting relationships among expert systems, agents will need to acquire their knowledge of other agents' roles as they interact with them . In other words, agents will need to 'get acquainted' with each other, so that they can make ideal use of each other's specialized expertise, even though they had only begun to interact. Interagent language to support cooperation Early efforts to determine requirements for intelligent agents in the customer network control domain led to a

553

: (Communication)

(Comm-ldent) : (Statement) [{ ;}}],

: : _

[] NO-MORE-OPTIONS

< Hypo-Act) : : _

PROPOSE DENY CONFIRM (Comm-(dent) RETRACT (Comm-(dent)

< Proposal) : : =

(Prop-Clause) [{, (Prop-Clause)*]


: [] (Agent)

: : (Action) =

REPLACE T1) (LEASE
: : =

(Annote-Type) }]


BECAUSE ENABLED-BY < Priority-Expression )

(Annote-Ident) : : =


I (Goal-Ident)
Figure 7.

[-(Node) [xl V : -< Node> (N) P IC

Language definition (BNF subset)

knowledge acquisition process based on 'simulated' cooperation scenarios . Several natural language dialogues were generated by the authors, 'role-playing' a two-agent interaction between public and customer agents over an interactive computer mail facility . As a result, reference dialogues and an interagent machine language were developed that reflected those interactions . The humanreadable representation of this language, a subset of which is defined in BNF (Backus-Naur Form) notation in Figure 7, was derived from transcripts of experiments that simulated various network scenarios . The three-agent cooperation scenario elaborated in a previous section was not derived through human cooperation experiments but rather constructed to reflect the style of interactions encountered in the two-agent experiments . These investigation revealed that, beyond statements about network entity 'nouns' and network control 'verbs', there was a need to construct statements about these network-relevant action primitives . Specifically, a class of propositional attitudes ('hypothetical action verbs') were used, including PROPOSE, DENY, CONFIRM and RETRACT that we used for discussing our developing plans . In addition, it was found necessary to justify our proposals and to make clear which proposals depended on which others . Hence, 'annotations' in the form of simple goals and dependencies naturally became part of the language. The purpose of a specification such as that shown in Figure 7 is to act as an evolving, human-understandable analogue of the language that will be used by our machine agents . It is also a useful device to use in simulating such dialogues because it constrains the agents' communication capabilities, and facilitates knowledge acquisition undertaken to operationalize the capabilities needed to sustain such dialogues . In fact, experimentation with this language,

5 54

i .e., having people use this machine language in solving the same kind of problems where natural language was previously used, has shown it to be limiting in some interesting ways : Structured proposals. In a number of instances, it was

desired to express dependencies among proposal clauses . In one illustrative episode, the PA could offer to restore either of two failed CN links, but because of shared resources could not restore both simultaneously . This simple dependency could be expressed as a logical XOR . In fact, however, the shared resource could be divided arbitrarily between two partial restorations, indicating that the appropriate expression of dependencies is a more complex constraint on the sum of the restored circuits in the two failed links . More generally, by communicating systems of these constraints, the transmitting agent would be able to share significant chunks of its search space . To share systems of constraints, however, not only would the language need to be expanded, but the agents' knowledge would need to be more sophisticated in several ways . For one thing, an agent would have to recognize that its partner could benefit from knowing some part of the local search space ; this would require an extensive appreciation of the partner's current problem solving state. Secondly, the agent would need to recognize that some part of its search space can be coherently expressed by the constraints in the language ; because the space is initially constrained by unshared local knowledge (e.g. the shared resource in the above example), it is not certain when there exists a sharable representation (e.g. the algebraic constraint on total restored circuits) nor, when multiple representations are possible, which is the best to communicate. Traceability among proposals. It was found that BECAUSE

and ENABLED-BY were merely indicative of the semantics required to link proposals . This is really an issue of pragmatics : it would be preferable if these inter-proposal link terms were closer to those used by humans, rather than their current primitive 'assembly-language' of justification . Partial commitment The class of Figure 7 is

very small. Notably, an agent can only DENY or CONFIRM another agent's proposal, or not respond to it at all . It is unlikely that these choices are sufficient to express the gradations of an agent's commitment to a proposal . Instead of CONFIRM-inga proposal, an agent may wish to express an intention of following the proposal, before committing to it . More levels of agreement and commitment, and of disagreement or rejection, may be required to focus attention toward or away from a proposal . Interrogation . In our simulations it was never necessary for

an agent to ask his partner for information that wasn't contained in a response to a proposal . We never asked, for example, what the partner was trying to accomplish by a proposal, nor how good the currently-best solution was, nor how close we were to termination, nor any details of the partner's network state . One explanation for this absence of interrogation is the agents' heterogeneity : neither agent is expected to know how to interpret the network state of its partner, so there is no point in asking about it. Another explanation is our simplifying assumption that communication bandwidth between agents is unlimited ; this places the responsibility on the trans-

computer communications



mitting agent to annotate each message with every possibly relevant observation that he believes the other agent can understand, so that the recipient can have no reason to ask any questions . Problems with this latter explanation, however, indicate that agents will eventually need to interrogate each other . First, of course, bandwidth is neverfinite . More importantly, the transmitter's task of generating each appropriate annotation is impractical, as is the recipient's task of interpreting them all, inasmuch as these unfocused annotations amount to a nearly complete copy of the transmitting agent's current solution state . It might seem that we could overcome this impracticality - and still avoid interrogation - if the transmitter were to reduce the annotations to only those that are currently relevant to the recipent; these focused annotations, however, won't necessarily be complete unless the transmitter has perfect knowledge of the recipient's current solution state again, impractical . Thus, to avoid sharing complete state information, agents must eventually be able to interrogate each other .

Annotations can be generated by tracing the immediate goal structure of the proposal (or denial, or whatever is to be communicated), similar to a logic traceback for explanation . One subtlety is in knowing when some annotation is needed . This requires a model of how the current communication fits into the ongoing dialogue . If, for example, a new proposal is an option just in case a previous proposal is accepted, then this dependency must be communicated - unless the previous proposal has already been confirmed by the other agent . Or, if two proposals are mutually exclusive (rely on the same network capacity, for example) this must be communicated in an annotation - unless one of the earlier proposals has already been denied or retracted . Interpretation of the other agent's annotations is less complex because we insist on explicit annotations (see above note) . This minimal interpretation amounts to extending the agent's internal problem structure with the dependencies, etc ., expressed in the annotation .

Error correction . Because an agent may have imperfect knowledge of its partner, its messages may sometimes contain errors in, for example, the goals it ascribes to the partner. The partner should be able to correct these errors . The language needs to expand to express these corrections . In addition, the agents need to detect these errors, in the form of contradictions between an agent's local knowledge and messages from its partner . Not all such contradictions can be found, of course, but there should be a mechanism for disposing of those that surface .

TEAM-CPS : A RESEARCH TESTBED

These limitations show a clear path for further exploration .

Agent control functions This notion of agent control includes both the related aspects of focus-of-attention control and dialogue control - or, equivalently, 'Think control' and 'Talk control' . Dialogue control is also very much related to the previous discussion on the interagent language, i .e. referring to the knowledge and mechanisms that sustain cooperative dialogues using. that language. Moreover, the agent's decision to undertake a local problem solving process, to meet local or nonlocal goals, or to communicate partial results, proposals, requests, termination of dialogue, etc . clearly involves knowledge of the other agent's capabilities or state of its problem solving process, as discussed in a previous section . The annotation mechanism appears to play a key role in the coordination of local problem solving focus and cooperative dialogue . Dialogue between our network manager agents should include annotations such as goals and inter-proposal dependencies . Many of the communications in our dialogues required reference to some previous communication . This ability to describe dependencies is important for efficient dialogue because it allows an agent to communicate part of a search space, rather than just a single option .* `Earlier, we envisioned agents that could infer the other agent's intent, in order to extrapolate an implicit search space around his explicit proposals . This speech-act recognition would be difficult, and seemed more relevant to human discourse analysis than to the DAI thrust of building computer programs that explicitly cooperate ; thus, the language was augmented to include explicit annotations .

vol 13 no 9 november 1990

TEAM-CPS is a recent research testbed to explore some of the ideas elaborated in this paper. A brief description of the agent architecture and a model for cooperation follows.

Planning issues in local agent problem solving The requirements described in the previous section have served as a basis for the design of the TEAM-CPS agents . They use a blackboard framework" as their basic memory and reasoning paradigm . By implementing both the agent-control and domain planning mechanisms on the blackboard, we increase the degree to which these processes can be integrated . Our agent problem solving mechanism is based on a STRIPS-like 16 planning mechanism . This includes state predicates to describe, for instance, initial and desired world states; primitive actions modelled by preconditions and postconditions ; selection mechanisms for appropriate actions; and cycle-detection and backtracking methods . The planning mechanism is implemented in Knowledge Sources ('KSs,' the blackboard equivalent of rules) . These general-purpose planning KSs are supplemented in our agents with domain-specific KSs that know how to generate network management plans . These KSs work together to generate and search an AND/OR tree . Notably, though, the operators (goals and actions) in the tree are blackboard objects - visible to the planning KSs . Both tree expansion and search are accomplished by pattern-matching of relevant KSs against the current 'focus' node of the tree . There are a number of issues that have influenced the structure of local problem solving for TEAM-CPS :

• Meta-level agent control. Control of an individual agent should include management both of its own planning and of its effects on other agents . In its most basic form this includes 'think/talk control' and search focusing based on nonlocal state information and models of other agent's nature and activities . • Integration of nonlocal goals. This refers to how received goal nodes, with their nonlocal satisfaction

555

level, are 'understood' and how they change the satisfaction of local nodes . This is part of the challenge of achieving shared action and intent knowledge among dissimilar agents . • Partial goal satisfaction . Beyond the simple goal satisfaction logic that underlies classical planning, our domain requires that goals be metrically parametrized (e.g. fractional solutions on traffic and trunks) and partially satisfiable . At the level of planning mechanisms, the challenge is to handle partial satisfaction without compromising the current logic of goal tree management : maintaining the consistence of worlds (i .e . hypothetical world reasoning) and the ability to backtrack . '

Mechanism for cooperation The structure of the agents and the mechanisms that support their interworking evolved from experimentation . A simple depiction of TEAM-CPS agent structure is shown in Figure 8i . Local plans are communicated as goal subtrees (Figure 8ii) . This shared, goal-based knowledge representation allows agent control expertise (plan interpretation logic) to integrate a nonlocal goal subtree into the local expertise goal tree (Figure 8iii) . The CA and PA have different local expertise and different internal network representations but share the same local problem solving mechanism . They are different but alike, homogeneous in some respects but heterogeneous in others . As discussed above, goals and actions trigger local KSs that manage the expansion of the goal tree and execution of plan actions . Goal and expertise sharing involves goal and KS homomorphism . This means that common goals trigger KSs, in both agents, that correspond to locally specific expertise. This in fact implements a shared semantics of action and intent between our dissimilar agents . It is to be noted that not all goals should be understandable by all agents.

CONCLUSION AND ONGOING WORK The TEAM-CPS system and the underlying research is motivated by the requirements of a distributed Al architecture for customer network control . If customers are to have real control over their telecommunication services, then there is no centralized network management architecture that will fully meet these requirements . A short-term goal for TEAM-CPS is to handle the general domain of multiple CAs and PAs interacting in complex problem solving situations . This would involve simultaneous congestion and fault management problems as well as reconfiguration service requests necessitating multilateral cooperation among all agents . Our work to date has emphasized 'domain-level' heterogeneity, in which agents are different because their local problem solving expertise is distinct and because each performs a different function in the domain . This type of heterogeneity is in contrast to another form that is more general, complex, and difficult to achieve, which we call 'representational' heterogeneity . In this form, agents do not share the same internal representations and inference mechanism, in contrast to our current model of agent structure. If we assume that because of domain requirements the PA, for example, needs to be a casebased reasoner while the CA retains the structure described above, we then have an instance of multiagent representational heterogeneity . This presents a much harder problem that the authors have not yet addressed . It is to be noted that there are benefits in terms of lower 'cost' of cooperation, in restricting agent diversity' . Our model of cooperative problem solving, as described previously, is evolving as a result of greater emphasis on abstract interagent languages, meta-level agent control, and opportunistic focusing ; but more fundamentally as a result of more sophisticated knowledge representation of the cooperative process .

REFERENCES Customer Aaent (CAI

1

public Aaent (PAI Agent Control

Agent Control Plan Analysis

2 KSs

Local Expertise

KSs

3

I . Generic structure of agents with blackboard reasoning system ; Agent A finds a Plan

Plan Eneapwlatlon and Communication 40.

Plan Interpretation Logic

4=

4

5

4

II . Inleragent Communication ; plans are goal subtlest representing proposals for actions and their justifications

6 integration of Nonlocal Plan Into Local Goal Trw

7

Expand tram Imported goals Ill. Integration of nonlocal plan into local goal tree ; also shown here is further local goal expansion from imposed goals

Figure 8. TEAM-CPS: basic local problem solving and interagent communication mechanism

556

8

Decker, K S'Distributed problem solving techniques : a survey' IEEE Trans. Syst. Man, and Cybernetics SMC-1 7 No 5 (September/October 1987) Gasser, L'Distributed artificial intelligence' AI Expert (July 1989) Durfee, E H, Lesser, V R and Corkill, D D'Cooperative distributive problem solving' in Barr, A, Cohen, P R and Feigenbaum, E A (Eds) The Handbook of Artificial Intelligence Vol 4, Addison-Wesley, USA (1989) Davis, R and Smith, R G 'Negotiation as a metaphor for distributed problem solving' Artif. lntell. Vol 20 No 1 (1983) Lesser, V R and Corkill, D D'The distributed vehicle monitoring testbed' AI Mag Vol 4 (Fall 1983) Durfee, E H Coordination of Distributed Problem Solvers KluwerAcademic Publishers, Boston (1988) Brandau, R and Weihmayer, R 'Heterogeneous multiagent problem solving in a telecommunication network management domain' 9th Workship on Distributed Artificial Intelligence/AAAI, Orcas Island, Washington (September 1989) Adler, M, Davis, A, Weihmayer, R and Worrest, R 'Conflict-resolution strategies for non-hierarchical distributed agents' in Gasser, L and Huhns, M (Eds)

computer communications

Distrib. Artific . Intel/ . Vol II Morgan Kaufmann, USA (1989) 9 Weihmayer, R, Brandau, R and Shinn, H S'Modes of cooperation : issues in cooperation among dissimilar agents' 10th DAI Workshop, Bandera Texas, USA (October 1990) 10 Wright, J R and Vesonder, G T 'Expert systems in telecommunications' Expert Systems With Applications Vol 1 (1990) pp 127-136 11 Aidarous, S E, Ball, T, Tam, R and Biggs, D 'A distributed architecture for customer network control' Proceedings ICC 88 Philadelphia (11-15, June 1988) 12 Berman, R K'Customer control and management in a Robert Weihmayer received a B5 in

Electrical Engineering and a BA in Asian History from McGill University in Montreal and an MS in Computer Science from Boston University. From 1978 to 1982, he worked at Bell-Northern Research in Montreal as a Member of Scientific Staff. He subsequently joined GTE GovernmentSystems and has been with GTE Laboratories since 1985 . He has been active in applying combinatorial optimization techniques to strategic military networks and public switched communication network design and evolution planning . He is currently Principal Investigator and leads a project to apply distributed artificial intelligence problem solving models and techniques to network management and customer network control .

vol 13 no 9 november 1990

13

14

15 16

multi-controller environment' Proceedings GLOBECOM 88, Hollywood, Florida (November 1988) Rank, P H et al. 'End-user administration of an intelligent network' Proc. ICC Boston, Mass (11-14 June 1989) Bauer, B T and Green, R G 'Opening the public network to customer access and control', Proc. ICC Boston, Mass (11-14 June 1989) Engelmore, R and Morgan, T Blackboard Systems Addison Wesley, USA (1988) Fikes, R E and Nilsson, N I 'STRIPS: A new approach to the application of theorem proving' Artif. Intel/. Vols 2(3/4) (Winter 1971) pp 189-208

Richard Brandau received a BS and an MS degree in Experimental from Iowa State Psychology University. Prior to joining GTE Laboratories, Mr. Brandau was a Principal Engineer and Systems Analyst for Lockheed Electronics Company in Plainfield New Jersey (1981-85, 1986-87) . He provided technical leadership in design and implementation of expert systems intelligent user interface and software projects . In addition, Mr. Brandau worked at Allied Signal as a System Software Engineer (1985-86) . He joined GTE Laboratories in 1987 and has been involved in distributed Al research . He is currently working as the Principal Investigator of the case-based reasoning research project

557