AgentTest: A specification language for agent-based system testing

AgentTest: A specification language for agent-based system testing

Neurocomputing 146 (2014) 230–248 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom AgentTe...

2MB Sizes 0 Downloads 63 Views

Neurocomputing 146 (2014) 230–248

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

AgentTest: A specification language for agent-based system testing Marina Bagić Babac n, Dragan Jevtić University of Zagreb, Faculty of Electrical Engineering and Computing, Unska 3, HR-10000 Zagreb, Croatia

art ic l e i nf o

a b s t r a c t

Article history: Received 31 October 2013 Received in revised form 6 April 2014 Accepted 29 April 2014 Available online 21 June 2014

AgentTest is a sublanguage of TTCN-3, aimed at specifying and testing agent-based systems. Its major strength is its unification and automation of abstract test suite generation and structured testing methodology for agent-based systems. AgentTest enables formal and strongly typed modelling of agentbased systems and their unit and agent testing. This paper introduces the syntax and semantics of AgentTest language and its modelling and testing methodology through the example of an agent-based weather system. AgentTest methodology is also complemented with an evolutionary testing methodology for test case generation based on a multi-objective genetic algorithm. & 2014 Elsevier B.V. All rights reserved.

Keywords: Agent-oriented software engineering Model based testing Evolutionary testing TTCN-3

1. Introduction TTCN-3 is a testing language, aimed at writing test suites for general purpose applications. However, the existing language constructs provide only partial support for testing certain domain specific applications, for instance, agent-based system testing. In this paper, we take a small step towards TTCN-4, which could be a collection of domain specific testing languages instead of being one general purpose test language. We introduce AgentTest as a sublanguage of TTCN-3 which means that AgentTest fully inherits TTCN-3 syntax and semantics. However, it restricts some of its constructs and adds new distinctive agent concepts to support an agent-based system specification and testing. Testing agent-based systems is a challenging task because these systems are distributed, autonomous, and deliberative. Agents operate in an open world, which requires context awareness. Our motivation for extending TTCN-3 to agent-based system testing is driven by the following:

 to the best of our knowledge, at present there is no testing 



n

language for agent-based systems; so far, there is no standardized structured testing process within existent AOSE methodologies; AgentTest allows agent (test) specifications at different levels of abstraction, so is considered a new AOSE methodology for the specification and validation of agent-based systems; while most testing approaches cover only certain aspects of testing, AgentTest provides different aspects of testing such as Corresponding author. Tel.: þ 385 989141879; fax: þ385 16129832. E-mail address: [email protected] (M. Bagić Babac).

http://dx.doi.org/10.1016/j.neucom.2014.04.060 0925-2312/& 2014 Elsevier B.V. All rights reserved.

  

    

for functional and conformance testing, as well as for unit and agent testing with the potential to be used for integration and system testing; AgentTest provides us with test scripts at an abstract level, and therefore does not depend on any agent platform or implementation language; as AgentTest is built upon TTCN-3 as its sublanguage, it naturally benefits fully from TTCN-3; automated test oracle generation is considered a weakness of automated testing frameworks for agent-based system. However, the TTCN-3 template matching mechanism overcomes this problem; AgentTest supports an automated testing process. However it requires an extensive knowledge of TTCN-3; AgentTest/TTCN-3 provides us with an abstract test suite which clearly separates the testing code from the SUT, i.e. no augmentation code is inserted into the SUT; a measurement of test adequacy is built into the test system per se, i.e. explicitly setting the test verdicts; besides the purpose of writing abstract test suites, AgentTest is also a new modelling language for agent-based systems which is formal and strongly typed; by means of a rich TTCN-3 data type and template support, AgentTest supports semantic agent activities based on ACL (Agent Communication Language) message communication.

In order to develop such an agent testing environment, we are faced with different challenges. The first challenge is to provide unambiguous syntax and semantics for the new AgentTest language. The second challenge is to provide an agent operational infrastructure and implementation mechanism which is fully

M. Bagić Babac, D. Jevtić / Neurocomputing 146 (2014) 230–248

compatible with FIPA specifications for architectural requirements. As a consequence of the second challenge, the third challenge is the support of agent communication related issues via ACL messages. Fourthly, while testing agents, the outputs arising from the same inputs can be different for different executions, so it is important to evaluate not only the final results but also the adaptation of the agents under test over time. As a consequence, we need to collect sufficient and adequate data to properly evaluate an agent according to the defined criteria. Insufficient or inadequate test data can lead to incorrect conclusions. Therefore, we complement the AgentTest methodology with an evolutionary testing approach in that we generate test cases based on a multi-objective genetic algorithm. We define fitness functions to measure the quality of the chromosomes (test cases) based on the agent's efficiency and reliability soft-goals. The outline of the paper is as follows; in the second section, we provide a brief survey of the state of the art in the field of testing autonomous agents and multi-agent systems, and the comparison of the AgentTest to the eCAT and PDT test frameworks. In the third section, we introduce the AgentTest test system and configuration. In the fourth section, we explain the AgentTest methodology for test agent modelling with AgentTest syntax and semantics. In the fifth section, we provide the specification of an agent-based weather system by the current version of the AgentTest supporting tool. Testing environment is also complemented with test cases obtained from an evolutionary testing methodology. We have used NSGA-II, a multi-objective genetic algorithm to obtain test case population which enables testing agent's autonomy. In addition, we have provided a simulation-based performance analysis to evaluate the AgentTest methodology and the behaviour of the weatherAgent. A discussion of the test results with a conclusion and some directions for future work are given at the end to conclude the paper.

2. Related work In this section we provide a brief overview of the state of the art in testing autonomous agents and multi-agent systems. Recently, there have been testing approaches that complement analysis and design methodologies with the idea that multi-agent system behaviour can be dynamically evaluated by providing as input a set of test cases that are derived from analysis and design artefacts. Research efforts have addressed various aspects of testing agentbased systems, e.g. the correctness of inter- and intra- agent communication, external and internal agent action workflow, generation of test cases, control of test executions, and test adequacy criteria. However, none of the approaches has provided an explicit language for testing. Therefore, we first refer to the extensive work on the development and applications of TTCN-3, a general purpose testing language which we have extended to support the testing of specific features of agent-based systems. Willcock et al. [1] introduce TTCN-3 using the example of testing DNS service and SIP protocol throughout their whole work, [2] and [3] represent the standard TTCN-3 documents providing numerous small examples on each language concept. Stepien et al. [4], however, deal with the various applications of TTCN-3, particularly web testing [5] extending the language to objectoriented structure in [6]. Wu-Hen-Chang et al. [7] try to popularize the use of TTCN-3 by means of annotations as a modelling language to support test generation tools. Kumar et al. [8] report on the practical applications of model-based testing approach which leverages behaviour models to automatically generate a large set of test cases. Schieferdecker and Din [9,10] define a metamodel for TTCN-3 for integrating test but also system development techniques.

231

Besides the continuous growth and development of TTCN-3 in recent decades, various approaches to testing agent-based systems are of great interest. Agents are difficult to test because it is notoriously complicated to observe their proactive, autonomous and nondeterministic behaviours and hard to judge their correctness in dynamic environments. Testing an agent-based system depends on the internal or external architecture of the system, agent properties, communication, protocols and/or agent units. More recently, Padgham et al. [11] have developed a fault model based on the features of agent core units to capture the types of fault that may be encountered, and define how to automatically generate a partial, passive oracle from the agent design models. Their work is related to the Prometheus methodology which is based on the AgentSpeak(L) specification language theory in Rao [12]. A weakness of this approach is the code augmentation for test harness generation and that only agent units are tested. Previous work in Zhang et al. [13–15] and Poutakidis et al. [16] details the process of generating test cases and outlines the overall testing process. All the mechanisms are fully automated, but allow for user input at various stages. Recently, Wang and Zhu [17] have presented a tool called CATest, which the authors claim is only a part of CATE-Test, an automated testing framework for testing all agent and multi-agent levels, while CATest aims at agent unit testing. CATest is based on the SLABS formal specification language. It is used to check the correctness of an agent's behaviours recorded during test executions against formal specifications. Test adequacy is measured by the coverage of the specification and determined according to a set of adequacy criteria specifically designed for testing MAS. However, a disadvantage of the approach is that the invocations of the testing library methods are inserted into the source code of the AUT. We overcome this issue with AgentTest, as we separate the testing code from the AUT's source code. In addition, test case generation is left as an open issue, but most of existing test automation frameworks have the same weakness. We overcome this issue using an evolutionary testing for test case generation to complement AgentTest. Similar to our work, Ekinci et al. [18] consider agent goals as the smallest testable units in MAS, and propose testing these units by means of test goals. Each test goal is conceptually decomposed into three sub-goals: setup, goal under test, and assert. The first and last goals prepare the pre-conditions and check the post-conditions respectively, while testing the goal under test. However, besides goals, we also consider roles, plans, actions and beliefs. The approach of Caire et al. [19] introduces new design artefacts that contain additional details which are used in testing. Their approach derives test cases from (additional) detailed design artefacts called multi-agent zoomable behaviour descriptions which are based on UML activity diagrams. However, user intervention is required to derive test cases from the diagrams. There has also been some work on test coverage criteria. Low et al. [20] proposed a set of coverage criteria defined on the structure of plans for testing BDI agents. Miller et al. [21] proposed test coverage criteria for agent interaction testing. However, most of the approaches inherit the fundamental weaknesses of object-oriented testing automation frameworks, i.e. manual coding of test classes, lack of support for the measurement of test adequacy and weakness in the support for correctness checking [17]. A survey on testing agent-based systems by Nguyen et al. [22] classifies papers on the basis of their testing objectives, and test subjects into the following categories (here we highlight only a few relevant papers for each category):

 unit testing: Zhang et al. [13–15], Ekinici et al. [18], Nguyen et al. [23], Tyraky et al. [24];

232

M. Bagić Babac, D. Jevtić / Neurocomputing 146 (2014) 230–248

 agent testing: Núñez et al. [25,26], Coelho [27,28], Gómez-Sanz   

et al. [29], Nguyen et al. [30,31], Lam and Barber [32], Gardelli et al. [33], Fortino et al. [34]; integration (or group) testing: Serrano and Botia [35], Sudeikat and Renz [36]; system (or society): De Wolf [37], Mani et al. [38], Seo et al. [39]; acceptance testing: Nguyen et al. [40].

Some of these papers go beyond testing only one of these categories and here we have indicated only their primary focus. The widest range of testing from this set is covered by the eCAT tool in Nguyen et al. [40]. In the context of the state-of-the-art methodologies, we have compared the most prominent existing model-based methodologies for testing agent-based systems to the AgentTest methodology. These are the Tropos and Prometheus methodologies, i.e. eCAT [40] which is built on top of the Tropos methodology, and PDT [15] which is built on top of the Prometheus methodology. We have focused on these two tools since they both provide a test framework for testing agent-based systems as the AgentTest does. The eCAT framework uses hard-goals and soft-goals for deriving test cases from goal analysis. Soft-goals have no clear-cut definition and/or criteria as to whether they are satisfied. Softgoals are useful to specify non-functional requirements. Goals are analyzed from the owner actor perspective through AND, OR decomposition; there is a means-end analysis of plans and resources that provide means for achieving the goal; and a contribution analysis that points out hard-goals and soft-goals that contribute positively or negatively to reaching the goal being analyzed. Goal–goal or goal–plan relationships are classified into elementary relationships and intermediate relationships. In order to test this kind of relationships, the execution of the plan corresponding to a goal is triggered and checked through assertions on the expected behaviour [23]. In order to perform fault-directed testing, PDT identifies the units to be tested and possible points of failure for each unit that are independent of the implementation. An abstract testing framework for a plan unit has two distinct components, the test-driver and the subsystem under test. The test-driver component contains the test-agent, testing specific message-events that are sent to and from the test-agent, and a plan (test-driver plan) that initiates the testing process. This plan is embedded into the subsystem under test as part of the code augmenting process. For testing a plan, the key units are the plan itself, and its triggering event. For testing an event the key units are the event and all plans for which that event is a trigger. For testing a plan cycle the key units are all plans in the cycle and their triggering events [13]. Feature comparison of the PDT, eCAT and AgentTest methodologies is given in Table 1. While similar in many aspects, PDT and

eCAT mainly differ in the initial phase of test case generation. While the PDT takes the smallest agent decomposition units to provide test cases, the eCAT has a focus on the stakeholder's requirements and goals. This implies that the PDT shows better performance for unit testing, while the eCAT is better in agent and system testing. Compared to these two methodologies, AgentTest is a new concept in that it clearly separates the agent testing code from the agent implementation code. Therefore, it also has different mechanism for test oracle generation. Obtaining a final verdict of the AgentTest test suite is completely automated, similar to that of TTCN-3 upon which the AgentTest is built.

3. AgentTest test system and configuration Following the generally accepted scope of the definition of an agent [41,42] and prominent agent-oriented software engineering methodologies like Prometheus, Tropos, Gaia and MaSE [43–46], we have chosen an agent modelling approach which focuses on the most commonly used concepts defining an agent, i.e. its autonomous, intelligent and social character, its BDI architecture and its proactive and reactive influence to the environment. Since AgentTest inherits the TTCN-3 testing environment, we have focused on modelling agent constructs in the spirit of TTCN-3. It is important to determine agent concepts defining the agent itself and its interface with the testing environment. Therefore, an agent should obtain both testing characteristics and abilities that are common to ordinary agents. AgentTest allows for the dynamic specification of concurrent test configurations with agents situated in a testing environment. A configuration consists of a set of inter-connected test (agent) components with well-defined communication ports and an explicit test system interface which defines the borders of the test system. The architectural viewpoint of such a testing environment provided by an execution of a test case example is shown in Fig. 1 (based on [47,52,53]). Test cases invoked from the TTCN-3 module's control part define the behaviour of the test system. In Fig. 1 there is an arbitrary collection of one Main Test Component (MTC), two Parallel Test Components (PTCs) and three Agent Test Components (ATCs) each having port-based connections with other components. As for the ordinary TTCN-3 system there is at least one test component in the system, namely MTC, and an arbitrary number of PTC and/or ATC components. The MTC is created by the system automatically at the start of each test case execution. The behaviour defined in the body of the test case is executed on this component [2]. Each agent is embedded in one ATC. An agent can act both as a test agent communicating with Agent(s) Under Test (AUT, or SUT for System Under Test as is common for TTCN-3 testing) or as a kind of mock agent performing the actions needed for testing

Table 1 Feature comparison of the PDT, eCAT and AgentTest methodologies. Methodology's feature

AOSE methodology Levels of testing Generation of test cases Test case specification Test data generation Test execution/management Communication with SUT SUT code augmentation Test oracle generation

Aspects from methodologies PDT

eCAT

AgentTest

Prometheus Unit Design-based Tool dependent Unit dependent Semi- automated Direct Needed Semi-automated

Tropos System, agent Evolutionary Tool dependent Goal dependent Semi-automated Direct Needed Semi-automated

TTCN-3 Unit, agent, acceptance Evolutionary Tool independent TTCN-3 templates Automated Encoders/decoders Not needed Automated verdicts

M. Bagić Babac, D. Jevtić / Neurocomputing 146 (2014) 230–248

233

Fig. 1. AgentTest test configuration.

Table 2 Agent test component specification. type agentcomponent HelloAgentComponent { agentport agentTestPort; port ttcnPort; }

against the AUT. The agent description in AgentTest uses the same syntax in both cases. The dotted directed arrows in Fig. 1 represent the process of component creation and its startup, e.g. MTC creates one of the PTCs and one of the ATCs. The full lines between the pairs of circles represent the communication channels among the components. Communication between the test components and between the components and test system interface is achieved via communication ports. The circles represent ports which we do not distinguish by type in this figure. However, ATC components are allowed to use ports of the agent port type, as these are specifically designed to carry only ACL (Agent Communication Language) messages to be exchanged among communicating agents. As agent communication is specific in that agents communicate only via message exchange, the reasoning regarding agent goals, plans and beliefs is mapped onto the ACL message processing. Therefore, an ATC is a structural frame which describes the agent interface, i.e. the agent ports towards its environment with other test agents and with the SUT. Agent ports differ from standard ports in that they support only message-based communication via ACLs. Consequently, the msg keyword can be omitted from the agent port type definition, as it is implicitly assumed. The rest of the agent port specification is the same as for standard ports. Therefore, an agent port can also be used on standard component types. Being important and specific concepts of agent-based system architecture, we define an agent component and an agent port as special TTCN-3 types, and put the keywords agentcomponent and agentport into the AgentTest vocabulary. We allow the agent components to hold ordinary port types besides agent ports in order to comply with TTCN-3 component types, and to allow agents to communicate with third party entities in a generic manner. The agent component type is allowed to have its own constants, variables and timers. An example of how to use an agent port and an agent component type is given in Table 2, where an agent component

HelloAgentComponent is specified (here, and for the rest of the AgentTest code in the paper, we have bolded the language keywords). This agent component is defined by the interface of two communication ports, one of which is a TTCN-3 port named ttcnPort, while the other is an agentport type named agentTestPort. An agent component is generally allowed to have multiple agent ports which we distinguish by their names, i.e. there are no restrictions on the number of connections an agent component may maintain. Allowed and disallowed connections comply with TTCN-3 configurations [2]. The actual configuration of standard and agent components and the connections between them is achieved by performing create and connect operations within the test case behaviour. The component ports are connected to the ports of the test system interface by means of a map operation. We extend the TTCN-3 set of component states: {alive, running, done, killed} with additional ones for agent component states, i.e. {initiated, active, transit, suspended, waiting}, to comply with FIPA agent states [48].

4. Specifying agents with AgentTest methodology Each agent definition must refer to an agent component type on which the described behaviour is to be executed. As far as test cases are concerned, these are done with a runs on clause in the agent definition. AgentTest methodology describes agents in a top-down manner indicating that the modelling process starts from a high level of abstraction, thus following the usual approach of AOSE modelling methodologies [54]. The methodology's functional flow graph is given in Fig. 2. The notation we use in Fig. 2 refers to common agent semantics; the Roles and Goals are the descriptions of agent behaviour, while the Plans and Actions are means to achieve them. The roles group a set of root goals, while the root goals themselves have a tree structure with a set of leaf goals each related to one or more plans. Context refers to the current belief set of an agent, and the triggering Events refer to the atomic actions which initiate the change in agent behaviour. While an agent test component is an agent holder (or agent wrapper) describing its interface with the environment, agent dynamic behaviour is described in terms of its roles (or capabilities in Prometheus methodology). Each agent lives in an organization

234

M. Bagić Babac, D. Jevtić / Neurocomputing 146 (2014) 230–248

4.1. Agent goal hierarchy Speaking in terms of BDI agent architecture [41], goals are specifications of an agent's desires. A goal is a state of the system which the agent wants to bring about [12]. In AgentTest goal hierarchy is defined using a type goal section. This section holds a set of the goal's subgoals which are also referred to as goals, and/or a set of agent plans to achieve this type of goal. A leaf goal is defined as a goal with no subgoals. Instead, a leaf goal is defined in terms of agent plans as a means of achieving leaf goal functionality. Each (sub)goal is implicitly assigned a set of attributes which can be obtained using the following functions:

 getGoalState – returns the current state of the goal,

      Fig. 2. AgentTest methodology for an agent specification.

Table 3 An example of agent behaviour specification. agent agentHelloWorld runs on HelloAgentComponent { RoleHello role_Hello; OrgHello org_Hello; }

(a group of ATCs) performing its dedicated roles. An example of agent behaviour specification is given in Table 3. An agent named agentHelloWorld performs a role role_ Hello of role type RoleHello and is situated in an organization named org_Hello of organization type OrgHello. The agent is executed in the HelloAgentComponent agent component. Here, we have also introduced new naming conventions for the role and organization instances, i.e. a role_ prefix for the role instance, and an org_ prefix for the organization instance. Each role type is composed of a set of agent goals which are the descriptions of the actual tasks an agent can perform. The main difference between the role and goal type is that the role type is a means of grouping related goals together to achieve certain behaviour, while the goal type is a structured composition of subgoals and plans specifying a way to achieve a set of specific functionalities. While a role instance is merely a frame which perceives its goals as equals, a goal instance has its internal state, its priority tag and strongly defined relations with other goals and/or subgoals within the same role.

i.e. one of the values from the set: {idle, initiated, running, suspended, resumed, achieved, failed}; idle is the default value for the goal state variable; isLeafGoal – returns true/false, indicating whether the goal is a leaf goal in the goal hierarchy; true is the default value; isRootGoal – returns true/false, indicating whether the goal is a root goal (a goal with no parent goal) in the goal hierarchy; true is the default value; getParentGoal – returns the parent goal (of the goal type) of the goal; a null is the default value; getChildrenGoals – returns the children goals (of the goal type) if any; a null is the default value; priority – returns the priority status of the goal from the set of goal priorities: {low, normal, high}; normal is the default value; getGoalVerdict – returns the goal verdict depending of the current state of the goal and its dedicated subgoals, plans, actions and beliefs; as for the TTCN-3 verdict assigned to each test, the goal verdict is also of the type verdicttype and obtains its value from the same set {pass, fail, inconc, none, error} [2]; none is the default value.

When specifying a goal hierarchy, these functions are implicitly called and the set of goal attributes is set to their default values unless specified otherwise. Examples of role type and goal type specifications are given in Table 4. Role type RoleAB achieves two goals, namely goal_goalA of goal type GoalA and goal_goalB of goal type GoalB. Goal type GoalA uses planA as a means of achieving a goal goal_goalA. Goal type GoalB is more complex in that it comprises two subgoals, namely goal goal_subgoalB1 of goal type SubgoalB1 and goal goal_subgoalB2 of goal type SubgoalB2. In addition, these subgoals are leaf goals and Table 4 Agent role and goal type specification. type role RoleAB { GoalA goal_goalA, GoalB goal_goalB } type goal GoalA { plan_A } type goal GoalB{ SubgoalB1 goal_subgoalB1, SubgoalB2 goal_subgoalB2 } type goal SubgoalB1 { plan_B1 } type goal SubgoalB2 { plan_B2 }

M. Bagić Babac, D. Jevtić / Neurocomputing 146 (2014) 230–248

therefore contain plan references. Plan plan_B1 is a means of achieving goal goal_subgoalB1, and plan Plan_B2 is a means of achieving goal goal_subgoalB2. The reason for specifying goals and roles as AgentTest types is that in general the same role can be obtained using different goal instances of the same goal type, and two different goal instances of the same goal type can hold different properties, e.g. a state or a priority, and so are distinguished by different name references. In addition, defining types instead of concrete values enable reusability of role and goal types. With such a defined goal hierarchy it is possible to redefine role types depending on the purpose of individual agents, i.e. a redefined role type can hold both new and already defined (‘old’) goals. In this way, we enable the reuse of the goal types and their nested code, e.g. plans and actions, for other agent specifications. For instance, a new role type RoleABC can hold goals of goal types GoalA and GoalB with the addition of a new goal of goal type GoalC.

4.2. Agent plans, actions and beliefs While agent roles are used for goal grouping, and a goal hierarchy represents an agent's architecture embedded within an agent test component, an agent's plan refers to an agent dynamic behaviour. A plan is a behavioural unit which wraps a set of dedicated actions. We have distinguished between different plan executions by their triggering events and current agent mental state. In AgentTest, a plan consists of implicit triggering event(s) and explicit action(s) the agent should achieve or the action(s) the agent should execute. The plan is considered a means of achieving the goal to which a plan is attached. In addition to actions, a plan can also have internal variable, constant and timer definitions which are visible to all actions within the same plan. An action reference is given in plan definition scope, while an action body is written in a separate (action definition) section. The simplest plan holds only one action which is invoked on plan execution. For instance, plan_Iota in Table 5 holds only one action reference action_Iota(). However, a plan can embed an arbitrary number of actions which can be composed via logical and/or operators, depending on the goal fulfilment criteria. For instance, in order to satisfy plan Plan_ABorC given in Table 6, both actions action_A() and action_B() must be executed because they are connected via an and operator. Alternatively, an action_C() must be executed to achieve the same effect. However, during the plan execution it is possible for the plan to be interrupted or aborted, so it is considered fulfilled only upon the execution of all referred actions.

Table 5 An example of an agent plan specification. plan plan_Iota { action_Iota() }

Table 6 An example of an agent complex plan specification. plan plan_ABorC { (action_A() and action_B()) or action_C(); }

235

Note that the plan specification contains only action references, while the actual action behaviour is defined in the separate (action) section. AgentTest plan modelling is reminiscent of class modelling from object-oriented programming because a plan holds a set of properties (variables, constants, timers) and a set of ‘methods’ (i.e. actions) in the some way that a class does. However, unlike a class's methods, a plan's actions are related to each other and they are not explicitly invoked. Therefore, we use the term plan to distinguish it from traditional OO programming concepts. Furthermore, each plan is assigned a plan state. A getPlanState function returns the value of a plan state from the following set of values: {idle, initiated, active, suspended, resumed, achieved, failed}. These values are implicitly assigned to a plan depending on the current execution state of the plan, but can also be explicitly assigned. Depending on the goal's and plan's states an agent is assigned an agent state from the set of values: {initiated, active, transit, suspended, waiting} which are compliant with FIPA specification [48]. Each plan is performed in a certain test system execution context. The context is defined in terms of agent beliefs, i.e. each agent holds an internal belief base. Beliefs define the agent's mental state which is influenced by the environment, and these changes are perceived through agent ports via ACL messages [51]. It is also possible to influence an agent belief base through an agent internal action, in which case the port reference is not used, e.g. when an agent discards a belief on the basis of a timeout. AgentTest is compliant with the FIPA specification for agent semantics [49], so agent beliefs are based on first-order logic and propositions which hold a truth value, i.e. these propositions are either true or false. An example of an inform ACL message holding a belief is shown in Table 7. A performative field refers to an enumeration value from the fixed set of performative values (Appendix A), while the content refers to another message template (template reference) due to the complexity of ACL semantic message (Appendix C). Actions are situated at the core of a plan and an action is the smallest behavioural unit whose execution cannot be interrupted. As for beliefs, these are influenced by the environment and act upon the receipt of ACL message(s). For action specification we use an action keyword, by means of which we override the action definition in [2], where an action is loosely defined (it refers to an external event). In AgentTest, we use an action as a milestone for an agent plan definition. An action's name follows the action keyword, and rounded parentheses hold a set of the action's triggering events, each of which causes the action performance. That is, the receiving statements, as action arguments define the events causing the action body to execute. These arguments are implicitly related by logical OR relation, i.e. any of the listed arguments can cause the action performance. If rounded parentheses hold no arguments, then the action is considered a sending action or an internal agent action. Although similar in that they both express a computational and behavioural unit, an AgentTest action and TTCN-3 function are two different concepts and the main differences between the two are:

 a function is declared by formal parameters of any data type and optionally a return value, while an action is defined in terms of triggering events in agent ports; Table 7 An example of an ACL template specification carrying a belief. template ACL m_informWeather: ¼ { performative: ¼ e_inform, content: ¼ m_weatherTodayRaining }

236

M. Bagić Babac, D. Jevtić / Neurocomputing 146 (2014) 230–248

 a function can return values, while an action does not return 

  

any value; a function is explicitly invoked for execution, while an action is implicitly invoked on triggering events, i.e. when a message arrives at a triggering port, an action removes the top message from the associated incoming port queue and processes it; a function can be invoked from within an action as a computational unit, while a function cannot invoke an action; a function can invoke another function or itself, while an action cannot invoke another action or itself; an action runs on an agent type component while a function runs on any component type.

The rest of the action body is a statement block, but can also contain function, altstep or testcase invocations. An example of a single-parameter action is given in Table 8. Here, an action named action_A has a single parameter specifying ACL message receipt at port agentportA, which is then stored in a variable named myMsg. Upon receipt of the specified message, the statement block following the action signature is executed. In this specific case, the internal action variable named internalState is declared and initialized to 0. The computation code of if-statement is then performed and the action returns, allowing the next action to execute. An action can also hold multiple parameters. An example of such an action is the two-parameter action shown in Table 9. Here, an action named action_AB is triggered upon receipt of either the message myMsg at the port named agentportA, and/or the message myHello at the port named agentportB. If these conditions are fulfilled, then the statement block if-else is executed, setting the appropriate test verdict. Besides ACL message triggering, an action can be invoked upon goal invocation, i.e. when an agent is assigned a task to fulfil a certain goal. Note that we have used naming conventions for the goal, plan and action definitions, i.e. a goal_ prefix for the goal instance, a plan_ prefix for the plan instance, and an action_ prefix for the action instance. Their purpose is to improve the readability of the AgentTest code. Once we have specified agent roles with a goal hierarchy and a full nested structure of subgoals, plans and actions, we can run

Table 8 A single-parameter action specification example. action action_A (agentportA.receive(ACL:?)- 4value myMsg) { var integer internalState:¼ 0; if (myMsg.performative ¼ ¼ e_inform) { internalState ¼ 1; } }

our AgentTest test environment. During testing, test verdicts are assigned by test cases and test components. Additionally, each goal, plan and action embedded in an agent test component also influences the overall agent verdict. This dependency is shown in Table 10, where the goal/plan/action verdict column refers to multiple goal/plan/action units within a single agent, e.g. taking any goal from an agent specification, any failed plan within that goal causes the goal to fail. The n symbol in the table indicates any of the possible verdict values. In addition to these dependencies, the test verdict assignment also depends on the state of its dedicated unit within an agent. The agent/goal/plan/action states through which an agent flows during its life cycle are shown in Fig. 3.

4.3. Agent execution Independent of an agent specific purpose in the test system, an agent executes its portion of internal actions while reacting to environmental changes. The internal agent architecture is shown in Fig. 4 where ACL message processing units are connected to support an internal agent execution triggered either by an ACL incoming message from another agent, or by goal invocation. Upon the arrival of a new message in the agent's MessageQueue, AgentTest takes a system snapshot in order to preserve the current state of execution. The message's performative indicates whether the message affects the current belief base (if the performative is from the InformACL set, e.g. inform performative), and the belief set is updated accordingly. Next, the Goal Processor checks if the message affects the current goal execution. If it does not, then the execution proceeds without change. Otherwise, the Goal Processor rearranges the GoalQueue list according to the new goal priorities, and proceeds with execution. In addition, the Plan Scheduler determines whether the message has introduced any change of plans, implying the action (re)arrangement of the Action Selector. Based on the processing of the Goal Processor, Plan Scheduler and Action Selector, an agent might compose a new message to send as a reply to the received message. The new composed message is then sent to the outgoing queue of the indicated agent port.

Table 10 Agent verdict dependency on goal, plan and action verdicts. Goal verdict

Plan verdict

Action verdict

Agent verdict

Pass Fail Inconc None Error

Pass

Pass

n

n

Pass, inconc, none Pass, inconc, none

Pass, inconc, none Pass, inconc, none

n

n

Pass Fail Inconc None Error

Table 9 A multiple-parameter action specification example. action action_AB

(agentportA.receive(ACL:?)- 4value myMsg; agentportB.receive(ACL:?)- 4 value myHello;) { if(myMsg ¼ ¼ myHello){ setverdict(pass); } else { setverdict(fail); }

}

M. Bagić Babac, D. Jevtić / Neurocomputing 146 (2014) 230–248

237

Fig. 3. Unit state flow during an agent execution.

weatherAgent, provided test oracles and evolutionary generated test cases for the testing, and evaluated the results. 5.1. Specification of an agent-based weather system In order to improve the readability of AgentTest code, we have shown the goal hierarchy for the weatherInformer role in Fig. 5. The circled nodes represent the (sub)goals and the hexagons represent the means-end of the leaf goals, which are the plans for goal achievement. Fig. 5 corresponds to Table 11, which groups the three root goals (goal_get of goal type getWeatherInfo, goal_maintain of goal type MaintainWeatherInfo and goal_disseminate of goal type DisseminateWeatherInfo) into the WeatherInformer role. The rest of the relevant AgentTest code for the weatherAgent is provided in Appendix B. 5.2. Test oracle generation

Fig. 4. An agent's internal architecture.

5. Evaluation of the AgentTest methodology In this section we provide two aspects of the test results. The first aspect is the evaluation of the AgentTest methodology itself, and the second is the test analysis of an example of an agent-based system developed and verified by AgentTest methodology. We have taken a weather system as a case study of an agentbased system specified and tested in AgentTest. The weather agent provides a weather service. It communicates with a number of weather servers in order to obtain a number of weather forecasts. Based on the collected information and its own table of reliabilities for each server, it calculates its own weather report. In order to test the weather agent, we have provided a test agent (named weather Agent) and testing environment in AgentTest. We have specified the

As AgentTest inherits TTCN-3, it provides us with TTCN-3 message templates as test oracles. The real power of TTCN-3 templates lies in their ability to specify multiple values or variations of a message within a single definition, which can then be used in a receive statement. During test case execution, whenever a test system receives a message, the template matching mechanism (which is implemented by the AgentTest/TTCN-3 tool) checks the incoming value against the expected message definition, specified by the template. This feature allows the user to handle complex messages by only focusing on the relevant parts and ignoring information which is not of interest in a given test situation [1]. Based on the test oracle matching, test verdicts at various levels are obtained. Since agents communicate via ACL message exchange, the units for testing are extracted from and influenced by these messages. Simple agent actions do not require the complex content section of an ACL message. However, to support various kinds of action, semantic content is introduced [55]. For FIPA SL compliance [49], we have developed TTCN-3/AgentTest data types to support the semantic ACL content section. The full collection of data types are given in Appendix C (FIPA SL1 – Propositional Form). These types are used for semantic test oracle generation, i.e. ACL templates using wildcards, value lists, and value ranges when necessary.

238

M. Bagić Babac, D. Jevtić / Neurocomputing 146 (2014) 230–248

Fig. 5. Architecture of weatherAgent: a goal hierarchy for the weatherInformer role.

Table 11 Role specification for weatherAgent. type role WeatherInformer { GetWeatherInfo goal_get, MaintainWeatherInfo goal_maintain, DisseminateWeatherInfo goal_disseminate }

Template referencing and template parameterization are also used when appropriate. An example of a simple test oracle for testing weatherAgent is given in Table 12, where data types from Appendix C are used. For the specification of test agents and test oracles in AgentTest we have used our initial version of the AgentTest Tool (ATT) that has been developed in Java programming language [50]. The tool supports AgentTest methodology for testing agent-based systems. The tool's benefits are summarized as follows:

 the support of automated modelling, simulation and testing of agent-based systems;

 a user-friendly graphical interface which facilitates the agent specification and testing process to non-TTCN-3 experts;

 a hierarchical design of agent architecture allowing different levels of abstraction in the modelling process;

 a structured approach to agent-based specifications and testing;

 code modularity of agent, role and goal types;  highly reusable code, e.g. a goal type is defined at the system

 

level, so it can be multiply reused by different kinds of agent, i.e. not only by different instances of the same agent type, but also by agents of different agent types; the same approach applies to defined actions, plans and role types; a clear separation of data types from control execution; besides test case and test component verdicts, verdicts are also obtained at the agent, goal, plan and action levels, and together with agent/goal/plan states indicate test results.

An agent action specification facility in ATT is shown in Fig. 6, while the agent specification process is illustrated in Fig. 7 where an action act_waitResponse is defined, and the plan name, action type, ACL and verdict are given in advance for the user to choose (because they have already been specified in the previous steps). The table at the bottom of the figure lists the already defined actions for the user to keep track of them. The current version of ATT supports the specification of agents and trace of test execution which indicates verdict listings. However, in order to take more experiments of the AgentTest methodology than only with its own supporting tool, we also use other methodologies and tools to evaluate our approach. Besides for the ATT, the TTCN-3 code segments are also verified by Loong testing tool [55]. Then, the overall methodology is verified by timed coloured Petri nets and CPN Tools [60]. Moreover, to enhance our approach we have used an evolutionary testing methodology for test case generation based on a multi-objective genetic algorithm.

M. Bagić Babac, D. Jevtić / Neurocomputing 146 (2014) 230–248

239

Table 12 An example of a test oracle for weatherAgent. template Term m_termToday: ¼ { termConst: ¼ { stringConst: ¼ "today" } } template Term m_termRaining: ¼ { termConst: ¼ { stringConst: ¼ "raining" } } template TermSequence m_todayIsRaining: ¼ {m_termToday, m_termRaining} template Predicate m_predicateRaining: ¼ { predicate:¼ "weather", terms:¼ m_todayIsRaining } template AtomicFormula m_wffWeather: ¼ { predicate: ¼ m_predicateRaining } template ContentExpression m_contentRaining:¼ { proposition: ¼ { atomicFormula:¼ m_wffWeather } }

Fig. 6. GUI of AgentTest tool: choosing to specify test agent's actions.

Fig. 7. GUI of AgentTest tool: specifying test agent actions.

5.3. Test case generation based on evolutionary testing The syntax and semantics of AgentTest provide the testing code for the agents under test and the automated test execution within

predictable environment. However, the test case scenarios are mostly influenced by the tester, and no indication of test scenario generation that is more likely to detect faults in a real-time agentbased system is given by AgentTest. Moreover, an agent's autonomy feature is not completely covered by AgentTest. Due to autonomous nature and nondeterministic decision-making, an agent may make different choices given the same subsequent situation. An AgentTest fixed oracle which evaluates the output of an agent may have difficulty in dealing with such varied responses. We need to include the agent's autonomy and adaptability while testing its responses. Therefore, we have complemented the AgentTest methodology with an evolutionary testing methodology in order to produce dynamic testing environment and more specific test scenarios for the agents under test regarding their autonomy. We have followed the approach from [56] where soft-goals play a key role in representing non-functional or quality stakeholder requirements, and test case fitness measures are derived from these requirements and used as evaluation criteria for automated test case generation. An example of stakeholder's soft-goals for the weatherAgent is given in Fig. 8. These are the efficiency and reliability soft-goals which give an agent a certain level of autonomy. These soft-goals have not been introduced by AgentTest as they specify less formal and not strictly defined qualities of an agent.

240

M. Bagić Babac, D. Jevtić / Neurocomputing 146 (2014) 230–248

where T now is a current model time while running the system, T ilast is a model time of the last update of the weather data item Di , and Di is the weather data item. The longer the agent waits for the information update, the lower f delta will be. This is better from the testing perspective, as the agent's accuracy decreases and the test case forces an agent to exhibit fault behaviour. There is also a threshold defined, i.e. a maximum time interval between the two subsequent updates denoting that the information an agent contains is considered obsolete. The weatherAgent communicates with a number of servers, each providing it with weather information with a certain probability. Therefore, the second fitness function is defined as follows: f weather ¼ Fig. 8. Example of stakeholders' soft-goals for the weatherAgent.

We summarize the methodology for testing autonomous agents from [56] in the following steps: 1. Definition of quality functions as evaluation criteria 2. Encoding of test inputs 3. Evolutionary test case generation 3.1. Generation of initial population 3.2. Execution and monitoring 3.3. Collection of observed data and fitness values calculation 3.4. Reproduction 4. Analysis of output data

We have used a fast elitist multi-objective genetic algorithm NSGA-II [57] for producing test case populations. Our experiment is based on the evolutionary algorithm for finding the optimal solution for multiple objectives, i.e. Pareto front for the objectives [58]. The quality functions derived from the soft-goals have been used as objective functions to guide the search towards generating more challenging test cases. The population size and objective functions are identified as initial conditions. Then, meeting the stopping criteria or the total number of generations stops the algorithm [59]. During the reproduction phase, test cases that have good fitness values are selected. The crossover operation is used to produce new offspring and a mutation is applied with a certain probability to some selected offspring. As with natural evolution, selection is biased in favour of fitter individuals. In order to test efficiency and reliability of the weatherAgent, we have created an artificial dynamical timed environment to monitor the weatherAgent's adaptability to the changes. Agent knowledge about the world evolves, therefore it makes different decisions upon the same input triggering events. In addition, test cases form a population which is evolving over time, and according to the evolution theory, the fitter individuals, i.e. those indicating greater probabilities of discovering distorted agent behaviour, are chosen for the next generation. Fitness functions evaluate each individual based on the calculation of its characteristics. Efficiency soft-goal is a measure of an agent's updating weather information ratio, i.e. it indicates how often an agent refreshes the weather information it maintains. A fitness function f delta for the test case measurement of weatherAgent's efficiency is defined as follows: f delta ¼

n2 ∑ni¼ 1 qi ∑ni¼ 1 Di

where n is a number of servers an agent communicates with, pi is a probability value for the weather data item specified by the ith server, qi ¼ 1  pi , and Di is a particular weather data item. In meta-heuristic genetic search algorithms, each individual is encoded as a chromosome containing a set of genes. For testing an agent, we have encoded each environmental artefact by means of one gene, where each property of the artefact is encoded as one part of the gene. We have encoded each gene as a vector of real numbers in the specified range of values. Then, a test case, consisting of a set of all investigated artefacts, is encoded as a chromosome: fconf iguration ½n; probabilities ½n; time½n; data½ng The goal of the evolutionary testing is to search for test cases, such that f delta and f weather are optimal. Regarding the parameters used with NSGA-II, we have chosen for our experiments the following values: Population size Maximum number of generations Mutation probability Crossover probability

Fig. 9 shows the improvement of fitness values over 50 generations. The x axis represents values of f delta function, while the y axis represents values of f weather function over 50 test case generations. A generic multi-objective optimization solver has searched for non-dominated solutions that correspond to tradeoffs between these two objective functions. The obtained Pareto

1 ∑ni¼ 1 ðT now T ilast ÞDi

100 50 10% 90%

Fig. 9. Pareto front for the 50th test case generation.

M. Bagić Babac, D. Jevtić / Neurocomputing 146 (2014) 230–248

set has revealed certain test scenarios for the weatherAgent fault detection, i.e. certain values of data items and time intervals for which the weatherAgent would exhibit the most vulnerable behaviour. The chance of detecting the same fault when only one of these objectives holds regardless of the other would be lower by default. 5.4. Evaluation results and discussion We have evaluated the AgentTest methodology using timed coloured Petri nets [60]. Efficiency of the approach is evaluated by simulation-based performance analysis of CPN Tools [62]. Using CPN monitors we have obtained typical performance measures which include average queue lengths, average delays, and throughput. Simulation-based performance analysis also provides statistical investigation of output data, the exploration of large data sets, appropriate visualization of performance-related data, and estimating the accuracy of simulation experiments [60]. The simulation output presented in this section is based on the following settings of the input parameters for an agent; ACL message and goal arrival is modelled using Poisson distribution; plan and action execution is modelled using uniform distribution. The simulations have used independent and identically distributed data which means that all simulations were run under the same initial conditions, and they all stopped when the same stop criterion was fulfilled. We have obtained these by means of simulation replications in CPN Tools. Timed CPN model of the AgentTest methodology is shown in Fig. 10 and it follows our approach illustrated in Figs. 2 and 3,

241

where an agent modelling and its internal architecture are shown, respectively. The Goal Processor, Plan Scheduler and Action Selector are decomposed into a set of CPN places and transitions. The places are given timed colour sets in order to measure timed token flow throughout the model. Also, goal/plan/action transitions are given time stamps in order to simulate their processing capabilities. Input and output queues for Goal Processor, Plan Scheduler and Action Selector enable the measurement of queuing and throughput parameters. Accumulation of tokens at various points of a token flow is also measured. The processing starts with an incoming token of ACL message, and depending of selected mode, the message is decomposed and processed according to the current execution of goal/plan/action units, or the new goal is invoked and the rearrangement of the current execution of goal/ plan/action unit is required. The CPN model based on [61] for simulation-based performance analysis shown in Fig. 10 is used for the evaluation of the AgentTest methodology itself, i.e. to investigate properties and quality functions of our structured agent modelling and testing methodology. In addition, the CPN model has also served as an implementation of the weather agent we have tested with AgentTest weatherAgent. The first set of results refers to quality functions which have obtained the statistical data from the model using CPN monitors. As an example of obtaining such a data set, we have taken the Goal Processor as the agent unit with its own test oracle, verdict and state. We have simulated the continuous incoming goal requests for the weatherAgent and its behaviour upon these requests. Depending on the time intervals between such requests, the time

Fig. 10. CPN model for simulation-based performance analysis of the AgentTest methodology.

242

M. Bagić Babac, D. Jevtić / Neurocomputing 146 (2014) 230–248

requirements for processing of each of the specific goal, and the time for inter-communication between the Goal Processor, Plan Scheduler and Action Selector, the Goal Processor indicates various time consumption periods per goal. Distribution of these periods is shown in Fig. 11. There, the x axis shows the ordering of the goal in the incoming goal queue, and the y axis shows the average model time needed for the goal consumption. Results are based on ten simulation replications that have been run to obtain independent and identically distributed values. It can be seen from Fig. 11 that owing to new goal arrivals occasional high picks of goal processing time are observed. In those occasions a current goal is put on hold while the goal of higher priority is given precedence. Therefore, the suspended goal continues its execution when resumed, but its processing time has increased. Obtaining statistically reliable results, we also need a way to determine the accuracy of the estimates. Confidence intervals are a commonly used technique for determining how reliable the estimates of performance measures are. A 90%, 95%, and 99% confidence intervals are intervals which are determined such that there is a 90%, 95%, and 99% likelihood that the true value of the performance measure is within that interval, respectively. We have shown an average goal processing time for these confidence intervals in Fig. 12. Using the above approach similar measures for the Plan Scheduler and Action

Selector are obtained. Also, any place and/or transition measure of the AgentTest CPN model can be represented in the form of Figs. 11 and 12. Next, we have investigated the fault detection behaviour of the weatherAgent. We have introduced artificial faults and errors in our timed testing environment. We refer to failures caused by agent distorted behaviour as faults, and failures caused by system problems as errors. Both are caught by the weatherAgent. The faults are further classified into internal and external faults depending on whether the origin of the fault is the agent itself or another agent. Based on these faults, and the test cases obtained from the evolutionary testing, we have tested the weather system, and we have obtained the results for the weatherAgent shown in Figs. 13 and 14. Fig. 13 shows the failure detection curve during the execution of the weatherAgent. It has detected faults at three levels, i.e. internal and external failures (levels of 1 and 2 in the figure, respectively), and errors (level of 3 in the figure). The width of steps in the figure denotes the average time before next failure is detected. Internal and external failures are caused by incorrect ACL message structure, e.g. TTCN-3 templates are violated, unknown performative is received, or invalid/insufficient information is provided. Besides for their functional behaviour based on the incoming message receipts and goal invocation, agent units have

100 Goal Processor

90 80

model time

70 60 50 40 30 20 10 20

40

60

80

100

120

140

160

180

200

goal counter Fig. 11. Average time occupation per goal in a Goal Processor unit.

25 Average Goal Processor time

time per goal

20

15

10

5

0 90

92

94

96

98

confidence interval level Fig. 12. 90%, 95% and 99% confidence interval levels for the average time occupation per goal.

100

M. Bagić Babac, D. Jevtić / Neurocomputing 146 (2014) 230–248

4

243

Failure detection

3.5

failure unit

3 2.5 2 1.5 1 0.5 0

0

1000

2000

3000

4000

5000

6000

model time Fig. 13. Time intervals between action, plan and goal failure detection.

Failure detection

failure unit

goal

plan

action

0

1000

2000

3000

4000

5000

model time Fig. 14. Action, plan and goal failure detection.

Verdict calculation

verdict

pass

inconc

fail

0

1000

2000

3000

4000

5000

model time Fig. 15. Final verdict calculation.

been also tested for coverage. An action is tested to determine does the action get handled, which is indicated by an action state and verdict. A plan is tested for whether the plan gets triggered in at least some situations, does the plan complete, and does the plan execute the actions that it should. These plan tests are based on a plan state and verdict, and an action verdict. Similarly, a goal is tested for whether the goal gets invoked in at least some situations, does

the goal complete, and does the goal invoke the plans that it should. These goal tests are based on a goal verdict and state, and a plan verdict. These faults are related to the goal/plan/action unit in which they occur. The illustration of the fault detection depending on the unit location is shown in Fig. 14. The points represent the moments at which a particular unit has detected a failure. This

244

M. Bagić Babac, D. Jevtić / Neurocomputing 146 (2014) 230–248

measurement is enabled due to verdict calculation of each goal/ plan/action unit while changing its life cycle states. The verdict calculation at the agent level obtained during the simulation is shown in Fig. 15. The points in the graph indicate moments when a verdict calculation has occurred, and what result the calculation has provided. The agent verdict is calculated each time a (sub)verdict of a certain agent unit changes. Therefore, we have obtained numerous moments of the verdict calculation. Also, the most dominant verdict is inconc value, since the verdict is calculated many times before the final verdict is obtained, and the inconc value is the most frequent intermediate value by default. 6. Conclusion and future work The contribution of this paper is a new agent specification and test language based on TTCN-3. The AgentTest methodology introduces a clear separation of test and implementation agent codes. The test code is based on different kinds of unit verdicts from which the overall test verdict is obtained. The testing is directed via TTCN-3 send/receive operations and test oracles are generated automatically from the message templates as the builtin TTCN-3 matching mechanism provides a powerful means of receiving messages of various kinds. Based on ACL message exchange among agents, we can follow the states and verdicts of test units tracing the agent execution. In addition, we have extended the AgentTest methodology with an evolutionary testing methodology in order to automate test case generation. The NSGA-II algorithm for test case generation has also provided dynamically changing environment, so that agent autonomy feature is tested intensively. The overall AgentTest methodology is verified by timed coloured Petri nets using the example of a weather agent-based system. Although presented at a unit level of testing, AgentTest language is aimed at writing abstract test suites at a wider range of testing levels, i.e. from unit and agent testing to integration and system testing of an agent-based system, which is to be considered for future work. In addition, to bridge the gap between the test agents and their implementation, runtime and control interfaces for AgentTest should be introduced. Acknowledgement This work was carried out as part of research project 0360362027-1640 ‘Knowledge-based network and service management’, supported by the Ministry of Science, Education and Sports of the Republic of Croatia.

Appendix A. FIPA ACL message structure in TTCN-3/AgentTest

type record ACL { Performative performative, AgentID senderID optional, AgentSequence receiver optional, AgentID replyTo optional, Content content optional, Language languageL optional, Encoding encoding optional, Ontology ontology optional, Protocol protocol optional, ConversationID conversationID optional, ReplyWith replyWith optional,

InReplyTo inReplyTo optional, ReplyBy replyBy optional }; type enumerated Performative { e_accept_proposal, e_agree, e_cancel, e_call_for_proposal, e_confirm, e_disconfirm, e_failure, e_inform, e_inform_if, e_inform_ref, e_not_understood, e_propagate, e_propose, e_proxy, e_query_if, e_query_ref, e_refuse, e_reject_proposal, e_request, e_request_when, e_request_whenever }; type record AgentID { charstring name }; type record of AgentID AgentSequence; type charstring Language ("FIPA-SL-00008", "FIPA-CCL-00009", "FIPA-KIF-00010", "FIPA-RDF-00011"); type charstring Encoding; type charstring Ontology; type charstring Protocol; type integer ConversationID; type charstring ReplyWith; type charstring InReplyTo; type charstring ReplyBy;

Appendix B. Goals and plans of the weatherAgent specification in AgentTest

module WeatherAgent { type agentport RequestWeatherInfo { in ACL; } type agentport ResponseWeatherInfo { out ACL; } type port RequestServer { out Request; } type port ResponseServer { in Response; in WeatherData; } type agentcomponent WeatherAgent { port RequestServer requestServer; port ResponseServer responseServer; agentport RequestWeatherInfo requestWeather; agentport ResponseWeatherInfo responseWeather; } type goal GetWeatherInfo {

M. Bagić Babac, D. Jevtić / Neurocomputing 146 (2014) 230–248

EstablishConnectionToServer goal_establishConnection, GetData goal_getData } type goal EstablishConnectionToServer { SendRequest goal_sendRequest, ReceiveResponse goal_receiveResponse } type goal SendRequest { plan_initiate } type goal ReceiveResponse { plan_establish } type goal GetData { plan_get } type goal MaintainWeatherInfo { UpdateInfo goal_update, RemoveObsoleteInfo goal_remove, EditInfo goal_edit } type goal UpdateInfo { LocateChanges goal_locateChanges, AddChanges goal_addChanges } type goal LocateChanges { plan_search } type goal AddChanges { plan_change } type goal RemoveObsoleteInfo { LocateObsolete goal_locateObsolete, RemoveObsolete goal_removeobsolete } type goal LocateObsolete { plan_locate } type goal RemoveObsolete { plan_remove } type goal EditInfo { LocateEditable goal_locateEditable } type goal LocateEditable { plan_edit } type goal DisseminateWeatherInfo { ReceiveInfoRequest goal_receive, InfoRespond goal_respond } type goal ReceiveInfoRequest { plan_receive } type goal InfoRespond { AnalyzeRequest goal_analyzeRequest, DecideResponse goal_decideResponse, SendResponse goal_sendResponse

} type goal AnalyzeRequest { plan_analyze } type goal DecideResponse { plan_decide } type goal sendResponse { plan_respond } plan plan_initiate { action_requestConnection(); } plan plan_establish { var Response v_response; action_connectToServer (responseServer.receive (Response:?)-4 value v_response); } plan plan_get { var WeatherData v_weatherData, v_data; action_getDataFromServer (responseServer.receive (WeatherData:?)- 4 value v_data); } plan plan_search { action_findChanges(); } plan plan_change { action_addChanges(); } plan plan_locate { action_scanCurrentInfo() and action_locateObsoleteInfo(); } plan plan_remove { action_removeObsoleteInfo(); } plan plan_edit { action_editUpdatedInfo(); } plan plan_receive { var WeatherRequest v_requestInfo; action_receiveInfoRequest (requestWeather.receive(ACL:?)-4 value v_requestInfo); } plan plan_analyze { action_analyzeRequest(); } plan plan_decide { action_decideResponse(); } plan plan_respond { action_respondInfoRequest(); } action action_requestConnection() { requestServer.send(m_request); } action action_connectToServer(responseServer. receive(Response:?)-4 value v_response) { f_connect(v_response); }

245

246

M. Bagić Babac, D. Jevtić / Neurocomputing 146 (2014) 230–248

action action_getDataFromServer(responseServer. receive(WeatherData:?)-4 value v_data) { v_weatherData: ¼ v_data; } action action_receiveInfoRequest(requestWeather. receive(ACL:?)- 4 value v_requestInfo) { if (match(v_requestInfo, m_requestInfo) { f_receiveWeather(v_requestInfo); } } action action_respondInfoRequest() { responseWeather.send (m_weatherInfo); } action action_findChanges() { f_find(); } action action_addChanges() { f_add(); } action action_scanCurrentInfo() { f_scan(); } action action_locateObsoleteInfo() { f_locate(); } action action_removeObsoleteInfo() { f_remove(); } action action_editUpdatedInfo() { f_edit(); } action action_analyzeRequest() { f_anaylze(); } action action_decideResponse() { f_decide(); } }

Due to the large amount of space that they take up, the consumption templates and functions have been omitted from the above weatherAgent specification. The full code is available from the authors upon request. The templates are based on the data types in Appendices A and C.

Appendix C. TTCN-3/AgentTest data types for ACL test oracles The following types are provided to support FIPA Semantic Language SL1 – Propositional Form. Templates based on these types are test oracles for inter-agent ACL communication. type type type type

charstring FunctionSymbol; charstring PropositionSymbol; charstring PredicateSymbol; union NumericalConstant { integer integerConst, float floatConst

}; type union Constant { NumericalConstant numConst,

charstring stringConst }; type union Term { Constant termConst, TermSet termSet, TermSequence termSequence, ActionExpression actionExpression, FunctionalTerm functionalTerm }; type record ActionExpression { Agent agent, Term term }; type Term Agent; type Term ParameterValue; type set of Term TermSet; type record of Term TermSequence; type record Parameter { charstring parameterName, ParameterValue parameterValue optional }; type record of Parameter ParameterSequence; type set of Parameter ParameterSet; type record FunctionalTermT { FunctionSymbol functionSymbol, TermSequence terms optional }; type record FunctionalTermP { FunctionSymbol functionSymbol, ParameterSequence parameters optional }; type union FunctionalTerm { FunctionalTermT termT, FunctionalTermP termP }; type enumerated ActionOp { Done }; type record Result { TermSequence terms }; type record Predicate { PredicateSymbol predicate, TermSequence terms } type union AtomicFormula { PropositionSymbol proposition, Result result, Predicate predicate, BooleanAtom atom }; type record ActionOpExpression { ActionOp operation, ActionExpression expression UnaryLogicalWff unaryLogicalWff, BinaryLogicalWff binaryLogicalWff };

M. Bagić Babac, D. Jevtić / Neurocomputing 146 (2014) 230–248

type boolean BooleanAtom; type union Wff { ActionOpExpression actionOp, AtomicFormula atomicFormula, }; type Wff Proposition; type record of Wff WffSequence; type union ContentExpression { ActionExpression actionExpression, Proposition proposition }; type record of ContentExpression ContentExpressionSequence; type ContentExpressionSequence Content; type charstring UnaryLogicalOp ("not"); type charstring BinaryLogicalOp ("and", "or"); type record UnaryLogicalWff { UnaryLogicalOp operation, Wff wff }

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

type record BinaryLogicalWff { BinaryLogicalOp operation, WffSequence wffSequence optional }

[21]

[22]

[23]

References [1] C. Willcock, T. Deiss, S. Tobies, S. Keil, F. Engler, S. Schulz, An Introduction to TTCN-3, second ed., John Wiley & Sons Ltd, Chichester, England, 2011. [2] ETSI, Methods for Testing and Specifcation (MTS), The Testing and Test Control Notation version 3, Part 1 TTCN-3 Core Language, ETSI ES 201 873-1, V 4.5.1, 2013. [3] ETSI, Methods for Testing and Specifcation (MTS), The Testing and Test Control Notation version 3, Part 4 TTCN-3 Operational Semantics, ETSI ES 201 873-4, V 4.4.1, 2012. [4] B. Stepien, P. Xiong, L. Peyton, Using TTCN-3 as a modeling language for web penetration testing, in: Proceedings of the IEEE International Conference on Industrial Technology (ICIT), IEEE Computer Society Press, 2012, pp. 674–681. doi: http://dx.doi.org/10.1109/ICIT.2012.6210016. [5] P. Xiong, B. Stepien, L. Peyton, Model-based penetration test framework for web applications using TTCN-3, in: G. Babin, P. Kropf, M. Weiss (Eds.), E-Technologies: Innovation in an Open World, LNBIP 26, Springer-Verlag, Berlin Heidelberg, 2009, pp. 141–154. [6] R.L. Probert, P. Xiong, B. Stepien, A. Life-cycle e-commerce testing with OO-TTCN-3, in: M. Núñez, Z. Maamar, F.L. Pelayo, K. Pousttchi, F. Rubio (Eds.), Applying Formal Methods: Testing, Performance, and M/E-Commerce, FORTE'04 Workshops Proceedings, LNCS 3236, Springer-Verlag, Berlin Heidelberg, 2004, pp. 16–29. [7] A. Wu-Hen-Chang, G. Adamis, L. Erős, G. Kovács, T. Csöndes, A new approach in model-based testing: designing test models in TTCN-3, in: I. Ober, I. Ober (Eds.), Proceedings of the 15th international conference on Integrating System and Software Modeling, LNCS 7083, Springer-Verlag, Berlin Heidelberg, 2011, pp. 90–105. [8] B. Kumar, B. Czybik, J. Jasperneite, Model based TTCN-3 testing of industrial automation systems – First results, in: Proceedings of the 16th Conference on Emerging Technologies & Factory Automation (ETFA), IEEE Computer Society Press, 2011, pp. 1–4. http://dx.doi.org/10.1109/ETFA.2011.6059146. [9] I. Schieferdecker, G. Din, A Meta-model for TTCN-3, in: M. Núñez, Z. Maamar, F.L. Pelayo, K. Pousttchi, F. Rubio (Eds.), Applying Formal Methods: Testing, Performance, and M/E-Commerce, FORTE'04 Workshops Proceedings, LNCS 3236, Springer-Verlag, Berlin Heidelberg, 2004, pp. 366–379. [10] G. Din, TTCN-3, Model-Based Testing of Reactive Systems: Advanced Lectures, LNCS 3472, Springer-Verlag, Berlin Heidelberg (2005) 465–496. [11] L. Padgham, Z. Zhang, J. Thangarajah, T. Miller, Model-based test oracle generation for automated unit testing of agent systems, IEEE Trans. Softw. Eng. 39 (9) (2013) 1230–1244, http://dx.doi.org/10.1109/TSE.2013.10. [12] A.S. Rao, AgentSpeak(L): BDI agents speak out in a logical computable language, in: W. Van de Velde, J.W. Perram (Eds.), Proceedings of the 7th

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

247

European Workshop on Modelling Autonomous Agents in a Multi-Agent World, MAAMAW ‘96: Agents breaking away, LNCS 1038, Springer-Verlag, New York, 1996, pp. 42–55. Z. Zhang, J. Thangarajah, L. Padgham, Automated unit testing for agent systems, in: Proceedings of the 2nd International Working Conference on Evaluation of Novel Approaches to Software Engineering, ENASE 2007, Barcelona, Spain, 2007, pp. 10–18. Z. Zhang, J. Thangarajah, L. Padgham, Model based testing for agent systems, in: K.S. Decker, J.S. Sichman, C. Sierra, C. Castelfranchi (Eds.), Proceedings of the Eighth International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2009, vol. 2, 2009, pp. 1333–1334. Z. Zhang, J. Thangarajah, L. Padgham, Automated testing for intelligent agent systems, in: Proceedings of the 11th International Conference on AgentOriented Software Engineering, AOSE'10, Springer-Verlag, Berlin Heidelberg, 2010, pp. 66–79. D. Poutakidis, M. Winikoff, L. Padgham, Z. Zhang, Debugging and testing of multi-agent systems using design artefacts, in: R.H. Bordini, M. Dastani, S.A. El Fallah Seghrouchni (Eds.), Multi-Agent Programming, Languages, Tools and Applications2009, pp. 215–258. S. Wang, H. Zhu, CATest: a test automation framework for multi-agent systems, in: X. Bai et al. (Eds.), Proceedings of the 36th Annual Computer Software and Applications Conference (COMPSAC 2012), IEEE Computer Society, 2012, pp. 148–157. E.E. Ekinci, A.M. Tiryaki, O. Cetin, O. Dikenelli, Goal-Oriented Agent Testing Revisited, in: M. Luck, J.J. Gomez-Sanz, (Eds.), Proceedings of the 9th International Workshop on Agent-Oriented Software Engineering (AOSE'08), LNCS 5386, Springer, Heidelberg, 2008, pp. 85–96. G. Caire, M. Cossentino, A. Negri, A. Poggi, P. Turci, Multiagent Systems Implementation and Testing, in: Proceedings of the 4th from Agent Theory to Agent Implementation Symposium, 2004, pp. 14–16. C. Low, T.Y. Chen, R. Ronnquist, Automated test case generation for BDI agents, Auton. Agent Multi-Agent Syst. 2 (4) (1999) 311–332. T. Miller, L. Padgham, J. Thangarajah, Test coverage criteria for agent interaction testing, in: Proceedings of the 11th International Conference on Agent-Oriented Software Engineering, AOSE'10, Springer-Verlag, Berlin, 2011, pp. 91–105. C.D. Nguyen, A. Perini, C. Bernon, J. Pavón, J., Thangarajah, Testing in multi-agent systems, Agent-Oriented Software Engineering X, in: Proceedings of the 10th International Conference on Agent-Oriented Software Engineering, AOSE'10, Revised Selected Papers, LNCS 6038, Springer, Berlin, Heidelberg, 2011, pp. 180–190. C.D. Nguyen, A. Perini, P. Tonella, Goal-oriented testing for MASs, Int. J. AgentOriented Softw. Eng. 4 (1) (2010) 79–109. A. Tiryaki, S. Oztuna, O. Dikenelli, R. Erdur, SUNIT: a unit testing framework for test driven development of multiagent systems, in: Proceedings of the AgentOriented Software Engineering, AOSE'06, LNCS 4405, Springer Verlag, Berlin, Heidelberg, 2006, pp. 156–173. M. Núñez, I. Rodríguez, F. Rubio, Testing of Autonomous Agents Described as Utility State Machines, Applying Formal Methods: Testing, Performance, and M/E-Commerce, LNCS 3236 (2004) 322–336. M. Núñez, I. Rodríguez, F. Rubio, Specification and testing of autonomous agents in e-commerce systems, Softw. Test. Verification Reliab. 15 (4) (2005) 211–233, http://dx.doi.org/10.1002/stvr.323. R. Coelho, U. Kulesza, A. von Staa, C. Lucena, Unit testing in multi-agent systems using mock agents and aspects, in: Proceedings of the 2006 International Workshop on Software Engineering for Large-scale Multi-agent Systems, SELMAS 2006, ACM Press, New York, 2006, pp. 83–90. R. Coelho, E. Cirilo, U. Kulesza, A. von Staa, A. Rashid, C. Lucena, JAT: A test automation framework for multi-agent systems, in: Proceedings of the IEEE International Conference on Software Maintenance, ICSM'07, IEEE Computer Society Press, 2007, pp. 425–434. http://dx.doi.org/10.1109/ICSM.2007.4362655. J.J. Gómez-Sanz, J. Botía, E. Serrano, J. Pavón, Testing and debugging of MAS interactions with INGENIAS. in: M. Luck, J.J. Gomez-Sanz (Eds.), in: Proceedings of the 9th International Workshop on Agent-Oriented Software Engineering, AOSE 2008, LNCS 5386, Springer, Heidelberg, 2009, pp. 199–212. C.D. Nguyen, S. Miles, A. Perini, P. Tonella, M. Harman, M. Luck, Evolutionary testing of autonomous software agents, in: Proceedings of the Eighth International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2009, 2009, pp. 521–528. C.D. Nguyen, A. Perini, P. Tonella, Ontology-based test generation for multi agent systems, in: Proceedings of the 7th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2008, 2008, pp. 1315–1318. D.N. Lam, K.S. Barber, Debugging agent behavior in an implemented agent system, in: R.H. Bordini, et al., (Eds.), PROMAS 2004. LNCS (LNAI) 3346, Springer, Heidelberg, 2005, pp. 104–125. L. Gardelli, M. Viroli, A. Omicini, On the role of simulations in the engineering of self-organising MAS: the case of an intrusion detection system in TuCSoN. in: Proceedings of the 3rd International Workshop Engineering SelfOrganising Applications, 2005, pp. 161–175. G. Fortino, A. Garro, W. Russo, R. Caico, M. Cossentino, F. Termine, Simulationdriven development of multi-agent systems, in: Workshop on Multi-Agent Systems and Simulation, Palermo, Italia, 2006. E. Serrano, J.A. Botia, Infrastructure for forensic analysis of multi-agent systems, in: K.V. Hindriks, A. Pokahr, S. Sardina (Eds.), ProMAS 2008, LNCS 5442, Springer, Heidelberg, 2009, pp. 168–183. J. Sudeikat, W. Renz, A systemic approach to the validation of self–organizing dynamics within MAS, in: M. Luck, J.J. Gomez-Sanz (Eds.), Proceedings of the

248

[37]

[38]

[39]

[40]

[41] [42] [43]

[44]

[45] [46]

[47]

[48] [49] [50] [51] [52]

[53]

M. Bagić Babac, D. Jevtić / Neurocomputing 146 (2014) 230–248

9th Internatinal Workshop on Agent-Oriented Software Engineering, AOSE 2008, LNCS 5386, Springer, Heidelberg, 2009, pp. 31–45. T. De Wolf, G. Samaey, T. Holvoet, Engineering self-organising emergent systems with simulation-based scientific analysis, in: S. Brueckner, et al., (Eds.), Proceedings of the Third International Workshop on Engineering SelfOrganising Applications, Utrecht, Netherlands, 2005, pp. 146–160. N. Mani, V. Garousi, B.H. Far, Testing multi-agent systems for deadlock detection based on UML models, in: Proceedings of the 14th International Conference on Distributed Multimedia Systems, DMS 2008, September 4–6, Knowledge Systems Institute, 2008, pp. 77–84. H.-S. Seo, T. Araragi, Y.R. Kwon, Modeling and Testing Agent Systems Based on Statecharts, Applying Formal Methods: Testing, Performance, and M/E-Commerce, LNCS 3236 (2004) 308–321. C.D. Nguyen, A. Perini, P. Tonella, eCAT: a tool for automating test cases generation and execution in testing multi-agent systems, in: Proceedings of AAMAS'08, 2008, pp. 1669–1670. M. Wooldridge, An Introduction to MultiAgent Systems, John Wiley & Sons Ltd, Chichester, England, 2002. V.S. Subrahmanian, P.A. Bonatti, J. Dix, T.R. Eiter, S. Kraus, F. Ozcan, R.B. Ross, Heterogeneous Agent Systems, MIT Press, Cambridge, London, 2000. Robby, S.A. DeLoach, V.A. Kolesnikov, Using design metrics for predicting system flexibility, in: L. Baresi, R. Heckel (Eds.), in: Proceedings of the 9th International Conference on Fundamental Approaches to Software Engineering, FASE 2006, Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2006, LNCS 3922, Springer, 2006, pp. 184–198. S.A. DeLoach, Multiagent systems engineering of organization-based multiagent systems, in: Proceedings of the 4th International Workshop on Software Engineering for Large-Scale Multi-Agent Systems, SELMAS'05, LNCS 3914, Springer, 2006, pp 109–125. K.H. Dam, M. Winikoff, Comparing agent-oriented methodologies, agentoriented information systems, LNCS 3030 (2004) 78–93. M. Morandini, C.D. Nguyen, A. Perini, A. Siena, A. Susi, Tool-supported development with tropos: the conference management system case study, agentoriented software engineering VIII, LNCS 4951, Springer, 2008, pp. 182–196. M. Ebner, TTCN-3 Test Case Generation from Message Sequence Charts, Workshop on Integrated-reliability with Telecommunications and UML Languages ISSRE04, WITU, 2004, pp. 1–17. IEEE FIPA, FIPA Agent Management Specification, FIPA TC Communication, No. SC00023J, 2002, 〈http://www.fipa.org/ specs/fipa00023/〉, April, 2014. IEEE FIPA, FIPA SL Content Language Specification, FIPA TC Communication, No. SC00008I, 2002, 〈http://www.fipa.org/ specs/fipa00008/〉, April 2014. NetBeans homepage, 〈http://netbeans.org/〉, April 2014. IEEE FIPA, FIPA ACL Message Structure Specification, FIPA TC Communication, no. SC00061G, 2002, 〈http://www.fipa.org/ specs/fipa00061/〉, April 2014. ETSI, Methods for Testing and Specifcation (MTS), The Testing and Test Control Notation version 3, Part 5: TTCN-3 Runtime Interface (TRI), ETSI ES 201 873-5, V 4.5.1, 2013. ETSI, Methods for Testing and Specifcation (MTS), The Testing and Test Control Notation version 3, Part 6: TTCN-3 Control Interface (TCI), ETSI ES 201 873-6, V 4.5.1, 2013.

[54] F.L. Bellifemine, G. Caire, D. Greenwood, Developing Multi-Agent Systems with JADE, Wiley Series in Agent Technology, John Wiley & Sons Ltd, Chippenham, Wiltshire, 2007. [55] Loong Testing homepage, 〈http://loong-testing.software.informer.com/〉, April 2014. [56] C.D. Nguyen, S. Miles, A. Perini, P. Tonella, M. Harman, M. Luck, Evolutionary testing of autonomous software agents, Auton. Agents Multi-Agent Syst. 25 (2) (2012) 260–283. [57] K. Deb, A. Pratap, S. Agarwal, T. Meyarivan., A fast elitist multi-objective genetic algorithm: NSGA-II, IEEE Trans. Evolut. Comput. 6 (2) (2002) 182–197. [58] A. Seshadri, A fast elitist multiobjective genetic algorithm: NSGA-II, MATLAB Central (2006). [59] N. Srinivas, K. Deb, Multiobjective optimization using nondominated sorting in genetic algorithms, Evolut. Comput. 2 (2) (1994) 221–248. [60] K. Jensen, L.M. Kristensen, Coloured Petri Nets, Springer-Verlag, Berlin Heidelberg, 2009. [61] M. Purvis, M.S. Cranefield, Agent modelling with Petri nets, Information Science Discussion Papers Series, University of Otago, 1996. [62] CPN Tools homepage, 〈http://cpntools.org/〉, April 2014.

Marina Bagić Babac works as a teaching and research assistant at the University of Zagreb, Faculty of Electrical Engineering and Computing, where she obtained her Dipl. Ing., M.Sc. and Ph.D. degrees in Computer Science in 2001, 2004 and 2009, respectively. She also holds an M.Sc. in Journalism from the University of Zagreb, Faculty of Political Sciences. Her specific field of research is formal methods for the specification and verification of software systems. She is a member of IEEE Communications Society and KES International.

Dragan Jevtić is a Professor at the University of Zagreb, Faculty of Electrical Engineering and Computing, Department of Telecommunications, from where he obtained Dipl. Ing., M.Sc. and Ph.D. degrees in 1984, 1987 and 1997, respectively. His research interests are knowledge-based and intelligent systems with application in telecommunication services. He is a member of the IEEE Communications Society and KES International.