Computer Standards & Interfaces 28 (2006) 412 – 427 www.elsevier.com/locate/csi
Iterative automatic test generation method for telecommunication protocols G. Kova´cs*, Z. Pap, G. Csopaki, K. Tarnay Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, H-1117, Magyar tudo´sok u´tja 2, Budapest, Hungary Received 8 February 2005; received in revised form 26 April 2005; accepted 26 April 2005 Available online 23 June 2005
Abstract Standardized languages used for protocol specification provide an excellent basis for both automatic and manual test generation. Test generation is composed of two steps, the derivation of test cases from the specification, and the selection of the test cases to be included in the final test suite in order to reduce its execution time. This article proposes a new method that aims to decrease the total number of test cases generated automatically by a test derivation algorithm and at the same time to reduce the computation requirements for the test selection procedure. It creates an iteration cycle on the model of evolutionary algorithms, where the test derivation and selection are done simultaneously. In each cycle first a bsmallQ test suite is derived then optimized, evaluated and finally compared to the best suite so far. It is kept as the best suite, if it is found better according to some well-defined evaluation criteria and test suite metrics. This iteration condition is based on the test selection criteria. The article presents an experiment where iterative algorithms are compared to two simple test derivation methods from different aspects. D 2005 Elsevier B.V. All rights reserved. Keywords: Conformance testing; Telecommunication protocols; Test generation
1. Introduction Telecommunication protocols are sets of rules governing the interaction between two or more commu* Corresponding author. Tel.: +36 1 463 2225; fax: +36 1 463 3107. E-mail addresses:
[email protected] (G. Kova´cs),
[email protected] (Z. Pap),
[email protected] (G. Csopaki),
[email protected] (K. Tarnay). 0920-5489/$ - see front matter D 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.csi.2005.04.002
nicating entities. A protocol determines the syntactical, semantic and temporal rules of the cooperation. An important role of telecommunication protocols is to provide compatibility of systems from different vendors. This is usually achieved by means of unambiguous specifications developed by standardization bodies. Standardized specification languages from ITU-T (International Telecommunication Union), like Specification and Description Language (SDL)-ITU-T Z.100 [1], Message Sequence Chart
G. Kova´cs et al. / Computer Standards & Interfaces 28 (2006) 412–427
(MSC) [2]-ITU-T Z.120, and from ISO (International Organization for Standardization), like Language of Temporal Ordering Specifications (LOTOS) [3], support the development of formal protocol specifications. Conformance testing, ITU-T X.29x [4] is also a standardized methodology, which provides the means to check whether the implementation works as it is required by the specification. Conformance testing is not intended to be exhaustive, and a successfully passed test suite does not imply a 100% guarantee. But it does ensure, with a reasonable degree of confidence, that the implementation is consistent with its specification. The easiest way to increase the level of confidence that the implementation under test (IUT) conforms to the specification is to add more and longer test cases to a test suite. However, as the size of the test suite grows also the execution time increases. To reduce this time requirement more compact test suites must be generated manually or automatically. The manual way of test case development is rather time consuming and requires the effort of many human experts. Though, the automatic way is much faster, increasing the level of confidence of the generated test suite significantly increases the size of the test suite and therefore the execution time. Therefore, the automation of the test generation and the optimization of these contradictory parameters, execution time, and level of confidence, are together an important challenge. 1.1. Related work In the last decades, major work has been done in the field of automatic test generation. There are now
several methods generating test cases from various types of formal models, like finite state machines (FSM) (there is a great summary by Lee and Yiannakakis in [5] or by Bourhfir et al. in [6]) or input output transition systems (IOTS) [7], and complex real-time systems (e.g. the paper of Bucci et al. [8]). A common drawback of these methods is that in case of complex system specifications they result in enormous number of test cases, and thus impose unacceptable time requirements for the actual testing. To reduce the size of these test suites, that is, to drop the redundant or inadequate test cases, different test selection methods have been proposed. Cso¨ndes and Kotnyek [9] and Williams and Probert [10] proposed an integer programming method to select an optimal test suite using test selection criteria. The test selection criteria can be obtained from the estimated execution cost [9], or the distance of test cases as Feijs et al. proposed in [11], or by means of a fault model [12,13]. However, in case of large test suites these techniques are computation intensive [14], finding the optimal suite is extremely time consuming, even in case of a suite of few thousands of test cases it may last for many hours, even for days. The aim is to reduce this total time, and make the formal languages based automatic test generation a viable choice. 1.2. Objectives This article proposes a novel method for the test generation. Test generation is considered to be composed of two separate tasks: test derivation and selection. The derivation derives a test suite from the specification, the selection makes that suite more compact. Instead of a single test derivation and a Test Generation Cycle
Formal Specification
413
Test Generation
Implementation Fig. 1. Introducing and iterative cycle in the test generation.
Test Suite
414
G. Kova´cs et al. / Computer Standards & Interfaces 28 (2006) 412–427
separate selection step, an iteration cycle is constructed. Fig. 1 shows the process. This technique reduces the total cost of test generation by deriving small number of test cases, and selecting the adequate ones in each cycle. In each iteration cycle, first the actual test suite is modified, and then reduced. Based on the test selection criteria, it is possible to define a metrics for test suite evaluation and compare two test suites: the new with the one of the previous iteration cycle, and get a better suite. The rest of the article is organized as follows. Section 2 introduces the basic notations and definitions used concerning conformance testing, test derivation from finite state machines, fault models and test selection. In Section 3 iterative test generation methods are proposed that use fault based test selection criteria. Then, in Section 4 results of test generation experiments are presented: different test derivation methods are compared through sample and standardized real-life protocols from some aspects. Finally, a short summary is given and the possible extensions are proposed.
2. Preliminaries 2.1. Conformance testing The purpose of conformance testing [4] is to find out if an implementation complies with its specification, whether the implementation under test (IUT) is the implementation of the specification. Service and functional behavior is tested in order to find logical errors and prerequisites for interoperability. It is black box testing method, i.e., the internal operations of the implementation are not known for the tester. Complete information is available about a specification, and only the input output behavior of the black box IUT can be observed. The conformance testing is done by means of test suites. A test suite is built up from a hierarchy of test components. The test event is the smallest, indivisible unit of a test suite. Typically, it corresponds to sending or receiving a message and the operations for manipulating timers. The test step is a grouping of test events, similar to a subroutine or procedure in other programming languages. The test case is the fundamental building block in a test suite. A test
case tests a particular feature or function in the IUT. A test case has an identified test purpose and it assigns a verdict that depends on the outcome of the test case. A test group is simply a grouping of test cases. A test suite can range from a large number of test groups and test cases to a single test event contained in a test case. 2.2. Protocol specifications Protocol specifications are given either informally or using standardized formal description techniques, which are excellent means for system modelling, and provide the basis for validation and automatic test generation. ITU’s Specification and Description Language (SDL) is a widely accepted formal language for the description of interactive systems. SDL’s behavior description originates from the finite state machine (FSM) model. The FSM model is used in different fields of engineering problems, e.g. for modelling sequential circuits. An FSM M can be given by a quintuple: M = (S, I, O, d, k), where S, I and O are the finite and nonempty sets of states, input and output events (incoming and outgoing messages from the point of view of the protocol), respectively. The d is the transition function: d : S I Y S that determines the next state based on a state-input pair. The k is the output function: k : S I Y O that determines the output event based on a state-input pair. Note that in this definition the output function is allowed to generate a sequence of output events of arbitrary finite length. We assume that the machine is deterministic, that is, for each state-input pair there is exactly one next state and always the same output event or events are generated. We also assume that the specification of the machine is complete, that is, for each state-input pair the next state exists and output is produced. However, it is possible that the next state is the same as the actual state and it is possible that bnullQ output event is generated. A machine can be represented by state tables or state graph, Fig. 2 shows an example. We extend the transition and output functions to input sequences. Let the input string x = i 1, . . ., i k take M through the states s j + 1 = d(s 1, i j ) = s 1, . . ., s k , where j = 1, . . ., k, and produce the output sequences k(s 1, x) = y 1, . . ., y k , where kaN; kbl.
G. Kova´cs et al. / Computer Standards & Interfaces 28 (2006) 412–427
a/ s1 b/2
s2
b/1
b/2
s3
a/2 a/1 Fig. 2. State graph of a sample FSM.
A test case t can be constructed based on the input and output strings: t = i 1 , y 1, . . ., i k , y k . Let TS be a test suite in the rest of the article. Let length of test case t a TS be the number of input and output events it contains: n = length(t). Let the first l input and output events of t be denoted by substring(t, l). Let t[l] denote the lth test event of t. 2.3. Automatic test derivation from finite state machines There are numerous FSM based test derivation methods: the D-, U-, W-methods, transition tour, and random walk based algorithms. A detailed discussion of these methods can be found in [5]. The next simple FSM based automatic test derivation algorithm (Algorithm 2.1) traverses successive transitions of a machine at random. Its input is an FSM, and it outputs a test case. It offers an arbitrary input for the machine and then it accepts an output or output string from it. This procedure is applied until a certain stop condition (e.g. a certain number of transitions is reached) is fulfilled, and from the inputs and outputs, a test case is constructed. Algorithm 2.1 (Simple random trace derivation). INPUT: FSM M, stop condition OUTPUT: a test case t VARIABLES: s and s’: are states of M i: is an input for M o: is an output of M y: is an output string produced by M Obtain t from M, let the initial state be s = s 0 : 1. Select an input i a I of the machine M.
415
2. Apply the next state function s’ = d(s, i). 3. Add the input i to the end of the test case t. 4. Pick one of the following two possibilities at random: (a) Add the y = k(s, i) = o 1, . . ., o k output string produced by M to the end of test case t, and let the next state be t: =s’ and go to step 1. (b) Add an arbitrary y output string to the end of t and stop. 5. If the stop condition is false, let the next state be s: =s’ and go to step 1. Otherwise stop. Example. Consider the machine of Fig. 2, where S = {s 1, s 2, s 3}, I = {a, b}, and O = {1, 2, e}, where e is the empty output string. Let its initial state be s 1. Let the procedure stop after two transitions. In the first cycle, let the selected input be i = b, t = b. Let the rule 4a be chosen: y = k(s 1, a) = 2, t = b2. The new state is s 2. In the second cycle, let the selected input be i = b, t = b2b. Let the rule 4b be chosen: y = k(s 1, a) = 1, t = b2b1. At this point, the stop condition is true. This test case t will only pass if there is an invalid transition from s 2 to s 3 in the implementation. The next algorithm constructs a test case by generating a random string of test events (input and output events). While in Algorithm 2.1 only the input events are chosen at random, in this case the whole test case is random. It is composed of two steps: first a random sequence t is obtained from {I [ O}, and then the number of correct events in t checked against M. Construct the input string x and the output string y from t = . . ., e j , . . . such that x = . . ., e j , . . ., where e j is an input test event and y = . . ., e j , . . ., where e j is an output test event, where 0 b j V length(t). The sequences x and y store the input and output events, respectively. To remove the bunusableQ part of the random sequence, two functions are introduced. The observation function obs(t, m) determines whether the machine m passes the test case t. This means, if m can produce the output string y of t prescribed by the input string x of t, m passes t, otherwise fails. The fails_after(t, m) function gives the number of test events, after which the test case t is observed to fail against the machine m.
416
G. Kova´cs et al. / Computer Standards & Interfaces 28 (2006) 412–427
Algorithm 2.2 (Random event sequence). INPUT: an FSM M OUTPUT: a test case t VARIABLES: e: is a test event (input or output) of M stop symbol: is an extra symbol indicating that the algorithm should return Repeat the next cycle: 1. Selectan event e from the union of the sets of input and output events including a stop symbol. 2. If the event e is the stop symbol, exit the cycle. 3. Else add the test event e to the end of test case t: t[length(t) + 1] w e. Example. Consider again the FSM in Fig. 2. The set of test events is E = a, b, 1, 2. Let the initial state be s 1. Let the sequence of events generated be: t = ab2. This random event sequence passed the specification: it traverses the loop around s 1 initiated by input a, where no output is produced, then it traverses the transition from s 1 to s 2 initiated by b outputting 2. The function fails_after(t, m) will return to 3. The next algorithm (Algorithm 2.3) creates all test event strings up to a certain length L. Its input is an FSM, and it outputs a set of test cases of the given length. The number of test cases increases exponentially as L increases. Algorithm 2.3 (All sequences of a certain length). INPUT: an FSM, a maximum length L OUTPUT: a test suite TS VARIABLES: e: is an input or output of M tand tV: are test cases Initialization: Let the test suite TS include an empty test case t of length(t) = 0. 1. Select a test case t from the test suite TS that has a shorter length than L: length(t) b L. 2. For each input and output event e do the following: (a) Add the test event e to the end of the test case t, and call it tV. (b) Add this new test case tVto the test suite TS. 3. Remove the test case t from test suite TS.
Example. Consider again the FSM in Fig. 2. The possible sequences of four events are: t 1 = aeb2, t 2 = b2a2, t 3 = b 2b 2, t 4 = a2b2, t 5 = b2a2, t 6 = b2a1, t 7 = b2b1, t 8 = b2ae, where e denotes the null output. 2.4. Mutation analysis and fault models Mutation analysis [15–17] is a fault-based testing method. The basic idea is that by applying small syntactic changes, or mutations, at atomic level to the system specification intentionally, faults are produced [18]. If a test suite can distinguish the implementation of a specification from the implementation of its slight variations, then it exercises the specification adequately. A mutation analysis system uses a set of mutation operators – a representation of the considered fault model – to create faulty systems, where each operator stands for a syntactic change type. A mutation operator is a function X : Sp Y F where Sp a M is the system specification, M is the set of all machines, and F p M q {Sp} is the set of faulty machines. There have been various fault models proposed for different formalisms. Models for finite state machines were proposed by Chow [19], Bochmann et al. [20] and Sidhu et al. [21]. An augmented fault model [12,17,13] is used in this article to generate a fault set. The FSM fault model is extended by the following operators: X Pred predicate modification operators, X Act action modification operators and X Var variable modification operators. 2.5. Fault-based test selection The derivation of test suites is considered to be only the first part of the test generation procedure. Test suites created using the algorithms above contain a large number of redundant test cases, and are, therefore, in practice not proper for testing systems due to the extreme time requirement. Test selection is essential to reduce the size of the suite and to drop the unnecessary test cases. In the context of this article, mutation analysis is applied to provide test selection criteria. An overview of this technique and detailed discussion of the fault model used in this article is in Ref. [12]. Mutant systems are used as test selection
G. Kova´cs et al. / Computer Standards & Interfaces 28 (2006) 412–427
criterion. Test cases are run against the original and faulty (mutant) systems. If inconsistency is detected in the behavior, that is, the two systems produce different outputs, the mutation used to create the faulty system is marked as a selection criterion. Selecting an optimal suite from an existing large suite using this method is, however, a very computation intensive task as the experiments show in Ref. [14]. A test selection criterion can be acquired by comparing the return values of the fails_after of a test case function when it is applied to the specification and to an erroneous system. In this paper, these return values are stored in a matrix, where rows represent the test cases and columns the test selection criteria, in this case the faults. Let D be a matrix of |TS| rows (the size of the test suite) and F + 1 columns (the number of faulty systems plus one) containing integer values, where the Dij the element at the ith row and jth column is equal to the return value of the fails_after function applied to the ith test case in the test suite TS and the jth faulty system. Let the 0th row contain the fails_after values of each test case for the original, correct specification. The test selection technique in Ref. [10] and in Ref. [12] is based on a Boolean matrix that can automatically be computed from D. If a faulty system has a different entry in D than the specification, that entry is marked true in the Boolean matrix. Let C be a matrix of |TS| rows (the size of the test suite) and F columns (the number of faulty systems) containing Boolean values (0 or 1). Cij is 1, if the ith test case in the test suite TS can detect the jth fault, that is, Di0 and Dij contain different values. Otherwise Cij is 0. This means that if the test case t i fails against the mutant system f j where it is expected to fail according to the specification Sp, then it does not detect the fault f j .
417
tionary algorithms to reduce the computation required for the selection process. An evolutionary algorithm [22] follows computational model of the biological evolution process. A bpopulationQ of structures is maintained by the algorithm. These structures evolve in cycles according to the rules given by means of selection, mutation and reproduction. In every cycle, the fitness of each element of the population is measured. In the selection process, the ones with the best fitness are considered. The selected population is mutated to provide a new population. In this article, instead of creating a large test suite and then selecting the adequate cases, iterative test generation algorithms develop adequate suites iteration by iteration. In each iteration cycle some test cases in the test suite are modified, replaced by new ones, or even new cases are added. Then it is checked whether the new suite is bbetterQ than the old one from some points of view. This decision can be based on the fault set used the selection procedure, but other iteration criteria may also be used as well. Fig. 3 shows the configuration of the iterative test generation procedure. Mutant systems are created from the system specification using a given fault model. This step is implemented in a fully automatic tool developed at the Budapest University of Technology and Economics [23]. This tool was written is Java and is based on ITU’s SDL [1] and MSC [2] languages, the system and mutant specifications are given in SDL, and test cases are represented in message sequence charts. In addition, input and output information are extracted from the specification, and passed to the test derivation procedure, which generates test cases for the test environment. The test
Original and Mutant Systems
Execute
Test Environment
Generation Parameters
Implementation
Test Cases
3. Iterative test generation algorithms 3.1. Evolutionary algorithms in test generation In this paper, we propose iterative automatic test generation algorithms based on the model of evolu-
Specification (SDL)
I, O
Test Generation Procedure (MSC)
Fig. 3. Functional architecture for fault based iterative test generation.
418
G. Kova´cs et al. / Computer Standards & Interfaces 28 (2006) 412–427
environment implements the iteration cycle detailed in the sections below. The total generation time is composed of the test case derivation time (t d ), the selection time (t s ) and the generation of the selection criteria (t m ), which is the creation of faulty, mutant systems. Iterative methods derive a small number of test cases (n i ) in every cycle, and intend to derive totally less than the previous, non-iterative method n i b n n , n i * N b n n , where N is number of cycles. However, here is no significant difference in test suite derivation time n i * N * t d ~ n n * t d , because the automatic test derivation executes very fast. The time needed to create mutant systems is the same for both the non-iterative and iterative methods. The difference is in the time requirement of the selection process. Selection algorithms can find the optimal test cases in a small test suite (t s (n i )) quickly, it is very time consuming for a large test suite (t s (n n )), that is N * t s (n i ) b t s (n n ), because t s (n i ) b t s (n n ). The gain is that N Tðni Ttd þ ts ðni ÞÞ þ tm bnn Ttd þ ts ðnn Þ þ tm The discussion of the iterative algorithm is separated into three parts: the definition of the iteration cycle (Section 3.2 and Section 3.3), the iteration condition (Section 3.4), and the modifications for the next cycle (Section 3.5). 3.2. Iterative test derivation The iteration cycle is constructed based on a set of faulty system and a break condition. Let F be a set of faulty systems derived from the specification Sp using a specific fault model (see Section 2.4), or other selection criteria must be given. Let stop be a break condition of the cycle that can be for example reaching a certain fault coverage level, iteration count (k = N), or exceeding a time limit. In the following, X[k] denotes the value of X in the kth iteration cycle. Algorithm 3.1 (Iterative test derivation). INPUT: F [ {Sp}, stop condition OUTPUT: TS VARIABLES: TS[k]: is the actual test suite in cycle k
D[k]: is a matrix of integer values in cycle k Initialization: Derive an initial test suite TS: = TS[0]. Construct D [0] based on TS[0] and F [ {Sp}, that is determined where the test cases in the initial test suite fail against the faulty systems. The kth iteration cycle: 1. Derive a new test suite TS[k] based on the previous test suite TS[k 1]. 2. Construct D [k] based on the actual test suite TS[k] and the fault set F [ Sp. 3. Compare the old D [k 1] and the new D [k] matrices using the fault based metrics ðF Þ (see Section 3.4): D½k N D½k 1, and if the new is found better, let it be the new actual test suite TS w TS[k]. 4. Until the stop condition is false, repeat this cycle from step 1, else optimize and return the test suite TS. The operation of Algorithm 3.1 is as follows; the initialization has two steps. First, a small initial test suite (TS[0]) is derived. This is the actual test suite (TS). Then the – fault based – test selection criteria (D) are determined for this small suite. The cycle has three steps, which are repeated while the stop condition is false. First in Step 1 a new test suite is generated based on the one of the previous cycle (see Section 3.5). Then the – fault based – test selection criteria are determined for this new suite (Step 2). Finally in Step 3 these selection criteria are compared (see Section 3.4). If the new test suite is found bbetterQ, it is kept as the actual. If the old test suite is better, the conditions remain the same for the next cycle. 3.3. Enhancement of the algorithm Since fault based test selection is relatively fast if it is applied to small test suites with a small matrix of test selection criteria [14,24], this algorithm can be made more effective without significantly increasing the test generation time, if at each step the test suites are optimized and reduced. The input parameters are the same as above, the output of the procedure is likewise a test suite. The initialization and second step of the cycle of Algorithm 3.1 have to be modified such that they contain
G. Kova´cs et al. / Computer Standards & Interfaces 28 (2006) 412–427
the immediate optimization based on the selection criteria. In the cycle, the optimum is computed not only from the new suite, but from the union of the old and new. The other steps of this algorithm are the same as above. There are several approaches in the literature to find an optimal, reduced test suite based on a set of criteria stored in a matrix [14,24]. To drop the redundant test cases, that is, to select an optimal test suite, this article uses the bacterial evolutionary algorithm discussed in Ref. [24]. This is referred as the optimization and reduction of a test suite according to a criterion matrix in the enhanced algorithm. Let the function select: M Y s, where 8i: s[i] a {0, 1} mark the rows of the matrix M in the vector s with 1, if the row is selected by the optimization algorithm (it is part of the optimal test suite). Otherwise, let s contain 0 in the corresponding row. If the size of M is m n, then the size of s is n. Let the function reduce: M s Y M remove the unselected rows from M according to the selection vector s. Let Mr : = reduce(M, s), where the size of M is m n, the size of s is n, and the size of Mr is m k, n where k = R g=0 sg. Let the function append: M M Y M copy the rows of the second matrix below the rows of the first, where both matrices have the same number of columns. Algorithm 3.2 (Enhanced iterative test derivation). INPUT: F [ {Sp}, stop OUTPUT: TS[n] VARIABLES: TS[k]: is the actual test suite in cycle k D[k]: is a matrix of integer values in cycle k C[k]: is a matrix of Boolean values in cycle k Initialization: Generate an initial test suite TS[0] Construct D [0] based on the test suite TS[0] and the fault set F [ {Sp}. Compute C [0] from D [0]. Let TS[0] w reduce (TS[0]; select (C [0])), and let D [0] w reduce (D [0]; select (C [0])). That is, select the optimal test cases and reduce the initial suite TS[0] and the initial matrix D [0].
419
The 2nd step of kth iteration cycle (the other steps are the same as in Algorithm 3.1): 2. (a) Generate a new test suite TS[k] based on TS[k 1]. (b) Construct D [k] based on TS[k] and F [ {Sp}. (c) Compute C [k] from D [k]. (d) Let TS[k] w reduce (append (TS[k 1], TS[k]), select (append (C[k 1], C[k]))). This means, select the optimal test cases and reduce the union of the actual and previous test suites. (TS[k] is a matrix consisting of a single column.) (e) Let D[ k ] w reduce (append ( D [ k 1], D[k]), select(append (C[k 1], C[k])). That is, compute the D matrix that corresponds to the new test suite. A further variation of this algorithm is when the previous event sequences generated are stored in a memory. When a new test case is derived, it is checked whether it has already been generated. If it has, it is dropped and another one is derived. 3.4. Comparison of test suites using the fault based criteria Fault based testing provides the heuristic evaluation criteria applied in this article. To compare two test suites a metric is defined for test suites. The metric is provided by the four-level test suite evaluation criteria, which are based on a fault set derived from the specification by means of mutation analysis. It is important to note, however, that evaluation criteria may have also different bases according to the decision of the tester. Definition 3.3 ((Test suite evaluation criteria)). To define the criteria for evaluating a test suite the fault based test selection matrices D and C are used. Criteria for evaluating the test suite TS: 1. The number of faults detected by TS:
CR1 ¼
jTSj X j¼1
sgn
jFj X i¼1
! Ci;j ;
420
G. Kova´cs et al. / Computer Standards & Interfaces 28 (2006) 412–427
where sgn(x) is the signum function: x b 0 Z sgn(x) = 1, sgn(0) = 0, x N 0 Z sgn(x) = 1. 2. The average length of the test cases in TS: PjFj PjTSj i¼0 j¼1 Di;j : CR2 ¼ jTSj 3. The number of test cases in T1: CR3 ¼ jTSj: 4. The uniformity of the fault detection capability of TS: 2 PjTSj PjFj C i;j j¼1 i¼1 CR4 ¼ : jTSj Criterion 1 calculates the number of faults, which can be detected by the test suite. It is the number of columns in C, which contain at least one true (1) value. The larger this value is, the better the test suite is.
Criterion 2 checks the average length of the test cases in a test suite. It is calculated from the average of all values in the matrix D. Longer test cases may exercise the implementation better. The third level (Criterion 3) uses the size of the test suites. The execution of a test suite containing fewer test cases, takes presumably less time. Finally, Criterion 4 checks the distribution of column sums in C. The closer it is to the uniform distribution, the small the calculated value is. The rationale of this criterion is to preserve the faults already detected for future cycles. If a test case is removed (replaced by a new test case of a future cycle), the test suite is still likely to detect the faults covered by it, if the distribution of the true values in the C matrix is close to the uniform distribution. Definition 3.4 and Definition 3.5 provide the means to compare two test suites. Definition 3.4 (Test suite metrics). Let c be a real value associated to a test suite TS. The value of c can
Fig. 4. Examples for the test suite evaluation criteria.
G. Kova´cs et al. / Computer Standards & Interfaces 28 (2006) 412–427
be calculated using the criteria above (Definition 3.3): c = w 1 * CR1 + w 2 * CR2 w 3 * CR3 w 4 * CR4, where wi N0; wi aR is a weight. Since practically the fault detection capability is the most important criterion w 1 H w i , i N 2. Because CR3 and CR4 intend to find a minimal value, these should decrease the metric.
421
cases. The distribution of column sums (Criterion 4 in the second C matrix of the last example is closer to the uniform distribution, therefore it is considered better. 3.5. Generating a new test suite
ðF Þ
Definition 3.5 (Comparing two test suites). T S1 N TS2 (read: TS1 is better than TS2 over the fault set F), if c 1 N c 2. Fig. 4 shows examples of comparing two matrices. The rows of the matrices in the figure represent the test cases of the test suite, and the columns represent the faults. In the first example where the C matrix of TS1 and TS2 is investigated, the second matrix is considered better, because it indicates that the second test suite detects all faults, while the first only three. In the second example, according to Criterion 2 the D matrix of the second test suite is considered better, it contains test cases of more events. In the example for Criterion 3, the first C matrix is considered better, it consists of fewer test ENV
Old test case:
In this section, the modification of a test suite that is based on a previous suite is discussed. There are two ways to modify a test suite. One is to add new test cases to it, the other is to replace an existing case with a new one. The modification of a test suite takes three parameters: the maximum number of test cases, the number of test cases modified or added in a cycle, and the test suite to be modified itself. While the size of the test suite is less than the maximum, new cases may be derived and added. Otherwise existing cases are modified. Let 8taTS: FAðt Þ ¼maxðl Þ; 1Vlbfails afterðt; SpÞ; laN; t ½l aO, that is, the failed-after (FA) value of a test case is number of events until the last correct output event. The test cases to be modified are those SYSTEM
!ICONreq ?CR
FA
?CR
... ENV
SYSTEM !CC
...
FAIL
ENV
SYSTEM !ICONreq ?CR !DR !CC
... a) Generating a new test case
b) Replacing the tail
Fig. 5. Generating a new test case (a) or a new tail to a test case (b).
FA
}
new tail
422
G. Kova´cs et al. / Computer Standards & Interfaces 28 (2006) 412–427
that have the smallest FA values, because longer test cases may exercise the system under test better. Let test case t a TS be split up into two parts t = ab such that a: = substring(t, random(0, FA(t))). Derive a new tail (bV) and let tw V abV. Fig. 5 shows two examples to the modification of test cases represented in message sequence charts (MSC). In the first case ((a) in the figure) a random number of events between 0 and the FA value are retained in the test sequence, and a new tail is appended to it. 0 means that a completely new test case is derived — just like in the figure. In the second case ((b) in the figure) the tail of the test case is replaced from the last successfully received output before which the test case was observed to fail. To generate the new tails – or even new cases – Algorithm 2.2 can be used.
4. Comparison of the algorithms This section presents an experiment analyzing the algorithms described above. The main purpose is to find out how the proposed methods perform – considering different aspects – compared to other test derivation algorithms including the selection procedure. The main aspects of the examination are the execution time, fault detection capability and the number of derived test cases. Since the proposed methods include heuristic elements none of these key properties can be exactly calculated or formulated, and are thus analyzed empirically. The methods were tested on a couple of specifications of different complexity: the experiment was taken on both sample and real life protocol specifications. The systems examined were sample systems: the Conference Protocol [25], the initiator and responder sides of the INRES [26] and real life protocol standards: the
Table 2 The number of mutant systems and parameters for test generation
Mutants Test case length Max. number of test cases
Conf. prot.
INRES ini
INRES res
WTP ini
WTP res
117 10 300
73 10 300
53 10 300
198 14 600
220 14 600
WAP WTP (Wireless Application Protocol – Wireless Transaction Protocol) [27] protocols. Table 1 summarizes the complexity of the protocols used. It shows the number of control states, inputs, outputs, variables, predicates, actions and timers. (Note that these values are derived from the EFSM (Extended Finite State Machine) representations of the protocols.). In the experiment five algorithms were compared: the random trace test derivation algorithm (Algorithm 2.1), random event sequences (Algorithm 2.2), the proposed iterative test generation method (Algorithm 3.1), its enhancement with the immediate test selection (Algorithm 3.2) and its extension with memory. In this experiment, all iterative algorithms used the random event sequence method to derive and modify test cases. The generation of all possible sequences up to a given length (Algorithm 2.3) was found unsuitable even in case of the most simple protocol as it creates an enormous number of test cases using the given length values. Slightly modified modules of the mutation analysis based test selector tool [23] provided the environment for the experiment. For the test suite optimization, the library of a bacterial evolutionary algorithm [24] was used. The results were obtained using a PC with an AMD Athlon XP 2100+ CPU (at 1733 MHz) processor and 512 MB of RAM and providing the same environment.
Table 1 Complexity of the systems examined System
State
Input
Output
Variable
Predicate
Action
Timer
Conference protocol INRES initiator INRES responder WAP WTP initiator WAP WTP responder
2 4 3 5 6
7 4 5 6 6
5 5 4 6 6
3 2 1 7 8
9 2 1 7 8
22 6 2 31 21
0 2 0 3 3
G. Kova´cs et al. / Computer Standards & Interfaces 28 (2006) 412–427 Table 3 The number of test cases in the optimized suite Method
Conf. prot.
INRES ini
INRES res
WTP ini
WTP res
Random trace Random event sequence Iterative Enhanced iterative Iterative with memory
8 10
6 7
5 5
17 22
15 15
10 9
8 5
9 6
20 15
27 18
8
5
4
16
15
As a first step of the experiment, the same fault model was applied to the different protocol specifications to create mutants. The number of mutant specifications generated was highly dependent on the complexity of the systems. The first row of Table 2 contains their number. For all the algorithms the maximum length, i.e., the maximum number of events, of the test cases was defined as a parameter for test generation. The second row of Table 2 shows the appropriate length values. For each algorithm the maximum number of test cases to be generated was defined. The third row of Table 2 shows the number of test cases derived for the different protocols. Note that the natural stop condition of the iterative algorithms is the reach of 100% mutation detection ratio. An additional stop condition was applied in the experiment: the total number of test cases generated during the iteration was limited to the same value used for the random trace and random event sequence algorithms. Besides, the maximum length of test cases was limited to the same value as well. This made a fair comparison of the algorithms possible. The next three tables (Tables 3–5) present the key results of the experiment. Note that in case of the
423
random trace algorithm and the random event sequence the tables contain the values of the optimized test suite. In case of the first three protocols the results of the iterative methods are the mean of five independent runs, and for the method using the random event sequences the best result of five runs was considered. Table 3 shows the resulting number of generated test cases after the test selection. As the data in the table indicate the resulting optimized test suites are nearly of the same size for the different test generation techniques. The fault detection ratio of the optimized suite is shown in Table 4 In case of small systems high fault coverage ratio can be achieved using even the simplest method with the given number of test cases. As the complexity of the specifications grows this ratio decreases. The reason for the low detection ratio in case of the more complex systems is that the number of test cases derived was not sufficient. These test suites were capable of detecting only a part of the faults. Table 5 shows the execution times of the algorithms including the test derivation and selection in minutes. It is calculated from the execution times of the test derivation and selection processes. Since in these experiments we used the same mutant set for the fault based test selection, the mutant generation time is not taken into account. Significant deviation can be observed between the simple random trace method and the other techniques when they are applied to larger protocols. The reason for this deviation is that the input and output sets for those protocols are larger. Therefore, the random sequence based methods generate inappropriate test cases more frequently. These tests are likely to fail early and thus provide only a
Table 5 Execution times in minutes
Table 4 Fault detection ratio [%] Method
Conf. prot.
INRES ini
INRES res
WTP ini
WTP res
Method
Conf. prot.
Random trace Random event sequence Iterative Enhanced iterative Iterative with memory
100 80
100 86
100 85
54 39
42 37
113 124
100 100
86 100
100 100
51 56
47 49
100
100
100
59
51
Random trace Random event sequence Iterative Enhanced iterative Iterative with memory
INRES ini
INRES res
WTP ini
WTP res
88 87
93 115
823 307
947 296
138 159
105 127
124 129
238 298
237 276
174
144
154
325
311
424
G. Kova´cs et al. / Computer Standards & Interfaces 28 (2006) 412–427
few selection criteria (few true values in the C matrix) thus the selection requires less time. The iterative test generation algorithm produced an optimized test suite nearly as fast as the method
using random event sequences. On the other hand, their fault detection capability was observed to be better. This result was in line with our expectations because in the case of iterative algorithms each
Number of faults detected
120 100 80 60 40
Iterative Enhanced iterative Enhanced iterative with memory
20
20
40
60
80
100
120
140
160
180
200
220
240
Number of test cases derived so far
Number of faults detected
(a) Conference Protocol 100 80 60 40 Iterative Enhanced iterative Enhanced iterative with memory
20 20
40
60
80 100 120 140 160 180 200 220 240 260 280 300
Number of test cases derived so far
Number of faults detected
(b) Inres Initiator 100 80 60 40 Iterative Enhanced iterative Enhanced iterative with memory
20
20
40
60
80
100
120
140
160
180
200
220
Number of test cases derived so far
(c) Inres Responder Fig. 6. Iterative fault detection experiments on sample systems.
240
G. Kova´cs et al. / Computer Standards & Interfaces 28 (2006) 412–427
Number of faults detected
iteration cycle may improve the previous random sequence. That is, there is correlation among the generated random sequences. The fault detection ratio of the random trace method was similar to the random sequence based iterative algorithms. Though the iterative algorithms required more time for smaller protocols, they were found less sensitive for the increase in system complexity. Five charts (Figs. 6(a), (b), (c), and 7(a), (b)) show the number of detected faults against the total number of generated test cases in the case of the iterative algorithms for the investigated protocols. (Note that due to their large number, the samples are represented with continuous lines.) The horizontal axis of each figure shows bthe number of test cases generated so farQ, the vertical axis shows bthe number of faults detectedQ. Solid, dashed and dotted lines represent the data sets for the methods iterative, enhanced iterative, and the enhanced iterative respectively. The charts show a step function-like fault detection progress. Due to the randomly generated test
sequences, jumps in the fault detection ratio and several iteration long steps can be observed. In case of the enhanced iterative method a more strict iteration condition provided faster convergence to the detection of all the mutants. The reason is that while the iterative method only aims that the new suite detects more faults, in case of the enhanced iterative method bgoodQ test cases cannot be dropped from the best suite, because it computes the best from the union of the previous and actual suites. Its further extension with a memory resulted in a slight improvement at the cost of higher hardware requirements. As the iteration progresses the speed of the convergence in found to get slower. On the other hand a slower convergence can be observed in case of the more complex systems as well.
5. Conclusions Selecting an optimized set from a large test suite can be a rather time consuming procedure. The goal in
200 160 120 80 40
Iterative Enhanced iterative Enhanced iterative with memory
40
80
120 160 200 240 280 320 360 400 440 480 520 560 600
Number of test cases derived so far
(a) WAP WTP Initiator Number of faults detected
220 200 160 120 80 40
Iterative Enhanced iterative Enhanced iterative with memory
40
425
80 120 160 200 240 280 320 360 400 440 480 520 560 600
Number of test cases derived so far
(b) WAP WTP Responder Fig. 7. Iterative fault detection experiments on real-life systems.
426
G. Kova´cs et al. / Computer Standards & Interfaces 28 (2006) 412–427
this article was to produce compact test suites automatically with high fault coverage within reasonable time. The iterative methods reduce the time required for the test suite optimization and make it possible to improve test suites generated by random test derivation algorithms.In the experiments, fault based testing provided the test selection and the test suite metrics. Test sequences were composed of random events. Especially, in case of more complex protocols the iterative algorithms were found to decrease the total generation time of an optimized test suite compared to the methods performing test generation and selection in two separate steps. The iterative techniques also provided similar– or slightly better–fault coverage and similar number of final test cases. Our current research focuses on two problems. One is the investigation of the utilization of more complex test derivation and non-fault based selection (the distance based) methods in the iteration cycle and the usage of different comparison criteria. The other is the compression of random event sequences to reduce the growth of test case lengths during the iteration. This means that the invalid inputs are dropped and the valid input–output sequence is extracted from each random sequence based on the whole set of sequences. We also plan to conduct more experiments to analyze the connection between the size of test suites to be used in the iteration cycles and the complexity of different protocols. We definitely intend to include non-ITU, e.g. IETF (Internet Engineering Task Force), protocols in our further analyses.
References [1] ITU-T, Recommendation Z.100: Specification and Description Language, 2000. [2] ITU-T, Recommendation Z.120: Message Sequence Chart, 2000. [3] ISO, ISO 8807: Information Processing Systems – Open Systems Interconnection – LOTOS — A Formal Description Technique Based on the Temporal Ordering of Observational Behaviour, 1987. [4] ITU-T, Osi Conformance Testing Methodology and Framework for Protocol Recommendations for ITU-T Applications — General Concepts, 1995. [5] D. Lee, M. Yiannakakis, Principles and methods of testing finite state machines — a survey, Proc.of the IEEE 43 (3) (1996) 1090 – 1123.
[6] C. Bourhfir, R. Dssouli, E.M. Aboulhamid, Automatic Test Generation for EFSM-based Systems. http://citeseer.nj.nec. com/114451.html. [7] J. Tretmans, Specification based testing with formal methods: a theory, in: A. Fantechi (Ed.), FORTE/PSTV 2000 Tutorial Notes, Pisa, Italy, October 10, 2000. [8] G. Bucci, A. Fedeli, E. Vicario, Specification and simulation of real time concurrent systems using standard SDL tools, In SDL-Forum, Springer, Stuttgart, Germany, 2003. [9] T. Cso¨ndes, B. Kotnyek, A mathematical programming method in test selection, EUROMICRO 97, 1997, pp. 8 – 13. [10] A.W. Williams, R.L. Probert, Formulation of the interaction test coverage problem as an integer program, Testing Communication Systems XIV, Kluwer Academic Publishers, Berlin, Germany, 2002, pp. 283 – 298. [11] L.M.G. Feijs, N. Goga, S. Mauw, J. Tretmans, Test selection, trace distance and heuristics, Testing Communication Systems XIV, Kluwer Academic Publishers, Berlin, Germany, 2002, pp. 267 – 282. [12] G. Kova´cs, Z. Pap, G. Csopaki, Automatic test selection based on CEFSM specifications, Acta Cybernetica 15 (2002) 583 – 599. [13] T. Sugeta, J.C. Maldonado, W.E. Wong, Mutation testing applied to validate SDL specifications, In Proceedings of 16th IFIP Testing of Communicating Systems, Springer, Oxford, UK, 2004, pp. 193 – 208. [14] T. Cso¨ndes, B. Kotnyek, J.Z. Szabo´, Application of heuristic methods for conformance test selection, European Journal of Operational Research (2001). [15] R.A. De Millo, R.J. Lipton, F.G. Sayward, Hints on test data selection: help for the practicing programmer, IEEE Computer 11 (4) (1978) 34 – 41 (April). [16] A. Mathur, W. Wong, A formal evaluation of mutation and data flow based test adequacy criteria, In ACM Computer Science Conference (CSC’94), 1994. [17] S.R.S. Souza, J.C. Maldonado, S.C.P.F. Fabbri, W. Lopes De Souza, Mutation testing applied to Estelle specifications, Software Quality Journal 8 (04) (2000). [18] D.R. Kuhn, A technique for analyzing the effects of changes in formal specifications, The Computer Journal 35 (6) (1992) 574 – 578. [19] T. Chow, Testing software design modelled by finite-state machines, IEEE Transactions on Software Engineering 4 (3) (1978) (May). [20] G. Bochmann, A. Das, R. Dssouli, M. Dubuc, A. Ghedamsi, G. Luo, Fault models in testing, IFIP Transactions, Protocol Test Systems IV, 1991. [21] D. Sidhu, Ting kau Leung, Fault coverage of protocol test methods, Proceedings, IEEE INFOCOM’88, 1988, (March). [22] W.M. Spears, K.A. De Jong, T. Ba¨ck, D.B. Fogel, H. de Garis, An overview of evolutionary computation, In Proceedings of the European Conference on Machine Learning (ECML-93), vol. 667, Springer Verlag, Vienna, Austria, 1993, pp. 442 – 459. [23] G. Kova´cs, D. Le Viet, A. Wu-Hen-Chang, Z. Pap, G. Csopaki, Applying mutation analysis to SDL specifications, In SDL-Forum, Springer, Stuttgart, Germany, 2003.
G. Kova´cs et al. / Computer Standards & Interfaces 28 (2006) 412–427 [24] Vincze, G.,Test selection using evolution algorithms (in Hungarian). Master’s thesis, Budapest University of Technology and Economics, 2002. [25] A. Belinfante, J. Feenstra, R.G. Vries, J. Tretmans, N. Goga, L. Feijs, S. Mauw, L. Heerink, Formal test automation: a simple experiment, in: G. Csopaki, S. Dibuz, K. Tarnay (Eds.), 12th Int. Workshop on Testing of Communicating Systems, Kluwer Academic Publishers, 1999, pp. 179 – 196. [26] J. Ellsberger, D. Hogrefe, A. Sarma, SDL Formal Objectoriented Language for Communicating Systems, Prentice Hall, 1997. [27] WAP Forum, Wireless application protocol architecture specification, Specification (1998) (April). Ga´bor Kova´cs received his MSc degree in Electrical Engineering in 2000 at Budapest University of Technology and Economics. Since then, he has been a Ph.D. student there. His main field of research is the investigation of automatic test generation methods for telecommunication software. He is also interested in the design of IPv6 access networks and the applications of the Java technology.
Gyula Csopaki reveived his MSc degree in Electrical Engineering in 1969 and his PhD degree in 1989 at the Technical University Budapest. He is an associate professor at the same university. He was the co-editor of the book Testing Communicating Systems (1999). His main research field is the application of formal methods in the design telecommunications systems.
427
Katalin Tarnay is a professor of mobile communication and telecommunications software at the Budapest University of Technology and Ecomomics. Her book Protocol Specification and Testing (1989) was published by Kluwer Academic Publishers in New York. She was the coeditor of the book Testing Communicating Systems (1999). Her current research field is the application of self-adaptive protocols in service location, first of all in global positioning. Zolta´n Pap received his MSc degree in Electrical Engineering in 2000 at Budapest University of Technology and Economics. Since then, he has been a PhD student at the same university. In 2002, he received his MSc Degree in Business Administration at the Budapest University of Economic Sciences and Public Administration. His major research areas include Computer Aided Test Generation algorithms for telecommunication software, aspect-oriented software development methodologies, and GRID systems.