The Journal of Systems and Software 82 (2009) 1403–1418
Contents lists available at ScienceDirect
The Journal of Systems and Software journal homepage: www.elsevier.com/locate/jss
Issues in using model checkers for test case generation q Gordon Fraser a,*, Franz Wotawa a, Paul Ammann b a b
Institute for Software Technology, Graz University of Technology, Inffeldgasse 16b/2, A-8010 Graz, Austria Department of Information and Software Engineering, MS 4A4, George Mason University, Fairfax, VA 22030-4444, USA
a r t i c l e
i n f o
Article history: Available online 10 May 2009 Keywords: Automated software testing Automated test case generation Testing with model checkers Performance Minimization
a b s t r a c t The use of model checkers for automated software testing has received some attention in the literature: It is convenient because it allows fully automated generation of test suites for many different test objectives. On the other hand, model checkers were not originally meant to be used this way but for formal verification, so using model checkers for testing is sometimes perceived as a ‘‘hack”. Indeed, several drawbacks result from the use of model checkers for test case generation. If model checkers were designed or adapted to take into account the needs that result from the application to software testing, this could lead to significant improvements with regard to test suite quality and performance. In this paper we identify the drawbacks of current model checkers when used for testing. We illustrate techniques to overcome these problems, and show how they could be integrated into the model checking process. In essence, the described techniques can be seen as a general road map to turn model checkers into general purpose testing tools. Ó 2009 Elsevier Inc. All rights reserved.
1. Introduction Testing is the main method applied to software in order to eliminate as many errors as possible, to determine the reliability, and to justify the quality of a software product. As software testing is a very resource-intensive task, automation is desirable. Model based testing techniques offer support for automation of many testing related tasks once a suitable test model is available. There is, however, no consensus on the best way to derive test cases from a model, and thus there is no single superior technique. A promising approach is to use model checking techniques to generate test sequences; model checkers are tools for formal verification and can generate counterexamples to illustrate property violations – such counterexamples are perfectly suited as test cases. The use of model checkers for automated testing was originally proposed by Callahan et al. (1996) and Engels et al. (1997), and since then several different methods to create test cases with model checkers have been proposed (e.g., Ammann et al., 1998; Calvagna and Gargantini, 2008; Gargantini and Heitmeyer, 1999; Heimdahl et al., 2001; Hong et al., 2003). Originally, model checkers were intended only for formal verification: Given a behavioral model and a temporal logic property q The research herein is partially conducted within the competence network Softnet Austria (http://www.soft-net.at) and funded by the Austrian Federal Ministry of Economics (bm:wa), the province of Styria, the Steirische Wirtschaftsförderungsgesellschaft mbH. (SFG), and the city of Vienna in terms of the center for innovation and technology (ZIT). * Corresponding author. E-mail addresses:
[email protected] (G. Fraser),
[email protected] (F. Wotawa),
[email protected] (P. Ammann).
0164-1212/$ - see front matter Ó 2009 Elsevier Inc. All rights reserved. doi:10.1016/j.jss.2009.05.016
a model checker will exhaustively analyze the model’s state space in order to prove property violation or satisfaction. Constant improvements of model checking techniques have lead to tools that can handle realistic applications, although the performance is still critical because of the state space explosion (Clarke et al., 2001). Software model checking is a technique that applies traditional model checking techniques to source code or bytecode without requiring an explicit model. Here, however, the state space is even bigger. Recently, the focus has shifted from exhaustive verification to error trace generation using techniques such as bounded model checking (Biere et al., 1999) or directed model checking (Edelkamp et al., 2001) based on heuristic search. Even if a property is proved to hold on a model, this only shows correctness of the model but not the actual implementation, which depends on many platform specific aspects such as the operational environment. Consequently, a certain amount of testing will always be necessary. Model checkers have proved to be useful for testing as well: Given a behavioral test model, the model checker can be used to create test suites with regard to different test objectives fully automatically. For this, the counterexample generation facility of model checkers is used. The test objectives might be a certain coverage criterion or based on mutation analysis. Given such a test objective, properties can be automatically derived such that the model checker will return counterexamples suitable as test cases. These test cases consist of both the input data and the expected outcome; i.e., the oracle problem (Weyuker, 1982) is solved as well as test cases contain the expected output. While model checker based methods to testing are convenient, there are some drawbacks that result from the fact that model checkers were not originally intended for such an application.
1404
G. Fraser et al. / The Journal of Systems and Software 82 (2009) 1403–1418
Counterexample generation for formal verification often tries to create sequences that are easy to understand – such counterexamples, however, are not always good test cases. Consequently, the quality of test cases generated with model checkers is not as good as it might be. Besides quality concerns, usability and performance often suffer because model checkers have to be used in twisted ways for testing. As a simple example, many model checker based test case generation techniques extend or duplicate the model, which obviously increases the complexity. It is conceivable that the model checker could fulfill these tasks more efficiently, and without a significant increase in the complexity. As another example, when using a model checker for verification, any counterexample that shows a property violation is appropriate. When creating test cases, this leads to a significant amount of identical or subsumed test cases. This wastes time during creation, and reduces the overall test suite quality. Unfortunately, this is not the only drawback of currently used counterexample generation techniques. A possible alternative to improve this situation would be to create dedicated testing tools that re-implement the relevant techniques provided by model checkers. While clearly possible, we argue that this is not strictly necessary. Many available model checking tools have evolved to efficient and mature tools, and the modeling languages are widely used. In addition, many modern model checkers are specifically intended and used for error trace generation. We argue that although conceptually different, formal verification and testing based on model checking techniques are very similar. With some modifications, available model checking tools can be turned into powerful testing tools. In this paper we take a closer look at the demands of test case generation and discuss deficiencies of current model checkers. In particular, we identify and illustrate ten different concrete problems or aspects where model checkers need to be improved for test case generation, and show the drawbacks resulting from these problems on a common demonstration model. In detail, we have identified the potential to improve model checkers with regard to the following aspects:
Minimization at creation time. Prioritization of counterexamples. Extensible counterexamples. Counterexamples for controllable test cases. Tree-like counterexamples. Abstraction for testing. Model combination. Alternative witnesses and counterexamples. Explicitly setting the initial state of counterexamples. Constraints on counterexamples.
This paper can be seen as a survey of the problems existing when using model checkers for test case generation and extends a paper drafting some of the issues discussed in detail here and presented at the Seventh International Conference on Quality Software (Fraser and Wotawa, 2007a). For some of the identified problems there are already solutions available but not used in conjunction with model checkers; in this case we can provide references and background information. Other issues are technical problems or open research problems that yet need to be solved. Consequently, these problems can be seen as possible directions for further research.
2. Testing with model checkers In general, model checking describes the process of determining whether a structure is a model of a logical formula. A typical model
checking tool takes a behavioral, formal model and a temporal logic property, e.g., specified in LTL (Pnueli, 1977) (Linear Temporal Logic) or CTL (Clarke and Emerson, 1982) (Computation Tree Logic), and then effectively explores the entire state space of the model. Research has resulted in several different techniques that handle this problem for models of industrial size. The first successful model checking approach was explicit model checking, which performs an explicit search in a model’s state space, considering one state at a time. There are different approaches based on LTL (Lichtenstein and Pnueli, 1985; Vardi and Wolper, 1986) and CTL (Clarke et al., 1983; Queille and Sifakis, 1982) properties. The use of ordered binary decision diagrams (BDDs (Bryant, 1986)) to represent states and function relations allows symbolic model checking (McMillan, 1993) to handle much larger state spaces. Recently, bounded model checking (Biere et al., 1999), which reformulates the model checking problem as a propositional satisfiability problem (Davis and Putnam, 1960) (SAT) and allows the use of SAT solvers to calculate counterexamples up to a certain upper bound, has gained popularity as a complimentary technique to symbolic model checking. At the same time, explicit model checking remains attractive due to improvements such as heuristic search (Edelkamp et al., 2001) and its ability to partially handle models to some extent at least for counterexample generation when the complete state space is too large to be fully represented in memory. Independently of the underlying technique, a counterexample is produced if a property violation is detected. In practice, a counterexample is a linear sequence leading from an initial state to some state, such that the sequence or its final state illustrates the property violation. This sequence is intended to allow a human operator to understand the cause of a property violation. If no property violation can be found, then this serves as a proof of correctness with regard to the considered property, as long as model checking is performed exhaustively. Model based testing is related in many ways: The aim of model based testing is to automatically derive test cases from dedicated test models. In general, such test cases are linear sequences as well. Given these similarities between model based testing and model checking, it is an appealing idea to use the available efficient model checking techniques to solve the model based testing tasks. Consequently, the idea of testing with model checkers is to use model checking techniques to create counterexamples, and then to interpret these counterexamples as test cases. In order to describe how model checking and testing are related, consider the formalism commonly used to describe behavioral models and to define the semantics of temporal logics with, the Kripke structure: Definition 1 (Kripke structure). A Kripke structure K is a tuple K ¼ ðS; S0 ; T; LÞ:
S is a set of states. S0 # S is an initial state set. T # S S is a total transition relation, that is, for every s 2 S there is a s0 2 S such that ðs; s0 Þ 2 T. L : S ! 2AP is a labeling function that maps each state to a set of atomic propositions that hold in this state. AP is a countable set of atomic propositions. An infinite execution sequence p :¼ hs0 ; s1 ; . . .i of a Kripke structure K such that 8 i P 0 : ðsi ; siþ1 Þ 2 T is called a path. Let PathsðKÞ denote the set of all paths of Kripke structure K starting with a state in S0 . The Kripke structure is often used to define the semantics of temporal logics like LTL (Pnueli, 1977) or CTL (Clarke and Emerson, 1982), which are used to express properties that should
1405
G. Fraser et al. / The Journal of Systems and Software 82 (2009) 1403–1418
be verified on the model. If such a property is shown to be violated, then a model checker returns a finite trace that violates the property. Definition 2 (Trace). A trace t :¼ hs0 ; . . . sn i of Kripke structure K is a finite sequence such that 8 0 6 i < n : ðsi ; siþ1 Þ 2 T for K. Some properties cannot be violated by finite traces; for example, violation of liveness properties requires infinite traces. Model checkers can handle this by finding so called lasso-shaped sequences, which simply means that there is a dedicated loopback state on the finite trace, such that the finite trace represents an infinite path. Mapping traces to test cases is usually straight forward. In general, a test case is a sequence where each state consists of input values that are used as test data, and the expected result, which serves as test oracle. The input values are what the system under test receives as user input, reads via sensors, or receives from other systems, while any observable reaction of the system is an output. Converting a model checker trace to a test case is often simply a matter of partitioning the model’s variables into input, output, and internal variables (e.g., in the case of reactive systems). The values of input variables serve as test data while the output values in the trace serve as test oracle. Different techniques have been proposed in order to force a model checker to create traces suitable as test cases. Some model checkers are able to create witness traces, which serve to illustrate that a property is satisfied. In general, traces are generated as counterexamples for property violations. This means that in order to generate test cases with a model checker it is necessary to introduce some inconsistency that leads to counterexamples. This is mostly done by specifying dedicated properties that are supposed to be violated by the test model. In this paper, we mostly use the temporal logic LTL (Pnueli, 1977) to illustrate such properties. An LTL formula consists of atomic propositions, Boolean operators and temporal operators. The operator ‘‘” refers to the next state. For example, ‘‘ a” expresses that a has to be true in the next state. ‘‘U” is the until operator, where ‘‘a U b” means that a has to hold from the current state up to a state where b is true. ‘‘” is the always operator, stating that a condition has to hold at all states of a path, and ‘‘}” is the eventually operator that requires a certain condition to eventually hold at some time in the future. There are two main categories of approaches to test case generation with model checkers: The first category uses special proper-
IN_speed = 0 & Endswitch & !IN_pump
ties that are intended to be violated by a model (Callahan et al., 1998; Gargantini and Heitmeyer, 1999; Hamon et al., 2004; Heimdahl et al., 2003; Rayadurgam and Heimdahl, 2001). These properties are called trap properties, and express the items that make up a coverage criterion by claiming that these items cannot be reached. For example, a trap property might claim that a certain state or transition is never reached. A resulting counterexample shows how the state or transition described by the trap property is reached; this counterexample can be used as a test case. The next section contains an example of this method. There are related approaches that use trap properties not related to coverage criteria. For example, there are trap properties that result from mutation of properties that ‘‘reflect” the transition relation (Black, 2000), or trap properties based on requirement properties (Tan et al., 2004). The second category of test case generation approaches uses mutation to change a model such that it violates a given specification or other properties (Ammann et al., 1998, 2001; Fraser and Wotawa, 2008; Okun et al., 2003). Here, the model checker is used to illustrate the differences between changed models and the original model. In general, test case generation with model checkers is feasible whenever there are test models available that can be handled by model checking tools. In comparison to other model based testing techniques (e.g., see Utting et al. (2006) for an overview) the main advantages of using model checkers are the flexibility and ease of use: For example, it is straight forward to combine different coverage criteria and mutation based testing for test case generation within a single framework. The main concern about this approach is usually related to the performance, as model checkers are known to run into the state space explosion problem quickly. Furthermore there are some disadvantages resulting from the fact that model checkers were not originally intended for test case generation; these disadvantages are discussed in this paper.
3. A demonstration model In this section we describe an example model that will be used for demonstration purposes throughout the paper. The application is a windscreen wiper controller as used in modern cars. The wiper shall be able to wipe at two different speeds and has a water pump to squirt water on the windscreen for cleaning purposes. After water is used, three cycles of wiping have to be done. If the wiper was active before pumping water, then during these three wipes
Wiping
!IN_pump & Counter = 3
!IN_pump & Counter < 3 / Counter = Counter + 1 Idle
Waiting
IN_speed > 0 & !IN_pump IN_pump
IN_pump
Pumping
IN_pump
!IN_pump / Counter = 0
Fig. 1. EFSM model of windscreen wiper application.
1406
G. Fraser et al. / The Journal of Systems and Software 82 (2009) 1403–1418
the same speed as before is used, and normal wiping continues after that. If the wiper was idle before the use of water, then the first speed shall be used. Before changing to idle mode again the wiper should always return to the initial position. Fig. 1 illustrates the behavior of this application as an extended finite state machine (Lee and Yannakakis, 1996) (EFSM). An EFSM is a finite state machine that has a set of variables, and the transitions have guard conditions and update functions on these variables. Transitions are labeled with their guard conditions, and the optional update functions are separated with a slash. Note that Fig. 1 omits some details to keep the presentation as simple as possible; for example, the variable Counter is only increased when the wiper is in its initial position, and inputs and outputs are implicitly given via the guard conditions and update functions. The figure also omits most of the self-loops, which exist for all states. For all details of the model refer to the specification given in Appendix A. The specification is given in the language of the model checker NuSMV (Cimatti et al., 1999). The syntax is straight forward; lines beginning with – are comments. The code mainly consists of variable declarations (section VAR) and the transition relation (section ASSIGN). In the case statements, the case 1 is the default case. The listing also includes a set of CTL properties that were used to verify the correctness of the test model. Fig. 1 uses NuSMV syntax for guard conditions and update functions: & represents a logical and, and ! a logical not. The controller is implemented as a reactive system that periodically reads input values from its sensors, and sets output values accordingly. The control loop is periodically called (for example, every 10 ms). This example model is a significantly simplified version of the windscreen wiper model used in other papers (e.g., Fraser and Wotawa, 2007f,), and is based on actual user requirements provided by Magna Steyr. The model consists of three inputs, where IN_speed represents the user requested speed (0, 1, 2), and the Boolean IN_pump indicates whether water shall be pumped. Endswitch indicates whether the windscreen wiper currently rests in its initial position. The outputs comprise of OUT_speed, which is used to drive the wiper, and OUT_water, which is used to activate the water pump. As an example, let us assume we want to create a test suite that covers all transitions depicted in Fig. 1. This means that in every state every guard has to evaluate to true at some point. A suitable test suite can easily be created with a model checker. For each transition that we want to cover (these are the test goals) we define a trap property, which claims that the transition guard evaluates to true but the transition is not executed. A counterexample to such a property is any sequence that executes the transition. Table 1 lists the resulting trap properties using LTL syntax. Appendix B contains the same trap properties as CTL properties using NuSMV syntax. Given the trap properties in Table 1, the problem of creating a transition coverage test suite reduces to calling the model checker with the model and the trap properties. Table 2 summarizes the test cases resulting from a run with the model checker NuSMV.
Table 1 Trap properties for transition coverage. Trap property 1 2 3 4 5 6 7 8
ðState ¼ Idle ^ IN pump ! State – PumpingÞ ðState ¼ Idle ^ :IN pump ^ IN speed > 0 ! State – WipingÞ ðState ¼ Pumping ^ :IN pump ! State – WaitingÞ ðState ¼ Waiting ^ IN pump ! State – PumpingÞ ðState ¼ Waiting ^ Counter < 3 ^ :IN pump ! State – WaitingÞ ðState ¼ Waiting ^ Counter >¼ 3 ^ :IN pump ! State – WipingÞ ðState ¼ Wiping ^ IN speed ¼ 0 ^ Endswitch ^ :IN pump ! State – IdleÞ ðState ¼ Wiping ^ IN pump ! State – PumpingÞ
Table 2 Test cases created from trap properties and covered transitions. No.
Length
Goal
Goals covered
1 2 3 4 5 6 7 8
4 2 4 5 5 7 3 3
Idle ! Pumping Idle ! Wiping Pumping ! Waiting Waiting ! Pumping Waiting ! Waiting Waiting ! Wiping Wiping ! Idle Wiping ! Pumping
1, 2 2, 2, 2, 2, 2, 2,
2, 7 3, 3, 3, 3, 7 8
8 4, 8 5, 8 5, 6, 8
The table lists for each test case the goal it was created for (the target transition) as well as all other goals that were covered. The length of a test case is given as the number of states it traverses. Note, that NuSMV does not always create the shortest possible test cases.
4. Problems of current model checkers and improvements When representing a test objective as temporal logic formulas that have to be violated in order to generate test cases, this abuses the model checkers in ways they were not conceived for. This is causing some dissent in the research community. There clearly are many advantages to such testing approaches: There are very mature and efficient model checking tools available, allowing the application to models of realistic size. The effort to switch from one approach to another (e.g., different coverage criteria or mutation testing) or integrate several different approaches is minimal. Especially from a research perspective it is very tempting to follow such approaches, as experimentation with new ideas is easily possible. One of the most useful and often used applications of model checkers is error trace generation, which in essence is similar to test case generation. On the other hand, there are some disadvantages: The usability for non-researchers who are not experienced with temporal logics, model checkers, etc., is not very good. There are concerns about the quality and the performance of model checker based testing, simply because model checkers were not meant to be used in this way. The alternative to use available model checkers is to either use completely different techniques, or re-implement those parts of model checkers that are really necessary for testing. There are several examples of tools that follow the latter approach. For example, TGV (Jard and Jéron, 2005) is based on model checking ideas, and SpecExplorer (Campbell et al., 2005) uses state exploration techniques developed for model checking. However, re-implementing model checking techniques neglects the availability of matured tools, is non-trivial, and does not serve reusability. We argue that available model checkers are not that far from being well-suited testing tools, such that regular test engineers can operate them. This section identifies a set of drawbacks of current model checkers when used for testing. These drawbacks might be negligible within a limited research focus, but do have a negative influence on the overall applicability of model checker based testing in practice. Therefore, this section identifies ten ways how model checker tools could be improved for test case generation use.
1407
G. Fraser et al. / The Journal of Systems and Software 82 (2009) 1403–1418
4.1. Test suite reduction at creation time As test case execution requires a certain amount of resources, it is often not feasible to use test suites of deliberate size. If there are too many test cases, then test suite minimization, also known as test suite reduction (Harrold et al., 1993), is applied. In general, a test suite is minimal with regard to a coverage criterion if removing any test case from the test suite will lead to the coverage criterion not being satisfied any more. An optimal test suite is the smallest possible minimized test suite. A reduced test suite is simply a test suite that contains less test cases than the original test suite, while still satisfying a given coverage criterion. The objective of test suite reduction is to find a suitable subset for a given test suite, such that the test objectives are achieved. Several heuristics have been presented (Rothermel et al., 2002; Harrold et al., 1993; Zhong et al., 2006) that can be used for test suite reduction. While there are other claims (Wong et al., 1995), several experiments (Heimdahl and Devaraj, 2004; Jones and Harrold, 2003; Rothermel et al., 1998) have shown that test suite reduction leads to a degradation of fault sensitivity. If execution is costly, then minimization is often necessary nevertheless. Performance is sometimes critical with model checkers. Although it depends on the actual model and technique used for verification, counterexample generation can consume a significant part of the time spent model checking. In that case, it is not advantageous to first spend resources on generating a large set of tests, only to discard most in favor of a much smaller subset. Instead it would be much better to just create the necessary test cases in the first place. This approach requires a technique to determine for every new counterexample, which of the remaining test goals are also covered by the counterexample. There are several candidate techniques to perform this detection. In the simplest case, model checking can directly be used to detect covered goals. Ammann and Black (1999b) present a straight forward approach to represent test cases as models and then simply model check these models against the test goals in temporal logic. However, Markey and Schnoebelen (2003) analyze the problem of model checking paths and show that there are more efficient solutions than checking Kripke structures. Alternative candidate techniques for the task include observer automata (Giannakopoulou and Havelund, 2001), which are commonly based on standard LTL to Büchi automaton conversion. Another alternative is the use of temporal logic rewriting rules (Havelund and Rosu, 2001b). For example, the NASA runtime verification system Java PathExplorer (JPaX) (Havelund and Rosu, 2001a) uses monitoring algorithms for LTL. They claim that their rewriting engine is capable of 3 million rewritings per second, which shows that rewriting is an efficient and fast approach. There are approaches that try to further optimize this approach, e.g., Barringer et al., 2004; Havelund and Rosu, 2004; Rosu and Havelund, 2005. Fraser and Wotawa (2007,f) have used LTL rewriting in the context of model checker based test case generation in order to detect already covered trap properties. The integration of these techniques into model checkers would not require much effort, but would bring advantages: Post-processing would not be necessary, and the performance of the test case generation would be significantly improved. The idea is straight forward: Given a trace and a temporal logic property, a set of rewriting rules (Definition 3) is successively applied for each state of the trace; at each step the result of the previous rewriting step is used as property. If at some point the property is rewritten to a contradiction, then this shows that the given trace violates the property. In the case of test case generation
this means that the coverage item represented by the property is covered. Definition 3 (State rewriting).
ð /Þfsg ¼ /fsg ^ /
ð1Þ
ð /Þfsg ¼ /
ð2Þ
ð} /Þfsg ¼ /fsg _ }ð/Þ
ð3Þ
ð/1 U /2 Þfsg ¼ /2 fsg _ ð/1 fsg ^ ð/1 U /2 ÞÞ ð/1 ^ /2 Þfsg ¼ /1 fsg ^ /2 fsg
ð4Þ ð5Þ
ð/1 _ /2 Þfsg ¼ /1 fsg _ /2 fsg
ð6Þ
ð/1 ! /2 Þfsg ¼ /1 fsg ! /2 fsg
ð7Þ
ð/1 /2 Þfsg ¼ /1 fsg /2 fsg
ð8Þ
ð:/Þfsg ¼ :ð/fsgÞ
ð9Þ
afsg ¼ a if a R LðsÞelse true
ð10Þ
As an example, consider that the rewriting is applied to the trap properties of our example model. Let / denote the original property subject to rewriting. After creating test case 1 from trap property 1, we consider which of the remaining trap properties are covered by the new test case. For example, let / be trap property 2:
/ :¼ ðState ¼ Idle ^ :IN pump ^ IN speed > 0 ! State – WipingÞ As the first step, we apply the rewriting rules using the first state (s0 :¼ (State = Idle, IN_pump = false, IN_speed = 2, Endswitch = false, OUT_speed = 0, OUT_water = false, Counter = 0)) of the test case to /:
ðState ¼ Idle ^ :IN pump ^ IN speed > 0 ! State – WipingÞfs0 g ¼ ððState ¼ Idle ^ :IN pump ^ IN speed > 0 ! State – WipingÞfs0 g ^ /Þ ¼ ðððState ¼ Idle ^ :IN pump ^ IN speed > 0Þfs0 g ! ð State – WipingÞfs0 gÞ ^ /Þ ¼ ððtrue ! ðState – WipingÞÞ ^ /Þ ¼ ðState – WipingÞ ^ / The rewriting rules are recursively applied; first, the h-operator is resolved, then the rewriting through the implication down to the atomic propositions. In state s0 the expression ðState ¼ Idle^ :IN pump ^ IN speed > 0Þ is true, therefore the implication reduces to the consequent. The result of the rewriting is the conjunction of the implication’s consequent and the original property. Next, the rewriting rules are applied with the second state of the test case, s1 :¼ (State = Wiping, IN_pump = false, IN_speed = 0, Endswitch = true, OUT_speed = 2, OUT_water = false, Counter = 0):
/0 fs0 g ¼ ððState – WipingÞ ^ /Þfs1 g ¼ ððState – WipingÞfs1 g ^ /fs1 gÞ ¼ false ^ /fs1 g ¼ false In state s1 the expression ðState – WipingÞ is false, therefore the whole property is false. Consequently the property / is violated by the test case, which in our case means that the test case covers /. As the example shows, the counterexample created for the first trap property also covers trap property 2, so there is no need to call
Table 3 Test cases created when monitoring trap properties. No.
Length
Goal
Covers
1 3 4 5 6
4 4 5 5 7
Idle ! Pumping Pumping ! Waiting Waiting ! Pumping Waiting ! Waiting Waiting ! Wiping
1, 2, 2, 2, 2,
2, 3, 3, 3, 3,
7 8 4, 8 5, 8 5, 6, 8
G. Fraser et al. / The Journal of Systems and Software 82 (2009) 1403–1418
the model checker for this trap property. Repeating this process for all trap properties and calling the model checker only on those trap properties that are not covered we derive the reduced test suite given in Table 3. The reduced test suite still covers all transitions as required by our original test objective. However, a general drawback of any reduction technique is that even though the coverage level stays the same, the fault detection ability will be reduced by every test case that is omitted. To illustrate this effect, we created a set of 252 non-equivalent mutants from the windscreen wiper NuSMV specification and calculated the mutation scores for the two test suites. A mutant is a version of the model with a single automatically inserted fault. The mutation score is the ratio of mutant models that are detected (killed) by a test suite to the number of mutants in total. The original test suite (Table 2) achieves a mutation score of 69.44%, while the mutation score of the reduced test suite (Table 3) is only 65.87%. Note, that monitoring does neither guarantee an optimal nor a minimal test suite. The size of the resulting test suite depends to some extent on the order in which trap properties are considered for test case generation. 4.2. Prioritization of counterexamples When using a model checker for verification, the task of the model checker is fulfilled once a counterexample is created, or a property is proved to be true. In software testing, the creation of counterexamples is only the first step. Once the counterexamples have been transformed to test cases, these have to be executed on an implementation under test. Here, further issues need to be tackled. The performance of the test case execution, for example, is often important. The previous section showed how test suite reduction can be integrated into the test case generation process, thus creating smaller test suites that still satisfy all test goals. Reducing the number of test cases obviously also improves the speed of the test suite execution. A technique that is related to test suite minimization and also meant to improve the time necessary for test execution is prioritization (Rothermel et al., 1999). The order in which test cases are executed has an influence on the overall speed of fault detection. Not all test cases are equally ‘good’. Some are ‘better’ in a way that means they can potentially detect more faults, or do so quicker. The idea of test suite prioritization is to order test cases such that the best test cases are executed first. Consequently, if there is a fault in the implementation under test chances are that the fault will be detected earlier, after executing fewer test cases. The main challenge of prioritization is that it is not possible to know beforehand what faults there are in the implementation under test (else we would not need to test!), and thus it is not possible to know which test cases are better. Consequently, different heuristics have been proposed (Elbaum et al., 2000; Rothermel et al., 1999) which estimate the fault detection ability of test cases and order test suites accordingly. Typically, these estimations are based on coverage measurement or mutation analysis, but it is also possible to include cost factors (Elbaum et al., 2001). Prioritization can be done with model checkers, as demonstrated by Fraser and Wotawa (2007e). In fact there are several possibilities to prioritize test cases created with model checkers. First, model checkers can be used to determine coverage of existing test cases. This is done by interpreting a test case as a model that the model checker should verify. This model is then checked against trap properties of a given coverage criterion, and the resulting coverage information can be used to prioritize test cases. In a total ordering, test cases are sorted by the number of trap properties they cover; alternatively, an adding ordering sorts test cases such that each selected test case covers the most yet uncovered coverage goals. Such a prioritization is not bound to traditional
coverage criteria; as mutants can also be expressed in temporal logic (Ammann and Black, 1999b), the same technique can be used to apply a fault-based prioritization (commonly referred to as fault exposing prioritization). A simple way to efficiently integrate such techniques into model checkers would be the use of rewriting techniques as presented in Section 4.1. An alternative that does not depend on coverage measurement but achieves only slightly worse results is to simply exploit the fact that counterexamples that are produced or subsumed more often are likely to be more important. Each test case is assigned an importance factor of initially 1. For each other test case or trap property a test case subsumes its importance factor is increased, and at the end of the test case generation phase test cases can simply be sorted according to their importance. Experiments have shown that this simple prioritization method already improves the fault detection rate significantly (Fraser and Wotawa, 2007e). As an example of prioritization, we consider a minimized test suite for the windscreen wiper example. The test suite is a subset of the test cases listed in Table 2, and assumes that trap properties are considered in random order and monitoring is applied when generating test cases. To determine the effects of the order in which test cases are executed, a set of mutants are created from the NuSMV model. In total, 252 non-equivalent mutants are used for this simple experiment. Table 4 shows four different orderings of the considered test cases, and Fig. 2 illustrates their performance at detecting mutants. The final mutation score for each ordering is identical (65.87%), but the rate in which the mutants are detected is quite different. In the optimal ordering, all mutants that can be detected by the five test cases are killed after the second test case, while the order in which the test cases were generated achieves this only after the fifth test case. This is identical to the worst possible ordering, although the worst ordering clearly detects less test cases at a time. Finally, the prioritized ordering uses coverage information, and almost achieves the optimal result – there is a single mutant that is not detected until test case 5. In this example, optimal and worst refer to the fault detection rate for the given set Table 4 Prioritized test cases created from trap properties and covered transitions. Description
Ordering
Generated Prioritized Optimal Worst
3, 4, 4, 2,
1, 1, 1, 7,
2, 3, 7, 1,
4, 7, 3, 3,
APFD (%) 7 2 2 4
45.79 48.97 49.04 31.83
100%
% Detected Mutants
1408
80%
60%
40%
Generation Prioritized Worst Optimal
20%
0
1
2
3
4
Test-Cases Fig. 2. Rate of fault detection for different prioritizations.
5
1409
G. Fraser et al. / The Journal of Systems and Software 82 (2009) 1403–1418
of mutants; it is impossible to know the optimal or worst ordering for unknown faults. Note that finding an optimal and worst ordering as in our example is not feasible for test suites of realistic size. The effect of the prioritization is quantified with the Average Percentage Faults Detected (APFD) metric (Rothermel et al., 1999). This metric is the weighted average percentage of faults detected over the life of a test suite. The APFD of a test suite T consisting n test cases and m mutants is defined as:
APFD ¼ 1
TF 1 þ TF 2 þ þ TF m 1 þ nm nm
Here, TF i is the first test case in ordering T 0 of T which reveals fault i. Table 4 gives APFD values to compare the different orderings. 4.3. Extensible counterexamples In the context of verification each property is considered separately, which means that each violated property results in a distinct counterexample. While this approach is well suited for verification, properties used for test case generation are not independent of each other. Consequently, when using model checkers to generate test cases this approach potentially has disadvantages. For example, many test cases created as counterexamples can have identical prefixes. Repeated execution of the same test prefix will not reveal any faults that cannot be detected by a single run of the prefix, assuming the implementation is deterministic. Several solutions to this problem have been presented in the past. For example, winnowing (Black and Ranville, 2001) describes a technique which removes test cases that are completely subsumed by other test cases. Redundancy minimization (Fraser and Wotawa, 2007c) transforms test cases in an existing test suite in order to avoid common prefixes, thereby reducing the overall size of the test suite. Such techniques are applied as a post-processing step once the test suite is completely generated. This post-processing could be avoided if the counterexample generation would take the test objectives into account in the first place. Common prefixes are not the only problem that can be observed when generating a test case for each trap property. Heimdahl et al. (2003) report of a case study in which some coverage criteria result in very large test suites where each test case is very short.
These problems could be alleviated if counterexamples are not generated independently for each property. For example, each counterexample could be seen as an extension to an existing counterexample, up to a given upper bound on the test case length. Such an approach was for example implemented by Hamon et al. (2004). As an example, Table 5 shows a single test case that was created by extending the previously generated counterexample for each trap property instead of creating a new counterexample every time. This test case was created with the model checker NuSMV (Cimatti et al., 1999), which does not natively support counterexample extension. As a workaround, the trap properties were rewritten from / to ðs ! /Þ, where s is the last state of the previous counterexample. This creates counterexamples consisting of a prefix that leads to s and then a regular counterexample for /. The prefix can be dropped, and the remaining counterexample extends the previous test case ending in s. Note that NuSMV does not guarantee minimal length counterexamples. The test case in Table 5 was created using monitoring of trap properties, and has a length of 21 states. In comparison, the full test suite with a counterexample per trap property has a total length of 46 states, and by using only monitoring of trap properties the total length is 33 states. As for the regular monitoring approach, the model checker was called 4 times for the 8 trap properties. An interesting observation is that the normal test suite kills 175 out of 252 non-equivalent mutants, which is a mutation score of 69.44%. Monitoring slightly reduces the mutation score to 65.87%. In contrast, the single test case in Table 5 kills 227 out of the 252 mutants, which is a mutation score of 90.08%. Therefore, extending counterexamples does not only reduce the creation costs and total size of a test suite, but potentially also increases the fault sensitivity at the same time! The drawback of long test cases is that it becomes more difficult to understand the root cause of a detected error. Consequently, it will be feasible to try to create not too long test cases. For example, a new counterexample could be created only if the new counterexample would be shorter than any possible extension of existing counterexamples. If counterexamples can be extended, then the counterexample that allows the shortest extension is chosen. This would achieve that the number of test cases is reduced, and their overall length is minimized. At the same time, the quality of the test suite would not be adversely affected.
Table 5 Test cases created from trap properties and covered transitions. Counter
Endswitch
IN_pump
IN_speed
OUT_speed
OUT_water
State
0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0
0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0
2 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
0 2 0 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2
0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1
Idle Wiping Idle Pumping Waiting Waiting Pumping Waiting Waiting Pumping Waiting Waiting Waiting Waiting Waiting Waiting Wiping Wiping Wiping Pumping
1410
G. Fraser et al. / The Journal of Systems and Software 82 (2009) 1403–1418
IN_speed = 0 & Endswitch & !IN_pump
Wiping
!IN_pump
!IN_pump Idle
Waiting
IN_speed > 0 & !IN_pump IN_pump
IN_pump
IN_pump
Pumping
!IN_pump
Fig. 3. Nondeterministic EFSM model of windscreen wiper application.
4.4. Counterexamples for controllable test cases In general, model checkers can handle nondeterministic models without problems. A system is nondeterministic if given the same inputs at different times, different outputs can result. Nondeterminism frequently occurs because of asynchronous communication, when the environment of a model is not predictable, or as a means of abstraction. In the context of verification nondeterminism is treated by simply exploring all possible behaviors. In contrast, nondeterminism is problematic in the context of software testing. If a system under test is nondeterministic, then it might give several different responses to an input, all of which are valid executions. A test case for such a system needs to anticipate all possible outcomes in order to be able to properly control the test case execution. This is known as the controllability problem. Current model checkers create linear sequences as counterexamples. If the model used to derive a counterexample is nondeterministic, then such a linear sequence represents choices at nondeterministic decision points. If the counterexample is applied to an implementation as a test case, this implementation might make different choices that are valid nevertheless. However, because the test case does not anticipate all possible behaviors, the implementation is not controllable. In fact, when only using a linear sequence we would have to conclude that the implementation fails the test case as soon as an unexpected output is observed. There are two options to turn linear test cases into proper tests for nondeterministic systems. The first option is to enhance the test case with additional information about the valid alternative outputs. The test case itself would still be a linear sequence, but if the implementation behaves differently to what the linear sequence described the additional information can be used to decide whether the test case actually fails or whether the implementation responded in a valid way and the test case is simply inconclusive. The second option is to not use a linear sequence as a test case but to have a test case represent all possible valid behaviors. Such test cases could be tree-like (e.g., Hierons, 2006), or use automaton-based formalisms (e.g., LTS (Jard and Jéron, 2005)). Possible remedies to this problem are to make nondeterministic choice explicit along counterexamples. This would make it possible to apply the counterexamples as test cases, and if an implementation behaves differently than the trace at a nondeterministic decision point, the test case execution framework could report an inconclusive verdict. Such an approach is proposed by Fraser and Wotawa (2007d).
As an example, consider the modified version of the windscreen wiper model given in Fig. 3. The variable Counter was removed, which gives a nondeterministic choice in state Waiting if :IN pump to either state Wiping or Waiting. For example, this could mean that the specification does not explicitly state how long the implementation has to continue wiping after water was pumped on the windscreen. Table 6 contains the two trap properties that have to be adapted for this model. Any counterexample that contains either the transition Waiting ! Waiting or Waiting ! Wiping has to consider that the respective other transition would also be valid. A straight forward solution for this problem is to extend the model to include indicators of nondeterministic choice. As an example, the NuSMV code representing the nondeterministic model of Fig. 3 is as follows: ASSIGN next(State):¼case State = Idle & IN_pump: Pumping; State = Idle & !IN_pump & IN_speed >0: Wiping; State = Pumping & !IN_pump: Waiting; State = Waiting & IN_pump: Pumping; State = Waiting & !IN_pump: {Waiting, Wiping }; State = Wiping & IN_speed=0 & Endswitch & !IN_pump : Idle; State = Wiping & IN_pump:Pumping; 1: State; esac; To make nondeterministic choice obvious, we need to add a special indicator ND_State for the variable State, which has nondeterministic transitions: VAR State:{Idle, Pumping, Waiting, Wiping }; ND_State: boolean; The indicator variable ND_State should be set to True whenever the nondeterministic transition is taken, and then stays True.
Table 6 Changed trap properties for transition coverage of nondeterministic model. Trap property 5 6
ðState ¼ Waiting ^ :IN pump ! State – WaitingÞ ðState ¼ Waiting ^ :IN pump ! State – WipingÞ
1411
G. Fraser et al. / The Journal of Systems and Software 82 (2009) 1403–1418
Idle
IN_pump
Pumping
!IN_pump
Waiting
!IN_pump
!IN_pump
Wiping
Waiting
Fig. 4. Nondeterministic test case for windscreen wiper application.
ASSIGN init(ND_State):¼0; next(ND_State):¼case !ND_State & State = Waiting & !IN_pump: 1; 1: ND_State; esac; If test case execution at a state where ND_State changes to True does not match the expected output described by the counterexample this additional information can be used to issue an inconclusive verdict instead of a fail verdict. A better improvement would be to create non-linear test cases. For example, a counterexample could be seen as a tree-like structure, which includes all possible alternative branches caused by nondeterminism. As an example, consider a test case that should cover the transition Waiting ! Wiping. The version of the model in Fig. 3 has a nondeterministic choice at state Waiting the input that activated this transition. Consequently, an implementation that conforms to this model may also take a transition Waiting ! Waiting at the time, which should not result in a fail verdict. Fig. 4 depicts a possible test case, which branches for the nondeterministic choice. When an implementation takes the wrong branch, then either an inconclusive verdict can be issued, or the test case is extended such that the transition Waiting ! Wiping can also be reached in the alternative branch. 4.5. Tree-like counterexamples Nondeterministic testing is not the only conceivable application of tree-like counterexamples. It has been shown that linear counterexamples are only capable of showing property violations of a certain subset of temporal logics (Clarke and Veith, 2004). Some properties can only be witnessed by tree-like structures, such as created by an algorithm presented by Clarke et al. (2002). One application of tree-like counterexamples are methods where mutants are model checked against the specification (Ammann et al., 1998; Fraser and Wotawa, 2008). Requirement proper-
ties that require tree-like counterexamples also require more than one linear test case. For example, each counterexample tree could result in a set of test cases such that the tree is covered, as described by Wijesekera et al. (2007). The ability to create tree-like counterexamples also opens up new possibilities for the definition of trap properties. For example, trap properties could be formulated such that pairs of test cases representing alternative branches are generated. As an example, assume we want to create an MCDC (Chilenski and Miller, 1994) coverage test suite with regard to the transition guards of the EFSM model. Consider the transition Wiping ! Idle which has the following guard condition C:
C :¼ IN speed ¼ 0 ^ Endswitch ^ :IN pump The guard C consists of three conditions ðIN speed ¼ 0; Endswitch; :IN pumpÞ. Consider the first condition (IN_speed=0), which is MCDC covered if: C evaluates to true and false. IN speed ¼ 0 evaluates to true and false. IN speed ¼ 0 independently affects C; i.e., the values of the other conditions are fixed. Coverage of IN_speed = 0 in C requires a pair of test cases. This pair of test cases can be represented with a single trap property. To do so, we need the branching-time logic CTL (Clarke and Emerson, 1982) instead of LTL. In addition to the temporal logic operators also supported by LTL, in CTL every temporal operator is preceded by a path quantifier (A, E). The operator F (‘‘finally”) corresponds to the eventually operator } in LTL, G (‘‘globally”) corresponds to . U (‘‘until”) corresponds to U. X (‘‘next”) corresponds to the next operator . The path quantifiers A (‘‘all”) and E (‘‘some”) require formulas to hold on all or some paths, respectively. For our example, we only need the combination EF, which denotes that there exists some path where some condition will eventually hold. The condition IN_speed = 0 is MCDC covered in C, if there is a test case where C is true, which is the case of the other conditions are true as well. In addition, C is false when :ðIN speed ¼ 0Þ and
Fig. 5. Possible tree-like witness graph for MCDC with regard to IN_speed = 0 in IN_speed = 0 ^ Endswitch ^: IN_pump.
1412
G. Fraser et al. / The Journal of Systems and Software 82 (2009) 1403–1418
the other conditions have the same value. Using CTL, this can be expressed as follows:
EFðC ^ ðIN speed ¼ 0Þ ^ Endswitch ^ :IN pumpÞ^ EFð:C ^ :ðIN speed ¼ 0Þ ^ Endswitch ^ :IN pumpÞ Any witness to this CTL formula MCDC covers IN_speed = 0 in C. However, it is not guaranteed that there exists a single test case that does this. A possible witness graph for this formula (or counterexample for its negation) is depicted in Fig. 5. Instead of creating a witness the negation of the given formula would result in the same counterexample as the same algorithms are used to derive counterexamples and witnesses in CTL formulas. The resulting graph has to be covered with linear sequences as described by Wijesekera et al. (2007) in order to derive MCDC test cases; in the example in Fig. 5, there are two linear test cases. 4.6. Abstraction for testing A problem that occurs for most model based test case generation techniques at some point is the performance. Because of the state explosion problem state spaces can become large to an extent that generating test cases might not be feasible, or at least very costly. While the current state of art in model checking already allows verification of realistic hardware designs, model checking of software is extremely susceptible to the state explosion problem. As many model checking techniques are based on exhaustive search, the performance limitations can be seen as the main show-stopper for the industry acceptance of model checker based testing. Abstraction is generally the solution to overcome the state explosion problem. Several automated abstraction techniques have been presented in recent years, and many of them make it possible to verify properties on very large models. These abstraction methods are tailored towards verification, and therefore are not always directly applicable in the context of testing. For example, a popular technique is CEGAR (Clarke et al., 2000), where an abstract model is refined until no more spurious counterexamples are generated when verifying a property. Such abstraction techniques guarantee that a property that holds on the abstract model also holds on the concrete model. This is contrary to the task of test case generation: Here we want that properties that are violated on a concrete model are also violated on the abstract model. Another common abstraction technique is the cone of influence reduction (COI) (Berezin et al., 1998). Here, slicing
!IN_speed & Endswitch & !IN_pump
techniques are applied to consider only those parts of a model that are necessary to determine satisfaction of a property. When creating test cases, the objective is to cover as much of the model as possible, so here again there is a certain discrepancy. Unfortunately, research results on abstraction technique specifically tailored towards the needs of testing is sparse. In the context of model checker based testing, Ammann and Black (1999a) suggested a method called finite focus that puts a limited focus on the definition range of variables for the purpose of test case generation. Ammann and Black further define a notion of soundness in the context of test case generation, which expresses that any counterexample of an abstracted model has to be a valid trace of the original model. In general, this is an area where further research is necessary. Abstraction is not only necessary with regard to the performance, but also with regard to the size of resulting test suites. The more complex a model is, the more mutants or trap properties can be defined, and the more test cases can be potentially created. A possibility to scale the complexity or abstraction level of a model would be an ideal method to scale the size of a resulting test suite size. The problem of generating test cases from abstract models is also related to the controllability problem, presented in Section 4.4. When generating test cases from an abstract model it is not always guaranteed that the results of a test case execution are conclusive, therefore it is important to make sure that inconclusive outcomes are properly detected and not detected as false negatives. Section 4.4 already presented an abstract version of the windscreen wiper model. Another conceivable abstraction would be to apply domain abstraction on the speed values. This would result in a model as depicted in Fig. 6; here, IN_speed was changed from an integer to a Boolean variable signifying whether the wiper shall be turned on or off, and OUT_speed signifies whether it is currently turned on or off. Even though this abstract model is deterministic, execution of derived test cases needs further consideration. One possibility is for the test case execution framework to apply suitable refinement of the test cases and abstraction of the observed values. An alternative would be to use the finite focus abstraction (Ammann and Black, 1999a). In the wiper example, suppose that IN_speed and OUT_speed actually had a large range of values. This would be the case, for example, if the outputs corresponded to fine-grained voltages controlling (near) continuous variation in speeds. Clearly, the integer range is not finite, and the large range
Wiping
!IN_pump & Counter = 3
!IN_pump & Counter < 3 / Counter = Counter + 1 Idle
Waiting
IN_speed & !IN_pump IN_pump
IN_pump
Pumping
IN_pump
!IN_pump / Counter = 0
Fig. 6. Domain abstracted EFSM model of windscreen wiper application.
1413
G. Fraser et al. / The Journal of Systems and Software 82 (2009) 1403–1418
is likely to explode the state space. For purposes of testing, we could model the situation with finite focus as a prefix of the integers and a special undefined value: VAR IN_speed: {0, 1, 2, 3, undefined} OUT_speed: {0, 1, 2, 3, undefined} Sound: boolean; As a practical matter, we could encode ‘‘undefined” as an integer (e.g., ‘‘4”), thereby preserving the option of using relational operators in a tool such as NuSMV. The variable ‘‘Sound” captures the validity of the finite focus. If ‘‘Sound” holds, then the model is in a state where the temporal constraints apply. If not, then the model has left the finite focus region, and the temporal constraints are not necessarily meaningful. There is no point in generating test cases through such states. We need to capture the evolution of the ‘‘Sound” variable, which goes false if IN_speed exceeds its range. The case where the output OUT_speed exceeds it range is captured by the transition logic in the model. Basically, OUT_speed goes undefined only if IN_speed goes undefined. ASSIGN init(Sound):¼!(IN_speed=undefined); next(Sound):¼case next(IN_speed)=undefined: 0; 1: Sound; esac; Finally, we need to rewrite recursively the properties to reflect the value of ‘‘Sound”. The rewriting rules (Ammann and Black, 1999a) given in Definition 4 apply to LTL and CTL properties. Consequently, the constraint rewriting CRð/Þ for an LTL/CTL property / is recursively defined as follows, where v denotes a Boolean value, and s is a special variable that is true if the state is sound or otherwise false. OP denotes any of the LTL operators ; ; }, or when considering CTL properties AG, AF, AX, EG, EF, EX. The operator OPU stands for either A or E in the context of the CTL until operator, or is a blank placeholder in the case of LTL. Atomic propositions are denoted by a. Definition 4 (Constraint rewriting).
CRð/Þ ¼
crð/; TrueÞ
if / begins with a temporal operator;
s ! crð/; TrueÞ else;
crða; v Þ ¼ a; crð:/; v Þ ¼ :crð/; :v Þ; crð/1 ^ /2 ; v Þ ¼ crð/1 ; v Þ ^ crð/2 ; v Þ; crð/1 _ /2 ; v Þ ¼ crð/1 ; v Þ _ crð/2 ; v Þ; crð/1 ! /2 ; v Þ ¼ crð/1 ; :v Þ ! crð/2 ; v Þ; crð/1 /2 ; v Þ ¼ crð/1 ; v Þ crð/2 ; v Þ; crðOP /; TrueÞ ¼ OP ðs ! crð/; TrueÞÞ; crðOP /; FalseÞ ¼ OP ðs ^ crð/; FalseÞÞ; crðOPU /1 U/2 ; TrueÞ ¼ OP U ð/1 U /2 ! crð/2 ; TrueÞÞ; crðOPU /1 U/2 ; FalseÞ ¼ OPU ð/1 U /2 ^ crð/2 ; FalseÞÞ:
Table 7 Statistics reported by NuSMV about normal windscreen wiper controller model and duplicated model.
Peak number of nodes Number of BDD variables Number of BDD and ADD nodes Number of States
Normal
Duplicated
3066 23 2238 1152
6132 37 5312 110592
The result is that when we use the resulting model for test case generation, we do not generate traces where ‘‘Sound” is false an any relevant current or future state mentioned in the temporal logic formula. Hence the tests generated from the finite focus abstraction are useful for testing the full system. 4.7. Model combination There are some problems related to techniques to derive test cases that would probably never occur in the context of verification. For example, several different techniques to derive test cases inject faults into a given model and then observe the resulting behavior (Ammann et al., 1998, 2001; Fraser and Wotawa, 2008; Okun et al., 2003). In some cases it might be of interest how the faulty model behaved (for example for the purpose of failing tests), but in many cases the test case should represent the behavior of the correct model. Several different solutions to this problem have been presented. For example, Okun et al. (2003) create a combination of the original model and the mutant, such that the two models share the same input variables. Ammann et al. (2001) create a ‘‘merged” model, that includes both the original, and the mutated transitions. Another possibility is to symbolically execute failing test cases on a correct model with a model checker, thus creating a new, passing test case. All of these approaches can be seen as workarounds to a problem that could more easily be solved by an appropriate model checker. For example, the verification process might only consider the mutant model, while during counterexample creation the original model is used. Alternatively, model checkers could support simulation to a comfortable level, such that counterexamples created with a specific version of a model could be executed on a different version of the same model. One of the main problems with regard to this is that most model checkers do not distinguish between system inputs and outputs – which is a basic prerequisite for testing! The module checking theory (Kupferman and Vardi, 1997) extends regular model checking to a suitable level where inputs and outputs can be distinguished. Table 7 lists statistics derived with version 2.4.3 of NuSMV (Cimatti et al., 1999) on the windscreen wiper example in comparison to the statistics derived for a duplicated (Okun et al., 2003) version of the model, as is sometimes used as a workaround. The increase in the state space is clearly visible, and shows that new solutions regarding this problem are necessary. 4.8. Alternative witnesses and counterexamples
When creating test cases from a model which is abstracted with the finite focus method, this constraint rewriting has to be applied to all properties involved in the test process, that is, trap properties, reflected properties, etc. As an example, consider:
ðIN pump ! OUT waterÞ: We rewrite this as:
ðSound ! ðIN pump ! ðSound ! OUT waterÞÞÞ:
When proving that a model does not satisfy a property, a single counterexample is sufficient. In the context of verification, a counterexample should offer the analyzer insight into why the property is not satisfied. The shorter the test case, the easier it is for a human engineer to understand how the property violation occurs. If several properties are violated such that one counterexample illustrates all these violations, then a single counterexample is sufficient in the context of verification.
1414
G. Fraser et al. / The Journal of Systems and Software 82 (2009) 1403–1418
When creating counterexamples for testing purposes, this is not the case. The objective of testing is to cover as much of the system as possible. The more different behaviors the test cases exercise, the better. Obviously there is a discrepancy between the objectives of testing and verification, but the same techniques are applied in both scenarios. Testing could greatly benefit from model checkers that take into account the objectives of testing. For example, test goals are commonly formulated in a negated way, such that a counterexample is generated. But, why stop after a single counterexample? Testing would benefit from the possibility to generate several different counterexamples for the same property. While some model checkers (for example, SPIN (Holzmann, 1997)) allow to create multiple counterexamples, most model checkers create only one counterexample per property. Multiple counterexamples would make it possible to choose such a counterexample that explores the most yet uncovered parts of the system. Some of the effort in counterexample generation is to derive counterexamples that are as short as possible. For example, the classical algorithm for counterexample generation in symbolic model checking (Clarke et al., 1995) uses significant effort in order to find a short counterexamples – testing, however, does not necessarily profit from short counterexamples. As shown in Section 4.3, long test cases can even increase the fault detection ability. A further step would be to calculate the superset of all possible witnesses or counterexamples. Such an approach is taken by Meolic et al. (2004), who derive counterexample or witness automata for a fragment of action computation tree logic. This is also related to the tool TGV (Jard and Jéron, 2005), which returns a graph structure (complete test graph) for a manually specified test purpose. Such a structure might itself result in a large or even infinite number of linear test cases, so here further processing is necessary to select a reasonable set of test cases.
ios it might be necessary that a set of test cases all share the identical initial state. To circumvent drawbacks of current model checkers, the task of setting an initial state explicitly is easily fulfilled by rewriting the source code of the model such that the desired initial state is the only allowed initial state. The drawback of such a solution is that the model checker has to re-encode the entire state space of the model. Alternatively, properties can be rewritten such that all statements are represented as implications of the desired initial state, such that a resulting counterexample consists of a prefix to this state, and then the regular counterexample part. Again there are performance concerns, as the complexity of counterexample generation increases with the length of the counterexamples. A model checker that explicitly supports setting of a specific single initial state for counterexample generation would be a very practical extension. It is conceivable that such an option would also increase the speed of model checking in some cases, because the state space is reduced if the set of initial states is restricted. Whether setting an initial state explicitly is complex or not depends on the underlying model checking technique. For symbolic model checking the effects on the BDD encoding of the model might be significant, while for bounded model checking or explicit state model checking there should be no problems in setting an initial state explicitly. 4.10. Constraints on counterexamples Constraints on the counterexample creation are another conceivable possibility to improve test case generation with model checkers. For example, a possible constraint would be to require that each counterexample has to pass a certain state. An application that requires this and a possible solution have been presented in the context of regression testing (Fraser et al., 2007). The problem considered by Fraser et al. (2007) is what to do when a model that is used for test case generation is changed. It is conceivable that the testing effort should be kept to a minimum by only considering test cases that are affected by the change, or it might be desirable to explicitly concentrate on the change and generate extensive test suites that exercise it on the implementation. A different useful application for constraints would be to require a certain minimum length for counterexamples. Often, trap properties are already violated in an initial state of the model. While a resulting counterexample consisting of only one state might be useful for verification purposes, it is not usable as a test case.
4.9. Explicitly setting the initial state of counterexamples When creating test cases with a model checker, it might be necessary to explicitly set the initial state of a model, without having to rewrite the textual model representation and re-initializing the model. Example applications that require such a method are described by Fraser and Wotawa (2008, 2007c,f). For example, it is necessary to set the initial state explicitly when extending existing counterexamples as described in Section 4.3, or deriving replacement sub-sequences (Fraser and Wotawa, 2007c). In other scenar-
IN_speed = 0 & Endswitch & !IN_pump
Wiping
!IN_pump & IN_speed > 0 & Counter = 3
!IN_pump & Counter < 3 / Counter = Counter + 1 Idle
Waiting
IN_speed > 0 & !IN_pump IN_pump
IN_pump
Pumping
IN_pump
!IN_pump / Counter = 0
!IN_pump & IN_speed = 0 & Counter = 3 Fig. 7. Changed EFSM model of windscreen wiper application.
G. Fraser et al. / The Journal of Systems and Software 82 (2009) 1403–1418
Test purposes, as used in the test tool TGV (Jard and Jéron, 2005), are a different application of constraints. Test purposes are a method to describe parts of the model that are interesting for testing. In TGV, test purposes are represented as transition systems with special pass/fail states, and for test case generation the synchronous product of the model and the test purpose is calculated and pruned. The resulting transition system can be used to create test cases more efficiently than using the whole model. As an example of when constraints on counterexamples are useful, reconsider the windscreen wiper example one last time. We notice that there is no transition from Waiting to Idle, only via Wiping, and create an adapted version (Fig. 7). Assume that this change is made at a time when there already is an implementation that passes test cases made for the previous version of the model. Testing should now focus on the added transition. Focusing on the added transition could be expressed as a constraint that only counterexamples that include this transition should be generated. Currently there is no way to do such a thing with available model checkers. However, the usefulness of such an approach has been demonstrated with a method that rewrites a model and trap properties (Fraser et al., 2007). For example, the changed transition relation of model Fig. 7 results in the following changed NuSMV code: ASSIGN next(State):¼case State = Idle & IN_pump: Pumping; State = Idle & !IN_pump & IN_speed > 0: Wiping; State = Pumping & !IN_pump: Waiting; State = Waiting & IN_pump: Pumping; State = Waiting & Counter < 3 & !IN_pump: Waiting; State = Waiting & Counter>=3 & !IN_pump & IN_speed>0: Wiping; State = Waiting & Counter>=3 & !IN_pump & IN_speed=0: Idle; State = Wiping & IN_speed = 0 & Endswitch & !IN_pump: Idle; State = Wiping & IN_pump: Pumping; 1: State; esac; The change in the model results in two conditions in the transition relation to be altered. These two changes can be made explicit with two additional variables: VAR ch0: boolean; ch1: boolean; ASSIGN init(ch0):¼0; next(ch0):¼case State = Waiting & Counter >= 3 & !IN_pump & IN_speed > 0: 1; 1: ch0; esac; init(ch1):¼0; next(ch1):¼case State = Waiting & Counter >= 3 & !IN_pump & IN_speed = 0: 1; 1: ch1; esac; A counterexample that executes one of the changed transitions will have either ch0 or ch1 set to true. Consequently, we can force creation of a test suite where these two transitions are explicitly exercised. For this, the trap properties used for test case generation
1415
are rewritten using the constraint rewriting rules given in Definition 4 and the variables ch0 or ch1. For example, consider the trap property for the transition from state Wiping to Pumping:
ðState ¼ Wiping ^ IN pump ! State – PumpingÞ: The following two rewritten versions of this trap property result in counterexamples that achieve the original goal of covering the transition from state Wiping to Pumping, but also execute the changed transitions from Waiting to Wiping and Idle, respectively.
ðch0 ! ðState ¼ Wiping ^ IN pump ! ðch0 ! State – PumpingÞÞÞ; ðch1 ! ðState ¼ Wiping ^ IN pump ! ðch1 ! State – PumpingÞÞÞ: Rewriting all eight trap properties for the windscreen wiper model results in 16 changed trap properties. Calling NuSMV on the changed and extended model and these trap properties results in 16 counterexamples (eight test cases per changed transition), which represent 14 unique test cases. 5. Conclusions The use of model checking tools for test case generation has been considered by many researchers recently. Despite all the advantages a model checker offers to test case generation, testing with model checkers is not as good as it could be because model checkers were not originally designed with test case generation in mind. Therefore this is a controversial approach, and industry is reluctant to embrace the technique. In this paper, we have discussed different techniques that would improve various aspects of test case generation when implemented in a model checking tool. This list cannot be seen as an exhaustive list, as requirements on a testing tool will vary for each project. Furthermore, the focus in this paper was on techniques that are independent of a certain test case generation technique. Further techniques could be integrated; for example, the explicit creation of trap properties for structural coverage is a task that a model checker could easily fulfill. A shift in the focus of model checking research from exhaustive verification to error trace generation using techniques such as bounded model checking (Biere et al., 1999) or directed model checking (Edelkamp et al., 2001) based on heuristic search is a promising step with regard to software testing. While such research mostly aims at improving the model checker as a debugging tool during development, the testing community can also profit from these results. There are approaches (Jéron and Morel, 1999) that reuse some ideas of model checking techniques in dedicated test tools, such as TGV (Jard and Jéron, 2005). Such tools are very successful at some areas where model checking suffers from problems; for example, the performance of TGV is significantly better than regular model checking. At the same time, such tools lack the flexibility of model checkers. For example, TGV requires suitable test purposes for testing; these test purposes currently have to be specified manually, although automatic creation of test purposes is an active area of research. Furthermore, TGV does not return linear test cases but a graph structure from which possibly an infinite number of test cases can be generated. The lessons learned from creating such tools should ideally be integrated into model checkers. Some of the discussed techniques are not yet available and therefore further research is necessary. However, many of the discussed techniques have already been implemented and evaluated outside of model checkers. Integration of these techniques directly into model checkers is conceivable, and would be beneficial in the context of testing with model checkers.
1416
G. Fraser et al. / The Journal of Systems and Software 82 (2009) 1403–1418
Appendix A. NuSMV code of windscreen wiper MODULE main VAR – Input IN_speed: {0, 1, 2}; IN_pump: boolean; Endswitch: boolean;
– Output OUT_speed: {0, 1, 2}; OUT_water: boolean;
– State State: {Idle, Pumping, Waiting, Wiping}; Counter: 0..3; ASSIGN init(OUT_speed):¼0; next(OUT_speed):¼case next(State) = Idle: 0; next(State) = Pumping & OUT_speed > 0: OUT_speed; next(State) = Pumping & OUT_speed = 0: 1; next(State) = Waiting & OUT_speed > 0: OUT_speed; next(State) = Waiting & OUT_speed = 0: 1; next(State) = Wiping: IN_speed; 1: OUT_speed; esac;
init(OUT_water):¼0; next(OUT_water):¼case next(State) = Pumping: 1; 1: 0; esac;
init(State):¼Idle; next(State):¼case State = Idle & IN_pump: Pumping; State = Idle & !IN_pump & IN_speed > 0: Wiping; State = Pumping & !IN_pump: Waiting; State = Waiting & IN_pump: Pumping; State = Waiting & Counter < 3 & !IN_pump: Waiting; State = Waiting & Counter >= 3 & !IN_pump: Wiping; State = Wiping & IN_speed = 0 & Endswitch & !IN_pump: Idle; State = Wiping & IN_pump: Pumping; 1: State; esac;
init(Counter):¼0; next(Counter):¼case next(State) = Waiting & Counter < 3 & Endswitch: Counter + 1; 1: 0; esac; – Some properties to check the test model SPEC AG(IN_pump -> AX OUT_water) SPEC AG(IN_pump -> AX OUT_speed > 0) SPEC AG(IN_speed > 0 -> AX OUT_speed > 0) SPEC AG(IN_pump -> AX(!IN_pump -> AX(OUT_speed > 0)))
SPEC AG(IN_speed = 2 & OUT_speed = 0 & !IN_pump -> AX OUT_speed = 2) SPEC AG(IN_speed = 1 & OUT_speed = 0 & !IN_pump -> AX OUT_speed = 1) SPEC AG(OUT_water -> AX OUT_speed > 0) SPEC AG(OUT_water -> AX AX OUT_speed > 0) SPEC AG(OUT_water -> AX AX AX OUT_speed > 0)
Appendix B. Transition coverage trap properties The following list contains the trap properties used in this paper to create test cases covering the transitions of the example figures using CTL. SPEC AG(State = Idle & IN_pump -> AX State != Pumping) SPEC AG(State = Idle & ! IN_pump & IN_speed > 0 -> AX State != Wiping) SPEC AG(State = Pumping & ! IN_pump -> AX State != Waiting) SPEC AG(State = Waiting & IN_pump -> AX State!= Pumping) SPEC AG(State = Waiting & Counter < 3 & ! IN_pump -> AX State != Waiting) SPEC AG(State = Waiting & Counter >= 3 & ! IN_pump -> AX State != Wiping) SPEC AG(State = Wiping & IN_speed = 0 & Endswitch & ! IN_pump -> AX State != Idle) SPEC AG(State = Wiping & IN_pump -> AX State != Pumping)
References Ammann, P., Black, P.E., 1999a. Abstracting formal specifications to generate software tests via model checking. In: Proceedings of the 18th Digital Avionics Systems Conference, vol. 2, pp. 10.A.6-1–10.A.6-10. Ammann, P., Black, P.E., 1999b. A specification-based coverage metric to evaluate test sets. In: HASE’99: The 4th IEEE International Symposium on HighAssurance Systems Engineering. IEEE Computer Society, Washington, DC, USA, pp. 239–248. Ammann, P.E., Black, P.E., Majurski, W., 1998. Using model checking to generate tests from specifications. In: Proceedings of the Second IEEE International Conference on Formal Engineering Methods (ICFEM’98). IEEE Computer Society, pp. 46–54. Ammann, P., Ding, W., Xu, D., 2001. Using a model checker to test safety properties. In: Proceedings of the 7th International Conference on Engineering of Complex Computer Systems (ICECCS 2001). IEEE, pp. 212–221. Barringer, H., Goldberg, A., Havelund, K., Sen, K., 2004. Program monitoring with LTL in EAGLE. In: PADTAD’04, Parallel and Distributed Systems: Testing and Debugging. IEEE Computer Society, Los Alamitos, CA, USA, p. 264b. Berezin, S., Campos, S.V.A., Clarke, E.M., 1998. Compositional reasoning in model checking. In: COMPOS’97: Revised Lectures from the International Symposium on Compositionality: The Significant Difference. Springer-Verlag, London, UK, pp. 81–102. Biere, A., Cimatti, A., Clarke, E.M., Zhu, Y., 1999. Symbolic model checking without BDDs. In: TACAS’99: Proceedings of the 5th International Conference on Tools and Algorithms for Construction and Analysis of Systems. Springer-Verlag, London, UK, pp. 193–207. Black, P.E., 2000. Modeling and marshaling: making tests from model checker counterexamples. Proceedingsof the 19th Digital Avionics Systems Conference, vol. 1. IEEE. pp. 1.B.3-1–1.B.3-6. Black, P.E., Ranville, S., 2001. Winnowing tests: getting quality coverage from a model checker without quantity. In: The 20th Conference on Digital Avionics Systems, 2001. DASC, vol. 2, pp. 9B6/1–9B6/4. Bryant, R.E., 1986. Graph-based algorithms for boolean function manipulation. IEEE Trans. Comput. 35 (8), 677–691. Callahan, J., Schneider, F., Easterbrook, S., August 1996. Automated software testing using model-checking. In: Proceedings 1996 SPIN Workshop. Also WVU Technical Report NASA-IVV-96-022. Callahan, J.R., Easterbrook, S.M., Montgomery, T.L., 1998. Generating Test Oracles Via Model Checking. Tech. rep., NASA/WVU Software Research Lab. Calvagna, A., Gargantini, A., 2008. A logic-based approach to combinatorial testing with constraints. In: Tests and Proofs. Lecture Notes in Computer Science, vol. 4966. Springer-Verlag, pp. 66–83.
G. Fraser et al. / The Journal of Systems and Software 82 (2009) 1403–1418 Campbell, C., Grieskamp, W., Nachmanson, L., Schulte, W., Tillmann, N., Veanes, M., May 2005. Model-based testing of object-oriented reactive systems with spec explorer. Tech. rep., Microsoft Research, Redmond. Chilenski, J.J., Miller, S.P., 1994. Applicability of modified condition/decision coverage to software testing. Software Engineering Journal 9 (September), 193–200. Cimatti, A., Clarke, E.M., Giunchiglia, F., Roveri, M., 1999. NUSMV: a new symbolic model verifier. In: CAV’99: Proceedings of the 11th International Conference on Computer Aided Verification. Springer-Verlag, London, UK, pp. 495–499. Clarke, E.M., Emerson, E.A., 1982. Design and synthesis of synchronization skeletons using branching-time temporal logic. In: Logic of Programs, Workshop. Springer-Verlag, London, UK, pp. 52–71. Clarke, E., Veith, H., 2004. Counterexamples revisited: principles, algorithms, applications. In: Verification: Theory and Practice. Lecture Notes in Computer Science, vol. 2772. Springer-Verlag, pp. 208–224. Clarke, E.M., Emerson, E.A., Sistla, A.P., 1983. Automatic verification of finite state concurrent system using temporal logic specifications: a practical approach. In: POPL’83: Proceedings of the 10th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages. ACM Press, New York, NY, USA, pp. 117–126. Clarke, E.M., Grumberg, O., McMillan, K.L., Zhao, X., 1995. Efficient generation of counterexamples and witnesses in symbolic model checking. In: Proceedings of the 32st Conference on Design Automation (DAC). ACM Press, pp. 427–432. Clarke, E.M., Grumberg, O., Jha, S., Lu, Y., Veith, H., 2000. Counterexample-guided abstraction refinement. In: CAV’00: Proceedings of the 12th International Conference on Computer Aided Verification. Springer-Verlag, London, UK, pp. 154–169. Clarke, E.M., Grumberg, O., Jha, S., Lu, Y., Veith, H., 2001. Progress on the state explosion problem in model checking. In: Informatics – 10 Years Back. 10 Years Ahead. Springer-Verlag, London, UK, pp. 176–194. Clarke, E.M., Jha, S., Lu, Y., Veith, H., 2002. Tree-like counterexamples in model checking. In: LICS’02: Proceedings of the 17th Annual IEEE Symposium on Logic in Computer Science. IEEE Computer Society, Washington, DC, USA, pp. 19–29. Davis, M., Putnam, H., 1960. A computing procedure for quantification theory. Journal of the Association for Computing Machinery 7, 201–215. Edelkamp, S., Lafuente, A.L., Leue, S., 2001. Directed explicit model checking with HSF-SPIN. In: SPIN’01: Proceedings of the 8th International SPIN Workshop on Model Checking of Software. Springer-Verlag New York, Inc., New York, NY, USA, pp. 57–79. Elbaum, S., Malishevsky, A.G., Rothermel, G., 2000. Prioritizing test cases for regression testing. In: ISSTA’00: Proceedings of the 2000 ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM Press, New York, NY, USA, pp. 102–112. Elbaum, S., Malishevsky, A., Rothermel, G., 2001. Incorporating varying test costs and fault severities into test case prioritization. In: ICSE’01: Proceedings of the 23rd International Conference on Software Engineering. IEEE Computer Society, Washington, DC, USA, pp. 329–338. Engels, A., Feijs, L., Mauw, S., 1997. Test generation for intelligent networks using model checking. In: Brinksma, E. (Ed.), Proceedings of the Third International Workshop on Tools and Algorithms for the Construction and Analysis of Systems. TACAS’97, Lecture Notes in Computer Science, vol. 1217. SpringerVerlag, Enschede, The Netherlands, pp. 384–398. Fraser, G., Wotawa, F., 2007. Mutant minimization for model-checker based testcase generation. In: Testing: Academic and Industrial Conference Practice and Research Techniques – MUTATION, 2007. TAICPART-MUTATION 2007. IEEE Computer Society, pp. 161–168. Fraser, G., Wotawa, F., 2007a. Improving model-checkers for software testing. In: Proceedings of the International Conference on Software Quality (QSIC’07). IEEE Computer Society, pp. 25–31. Fraser, G., Wotawa, F., 2007c. Redundancy based test-suite reduction. In: Proceedings of the 10th International Conference on Fundamental Approaches to Software Engineering FASE 2007. Lecture Notes in Computer Science, vol. 4422. Springer, pp. 291–305. Fraser, G., Wotawa, F., 2007d. Test-case generation and coverage analysis for nondeterministic systems using model-checkers. In: Proceedings of the International Conference on Software Engineering Advances (ICSEA 2007). IEEE Computer Society, Los Alamitos, CA, USA, p. 45. Fraser, G., Wotawa, F., 2007e. Test-case prioritization with model-checkers. In: Proceedings of the 25th IASTED International Conference on Software Engineering (SE’07). ACTA Press, Anaheim, CA, USA, pp. 267–272. Fraser, G., Wotawa, F., 2007f. Using LTL rewriting to improve the performance of model-checker based test-case generation. In: A-MOST’07: Proceedings of the 3rd International Workshop on Advances in Model-based Testing. ACM Press, New York, NY, USA, pp. 64–74. Fraser, G., Wotawa, F., 2008. Using model-checkers to generate and analyze property relevant test-cases. Software Quality Journal 16 (2), 161–183. Fraser, G., Aichernig, B., Wotawa, F., 2007. Handling model changes: regression testing and test-suite update with model-checkers. Electronic Notes in Theoretical Computer Science 190, 33–46. Gargantini, A., Heitmeyer, C., 1999. Using model checking to generate tests from requirements specifications. ESEC/FSE’99: 7th European Software Engineering Conference, Held Jointly with the 7th ACM SIGSOFT Symposium on the Foundations of Software Engineering, vol. 1687. Springer, pp. 146–162. Giannakopoulou, D., Havelund, K., 2001. Automata-based verification of temporal properties on running programs. In: ASE’01: Proceedings of the 16th IEEE
1417
International Conference on Automated Software Engineering. IEEE Computer Society, Washington, DC, USA, p. 412. Hamon, G., de Moura, L., Rushby, J., 2004. Generating efficient test sets with a model checker. In: Proceedings of the Second International Conference on Software Engineering and Formal Methods (SEFM’04). IEEE Computer Society, Los Alamitos, CA, USA, pp. 261–270. Harrold, M.J., Gupta, R., Soffa, M.L., 1993. A methodology for controlling the size of a test suite. ACM Trans. Softw. Eng. Methodol. 2 (3), 270–285. Havelund, K., Rosu, G., 2001a. Monitoring java programs with java pathexplorer. Electr. Notes Theor. Comput. Sci. 55 (2). Havelund, K., Rosu, G., 2001b. Monitoring programs using rewriting. In: ASE’01: Proceedings of the 16th IEEE International Conference on Automated Software Engineering. IEEE Computer Society, Washington, DC, USA, p. 135. Havelund, K., Rosu, G., 2004. Efficient monitoring of safety properties. International Journal on Software Tools for Technology Transfer 6 (2), 158–173. Heimdahl, M.P.E., Devaraj, G., 2004. Test-suite reduction for model based tests: effects on test quality and implications for testing. In: ASE. IEEE Computer Society, pp. 176–185. Heimdahl, M.P.E., Rayadurgam, S., Visser, W., 2001. Specification centered testing. In: Proceedings of the Second International Workshop on Automated Program Analysis, Testing and Verification (ICSE 2001). Heimdahl, M.P., Rayadurgam, S., Visser, W., Devaraj, G., Gao, J., 2003. Autogenerating test sequences using model checkers: a case study. In: Third International Workshop on Formal Approaches to Software Testing. Lecture Notes in Computer Science, 2931. Springer-Verlag, pp. 42–59. Hierons, R.M., 2006. Applying adaptive test cases to nondeterministic implementations. Inf. Process. Lett. 98 (2), 56–60. Holzmann, G.J., 1997. The model checker SPIN. IEEE Trans. Softw. Eng. 23 (5), 279– 295. Hong, H.S., Cha, S.D., Lee, I., Sokolsky, O., Ural, H., 2003. Data flow testing as model checking. In: ICSE’03: Proceedings of the 25th International Conference on Software Engineering. IEEE Computer Society, Washington, DC, USA, pp. 232– 242. Jard, C., Jéron, T., 2005. TGV: theory, principles and algorithms. International Journal on Software Tools for Technology Transfer (STTT) 7, 297–315. Jéron, T., Morel, P., 1999. Test generation derived from model-checking. In: CAV’99: Proceedings of the 11th International Conference on Computer Aided Verification. Springer-Verlag, London, UK, pp. 108–121. Jones, J.A., Harrold, M.J., 2003. Test-suite reduction and prioritization for modified condition/decision coverage. IEEE Trans. Softw. Eng. 29 (3), 195–209. Kupferman, O., Vardi, M.Y., 1997. Model checking revisited. In: CAV’97: Proceedings of the 9th International Conference on Computer Aided Verification. SpringerVerlag, London, UK, pp. 36–47. Lee, D., Yannakakis, M., August 1996. Principles and methods of testing finite state machines – a survey. In: Proceedings of the IEEE, vol. 84, pp. 1090–1123. Lichtenstein, O., Pnueli, A., 1985. Checking that finite state concurrent programs satisfy their linear specification. In: POPL’85: Proceedings of the 12th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages. ACM Press, New York, NY, USA, pp. 97–107. Markey, N., Schnoebelen, P., 2003. Model checking a path (preliminary report). In: Proceedings of the Concurrency Theory (CONCUR’2003), Marseille, France. Lecture Notes in Computer Science, vol. 2761. Springer, pp. 251–265. McMillan, K.L., 1993. Symbolic Model Checking. Kluwer Academic Publishers., Norwell, MA, USA. Meolic, R., Fantechi, A., Gnesi, S., 2004. Witness and counterexample automata for ACTL. In: Formal Techniques for Networked and Distributed Systems, Lecture Notes in Computer Science, vol. 3235, pp. 259–275. Okun, V., Black, P.E., Yesha, Y., 2003. Testing with model checker: insuring fault visibility. In: Mastorakis, N.E., Ekel, P. (Eds.). Proceedings of 2002 WSEAS International Conference on System Science, Applied Mathematics & Computer Science, and Power Engineering Systems, pp. 1351–1356. Pnueli, A., 1977. The temporal logic of programs. In: 18th Annual Symposium on Foundations of Computer Science, 31 October–2 November. IEEE, Providence, Rhode Island, USA, pp. 46–57. Queille, J.-P., Sifakis, J., 1982. Specification and verification of concurrent systems in CESAR. In: Proceedings of the 5th Colloquium on International Symposium on Programming. Springer-Verlag, London, UK, pp. 337–351. Rayadurgam, S., Heimdahl, M.P.E., 2001. Coverage based test-case generation using model checkers. In: Proceedings of the 8th Annual IEEE International Conference and Workshop on the Engineering of Computer Based Systems (ECBS 2001). IEEE Computer Society, Washington, DC, pp. 83–91. Rosu, G., Havelund, K., 2005. Rewriting-based techniques for runtime verification. Automated Software Engineering 12 (2), 151–197. Rothermel, G., Harrold, M.J., von Ronne, J., Hong, C., 2002. Empirical studies of test-suite reduction. Software Testing, Verification and Reliability 12 (4), 219– 249. Rothermel, G., Harrold, M.J., Ostrin, J., Hong, C., 1998. An empirical study of the effects of minimization on the fault detection capabilities of test suites. In: ICSM’98: Proceedings of the International Conference on Software Maintenance. IEEE Computer Society, p. 34. Rothermel, G., Untch, R.H., Chu, C., Harrold, M.J., 1999. Test case prioritization: an empirical study. In: ICSM’99: Proceedings of the IEEE International Conference on Software Maintenance. IEEE Computer Society, Washington, DC, USA, p. 179. Tan, L., Sokolsky, O., Lee, I., 2004. Specification-based testing with linear temporal logic. In: Proceedings of IEEE International Conference on Information Reuse and Integration (IRI’04), pp. 493–498.
1418
G. Fraser et al. / The Journal of Systems and Software 82 (2009) 1403–1418
Utting, M., Pretschner, A., Legeard, B., April 2006. A Taxonomy of Model-based Testing. Tech. Rep. No. 04/2006, Department of Computer Science, The University of Waikato (New Zealand). Vardi, M.Y., Wolper, P., 1986. An automata-theoretic approach to automatic program verification (Preliminary Report). In: Proceedings of the 1st IEEE Symposium on Logic in Computer Science (LICS’86). IEEE Computer Society, pp. 332–344. Weyuker, E.J., 1982. On testing non-testable programs. Computer Journal 25 (4), 465–470. Wijesekera, D., Sun, L., Ammann, P., Fraser, G., 2007. Relating counterexamples to test cases in CTL model checking specifications. In: A-MOST’07: Proceedings of the 3rd International Workshop on Advances in Model-based Testing. ACM Press, New York, NY, USA, pp. 75–84. Wong, W.E., Horgan, J.R., London, S., Mathur, A.P., 1995. Effect of test set minimization on fault detection effectiveness. In: ICSE’95: Proceedings of the 17th International Conference on Software Engineering. ACM Press, pp. 41–50. Zhong, H., Zhang, L., Mei, H., 2006. An experimental comparison of four test suite reduction techniques. In: ICSE’06: Proceeding of the 28th International Conference on Software Engineering. ACM Press, pp. 636–640. Gordon Fraser received a MSc in Computer Science (2003) and a PhD in 2007 both from Graz University of Technology. He currently works as a post-doc researcher at Saarland University at the Software Engineering Chair. His research interests include model-based testing, software testing, and verification. Gordon Fraser has been member of several program commitees, and has organized several workshops and special issues for journals in his field. He is a member of the ACM, the IEEE Computer Society, and the Austrian Society for Artificial Intelligence.
Franz Wotawa received a MSc in Computer Science (1994) and a PhD in 1996 both from the Vienna University of Technology. He is currently a professor of software engineering, head of the Institute for Software Technology (IST), and dean of studies (for the computer science curriculum) at the Graz University of Technology. His research interests include model-based and qualitative reasoning, configuration, planning, theorem proving, intelligent agents, mobile robots, verification and validation, and software engineering. Currently, Franz Wotawa works on applying model-based diagnosis to software debugging as well as on test-case generation and repair. He has written more than 100 papers for journals, conferences, and workshops and has been member of the program committees for several workshops and conferences. He organized workshops and special issues on model-based reasoning for the journal AI Communications. He is in the editorial board of the Journal of Applied Logic (JAL), and a member of IEEE Computer Society, ACM, AAAI, the Austrian Computer Society (OCG), and the Austrian Society for Artificial Intelligence. Paul Ammann is an Associate Professor of software engineering at George Mason University. He earned an AB degree in computer science from Dartmouth College and MS and PhD degrees in computer science from the University of Virginia. His research topics include semantic-based transaction processing, software for critical systems, secure information systems, software testing, and formal methods. Paul Ammann has recently co-authored the book ‘‘Introduction to Software Testing” together with Jeff Offutt, and received an outstanding teaching award in 2007 from the Volgenau School of Information Technology and Engineering.