Engineering Applications of Artificial Intelligence 45 (2015) 441–452
Contents lists available at ScienceDirect
Engineering Applications of Artificial Intelligence journal homepage: www.elsevier.com/locate/engappai
The application of iterative interval arithmetic in path-wise test data generation Ying Xing n, Yun-Zhan Gong, Ya-Wen Wang, Xu-Zhou Zhang State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China
art ic l e i nf o
a b s t r a c t
Article history: Received 12 December 2014 Received in revised form 31 May 2015 Accepted 27 July 2015
Research of path-wise test data generation is crucial and challenging in the area of software testing, which can be formulated as a constraint satisfaction problem. In our previous research, a look-ahead search method has been proposed as the constraint solver for path-wise test data generation. This paper analytically studies interval arithmetic of the search method in detail, which enforces arc consistency, and introduces the iterative operator to improve it, aiming at detecting infeasible paths as well as shortening generation time. Experiments were conducted to compare the iterative operator with the classical look-ahead operator AC-3, and to compare the test data generation method using the iterative operator with some currently existing methods. Experimental results validate the effectiveness and practical deployment of the proposed iterative method, and demonstrate that it is applicable in engineering. & 2015 Elsevier Ltd. All rights reserved.
Keywords: Test data generation Constraint satisfaction problem Interval arithmetic Arc consistency AC-3
1. Introduction Automating the process of software testing is a very popular research topic and is of real interest to the industry (Bertolino, 2007; Elsayed, 2012), because manual testing is time-consuming and error-prone, and is even impracticable for large-scale realworld programs in engineering (Weyuker, 1999). As a basic problem in software testing, the automation of path-wise test data generation is particularly important since many problems in software testing can be transformed into it, and people have put great efforts in this field both commercially and academically. Currently, most of the commercial tools implement a test data generation strategy that uses constant values found in the program under test (PUT) or values that are slightly modified by means of mathematical operations (Galler and Aichernig, 2013). Take Cþ þ test for example, which is a commercial software quality improvement tool for C/Cþ þ, it selects randomly a value from a predefined pool of values, such as minimum and maximum values, 1, þ1 and 0 for integer types, and constant values given within the PUT. But when this kind of constraint solvers reaches their limitations, they use random-based techniques, indicating that they are not totally intelligent and automatic. From an academic point of view, the problem of path-wise test data generation can be formulated into a constraint satisfaction problem (CSP) (Shan et al., 2004). For the purpose of solving the
n
Corresponding author. E-mail address:
[email protected] (Y. Xing).
http://dx.doi.org/10.1016/j.engappai.2015.07.021 0952-1976/& 2015 Elsevier Ltd. All rights reserved.
CSP, it is required to abstract the constraints to be met, and propagate and solve these constraints to obtain the test data. It is also strongly demanded to have precision in generating test data and the ability to prove that some paths are infeasible. DeMilli and Offutt (1991) proposed a fault-based technique using algebraic constraints and bisection to describe test data designed to find particular types of faults. Adtest (Gallagher et al., 1997) only considered one predicate or one input variable and iterated the solving procedures, which was relatively inefficient and not suitable for real-world programs in engineering. Gupta et al. (1999) presented a program execution-based approach to generate test data for a given path. The technique derived a desired input for a test path by iteratively refining an arbitrarily chosen input. Robschink and Snelting (2002) statically converted the program into a Static Single Assignment (SSA), normally resulting in large constraint systems, which sometimes contained variables irrelevant to the problem to be solved. BINTEST (Beyleda and Gruhn, 2003) adopted bisection to guide the search process, which, however, might cut the domains of variables that probably contained some solutions. Cadar et al. (2008) proposed a tool named KLEE and employed a variety of constraint solving optimization methods to reach the goal of high coverage. Wang et al. (2013) proposed an interval analysis algorithm using forward dataflow analysis, and adopted Choco (Team, 2010) as the constraint solver. But no matter what techniques are adopted, there are two challenges for the researchers in this field. One is infeasible path detection, without which much of the effort, statistically accounted for 30–75% (Hermadi et al., 2014) of the computation consumption, will become idle. The other is generation time reduction, which is especially important when the
442
Y. Xing et al. / Engineering Applications of Artificial Intelligence 45 (2015) 441–452
test beds are shifted from small, toy ones to large-scale engineering applications, for the reason that too long generation time is intolerable. Aiming at constructing a highly automatic test data generation tool that can be used for programs in engineering, we put forward a heuristic method best-first-search branch and bound (BFS-BB) (Xing et al., 2014), adopting branch and bound (BB), which is a classical search algorithm in artificial intelligence. For the purpose of infeasible path detection and generation time reduction as mentioned above, we make improvements on interval arithmetic in BFS-BB, which is used for arc consistency checking. All our work is based on the abstract memory model (AMM) (Tang et al., 2012) in Code Test System (CTS) (http://ctstesting.cn/), which tests realworld programs written in C programming language. AMM underlying automatic test data generation maintains a table of memory states, and the constraints related to the structure of data types can be represented by the table. As for the test data generation method in CTS, the main task is to construct an efficient constraint solver. We take numeric types as an example to describe our method in this paper. The rest of this paper is organized as follows. Section 2 provides the background and motivation of this paper. In Section 3, we give the theoretical analysis of interval arithmetic and propose the iterative operator to improve it. Two cases are studied in detail to explain the function of the iterative operator in Section 4. In Section 5, we make experimental analyses and empirical evaluations of the proposed method. Section 6 concludes this paper and highlights directions for future research.
2. Background and motivation As mentioned in Section 1, the problem of path-wise test data generation is in essence a CSP (Kasprzak et al., 2014), where the path refers to a sequence of nodes in a control flow graph (CFG) (McMinn, 2004). To be specific, X is a set of variables {x1, x2,…, xn}, D¼{D1, D2, …, Dn} is a set of domains, and Di AD (i¼ 1, 2,…, n) is a finite set of possible values for xi. For the path to be covered (denoted as p), D is defined based on the variables’ acceptable ranges. One solution to the problem is a set of values to instantiate each variable inside its domain denoted as {x1 ↦V 1 ; x2 ↦V 2 ; …; xn ↦V n }, Vi ADi, to make p feasible, meaning that each constraint defined by the PUT along p should be met. It should be noted that one solution is enough for path-wise test data generation, and it is not necessary to try to find all the solutions. A CSP is generally solved by backtracking search strategies. During the search process, variables are divided into three sets: past variables (short for PV, already instantiated), current variable (now being instantiated), and future variables (short for FV, not yet instantiated). The idea of the search algorithms is to extend partial solutions. At each step, a variable in FV is selected and assigned a value from its domain to extend the current partial solution. It is checked whether such an extension may lead to a possible solution of the CSP and the subtrees containing no solutions based on the current partial solution are pruned. The techniques for improving a search algorithm are categorized as look-ahead and look-back methods. Look-ahead methods (Frost and Dechter, 1995; Schaerf, 1997) are invoked whenever the search is preparing to extend the current partial solution, and they concern the following problems: (1) how to select the next variable to be instantiated or to be assigned a value; (2) how to select a value to instantiate a variable; (3) how to reduce the search space by maintaining a certain level of consistency. Lookback methods are invoked whenever the search encounters a dead-end and is preparing for backtracking.
The third problem in look-ahead methods is the focus of this paper, which is often solved by local consistency (including node consistency, arc consistency, and path consistency) techniques that associate a CSP with a network of relations, where nodes represent variables and arcs or edges represent constraints. Arc consistency (Cooper et al., 2010; Lecoutre and Prosser, 2006) is the most widely used method, which means that every consistent assignment to a single variable can be consistently extended to a second variable. AC-3 is the simplest arc consistency checking algorithm and is known to be practically efficient (Mackworth, 1977; Wallace, 1993). AC-3 involves a series of tests between pairs of constrained variables. In the enforcement of AC-3, it is not required to process all constraints if only a few domains have changed, and the operations are conducted on a queue of constraints to be processed. Arc consistency is a basic technique for solving CSPs, but consistency checking algorithms seldom solve CSPs by themselves. Actually, they often assist search algorithms in two ways. One is preprocess before the search starts, and the other is combining with search algorithms to fulfill the search process by reducing the domain of the CSP in question, such as forward checking. Generally, there are arc consistency checking methods within backtracking search algorithms, including BFS-BB as described below. Xing et al. (2014) described BFS-BB in detail that carries out depth-first search with backtracking, in which the arc consistency checking method interval arithmetic is involved in both preprocess and the search process. Adopting the MC/DC coverage criterion, BFS-BB took nearly 110 min to test the project aa200c available at http://www.moshier.net/, which includes 77 functions. The relatively large time consumption is one drawback of BFS-BB. Another drawback of BFS-BB is its inability to detect infeasible paths in advance. According to statistics, there are nearly 34% infeasible paths in aa200c. If these paths are considered feasible, there will be large amount of computation wasted in trying to generate test data for them. Fig. 1 shows an example test 1 to explain why there are infeasible paths that were not detected by BFS-BB. For the sake of easy explanation, we try to generate test data for the path which passes all the if statements and finally reaches the print statement. As described in Xing et al. (2014), the set of the domains of all the variables is obtained after the last branch predicate x2410, which is {x1:[ 1, 11], x2:[11, þ 1]}. It can be intuitively found that x1 is negative and x2 is positive, which, however, is in contradiction with the first predicate x14x2. In other words, the path to be covered is infeasible. But BFS-BB is unable to judge the feasibility of the path in advance due to the conservativeness of interval arithmetic, that is, it sometimes provides much larger intervals. Consequently, the computation consumed on the search will be meaningless. Aiming at solving the two problems with BFS-BB as well as test data generation methods, we made improvements on interval arithmetic based on the analysis in Section 3.
3. Methodology In this section, we give detailed analysis on interval and interval arithmetic, which is of great importance to BFS-BB, because it is the key part that is in charge of enforcing arc consistency. Based on the analytical result, an iterative operator is proposed.
Fig. 1. Program test 1.
Y. Xing et al. / Engineering Applications of Artificial Intelligence 45 (2015) 441–452
443
3.1. Theoretical analysis
Fig. 2. The complete lattice of integer domain.
8 > < ½a; b ½a; b [ i ½c; d ¼ ½c; d > : ½ minða; cÞ; maxðb; dÞ
⊃
⊃
Fig. 3. The arc consistency checking process of interval arithmetic. (a) domain reduction and (b) an inconsistency detected.
½c; d ¼ ? i ½a; b ¼ ? i
⊃
Definition 2. For two intervals I1 ¼[a, b] and I2 ¼[c, d], partial order λi is defined as I1 D iI2 if c ra rbrd or [a, b]¼ ? i. Definition 3. For two intervals I1 ¼[a, b] and I2 ¼[c, d], their intersection and union are defined as Eq. (1) and Eq. (2), respectively. ( ½ maxða; cÞ; minðb; dÞ maxða; cÞ r minðb; dÞ ½a; b \ i ½c; d ¼ ð1Þ otherwise ?i
⊃
⊃
A domain is a set of intervals. For example, if an integer variable x ranges from 3 to 6 but it cannot be equal to 0, then its domain is represented as [ 3, 1] [ [1,6], which is composed of two intervals. The set of all the domains is denoted as Itvs.
⊃
⊃
Definition 1. Given a, b Aℝ [ { 1, þ1}, [a, b] ¼ {x|x A ℝ [ { 1, þ1}, a rx rb} is a bounded closed interval, short for interval. a is the lower extreme point of interval [a, b], and b the upper extreme point. If a 4b, [a, b] is an empty interval denoted as ? i, and [ 1, þ1] is the largest interval denoted as > i .
⊃
Interval is a typical abstraction domain in abstract interpretation (Cousot and Cousot, 1976; Cousot, 2001), which is an approximation theory of the semantic model of the computer system and is effectively applied in analyzing the ranges of variables’ values. Interval arithmetic was proposed by Moore (1966), and initially it was used for the computation of reliable bounds in numerical analysis to increase reliability. Soon it was widely applied in physics, economics, engineering including software engineering, and many other fields. During decades of development (Moore, 1979; Moore et al., 2009), interval arithmetic has become an important static testing technique in the field of software engineering. The key idea of interval arithmetic is to represent numbers by intervals including real values, and interval arithmetic has a set of arithmetic rules defined on the basis of intervals. It analyzes and calculates the ranges of variables starting from the entrance of the program, and provides information for further program analysis efficiently, reliably, and conservatively. The following is the elaboration on interval.
ð2Þ
otherwise
The above definitions are given in the field of real numbers, and they apply for integers if ℝ is replaced by Z. In computers, the representation of real-valued variables is in fact discrete. If step λ (λ 40) is used to represent the minimum precision, then λ equals 1 for integer variables, and λ will vary for float variables. The open interval (a, b) can be represented as a closed interval [a þλ1, b λ2] restricted by the precision of computers. Both λ1 and λ2 are equal to 1 for integer variables, and the values of λ1 and λ2 depend on a and b for float variables. The representation of intervals is just in accordance with the procedure of constraint solving by computers. It can be easily verified that 〈Itvs; D i ; [ i ; \ i ; ? i ; > i 〉 makes a complete lattice, and Fig. 2 is the Hasse diagram for integer domain. The computation of intervals follows the rules for lattice. The following is the definition of branching condition, which is used by interval arithmetic. Definition 4. Let B be the set of Boolean values {true, false} and Da be the set of the domains of all the variables before the ath branch. If there are k branches along the path, the branching condition Br (nqa, nqa þ 1): Da-B (aA [1, k]) where nqa is a branching node is
calculated by Eq. (3). ( true Brðnqa ; nqa þ 1 Þ ¼ f alse
~ a a∅ Da \ D a ~a ¼∅ D \D
ð3Þ
In Eq. (3), Da satisfies all the a 1 branching conditions ahead and will be used as input for the calculation of the ath branching condition, ~ a which is a set of temporary domains is the result when and D calculating Br(nqa, nqa þ 1) with Da and satisfies the ath branching a a condition. Da \ D~ a ∅ means that Da \ D~ satisfies all the a 1 branching conditions ahead and the ath branching condition, ensuring that interval arithmetic can continue to calculate the remaining ~ a ¼ ∅ means that an inconsistency is branching conditions. Da \ D detected at the ath branch and interval arithmetic will exit. For the k branching nodes along the path, all the k branching conditions should be true to make p feasible. This is the arc consistency checking process of interval arithmetic, which is illustrated by Fig. 3. To be specific, at the entrance of the path interval arithmetic receives the set of the domains of all the variables denoted as D1, and evaluates the branching condition corresponding to the branch (nq1, nq1þ 1) where nq1 is the first branching node. The branching condition
Y. Xing et al. / Engineering Applications of Artificial Intelligence 45 (2015) 441–452
calculate the corresponding F return F(Vi); else return; if (Dk þ 1 ¼D1) arc consistent’true; return Dk þ 1; else D1 ¼ Dk þ 1;
As mentioned in Section 2, arc consistency checking methods often work in combination with search algorithms. Hence, we combine Algorithm 1 and BB, which we call BFS-BB-iterative interval arithmetic (BFS-BB-IIA) in this paper. For ease of exposition, we assume that the order of instantiation in which variables are added to the current partial solution is in accordance with the predefined order: x1, x2,…, xn. Algorithm 2. BFS-BB-iterative interval arithmetic.
⊃
⊃ ⊃
Fig. 4. The process that the iterative operator reaches a fixed point.
⊃
⊃
Input p: the path to be covered Output resultfVariable↦Valueg: the test data making p feasible Stage 1: Preprocessing 1: call Algorithm 1. Iterative interval arithmetic; 2: if (arc consistent ¼false) 3: return infeasible path; 4: else for each xi A FV 5: if (Di ¼ [ 1, b]||Di ¼[a, þ1]||Di ¼[ 1, þ1]) 6: do reduce Di; 7: call Algorithm 1. Iterative interval arithmetic; 8: while (arc consistent ¼false)
⊃
⊃
⊃
⊃
⊃
⊃
Input D1: the set of the domains of all the variables at the entrance of the path
else if (PVa Ø)
⊃
Algorithm 1. Iterative interval arithmetic.
8: 9: 10: (Vi); 11: 12: 13: 14: 15: 16: End
~ a a∅) if (Da \ D Br(nqa, nqa þ 1)’true; ~ a; Da þ 1 ’Da \ D
⊃
Based on the above analysis of interval arithmetic, we introduce the iterative operator to improve it. Fig. 4 shows the working process of the iterative operator with certain rounds. Since the set of the domains of all the variables is reduced in each round, it must be the smallest set with the same input when the iteration ends. The end of the iteration is also the fixed point of interval arithmetic if it is taken as a function. Fig. 5 shows another case in which an inconsistency is checked at a certain round with an empty domain found, indicating that the path is infeasible. To better elaborate the function of the iterative interval arithmetic (IIA) operator, the processes in Figs. 4 and 5 can be presented by the following pseudo-codes.
6: 7:
⊃
3.2. The iterative operator and its function
Output Dk þ 1: the set of the reduced domains of all the variables at the exit of the path F(Vi): the value of the objective function after an instantiation causes an inconsistency Begin 1: arc consistent’false; 2: while(D1 a Ø) 3: for a-1: k 4: Br(nqa, nqa þ 1)’false; ~ a ’calculate Br(nqa, nqa þ 1) with Da; 5: D
⊃
Br(nq1, nq1 þ 1) is generally not satisfied for all the values in D1 but for values in a certain subset D2 DD1 ensuring the traversal of the branch (nq1, nq1þ 1), i.e., D1 ⟹Brðnq1 ;nq1 þ 1 Þ D2 . Next the branching condition Br (nq2, nq2 þ 1) is evaluated given that the set of the domains of all the variables is D2. Again, generally Br(nq2, nq2 þ 1) is only satisfied by a subset D3 DD2. This procedure continues along p until all the branching conditions are satisfied and Dkþ 1 is returned as the set of the domains of all the variables at the exit of the path. The process is the propagation of the branching conditions along p in the form of D1 ⟹Brðnq1 ;nq1 þ 1 Þ D2 ⟹Brðnq2 ;nq2 þ 1 Þ D3 …Dk ⟹Brðnqk ;nqk þ 1 Þ Dk þ 1 , where D1 +D2 +D3…+Dk +Dk þ 1, shown as in Fig. 3(a). But if in this procedure Br(nqh, nqh þ 1)¼false (1rhrk), which means an inconsistency is detected, then interval arithmetic is terminated and an empty domain is found, shown as in Fig. 3(b). In summary, interval arithmetic along a path ends with two possibilities: the set of the domains of all the variables are reduced (or at least kept unchanged), and an inconsistency is detected (the lower bound ? i is reached). But no matter which possibility it is, interval arithmetic along a path is monotonically decreasing. If the interval arithmetic from the entrance of the path (where the set of the domains of all the variables is denoted as Din) to the exit (where the set of the domains of all the variables is denoted as Dout) is taken as a function, it can be concluded that it will converge to a fixed point (Antic et al., 2013) according to the Knaster–Tarski theorem (Lesniak, 2012), which will naturally lead to the idea of introducing the iterative operator to interval arithmetic along the path. This brings an improvement to the conservative interval arithmetic that often works out much larger intervals for variables. There are also other ways of improvement, such as affine arithmetic (Comba and Stol, 1993; Pirnia et al., 2014), which will be further studied in our future research.
⊃
444
Fig. 5. The process that the iterative operator checks an inconsistency.
Y. Xing et al. / Engineering Applications of Artificial Intelligence 45 (2015) 441–452
9: D’Dk þ 1; Stage 2: branch and bound search Begin 10: result’null; 11: x1’order (FV); 12:V1’select (D1); 13: D1’{[V1, V1], D2, D3,…, Dn}; 14: for each xi (i-1: n) 15: call Algorithm 1. Iterative interval arithmetic; 16: if (arc consistent ¼true) 17: D’Dk þ 1; 18: result’result [ fxi ↦V i g; 19: FV’FV {xi}; 20: PV’PVþ{xi}; 21: if (FV¼ Ø) 22: return result; 23: else xi’ order (FV); 24: Vi’select (Di); 25: D1’{[V1, V1], [V2, V2],…, [Vi, Vi], Di þ 1,…, Dn}; 26: go to 15; 27: else if (|Di| 41) 28: reduce Di by F(Vi); 29: Vi’select (Di); 30: D1’{[V1, V1], [V2, V2],…, [Vi, Vi], Di þ 1,…, Dn}; 31: go to 15; 32: else backtrack; End
There are two stages in BFS-BB-IIA. The first stage is to perform the preprocessing operations. The path p to be covered is input to BFS-BB-IIA, which contains the set of variables X¼ {x1, x2,…, xn}, the set of domains D¼{D1, D2,…, Dn}, and the constraints to be met. First, IIA checks arc consistency. If an inconsistency is checked, then BFS-BB-IIA exits with an infeasible path detected. Second, if p is feasible and there is any infinite domain in D, then IIA works to reduce the infinite domain and checks arc consistency. This step may repeat for several times until no inconsistency is found. The second stage carries out the BB search. Ordering is taken between all the variables in FV, the result of which is returned as the first variable to be instantiated (x1). A value (V1) is selected from the domain of x1 (D1) and assigned to x1. IIA checks arc consistency with the current domains of all the variables. If the result is consistent, x1 is put into PV, D is updated by the fixed point, and the ordering and instantiation of the following variables are repeated until all the variables have been assigned the right values {V1, V2,…, Vn} that make p feasible. {x1 ↦V 1 ; x2 ↦V 2 ; …; xn ↦V n } (Vi ADi) is a solution to the CSP. If the result is inconsistent, another value calculated by F (Vi) (Xing et al., in press) is assigned to the current variable (xi) from its domain (Di) and IIA checks arc consistency again. If there is not
445
any value left in Di, then a variable ahead of xi is put out of PV and reassigned a value, which is backtracking. It can be seen that the iterative operator functions in three steps of BFS-BB-IIA, and it involves in constraint solving in both the ways mentioned in Section 2, namely, preprocessing and forward checking in the search process. We use different notations to make a distinction between them, which are infeasible path detection (IPD) at line 1, initial domain reduction (IDR) at line 7, and variable assignment determination (VAD) at line 15. IPD and IDR are used for preprocessing, and VAD is used for forward checking. In addition to different steps used, there are two other points to distinguish them. One difference involves the inputs. The input of IPD is the set of the domains of all the variables, which may make p infeasible; the input of IDR is the set of the domains of all the variables which may include infinite domains, and normally each domain is a range of values, that is, no variable has been assigned a fixed value from its domain yet; the input of VAD is the set of the domains of all the variables with finite bounds, but the domains of the variables in PV and the current variable are all in the form of [V, V], which is in fact a fixed value. The other difference is in the ways that they process the conflicting interval information. Due to the conservation and soundness of interval arithmetic, IPD determines that the path is infeasible when an inconsistency occurs, and there is no need to carry out the following search steps; IDR determines that the initial domain reduction strategy should be adjusted in the event of inconsistency; for VAD, an inconsistency only implies that the current value Vi of the current variable xi is not part of the solution and the subsequent process has been introduced in our previous work (Xing et al., in press). Iterative interval arithmetic (IIA) and other arc consistency checking algorithms such as AC-3 are all look-ahead methods used for checking whether the domains or values of variables can meet the constraints between them and improving the search efficiency by wiping out the search domain. In contrast, IIA concerns each branching condition which may include more than two variables rather than each pair of variables, and reduces the domains of all the variables involved rather than the domain of the variable in question when an inconsistency is detected. The above analysis indicates that in some cases, for example, when testing engineering projects in which the domains of variables are relatively large, IIA will be more efficient due to its coarsegrained checking manner. IIA can also be used for real-valued variables. On the other hand, this coarse-grained manner will lead to the loss of precision, the cause of which can be traced back to the conservativeness of interval arithmetic, since the proposal of the iterative operator aims at solving this problem to some extent. In summary, IIA is the trade-off between precision and efficiency.
4. Case study This section elaborates the function of the iterative interval arithmetic operator with two examples, which can better explain the coarse-grained arc consistency checking manner.
Table 1 The IPD process for test 1. Round of iteration
Position
Domains of variables 1
Result
1
Entrance After (x1 4x2) After (x1 o 10) After (x2 410)
Din ¼ D ¼{x1:[ 1, þ 1], x2:[ 1, þ 1]} D2 ¼{x1:[ 1, þ 1], x2:[ 1, þ 1]} D3 ¼{x1:[ 1, 11], x2:[ 1, þ 1]} Dout ¼ D4 ¼{x1:[ 1, 11], x2:[11, þ 1]}
Din aDout
2
Entrance After (x1 4x2) After (x1 o 10) After (x2 410)
Din ¼ D1 ¼{x1:[ 1, 11], x2:[11, þ1]} D2 ¼{x1: Ø, x2: Ø} – –
Path is infeasible!
446
Y. Xing et al. / Engineering Applications of Artificial Intelligence 45 (2015) 441–452
4.1. Case one In this part, we use the example test 1 in Fig. 1. For brevity, we try to generate test data for the path which passes all the if statements and finally reaches the print statement, since the branching conditions are exactly the same as the corresponding predicates. First, IPD is carried out as shown in Table 1, where the changed domains are highlighted in bold. It can be seen that the first round of iteration reduces the domains of both variables. But it is not until the second round that an inconsistency is detected, indicating that the path is infeasible and there is no need to generate test data for it.
4.2. Case two In this part, we give a PUT as shown in Fig. 6. For brevity, we try to generate test data for the path which passes all the if statements and finally reaches the print statement, since the branching conditions are exactly the same as the corresponding predicates. First, IPD is carried out as shown in Table 2, where the changed domains are highlighted in bold. It can be seen that the domain of x4 is reduced. To reduce the search space, we cut the domains of all the variables to the range of [ 100, 100], which are {x1:[ 100, 100], x2:[ 100, 100], x3:[ 100, 100], x4:[ 100, 49]}. Then IDR works as shown in Table 3, where the changed domains are highlighted in bold. Clearly there is no need to adjust the initial domain reduction strategy, then x4 is selected out and assigned a value 62 from its domain [ 100, 49]. VAD is carried out to judge whether 62 for x4 will lead to an inconsistency as shown in Table 4. Table 4 demonstrates that only one round of iteration proves arc consistent when x4 is assigned 62, then x2 is selected out and assigned a value 71 from its domain [ 89, 100] in the same way as x4. VAD is carried out to judge whether 71 for x2 will lead to an inconsistency as shown in Table 5. After three rounds of iteration, 71 for x2 is judged to be feasible, and the domains of x1 and x3 are reduced to [ 9, 70] and [21, 100] from [ 100, 99] and [ 89, 100], respectively, as shown in bold. Then x1 is selected out and assigned a value 1 from its domain [ 9, 70] in the same way as x4 and x2. VAD is carried out to judge whether 1 for x1 will lead to an inconsistency as shown in Table 6. It indicates that after two rounds of iteration, 1 for x1 is
Fig. 6. Program test 2.
judged to be feasible and the domain of x3 is reduced to [92, 92] from [21, 100] as shown in bold, meaning that the test data is solved, which is {x1↦ 1; x2↦71; x3↦92; x4↦ 62}.
5. Experimental analyses and empirical evaluations To observe the effectiveness of BFS-BB-IIA, we carried out a large number of experiments in CTS. Within the CTS framework, the PUT is automatically analyzed, and its basic information is abstracted to generate its CFG. According to the specified coverage criteria, the paths to be covered are generated and provided as input, which may be infeasible and cannot be detected by interval arithmetic without iteration. The algorithms were implemented in Java and run on the platform of Java Runtime Environment (JRE). In Section 5.1, comparison experiments were conducted between IIA and the classical arc consistency checking method AC-3 in terms of the numbers of constraint checks and backtracking. The numbers of variables and expressions (Gallagher and Narasimhan, 1997; Zhao et al., 2010) are important factors that affect the performance of test data generation methods. Therefore, in Section 5.2 experiments were carried out to evaluate the effectiveness of VAD described in Section 3.2 for varying numbers of input variables and expressions, respectively, by comparing it with the method without VAD. Experiments were also conducted to test the ability of BFS-BB-IIA through comparisons with BFS-BB, a commercial testing tool Cþ þ test, and the open-source constraint solver Choco in Section 5.3, and some of the PUTs were from engineering projects. 5.1. Arc consistency checking evaluation This part presents the comparison between the IIA operator and AC-3. As mentioned in Section 2, arc consistency algorithms often function in combination with search algorithms. Hence, the comparison was undertaken in the framework of branch and bound (BB), namely, BFS-BB-IIA with BB and AC-3 (the combination of BB and AC-3 as a search method to be compared with BFSBB-IIA). The experiments were performed in the environment of Windows 7 with 32-bits, Pentium 4 with 3.00 GHz and 4 GB memory. The test beds were the N-queen problem when N varied from 4 to 11. A related problem is the order in which variable domains are checked for consistency. It has been found that certain ordering heuristics can yield significant improvement in the performance of AC-3 (Wallace and Freuder, 1992). For the sake of fairness, in our experiments the domains of variables were all treated in numerical order. Since one solution is enough for path-wise test data generation, the number of constraint checks and the number of backtracking by AC-3 and IIA when one solution was found were recorded for comparison. Table 7 presents the experimental results. The results show that in terms of the number of constraint checks and the number of
Table 2 The IPD process for test 2. Round of iteration
Position
Domains of variables
Result
1
Entrance After (x1þ x2þ x3þ x4¼ 100) After (x1o x2) After (x2þ x3410) After (x4o 50)
Din ¼ D1 ¼{x1:[ 1, þ 1], x2:[ 1, þ 1], x3:[ 1, þ1], x4:[ 1, þ1]} D2 ¼{x1:[ 1, þ 1], x2:[ 1, þ 1], x3:[ 1, þ 1], x4:[ 1, þ 1]} D3 ¼{x1:[ 1, þ 1], x2:[ 1, þ 1], x3:[ 1, þ 1], x4:[ 1, þ 1]} D4 ¼{x1:[ 1, þ 1], x2:[ 1, þ 1], x3:[ 1, þ 1], x4:[ 1, þ 1]} Dout ¼ D5 ¼ {x1:[ 1, þ 1], x2:[ 1, þ1], x3:[ 1, þ1], x4:[ 1, 49]}
Din a Dout
2
Entrance After (x1þ x2þ x3þ x4¼ 100) After (x1o x2) After (x2þ x3410) After (x4o 50)
Din ¼ D1 ¼{x1:[ 1, þ 1], x2:[ 1, þ 1], x3:[ 1, þ1], x4:[ 1, 49]} D2 ¼{x1:[ 1, þ 1], x2:[ 1, þ 1], x3:[ 1, þ 1], x4:[ 1, 49]} D3 ¼{x1:[ 1, þ 1], x2:[ 1, þ 1], x3:[ 1, þ 1], x4:[ 1, 49]} D4 ¼{x1:[ 1, þ 1], x2:[ 1, þ 1], x3:[ 1, þ 1], x4:[ 1, 49]} Dout ¼ D5 ¼ {x1:[ 1, þ 1], x2:[ 1, þ1], x3:[ 1, þ1], x4:[ 1, 49]}
Din ¼ Dout, fixed point reached!
Y. Xing et al. / Engineering Applications of Artificial Intelligence 45 (2015) 441–452
447
Table 3 The IDR process for test 2. Round of iteration
Position
Domains of variables
Result
1
Entrance After (x1 þx2 þx3þx4¼ 100) After (x1 ox2) After (x2 þx3 410) After (x4 o50)
Din ¼D1 ¼ {x1:[ 100, 100], x2:[ 100, 100], x3:[ 100, 100], x4:[ 100, 49]} D2 ¼{x1:[ 100, 100], x2:[ 100, 100], x3:[ 100, 100], x4:[ 100, 49]} D3 ¼{x1:[ 100, 99], x2:[ 99, 100], x3:[ 100, 100], x4:[ 100, 49]} D4 ¼{x1:[ 100, 99], x2:[ 89, 100], x3:[ 89, 100], x4:[ 100, 49]} Dout ¼D5 ¼ {x1:[ 100, 99], x2:[ 89, 100], x3:[ 89, 100], x4:[ 100, 49]}
Din a Dout
2
Entrance After (x1 þx2 þx3þx4¼ 100) After (x1 ox2) After (x2 þx3 410) After (x4 o50)
Din ¼D1 ¼ {x1:[ 100, 99], x2:[ 89, 100], x3:[ 89, 100], x4:[ 100, 49]} D2 ¼{x1:[ 100, 99], x2:[ 89, 100], x3:[ 89, 100], x4:[ 100, 49]} D3 ¼{x1:[ 100, 99], x2:[ 89, 100], x3:[ 89, 100], x4:[ 100, 49]} D4 ¼{x1:[ 100, 99], x2:[ 89, 100], x3:[ 89, 100], x4:[ 100, 49]} Dout ¼D5 ¼ {x1:[ 100, 99], x2:[ 89, 100], x3:[ 89, 100], x4:[ 100, 49]}
Din ¼ Dout, fixed point reached!
Table 4 The VAJ process when x4 is assigned 62. Round of iteration
Position
Domains of variables
Result
1
Entrance After (x1þx2þ x3þ x4¼ 100) After (x1o x2) After (x2þx3410) After (x4o 50)
Din ¼ D1 ¼{x1:[ 100, 99], x2:[ 89, 100], x3:[ 89, 100], x4:[ 62, 62]} D2 ¼ {x1:[ 100, 99], x2:[ 89, 100], x3:[ 89, 100], x4:[ 62, 62]} D3 ¼ {x1:[ 100, 99], x2:[ 89, 100], x3:[ 89, 100], x4:[ 62, 62]} D4 ¼ {x1:[ 100, 99], x2:[ 89, 100], x3:[ 89, 100], x4:[ 62, 62]} Dout ¼ D5 ¼{x1:[ 100, 99], x2:[ 89, 100], x3:[ 89, 100], x4:[ 62, 62]}
Din ¼ Dout, fixed point reached!
Table 5 The VAJ process when x2 is assigned 71. Round of iteration
Position
Domains of variables
Result
1
Entrance After (x1þ x2þ x3þ x4 ¼100) After (x1o x2) After (x2þ x3410) After (x4o 50)
Din ¼D1 ¼ {x1:[ 100, 99], x2:[71, 71], x3:[ 89, 100], x4:[ 62, 62]} D2 ¼ {x1:[ 100, 99], x2:[71, 71], x3:[ 89, 100], x4:[ 62, 62]} D3 ¼ {x1:[ 100, 70], x2:[71, 71], x3:[ 89, 100], x4:[ 62, 62]} D4 ¼ {x1:[ 100, 70], x2:[71, 71], x3:[ 60, 100], x4:[ 62, 62]} Dout ¼D5 ¼ {x1:[ 100, 70], x2:[71, 71], x3:[ 60, 100], x4:[ 62, 62]}
Din a Dout
2
Entrance After (x1þ x2þ x3þ x4 ¼100) After (x1o x2) After (x2þ x3410) After (x4o 50)
Din ¼D1 ¼ {x1:[ 100, 70], x2:[71, 71], x3:[ 60, 100], x4:[ 62, 62]} D2 ¼ {x1:[ 9, 70], x2:[71, 71], x3:[21, 100], x4:[ 62, 62]} D3 ¼ {x1:[ 9, 70], x2:[71, 71], x3:[21, 100], x4:[ 62, 62]} D4 ¼ {x1:[ 9, 70], x2:[71, 71], x3:[21, 100], x4:[ 62, 62]} Dout ¼D5 ¼ {x1:[ 9, 70], x2:[71, 71], x3:[21, 100], x4:[ 62, 62]}
Din a Dout
3
Entrance After (x1þ x2þ x3þ x4 ¼100) After (x1o x2) After (x2þ x3410) After (x4o 50)
Din ¼D1 ¼ {x1:[ 9, 70], x2:[71, 71], x3:[21, 100], x4:[ 62, 62]} D2 ¼ {x1:[ 9, 70], x2:[71, 71], x3:[21, 100], x4:[ 62, 62]} D3 ¼ {x1:[ 9, 70], x2:[71, 71], x3:[21, 100], x4:[ 62, 62]} D4 ¼ {x1:[ 9, 70], x2:[71, 71], x3:[21, 100], x4:[ 62, 62]} Dout ¼D5 ¼ {x1:[ 9, 70], x2:[71, 71], x3:[21, 100], x4:[ 62, 62]}
Din ¼Dout, fixed point reached
Table 6 The VAJ process when x1 is assigned 1. Round of iteration
Position
Domains of variables
Result
1
1
Entrance After (x1þx2þ x3þ x4¼ 100) After (x1o x2) After (x2þx3410) After (x4o 50)
Din ¼ D ¼{x1:[ 1, 1], x2:[71, 71], x3:[21, 100], x4:[ 62, 62]} D2 ¼ {x1:[ 1, 1], x2:[71, 71], x3:[92, 92], x4:[ 62, 62]} D3 ¼ {x1:[ 1, 1], x2:[71, 71], x3:[92, 92], x4:[ 62, 62]} D4 ¼ {x1:[ 1, 1], x2:[71, 71], x3:[92, 92], x4:[ 62, 62]} Dout ¼ D5 ¼{x1:[ 1, 1], x2:[71, 71], x3:[92, 92], x4:[ 62, 62]}
Din a Dout
2
Entrance After (x1þx2þ x3þ x4¼ 100) After (x1o x2) After (x2þx3410) After (x4o 50)
Din ¼ D1 ¼ {x1:[ 1, 1], x2:[71, 71], x3:[92, 92], x4:[ 62, 62]} D2 ¼ {x1:[ 1, 1], x2:[71, 71], x3:[92, 92], x4:[ 62, 62]} D3 ¼ {x1:[ 1, 1], x2:[71, 71], x3:[92, 92], x4:[ 62, 62]} D4 ¼ {x1:[ 1, 1], x2:[71, 71], x3:[92, 92], x4:[ 62, 62]} Dout ¼ D5 ¼{x1:[ 1, 1], x2:[71, 71], x3:[92, 92], x4:[ 62, 62]}
Din ¼Dout, fixed point reached
backtracking, IIA outperformed AC-3 for most of the cases as shown in bold in columns 3 and 5, which means that when one solution was calculated, BFS-BB-IIA visited less nodes in the search tree than BB and AC-3. For problems like the N-queen problem, IIA will lead to less constraint checks and backtracking than AC-3. In our future research, more efforts will be put into the study of the efficiency and precision of arc consistency checking algorithms.
5.2. Test data generation performance evaluation 5.2.1. Varying number of variables The testing of the relationship between the performance of test data generation methods and the number of variables was accomplished by repeatedly running the two methods (with and without VAD) on generated test programs having input variables
448
Y. Xing et al. / Engineering Applications of Artificial Intelligence 45 (2015) 441–452
Table 7 The comparison result between AC-3 and IIA. N
4 5 6 7 8 9 10 11
Number of constraint checks
Number of backtracking
AC-3
IIA
AC-3
IIA
122 83 902 212 858 973 1274 32,326
42 40 281 168 473 540 540 3377
4 0 21 0 10 5 8 256
1 0 3 0 0 1 0 10
Fig. 8. The fitting curve of the iterative method as the number of variables increases.
variables increases. We can roughly draw the conclusion that when the other settings keep unchanged, generation time using VAD is very close for n ranging from 1 to 13, while it begins to increase when n is larger than 14.
Fig. 7. The comparison result of the iterative method with the non-iterative method as the number of variables increases.
x1, x2,…, xn, where n varied from 1 to 100. Adopting statement coverage, in each test the program contained 100 if statements (equivalent to 100 branching conditions or 100 expressions along the path) and there was only one path to be traversed of fixed length, which was the one consisting of entirely true branches, i.e., all the branching conditions were the same as the corresponding predicates. The expression of each if statement was a linear combination of all the n variables in the form of ½a1 ; a2 ; …; an ½x1 ; x2 ; …; xn 0 rel_op const ½c
ð4Þ
where a1, a2,…, an are randomly generated numbers either positive or negative, rel_opA{ 4, Z,o ,r, ¼, a}, and const[c] (c A [1, 100]) was an array of randomly generated constants. The randomly generated ai (1 rirn) and const[c] should be selected to make the path feasible. This arrangement constructed the tightest linear relation between the variables. The programs for various values of n ranging from 1 to 100 were each tested 100 times, and the average time required to generate the data for each test was recorded. The experiments were performed in the environment of MS Windows 7 with 32-bits, Intel Core with 2.4 GHz and 2 GB memory. The comparison result is presented in Fig. 7. When the numbers of variables were not big enough, the generation time of both methods was very close, so we exponentiated the axis representing generation time to make a distinction between them. Fig. 7 shows that the average generation time of the iterative method is far less than the non-iterative method. For the iterative method, it is clear that the relation between average generation time and the number of variables can be represented as a quadratic curve very well as shown in Fig. 8, and the quadratic correlation relationship is significant at 95% confidence level with p-value far less than 0.05. Besides, average generation time increases at a uniformly accelerative speed with the increase of the number of variables. The differentiation of average generation time indicates that its increase rate rises by y¼904.4x 12,190 as the number of
5.2.2. Varying number of expressions The testing of the relationship between the performance of test data generation methods and the number of expressions was accomplished by repeatedly running the two methods (with and without VAD) on generated test programs, each of which had 100 input variables. Adopting statement coverage, in each test the program contained u (uA [1, 100]) if statements (equivalent to u branching conditions or u expressions) and there was only one path with entirely true branches to be traversed, i.e., all the branching conditions were the same as the corresponding predicates. The expression of each if statement was an expression in the form of ½a1 ; a2 ; …; a100 ½x1 ; x2 ; …; x100 'rel_op const ½u
ð5Þ
where a1, a2,…, a100 were randomly generated numbers either positive or negative, rel_opA{ 4, Z, o,r, ¼, a}, and const[u] was an array of randomly generated constants. The randomly generated av (v ¼1, 2,…, 100) and const[u] should be selected to make the path feasible. This arrangement constructed the strongest linear relation between the variables. The programs for various values of u ranging from 1 to 100 were each tested 100 times and the average time required to generate the data for each test was recorded. The experiments were performed in the environment of Windows 7 with 32-bits, Pentium 4 with 3.00 GHz and 4 GB memory. The comparison result is presented in Fig. 9. When the numbers of expressions were not big enough, the generation time of both methods was very close, so we exponentiated the axis representing generation time to make a distinction between them. Fig. 9 shows that the average generation time of the iterative method is far less than the non-iterative method. For the iterative method, it is clear that the average generation time increases approximately linearly with the number of expressions as shown in Fig. 10 and the linear correlation relationship is significant at 95% confidence level with p-value far less than 0.05. As the increase of the number of expressions, average generation time increases at an even speed. 5.3. Comparisons with other test data generation methods 5.3.1. Comparison with BFS-BB Adopting the MC/DC coverage criterion, BFS-BB-IIA takes nearly 66 min to test the project aa200c, which is almost two thirds of BFS-BB mentioned in Section 2, and the 34% infeasible paths are detected. But the time reduction will continue to be our next work. In this part, we conducted comparison experiments between BFS-BB-IIA and BFS-BB using test beds from two
Y. Xing et al. / Engineering Applications of Artificial Intelligence 45 (2015) 441–452
engineering projects available at http://www.moshier.net/. The experiments were performed in the environment of Ubuntu 12.04 with 32-bits, Pentium 4 with 2.8 GHz and 2 GB memory. The paths to be covered were provided by CTS, which might contain infeasible paths. The comparison adopted three coverage criteria: statement, branch, and MC/DC. For each test bed, the experiments were carried out 100 times, and the number of infeasible paths detected, average generation time, and average coverage (Blanco et al., 2010; Fraser and Arcuri, 2011; Mao et al., 2012) were used for comparison. The elements on the infeasible
Fig. 9. The comparison result of the iterative method with the non-iterative method as the number of expressions increases.
Fig. 10. The fitting curve of the iterative method as the number of expressions increases.
449
paths did not involve in the coverage calculation. The details of the comparison are shown in Table 8. Since BFS-BB-IIA is the improvement of BFS-BB, we discuss the comparison result in three related aspects in accordance with the purpose of the improvement elaborated in Section 2. 1) Infeasible path detection. There are infeasible paths in half of the cases as shown in bold in column 6. BFS-BB is not able to detect infeasible paths, but BFS-BB-IIA is. This is very useful, for the following steps to generate test data are spared. 2) Generation time. BFS-BB-IIA consumed less time than BFS-BB for 11 out of all the 18 cases as shown in bold in column 7. There are 7 cases in which BFS-BB-IIA consumed more time, because IIA encountered the same situation as shown in Tables 2, 3, and 6. To be specific, IIA took two rounds, but the first round completed the domain reduction of all the variables, and the second round only functioned as verification. In that situation, the non-iterative method BFS-BB would surely be faster. 3) Coverage. BFS-BB-IIA reached higher coverage than BFS-BB for 5 out of all the 18 cases as shown in bold in column 9. They performed the same for the other 13 cases, including 9 occurrences of 100%. For the cases that BFS-BB-IIA did not reach 100%, most adopted MC/DC as the coverage criterion, which is relatively strict (Rajan et al., 2008) and subsumes statement coverage and branch coverage, and is harder to achieve.
5.3.2. Comparison with C þ þ test This part presents results from an empirical comparison using BFS-BB-IIA and C þ þ test introduced in Section 1. C þ þ test worked as a plug-in of Visual Studio 2008. The experiments were performed in the environment of Windows 7 with 32-bits, Pentium 4 with 3.00 GHz and 4 GB memory. The comparison adopted statement coverage (Hwang et al., 2014). For each test bed, the experiments were carried out 100 times, and average generation time and average coverage were used for comparison. The details of the PUTs are shown in Table 9, where lines of codes are abbreviated to LOC. The comparison result is shown in Table 10, which is discussed in three aspects related to the programs under test, and the parts that BFS-BB-IIA performed better are bolded.
Table 8 The result of the comparison with BFS-BB. Project
de118i-2
aa200c
File
Function
sinl.c
sinl
asinl.c
acosl
diurpx.c
diurpx
dms.c
dms
nutate.c
nutlo
refrac.c
refrac
Coverage criterion
Paths from CTS
Infeasible paths by BFS-BB-IIA
Average generation time (s)
Average coverage
BFS-BB-IIA
BFS-BB-IIA (%)
BFS-BB (%)
BFS-BB
statement branch MC/DC statement branch MC/DC
6 8 9 4 5 5
2 3 5 0 0 0
0.498 0.689 1.060 0.186 0.226 0.224
0.773 0.756 1.177 0.226 0.201 0.319
93.8 95.8 93.4 100 100 93
86.4 95.4 92.6 100 100 93
statement branch MC/DC statement branch MC/DC statement branch MC/DC statement branch MC/DC
2 2 2 2 3 2 6 8 30 4 4 13
0 0 0 0 0 0 3 4 24 1 1 7
0.38 0.496 0.568 0.165 0.205 0.187 3.238 1.305 9.961 0.177 0.266 0.867
0.517 0.633 0.521 0.11 0.298 0.226 0.970 1.351 11.357 0.147 0.224 0.700
100 100 96 93.2 100 91.2 100 100 100 100 100 95
100 100 96 93.2 98 84 100 100 100 100 100 95
450
Y. Xing et al. / Engineering Applications of Artificial Intelligence 45 (2015) 441–452
Table 9 Programs used for comparison with Cþ þ test. Program
LOC
Types of variables
Program structure
Layers of loop
Paths from CTS
Source
testplus1.c testplus2.c testplus3.c testplus3.c loop1.c loop2.c loop3.c loop4.c loop5.c loop6.c loop7.c bonus.c days.c statistics.c division.c equation.c pingpiang.c prime.c star.c triangle.c lsqrt.c dms.c bc_jtoc( )
52 102 152 202 12 20 23 39 41 43 46 21 28 18 17 28 10 16 13 48 37 37
int int int int int int int int int int int long int int char int double char int int int long double, long, int, array, pointer
50 if 100 if 150 if 200 if do while, if for, switch, if while, if while, if while, for, if while, for, if while, for, if 6 if switch, if if for, while, if if if for, if while, for, if go to, if for, if go to, if
0 0 0 0 1 2 3 4 5 6 7 0 0 0 2 0 0 1 2 0 2 0
1 1 1 1 2 7 7 8 14 15 20 6 17 4 2 3 1 4 3 34 6 6
CTS CTS CTS CTS CTS CTS CTS CTS CTS CTS CTS Wang et al. (2013) Wang et al. (2013) Wang et al. (2013) CTS CTS CTS CTS CTS CTS CTS aa200c
1) Expressions. There are 50, 100, 150, and 200 mathematical expressions in the first four programs, but the numbers of variables are different, so they are not in the same structure as the programs in Section 5.2, and the test result of BFS-BB-IIA was not in accordance with the fitting result. When the numbers of expressions are below 100, Cþ þ test consumed less time, but it reached a lower coverage. And if coverage is given the priority, BFS-BB-IIA performed better. With the increase of the number of expressions and the increasing difficulty in solving the constraint system, C þ þ test failed to solve the constraints when the number of expressions reached 150. BFS-BB-IIA could solve the constraint system with hundreds of expressions and reached a high coverage, although the time consumption also increased. 2) Types of variables. It can be seen from columns 2 and 4 that BFS-BB-IIA performed better for the PUTs with basic types in terms of time consumption (except the first two programs) and coverage. For most of the cases, the time consumption of C þ þ test was many times more than BFS-BB-IIA. Cþ þ test was not able to generate test input that is required to be NULL (for example, the last test bed), and the execution of the randomly generated test data went wrong, which means that the test data generation failed. BFS-BB-IIA was able to generate test data for it, but the coverage did not reach 100%. 3) Loops. For the 11 programs with loops, BFS-BB-IIA consumed less time with 100% coverage. But these 11 programs are all from CTS, and we should try more programs in engineering to test its ability to handle loops.
5.3.3. Comparison with Choco In this part, we conducted comparison experiments using BFSBB-IIA and the open-source constraint solver Choco, which was adopted by Wang et al. (2013) to solve constraints. The test beds were from the project de118i-2 available at http://www.moshier. net/ and the project people in the library of C source codes of Florida State University available at http://people.sc.fsu.edu/ jburkardt/c_src/c_src.html. The experiments were performed in the environment of Ubuntu 12.04 with 32-bits, Pentium 4 with 3.00 GHz and 4 GB memory. The paths to be covered were provided by CTS, which might contain infeasible paths. The
Table 10 The comparison result with C þ þ test. Program
testplus1.c testplus2.c testplus3.c testplus4.c loop1.c loop2.c loop3.c loop4.c loop5.c loop6.c loop7.c bonus.c days.c statistics.c division.c equation.c pingpang.c prime.c star.c triangle.c lsqrt.c dms.c void bc_jtoc( )
Average generation time (s)
Average coverage
BFS-BB-IIA
Cþ þ test
BFS-BB-IIA (%) Cþ þ test (%)
23.549 42.166 49.464 72.72 0.085 0.085 0.084 0.078 0.09 0.097 0.106 0.921 1.785 0.289 0.182 0.854 0.117 0.206 0.302 0.383 0.097 0.334
13 23 – – 31 22 16 21 32 31 18 12 18 16 21 15 17 15 16 17 16 —
100 100 100 100 100 100 100 100 100 100 92 100 100 100 100 100 100 100 100 100 100 62.5
10 5 – – 100 100 100 100 100 100 92 82 55 90 75 86 100 91 100 77 100 –
comparison adopted three coverage criteria: statement, branch, and MC/DC. For each test bed, the experiments were carried out 100 times, and the number of infeasible paths detected, average generation time, and average coverage were used for comparison. The elements on the infeasible paths did not involve in the coverage calculation. The details of the comparison are shown in Table 11. We discuss the comparison result in three aspects concerned. 1) Infeasible path detection. There are infeasible paths in more than half of the cases as shown in bold in column 6. Choco is not able to detect infeasible paths, but BFS-BB-IIA is. This is very useful, for the following search steps to generate test data are spared.
Y. Xing et al. / Engineering Applications of Artificial Intelligence 45 (2015) 441–452
451
Table 11 The result of the comparison with Choco. Project
de118i-2
people
File
Function
sinl.c
cosl
asinl.c
asinl
atanl.c
atan2l
toms655.c
parchk
treepack.c
i4_power
Coverage criterion
Paths from CTS
Infeasible paths by BFS-BB-IIA
Average generation time (s)
Average coverage
BFS-BB-IIA
Choco
BFS-BB-IIA (%)
Choco (%)
Statement Branch MC/DC Statement Branch MC/DC Statement Branch MC/DC
6 9 9 9 7 12 3 21 35
3 3 4 5 5 9 0 17 26
0.558 0.888 0.346 0.329 0.226 0.123 0.473 0.107 0.122
4.525 3.739 2.891 1.786 1.153 1.178 6.301 2.026 0.781
100 97 98 100 100 98 100 100 100
87 91 82 69 60 39 75 72 71
Statement Branch MC/DC Statement Branch MC/DC
5 6 12 7 7 9
0 0 5 0 0 1
0.472 0.481 0.418 0.110 0.121 0.113
1.817 1.729 1.822 5.193 5.021 5.759
100 100 98 100 100 80
50 31 29 87 75 70
2) Generation time. For all the cases, the generation time consumed by Choco was several times more than BFS-BB-IIA as shown in bold in column 7, and sometimes even more than ten times. BFS-BB-IIA showed an obvious advantage over Choco in terms of generation time. 3) Coverage. For all the cases, the coverage achieved by BFS-BB-IIA was higher than Choco. Choco reached 100% coverage for none of the cases. BFS-BB-IIA achieved 100% except five cases, among which four adopted MC/DC. Just as mentioned in Section 5.3.1, MC/DC is relatively strict, which will be further studied.
6. Conclusion and future work The increasing demand of testing large-scale real-world programs necessitates the automation of the testing process. And as a basic problem in software testing, path-wise test data generation is particularly important. We had proposed a look-ahead search method in our previous research, and in this paper we made improvements on interval arithmetic, which enforces arc consistency. We analyzed in detail the working process of interval arithmetic, and based on the analytical result, the iterative operator was introduced and adopted in the constraint solving process, for the purpose of detecting infeasible paths as well as shortening the time consumption. Experimental results proved the effectiveness of the iterative operator and its applicability in engineering. Our future research will involve how to make interval arithmetic more efficient in arc consistency checking. We will also introduce more arc consistency checking techniques and more ways of representing the values of variables such as affine arithmetic. The MC/DC coverage criterion will be given more emphasis. We will continue to improve the effectiveness of the generation approach and provide better support for more data types.
Acknowledgments This work was supported by the National Grand Fundamental Research 863 Program of China (No. 2012AA011201), the National Natural Science Foundation of China (No. 61202080), and the Major Program of the National Natural Science Foundation of China (No. 91318301).
References Antic, C., Eiter, T., Fink, M., 2013. HEX semantics via approximation fixpoint theory. In: Proceedings of the 12th International Conference on Logic Programming and Nonmonotonic Reasoning, Corunna, Spain,vol. 8148, pp. 102–115. Bertolino, A., 2007. Software testing research: achievements, challenges, dreams. In: Proceedings of the International Conference Future of Software Engineering, Minneapolis, MN, USA, pp. 85–103. Beyleda, S, Gruhn, V., 2003. BINTEST-binary-search-based test case generation. In: Proceedings of the 27th Annual International Computer Software and Applications Conference (COMPSAC). IEEE Computer Society Press, Dallas, TX, USA, pp. 28–33. Blanco, R., Fanjul, J.G., Tuya, J., 2010. Test case generation for transition-pair coverage using scatter search. Int. J. Softw. Eng. Appl. 4, 37–56. Cadar, C., Dunbar, D., Engler, D.R., 2008. KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation, vol. 8, pp. 209–224. Comba, J.L.D., Stol, J., 1993. Affine arithmetic and its applications to computer graphics. In: Proceedings of VI SIBGRAPI (Brazilian Symposium on Computer Graphics and Image Processing), pp. 9–18. Cooper, M.C., de Givry, S., Sánchez, M., Schiex, T., Zytnicki, M., Werner, T., 2010. Soft arc consistency revisited. Artif. Intell. 174 (7), 449–478. Cousot, P., Cousot, R., 1976. Static determination of dynamic properties of programs. In: Proceedings of the Second International Symposium on Programming, Paris, France, pp. 106–130. Cousot, P., 2001. Abstract interpretation based formal methods and future challenges In: Proceedings of Informatics—10 Years Back, 10 Years Ahead. Springer, Berlin, Heidelberg, pp. 138–156. DeMilli, R.A., Offutt, A.J., 1991. Constraint-based automatic test data generation. IEEE Trans. Softw. Eng. 17 (9), 900–910. Elsayed, E.A., 2012. Overview of reliability testing. IEEE Trans. Reliab. 61 (2), 282–291. Fraser, G, Arcuri, A., 2011. Evosuite: automatic test suite generation for objectoriented software. In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, Szeged, Hungary, pp. 416–419. Frost, D., Dechter, R., 1995. Look-ahead value ordering for constraint satisfaction problems. In: Proceedings of International Joint Conference on Artificial Intelligence, Montreal, Canada, pp. 572–578. Gallagher, M.J., Narasimhan, V.L., 1997. Adtest: a test data generation suite for Ada software systems. IEEE Trans. Softw. Eng. 23, 473–484. Galler, S.J., Aichernig, B.K., 2013. Survey on test data generation tools. Int. J. Softw. Tools Technol. Transf., 1–25. Gupta, R, Mathur, A.P., Soffia, M.L., 1999. UNA based iterative test data generation and its evaluation. In: Proceedings of the 14th IEEE International Conference on Automated Software Engineering, pp. 224–232. Hermadi, I., Lokan, C., Sarker, R., 2014. Dynamic stopping criteria for search-based test data generation for path testing. Inf. Softw. Technol. 56 (4), 395–407. Hwang, G.H., Lin, H.Y., Lin, S.Y., Lin, C.S., 2014. Statement-coverage testing for concurrent programs in reachability testing. J. Inf. Sci. Eng. 30, 1095–1113. Kasprzak, W., Szynkiewicz, W., Zlatanov, D., Zielinska, T., 2014. A hierarchical CSP search for path planning of cooperating self-reconfigurable mobile fixtures. Eng. Appl. Artif. Intell. 34, 85–98. Lecoutre, C., Prosser, P., 2006. Maintaining singleton arc consistency. In: Proceedings of Constraint Propagation and Implementation (CPAI), vol. 6, pp. 47–61.
452
Y. Xing et al. / Engineering Applications of Artificial Intelligence 45 (2015) 441–452
Lesniak, K., 2012. Invariant sets and Knaster–Tarski principle. Cent. Eur. J. Math. 10 (6), 2077–2087. Mackworth, A.K., 1977. Consistency in networks of relations. Artif. Intell. 8 (1), 99–118. Mao, C.Y., Yu, X.X., Chen, J.F., 2012. Swarm intelligence-based test data generation for structural testing. In: Proceedings of 11th International Conference on Computer and Information Science, Shanghai, China, pp. 623–628. McMinn, P., 2004. Search-based software test data generation: a survey. Softw. Test. Verif. Reliab. 14 (2), 105–156. Moore, R.E., 1966. Interval Analysis. Prentice-Hall, New Jersey, USA. Moore, R.E., 1979. Methods and Applications of Interval Analysis. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA. Moore, R.E., Kearfott, R.B., Cloud, M.J., 2009. Introduction to Interval Analysis. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA. Pirnia, M., Cañizares, C.A., Bhattacharya, K., 2014. A novel affine arithmetic method to solve optimal power flow problems with uncertainties. IEEE Trans. Power Syst. 29 (6), 2775–2783. Rajan, A., Whalen, M.W., Heimdahl, M.P.E., 2008. The effect of program and model structure on MC/DC test adequacy coverage. In: Proceedings of ACM/IEEE 30th International Conference on Software Engineering, Leipzig, Germany, pp. 161– 170. Robschink, T, Snelting, G, 2002. Efficient path conditions in dependence graphs. In: Proceedings of the 24th International Conference on Software Engineering, ACM, pp. 478–488. Schaerf, A., 1997. Combining local search and look-ahead for scheduling and constraint satisfaction problems. In: Proceedings of International Joint Conference on Artificial Intelligence, Nagoya, Japan, pp. 1254–1259.
Shan, J.H., Wang, J., Qi, Z.C., 2004. Survey on path-wise automatic generation of test data. Acta Electron. Sin. 32 (1), 109–113. Tang, R., Wang, Y.W., Gong, Y.Z., 2012. Research on abstract memory model for test case generation. In: Proceedings of the 7th China Test Conference, Hangzhou, Zhejiang, China. 2012, pp. 144–149. Team, C., 2010. Choco: an open source java constraint programming library. Ecole des Mines de Nantes, Research report. 10-02. Wallace, R.J., 1993. Why AC-3 is almost always better than AC-4 for establishing arc consistency in CSPs. In: Proceedings of International Joint Conference on Artificial Intelligence, Chambery, France, pp. 239–245. Wallace, R.J., Freuder, E.C., 1992. Ordering heuristics for arc consistency algorithms. In: Proceedings of the Biennial Conference—Canadian Society for Computational Studies of Intelligence, Vancouver, Canada, pp.163–169. Wang, Y.W., Gong, Y.Z., Xiao, Q., 2013. A method of test case generation based on necessary interval set. Journal of Computer-Aided Design & Computer Graphics 25, 550–556. Weyuker, E.J., 1999. Evaluation techniques for improving the quality of very large software systems in a cost-effective way. J. Syst. Softw. 47 (2), 97–103. Xing, Y., Gong, Y.Z., Wang, Y.W., Zhang, X.Z., 2014. Path-wise test data generation based on heuristic look-ahead methods. Math. Probl. Eng. 2014. Xing, Y., Gong, Y.Z., Wang, Y.W., Zhang, X.Z., 2015. A hybrid intelligent search algorithm for automatic test data generation. Math. Probl. Eng. (in press). Zhao, R., Harman, M., Li, Z., 2010. Empirical study on the efficiency of search based test generation for EFSM models. In: Proceedings of the Third International Conference on Software Testing, Verification, and Validation Workshops, Paris, France, pp. 222–231.