Automated test data generation for branch testing using genetic algorithm: An improved approach using branch ordering, memory and elitism

The Journal of Systems and Software 86 (2013) 1191–1208 Contents lists available at SciVerse ScienceDirect The Journal of Systems and Software journ...

Download PDF

1MB Sizes 0 Downloads 101 Views

Report

PDF Reader
Full Text

The Journal of Systems and Software 86 (2013) 1191–1208

Contents lists available at SciVerse ScienceDirect

The Journal of Systems and Software journal homepage: www.elsevier.com/locate/jss

Automated test data generation for branch testing using genetic algorithm: An improved approach using branch ordering, memory and elitism Ankur Pachauri, Gursaran Srivastava ∗ Department of Mathematics, Faculty of Science, Dayalbagh Educational Institute, Agra 282005, India

a r t i c l e

i n f o

Article history: Received 3 November 2011 Received in revised form 15 October 2012 Accepted 26 November 2012 Available online 5 January 2013 Keywords: Automated program test data generation Software testing Genetic algorithm

a b s t r a c t One of the problems faced in generating test data for branch coverage using a metaheuristic technique is that the population may not contain any individual that encodes test data for which the execution reaches the predicate node of the target branch. In order to deal with this problem, in this paper, we (a) introduce three approaches for ordering branches for selection as targets for coverage with a genetic algorithm (GA) and (b) experimentally evaluate branch ordering together with elitism and memory to improve test data generation performance. An extensive preliminary study was carried out to help frame the research questions and ﬁne tune GA parameters which were then used in the ﬁnal experimental study. © 2012 Elsevier Inc. All rights reserved.

1. Introduction With the realization that the process of software test data generation can be cast into a search problem, a new area of research called search-based software test data generation has emerged to which metaheuristic techniques can be readily applied (McMinn, 2004; Harman and Mansouri, 2010; Ali et al., 2010; Harman and McMinn, 2010). However, as research points out, the application of metaheuristic techniques poses new challenges. During testing, the program under test P is executed on a test set of test data – a speciﬁc point in the input domain – and the results are evaluated. The test set is constructed to satisfy a test data adequacy criterion that speciﬁes test requirements (Zhu et al., 1997). The branch coverage criterion is a test adequacy criterion that is based on the program ﬂow graph. More formally, a test set T is said to satisfy the branch coverage criterion if on executing P on T, every branch in P’s ﬂow graph is traversed at least once. One of the problems faced in generating test data for branch coverage using a metaheuristic technique such as genetic algorithm is that the population may not contain any individual that encodes test data for which the execution path reaches the predicate node of the target branch. In order to deal with this problem, in this paper, we (a) introduce three approaches for ordering branches for selection as targets for coverage with a genetic algorithm (GA) and (b) experimentally evaluate branch ordering together with elitism and memory to improve test data generation capability.

∗ Corresponding author. Tel.: +91 5622801545; fax: +91 5622801226. E-mail addresses: [email protected] (A. Pachauri), [email protected] (G. Srivastava). 0164-1212/$ – see front matter © 2012 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jss.2012.11.045

The paper is further organized as follows. In Section 2 we describe the motivation for the present study together with a comprehensive review of related work. In Section 3 we introduce the genetic algorithm (GA) and describe its application to program test data generation. The proposed approach to enhance performance of GA based test data generation is described in detail in Section 4. In Section 5 we describe the experiments carried out in this study. This section (i) details the research questions, (ii) describes a preliminary study which was carried out to tune the GA parameters and operators and to frame the research questions and (iii) presents a discussion on the results of the main experiments. Section 6 discusses the threats to validity and limitations and Section 7 concludes the paper with directions for future work. 2. Motivation and related work Test data generation in software testing is the process of identifying a set of test cases to satisfy a selected test data adequacy criterion. Bertolino (2007) points out that the most promising results toward automated test data generation have come from three approaches: model-based, random and search-based approaches. Search-based test data generation is a part of the much broader research area of search-based software engineering (Harman, 2007; Harman et al., 2009; Harman and Mansouri, 2010). Search-based test data generation consists of exploring the input domain of a program under test for test data to satisfy a selected test data adequacy criterion. By using metaheuristic techniques – highlevel frameworks which utilize heuristics in order to ﬁnd solutions – search is directed toward the most promising areas of the domain (Michael et al., 2001; McMinn, 2004). As a technique, it can be used with speciﬁcation-based criteria as well as program-based criteria.

1192

A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208

Metaheuristic techniques such as simulated annealing (Tracey et al., 1998), Tabu Search (Díaz et al., 2003), genetic algorithms (Jones et al., 1996; Michael et al., 1997; Pargas et al., 1999), particle swarm optimization (Windisch et al., 2007), quantum particle swarm optimization (Agarwal and Srivastava, 2010), scatter search (Blanco et al., 2009), ant colony optimization (Li and Lam, 2005), memetic algorithms (Arcuri and Yao, 2007), clonal selection algorithm (Castro and Von Zuben, 2002) and immune genetic algorithm (Liaskos and Roper, 2008; Tan et al., 2009) have been applied to the problem of automated test data generation and provide evidence of their successful application. Amongst these, several have addressed the issue of test data generation with program-based criteria (Michael et al., 2001; Girgis, 2005; Andreou et al., 2007; Chen and Zhong, 2008; Ahmed and Hermadi, 2008; Ghani and Clark, 2009) and in particular the branch coverage criterion (Jones et al., 1998; Wegener et al., 2001; Blanco et al., 2007; Wang et al., 2008; Harman, 2008; Chen et al., 2009; Gross et al., 2009). In program based test data generation using metaheuristic techniques, the basic approach is as follows. The source code of the program under test is instrumented to collect information about the program as it executes. The resulting information, collected during each execution of the program, is used to heuristically determine how close the test case is to satisfying a speciﬁed test requirement as speciﬁed by the selected test criterion. This allows the test generator to modify the program’s inputs gradually, moving them ever closer to values that actually do satisfy the requirement. In other words, the problem of generating test data reduces to the well understood problem of function optimization. Furthermore, such test generation methods can handle arrays and pointer references because the values of array indices and pointers are known throughout the generation process. Wegener et al. (2002) point out that because of the non-linearity of software (conditional statements, loops, ﬂags, switch-case, break) the conversion of test problems into optimization tasks usually results in complex, discontinuous and non-linear search spaces for which search methods such as hill climbing are not suitable, but metaheuristic search methods can be employed. One of the problems faced in generating test data for branch coverage using a population based metaheuristic technique such as genetic algorithm is that when a branch is chosen as the target for coverage, it may happen that none of the individuals (individuals encode input test data) in the population encode inputs for which the execution path reaches the predicate node of the target branch, i.e., a critical branch is taken that causes the predicate node of the target branch to be missed in an execution of the program (Ferguson and Korel, 1996; McMinn and Holcombe, 2006). In order to deal with this problem, Michael et al. (2001) postpone the selection of the branch for coverage and Baresel et al. (2002) have described the design of ﬁtness functions to guide search. The usual approach adopted is to compute the individual ﬁtness in a way that it incorporates information about how close was the input test data in reaching the target of interest, called approach level, and combine this with branch distance data which reﬂects how close was the sibling branch, of the critical branch, was to be taken and which would have actually taken the traversal closer to the target branch. Accordingly, the ﬁtness of an individual is computed as approach level + normalized branch distance (McMinn, 2004; Harman and McMinn, 2010; Arcuri, 2010, McMinn, 2011). The possibility of selecting target branches in a speciﬁc sequence and augmenting the metaheuristic process in a way that from the step (generation) a target is selected to the generation that it is covered, the current population has at least one individual that encodes inputs for a path that includes the sibling branch of the target branch, has not been explored. We hypothesize that this should result in better coverage and performance. Harman et al. (Harman et al., 2009) note that since 1995, there has been an upsurge in the

works in search based test data generation based on the achievement of branch coverage and cite a number of references, but do not mention the idea of branch ordering and augmenting the search techniques as described above, in their report. McMinn (2011) in his paper also focuses primarily on the design of ﬁtness functions. In this paper we consider three approaches for branch ordering for target selection together with elitism and memory and evaluate them experimentally. One important issue in search-based test data generation is that of scalability. In our context this concerns the number of branches, the nesting depth and the search space. These have been addressed in the literature in the context of the genetic algorithm. Mehrmand (2009) has conducted a factorial experiment on the scalability of search-based software testing with Java programs. He concludes that GA can outperform random testing as complexity, in terms of number of branches and statements, is increased. Xiao et al. (2007) have conducted a scalability analysis in the context of goal oriented automated test data generation techniques. They conclude that GA performs better for both small and large search spaces in the context of condition-decision coverage. In general, Harman et al. (2009) point out that search-based software engineering has attractive scalability potential through parallel executions of ﬁtness computations. In light of these experiments and observations, the improvements proposed in this approach assume importance since they may be used to enhance test data generation performance as programs become large with increased number of branches and search space. 3. Background In this section we ﬁrst describe the genetic algorithm and then describe GA-based test data generation in detail. 3.1. Genetic algorithm Genetic algorithm (GA) is a metaheuristic search technique that is based on the idea of genetics and evolution in which new and ﬁtter set of string individuals are created by combining portions of ﬁttest string individuals of the parent population (Goldberg, 1989). A genetic algorithm execution begins with a random initial population of candidate solutions. Each candidate solution is generally a vector of parameters usually encoded in binary string (or bit string) called a chromosome or an individual. If there are m input parameters with the ith parameter expressed in ni bits, then the length of the chromosome is simply i ni . In this paper each individual, or chromosome, encodes test data. After creating the initial population, each chromosome is evaluated and assigned a ﬁtness value. From this initial selection, the population of individuals iteratively evolves to one in which candidates satisfy some termination criteria or, as in our case, fail to make any forward progress. Each iteration step is also called a generation. Each generation may be viewed as a two stage process (Goldberg, 1989). Beginning with the current population (also called the parent population), selection is applied to create an intermediate population and then crossover and mutation are applied to this population. Another (selection) step is then applied to the individuals from the intermediate and the current generation parent population to create the parent population for the next generation. In generational GA, the intermediate population replaces the current generation’s parent population to create the parent population for the next generation whereas in the steady state GA a small percentage of worst individuals from the parent population is replaced with best individuals from the intermediate population. In the case of generational GA, elitism ensures that the ﬁttest chromosomes survive from one population to the next.

A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208

1193

1. Choose an appropriate test adequacy criterion. This in our case is the branch coverage criterion. 2. Setup genetic algorithm. a. Select a representation for test data to be input to program P. b. Define a fitness function. c. Instrument the program P to create program Pt. The instrumented program Pt is used directly for test data generation. d. Select suitable genetic algorithm parameters and operators. 3. Generate test data. a. Run the genetic algorithm for test data generation using Pt for fitness computation. b. Identify and eliminate infeasibility. c. Regenerate test data if necessary.

Fig. 1. Complete steps for test data generation using genetic algorithm.

The chromosome length, population size, and the various probability values in a GA application are referred to as the GA parameters in this paper. Selection, crossover, mutation are also referred to as the GA operators. 3.2. GA-based test data generation Let P be the program under test, then a general sequence of steps for test data generation using genetic algorithm is described in Fig. 1. Test data is generated to meet the requirements of a particular test data adequacy criterion. The criterion in our case is the branch coverage criterion. McMinn (2011) points out that there are, in general, two requirements to be fulﬁlled to apply metaheuristic techniques to testing problems: (a) representation and (b) ﬁtness function. He further notes that a suitable representation is automatically available for the case of test data generation where the input vector or sequences of inputs to the software under test can be optimized. The search space is thus deﬁned over the represented inputs to the program under test and may be very large. However, for the application of the genetic algorithm, this search space does not have to be determined beforehand. Thus the basic choice that all search algorithms, including genetic algorithm, have to make at each stage of the search is which point (or points) in the search space to sample next (Radcliffe, 1997). At the start, assuming no prior knowledge, the choice is essentially random. After a while, as various points are sampled, the choice becomes more informed. In genetic algorithm, the population acts as the memory, and the choice of the next point in the search space to sample is determined by applying different operators on the individuals based on their ﬁtness. In our case, the inputs for one execution of P, i.e., a single test data, are represented in a binary string also called a binary individual. How real values can be represented in a binary string is described in detail in Michalewicz (1996). The ﬁtness of a binary individual is computed as Fitness(x) = Approximation Level + Normalized Branch Distance As opposed to the usual practice of formulating the generation problem as a ﬁtness minimization problem, in this work it is taken to be a maximization problem (Pachauri and Gursaran, 2012). The deﬁnition of approximation level and normalized branch distance is also different from (McMinn, 2004) although the basic idea is similar. A critical branch (Korel, 1990), as deﬁned earlier, is a branch that leads the execution away from the target branch in a path through the program. The approximation Level is a count of the number of predicate nodes in the shortest path from the ﬁrst predicate node, from the start node, in the ﬂow graph to the predicate node with the critical branch. See Fig. 2 for an example. The shortest path is chosen to avoid loops and take care of multiple paths that may be followed

Table 1 Branch distance computation.

1 2 3 4 5 6 7 8

Decision type

Branch distance

ab a >= b a == b a != b a && b a || b

a–b a–b b–a b–a Abs(a–b) Abs(a–b) a+b min(a, b)

to reach the critical branch. The Normalized Branch Distance is calculated according to the formula Normalized branch distance =

1

1.001distance

where distance, or branch distance, as deﬁned in Baresel et al. (2002) and McMinn and Holcombe (2006), is computed at the node with the critical branch using the values of the variables and constants involved in the predicates used in the conditions of the corresponding branching statement. However, the deﬁnition of normalized branch distance is different from the deﬁnition of McMinn (2004) as the problem is a maximization problem. Table 1 shows the computation of branch distance for different conditions. Entries one to ﬁve are the same as in Korel (1990). Table 1 also describes the computation of branch distance in the presence of logical operators AND (&&) and OR (||). In both these cases, the deﬁnition takes into account the fact that branch distance is to be minimized whereas the ﬁtness is to be maximized. Fig. 2 illustrates branch distance computation for different cases. In Fig. 1, Step 2d involves the selection of appropriate parameter values and step 3a involves an application of the genetic algorithm. Implementation of Step 3a is described in Fig. 3. Step 3c states: regenerate test data if necessary. A tester may choose to rerun the GA for two reasons: infeasibility has been eliminated or a larger test set is required to improve conﬁdence (Beizer, 2002; Biezer, 2009) Fig. 3 outlines the test data generation procedure with a genetic algorithm, i.e., the implementation of step 3a in Fig. 1. There are a number of parameters and operators whose values and types need to be decided. A preliminary study was carried out to determine the best values of parameters and types of operators. This is described in detail in Section 5.2. Harman et al. (2009) note that in early works the ﬁtness function sought to maximize the number of branches covered. This was found to be inadequate as it tended to avoid the branches that were hard to cover. In the works that appeared later, the coverage of each branch is viewed as a test objective, for which a ﬁtness function is constructed based upon the path taken by an individual test data. This, as described above, is the approach taken in this paper. The termination criterion for a GA run for each test objective can be target

1194

A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208

0

(a<=0 || b<=0 || c<=0)

False

Target Missed Approximaon Level 0 Branch Distance = Min ((a-0), (b-0), (c-0))

2

(a>10000 || b>10000 || c>10000)

False

4

(a<=b && b<=c || c<=2)

Target Missed Approximaon Level 1 Branch Distance = Min ((10000-a), (10000-b), (10000-c))

Target Missed Approximaon Level 3 Branch Distance = Min (((a-b) + (b-c)), (c-2))

False

TARGET

Fig. 2. Approximation level and branch distance computation.

branch coverage or a predeﬁned maximum number of GA iterations or population convergence. In the experiments in this paper, the GA run is terminated on whichever occurs earlier: coverage of all branches or a predeﬁned maximum number of iterations. 4. Proposed approach Apart from genetic algorithm parameters and operators, additional features can be incorporated to improve test data generation performance. These are: • Target branch selection method. See step 5 in Fig. 3. This is discussed in detail in Section 4.1. • Memory, see step 7 in Fig. 3. In a GA application, it is possible that a predicate node that was reached in an earlier generation is no

longer reached in the current generation and if a branch at that node is the target branch, then the ﬁtness of all the individuals becomes low. In order to circumvent this problem, we store the individuals that traverse a branch and use them when the sibling branch is chosen as the target node. Storing individuals for later use is ‘memory’. This is detailed further in Section 4.3. • Elitism at step 14 in Fig. 3.

4.1. Branch ordering As discussed in Section 2 the issue of branch selection order has not been explored. Branches may be ordered using any one of the following schemes:

1. Start with a randomly generated population of n individuals (test data) for program under test P 2. While (termination criterion is not met) { 3. Execute Pt on each individual (test data) x in the current population popcur; 4. If (selected target branch b is traversed or if no target has been selected) 5. Use a branch selection strategy to identify a new target branch b; 6. Calculate the fitness f(x) of each individual x in the population with respect to target b; 7. If available, use memory to replace worst individuals in the population; 8. Initialize intermediate population popinter to empty; // generate a new population 9. Repeat { 10. Select a pair of parent individuals from popcur; 11. With probability Pc (the crossover probability), crossover the parents to form two offspring (or children); 12. Mutate the two offspring with probability Pm (the mutation probability), and place the resulting individuals in popinter; } until (n individuals have been added to popinter); 13. 14. Replace (completely or partially) individuals in popcur with individuals from popinter (elitism is used at this step) 15. } Fig. 3. Test data generation with genetic algorithm.

A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208

1195

1

2

3

14 4

5

13 6

8

7

9

10

11

12 Fig. 5. Example graph for path preﬁx strategy.

DFS Order

(1,2),(2,4),(4,6),(6,8),(8,12),(6,9),(9,12),(4,7),(7,10),

BFS Order

(1,2),(1,3),(3,14,(2,4),(2,5),(5,13),(4,6),(4,7),(6,8), (6,9),(7,10),(7,11),(8,12),(9,12),(10,12),(11,12)

(10,12),(7,11),(11,12),(2,5),(5,13),(1,3),(3,14)

Fig. 4. Branch orders with DFS and BFS: an example.

• • • •

Breadth ﬁrst strategy (BFS). Depth ﬁrst strategy (DFS). Path preﬁx strategy (PPS). Random strategy (RNS).

is covered by p. For example, in Fig. 5, if path traversed is p: s,. . .,d, b, . . ., f, branch (d,b ) is not covered and s, . . ., d is the minimal initial portion of p that satisﬁes the condition above, then s, . . ., d,b is a reversible preﬁx. Accordingly, branch (d,b ) is a candidate for selection for coverage. If branch (d,b ) is selected for coverage, then the path s, . . ., d,b is said to be the reversal of path s, . . ., d,b. In the path preﬁx strategy, at any stage, if p1 , . . ., pk−1 are the traversed paths, the idea is to ﬁnd an input xk to cause the reversal of shortest reversible preﬁx q, with reversal q , amongst all the pi ’s. In our case preﬁx q identiﬁes the next branch to be considered for coverage. 4.2. Population replacement strategy and elitism

All the four methods use the control ﬂow graph for ordering the branches. As the names suggest depth ﬁrst strategy and breadth ﬁrst strategy order branches according to the sequence in which they are examined in a depth ﬁrst and breadth ﬁrst search of the control ﬂow graph beginning with the start node. Fig. 4 illustrates the ordering. The random strategy orders the branches randomly. The path preﬁx strategy is described as follows. To achieve path coverage, as opposed to branch coverage, the problem is to ﬁnd a program input that result in the traversal of the path. This may become difﬁcult in the presence of infeasible paths. In an attempt to circumvent this, Prather and Myers (1987) suggested the use of an adaptive strategy in which one new test path, or sub-path is added at a time and previous paths serve as a guide for selection of subsequent paths using some inductive strategy. The path preﬁx strategy suggested by them is one such strategy for test data generation. We have used this strategy to select a branch for coverage. The sequence in which the branches are selected deﬁnes an ordering of the branches. For a path p that is traversed in an execution, a reversible preﬁx q is deﬁned as the minimal initial portion of path p to a decision node d, whose branches are not fully covered, and the branch that

There are two issues that need to be addressed: (i) population initialization every time a new target branch is selected and (ii) replacement of the parent population with the child population. In case (i) the question is, should we initialize the population each time a new target is selected? A preliminary study was carried out to choose the best strategy. This study is described in Section 5.2. The steps outlined in Fig. 1 assume that the population is not initialized each time a new target branch is selected. Coming to issue (ii), in a generational GA, the entire parent population is replaced with the child population in each generation. The preliminary study has shown that, at times, the number of individuals (ﬁt individuals here) in the parent population for which the execution path reaches the predicate node with the target branch may be small. In this situation, crossover and mutation may give a child population for which the predicate node is no longer reached, i.e., the execution path for every individual misses the predicate node with the target branch. It may thus be necessary to preserve the ﬁt individuals from one generation to the next leading to an Elitist strategy. In our experiments we carry forward up to 10% of ﬁt individuals, with a minimum of one ﬁt individual, to the next generation.

1196

A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208

4.3. Memory In one generation of a GA run, since the program under test is executed once for each individual in the population, it is possible that branches other than the target branch which have not been covered earlier are traversed. Further, it is possible that when a branch is selected as the target branch, although its sibling branch has been traversed earlier, no individual in the current population results in an execution path that reaches the corresponding predicate node. In this situation, it would clearly be helpful if the population is supplemented with those individuals whose execution paths cover the target’s sibling branch. In order to facilitate this, each time a branch is traversed for the ﬁrst time, up to ﬁve individuals that cause the branch to be traversed are stored. This is what we refer to as memory. Now each time a branch is selected for coverage, up to ﬁve worst individuals in the current population are replaced with individuals from memory that traverse its sibling branch. However, it should be noted here that additionally an elitist strategy may be required to preserve individuals that traverse the sibling, across generations. 5. Experiments In this section we describe the research questions and the experiments carried out to address them. In Section 5.1 we describe the research questions and the subjects of study. In Section In Section 5.2 we describe the preliminary study in detail and in Section 5.3 we present the main study. 5.1. Study design In this section we outline the research questions of the present study and describe the benchmark programs that are subjects of study. 5.1.1. Research questions In Section 2.0 it was noted that the possibility of selecting target branches in a speciﬁc sequence and augmenting the metaheuristic process in a way that from the step (generation) a target is selected to the generation that it is covered, the current population has at least one individual that encodes inputs for a path that includes the sibling branch of the target branch, has not been explored. Ensuring this, we had hypothesized, should result in better performance of the test data generation process. To this end, in Section 4, it had been proposed that the following features be incorporated, namely, branch ordering, elitism and memory. In Section 4.2 it was argued that crossover and mutation are disruptive and may result in a child population for which the predicate node of the target branch is no longer reached, i.e., the execution path for every individual misses the predicate node with the target branch. It may thus be necessary to preserve the ﬁt individuals from one generation to the next with the help of elitism. Thus instead of investigating of the effect of each of elitism, memory and branch ordering individually on test data generation performance, the research questions address the effects of combining features. The reason for this is further grounded in the preliminary study which is discussed in detail in Section 5.2.3. The research questions for the present study are as follows: Research Question One: Does elitism together with memory improve test data generation performance? Research Question Two: Does branch ordering together with elitism and memory improve test data generation performance? Research Question Three: Does the choice of a particular branch ordering scheme affect test data generation performance?

5.1.2. Subjects of study Standard benchmark programs were chosen as the subjects of study. These have been taken from Díaz et al. (2008) and Blanco et al. (2009). The programs have a number of features such as real inputs, equality conditions with the AND operator and deeply nested predicates that make them suitable for testing different approaches for test data generation. • Line in a Rectangle Problem (Rectangle): This program takes eight real inputs, four of which represent the coordinates of a rectangle and other four represents the coordinates of a line. The program determines the position of the line with respect to the position of rectangle and generates one out of four possible outputs: (A) The line is completely inside the rectangle; (B) The line is completely outside the rectangle; (C) The line is partially covered by the rectangle; and (D) Error: The input values do not deﬁne a line and/or a rectangle. • The maximum nesting level is 12. The program’s CFG has 54 nodes with 18 predicate nodes. • Number of Days between Two Dates Problem (Date): This program calculates the days between two given dates of the current century. It takes six integer inputs – three of which represent the ﬁrst date (day, month, and year) and other three represents the second date (day, month, and year). It has 128 nodes. This program includes a number of branches with equality conditions. Some of them use the remainder operator (%), which adds discontinuity to the decisions domains and therefore pose a greater difﬁculty in ﬁnding the tests that cover those branches. The nesting level is very high for most of the branches and, in combination with the AND decisions, the equality conditions and the use of the remainder operator, make this program very appropriate to evaluate the effectiveness and efﬁciency of an automatic test generator for the branch coverage criterion. The CFG has 43 predicate nodes. • Calday: This routine returns the Julian day number. There are three integer inputs to the program. First input represents month, second represents day and the third represent the year. Its CFG has 27 Nodes with 11 predicate nodes. It has equality conditions and the use of the remainder operator. The maximum nesting level is 8. • Complex Branch: It accepts 6 short integer inputs. In this routine there are some complex predicate conditions with relational operators combined with complex AND and OR conditions, it also contains while loops and SWITCH-CASE statement. Its CFG contains 30 nodes. • Meyer’s Triangle Classiﬁer Problem: This program classiﬁes a triangle on the basis of its input sides as non-triangle or a triangle, i.e., isosceles, equilateral or scalene. It takes three real inputs all of which represent the sides of the triangle. It’s CFG has 14 Nodes with 6 predicate nodes. The maximum nesting level is 5. It has equality conditions with AND operator, which make the branches difﬁcult to cover. • Sthamer’s Triangle Classiﬁer Problem: This program also classiﬁes a triangle on the basis of its input sides as non-triangle or a triangle that is isosceles, equilateral, right angle triangle or scalene. It takes three real inputs; all of them represent the sides of the triangle but with different predicate conditions. Its CFG has 29 Nodes with 13 predicate nodes. The maximum nesting level is 12. It has equality conditions with AND operator and complex relational operators. • Wegener’s Triangle Classiﬁer Problem: This program also classiﬁes a triangle on the basis of its input sides as non-triangle or a triangle that is isosceles, equilateral, orthogonal or obtuse angle. It takes three real inputs; all of them represent the sides of the triangle but with different predicate conditions. Its CFG has 32 Nodes with 13 predicate nodes. The maximum nesting level is 9.

A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208

• Michael’s Triangle Classiﬁer Problem: This program also classiﬁes a triangle on the basis of its input sides as non-triangle or a triangle that is isosceles, equilateral or scalene. It takes three real inputs; all of them represent the sides of the triangle but with different predicate conditions. It’s CFG has 26 Nodes with 11 predicate nodes. The maximum nesting level is 6.

1197

12. Maximum number of generations – 107 . 13. Length of Chromosome (Binary String) – 63. The exact combinations of parameter, operator settings and additional features for which experiments were ﬁnally carried out were as follows:

5.2. Tuning the GA A preliminary study was carried out to • Determine the appropriate combinations of parameters and operators for the genetic algorithm. • Study the effect of elitism, memory and branch ordering so as help frame the research questions. • Mitigate the effects of random variation and to emphasize the need for memory and elitism. 5.2.1. Preliminary study: design Data was collected for combinations of the following settings: 1. Pilot benchmark program – Meyer’s triangle classiﬁcation program. The program takes three real inputs, has equality conditions, compound conditions and maximum nesting level of 5. All of which make it suitable for the preliminary study. 2. Population size – 6, 10, 16, 20, 26, 30, 36, . . ., 110. 3. Crossover type – Two point crossover and uniform crossover. 4. Crossover probability (Pc ) – 1.0. 5. Mutation probability (Pm ) – 0.01. Following experiments, as described later, the value for Pm was taken as 0.01. 6. Selection method – Roulette wheel selection (RW), tournament selection (TS) and pool tournament (PT). With roulette wheel selection, sigma truncation was used as the ﬁtness scaling scheme as this helps circumvent the problem of negatively scaled ﬁtness in later stages of the GA run. Experiments were carried out to determine an appropriate tournament size for TS. Accordingly, the ﬁnal tournament size was taken to be two, i.e., we have a binary tournament selection. In pool tournament, ﬁrst for each individual, two individuals are randomly selected, with replacement, to form a pool of three individuals from which the best two are selected. This results in a population of double the size of the original population. On this population, a tournament of size two without replacement is carried out. This reduces the population by half. 7. Branch ordering scheme – Depth ﬁrst search (DFS). However, for two experiments, the branch ordering was determined as follows. First, the nodes in the CFG were numbered, beginning with the start node, in an arbitrary order that interleaves a depth ﬁrst movement with breadth ﬁrst movement, again arbitrarily. This numbering was then used to sequence the branches using the numbering given to the head node of the branch. This scheme is simple referred to as the arbitrary branch ordering scheme (ABO). In ABO, if a branch is covered before it is selected as the target branch, it is marked as covered and never reselected as a target branch. 8. Adequacy criterion – Branch Coverage. 9. Fitness Function – Normalized branch distance (NBD) and the function described in Section 3.2 which takes approximation level into account (AL&NBD). 10. Population initialization – Initialize with every new target (IENT) and initialize once at the beginning of the GA run (NIENT). 11. Population replacement strategy – Replace entire parent population. Use elitism in which up to ten percent of the ﬁttest population is carried forward to the next generation.

A. IENT, NBD, ABO, Roulette wheel selection, two point crossover, Pm : 0.01, Pc : 1.0. B. NIENT, NBD, ABO, Roulette wheel selection, two point crossover, Pm : 0.01, Pc : 1.0. C. NIENT, NBD, DFS, Roulette wheel selection, two point crossover, Pm : 0.01, Pc : 1.0. D. NIENT, NBD, DFS with Memory, Roulette wheel selection, two point crossover, Pm : 0.01, Pc : 1.0. E. NIENT, AL&NBD, DFS with Memory, Roulette wheel selection, elitism, two point crossover, Pm : 0.01, Pc : 1.0. F. NIENT, AL&NBD, DFS with Memory, binary tournament, elitism, two point crossover, Pm : 0.01, Pc : 1.0. G. NIENT, AL&NBD, DFS with Memory, binary tournament, elitism, uniform crossover, Pm : 0.01, Pc : 1.0. H. Within (F), different tournament sizes. I. Within (F), different mutation probabilities. J. (F) and ﬁtness computation as in McMinn (2004). In order to ensure that our strategy for ﬁtness computation is no worse than the minimization strategy of McMinn et al. (2008) we experiment with the settings in (F) but with ﬁtness computed as described in McMinn (2004). K. (F) without approximation level in ﬁtness computation process. Since DFS, memory and elitism should ensure that there are individuals in the current population passing through the sibling branch of the selected target; it should be possible to eliminate the use of approximation level in ﬁtness computation for those individuals that lead to the traversal of the sibling branch of the target. However, it should be noted that approximation level would be required for those individuals in the current population that take some critical branch away from the target branch’s predicate node. As can be seen, the experimental combinations above include all the settings in a comparative way. (A) and (B) compare IENT and NIENT and NIENT is selected for further experiments. In (C) DFS is included, in (D) memory is included and in (E) elitism is taken into account and the ﬁtness function is also modiﬁed to include the approximation level. Considering DFS with memory and elitism, in (F) the selection scheme is modiﬁed to tournament selection and in (G) the crossover scheme is modiﬁed to uniform crossover. (H) compares different tournament sizes and (I) mutation probabilities with the settings for experiment (F). With (H) we build a case for selecting the tournament size as two and with (I) for selecting Pm as 0.01. In an attempt to mitigate the effects of random variation and to emphasize the need for memory and elitism, the following experiments were further carried out: L. (F) without memory, i.e., NIENT, AL&NBD, DFS, binary tournament, elitism, two point crossover, Pm :0.01, Pc :1.0. M. (F) without elitism. i.e., NIENT, AL&NBD, DFS with Memory, binary tournament, two point crossover, Pm : 0.01, Pc : 1.0. N. N. (F) without memory and without elitism. O. (F) without memory, without elitism and all branches are selected as target one by one. Further, steady state GA was also considered. P. Steady state GA: Ninety percent of the ﬁttest individuals in the current population popcur are copied to the next generation. The remaining ten percent of the next generation population is constructed as follows. A pool of 25% of the ﬁttest individuals from

1198

A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208 A

B

C

D

E

F

G

Table 2 Summary of Tukey test for mean number of generations for (A) to (K).

K

Mean Generaons

2000000

Combination

D

E

F

Summary of Tukey test

Signiﬁcantly different from A and C for population size 10, A and B for population size 20 and A, B and C for all remaining population sizes

Signiﬁcantly different from A, B and C for all population sizes

Signiﬁcantly different from A, B and C for all population sizes

200000

20000

2000

6 10 16 20 26 30 36 40 46 50 56 60 66 70 76 80 86 90 96 100 106 110

Populaon Size

Fig. 6. Plots of population size vs mean number of generations for (A) to (K).

popcur is selected, crossed over and mutated and out of these the ﬁttest individuals are selected to make up the ten percent of the next generation population. In the steady state GA, memory and elitism are not used. The remaining settings are as in (F). 5.2.2. Preliminary study: setup For each combination (A) to (P) and for each population size, hundred experiments were carried out and the following statistics were collected: • Mean number of generations. The maximum number of generations for the experiments was taken to be 107 . Accordingly, the termination criterion for each experiment is either full branch coverage or 107 generations whichever occurs earlier. It is possible that full branch coverage is not achieved even after 107 generations. The mean number of generations thus does not tell us whether full branch coverage is achieved, hence the second statistic. • Mean percentage coverage achieved. ANOVA test was carried out using SYSTAT 9.0 to determine if there is a signiﬁcant difference in the combinations for each population size. Additionally Tukey test was carried out to compare pairs of means for (A) to (F) for signiﬁcance. The results of the experiments are described in the next section. All the programs for the preliminary study and the main experiments were implemented in ‘C’. 5.2.3. Preliminary study: results and discussion Fig. 6 shows the mean number of generations for combinations (A) to (G) and (K). Fig. 7 shows the mean percentage coverage for different population sizes and Table 2 summarizes the results of the

Tukey test for mean number of generations. In Fig. 6 we can see that the best results are obtained for (F) and Fig. 7 shows that full (100%) coverage is achieved for all experiments and for all population sizes for (F). Tukey test (Table 2) and Fig. 6 show that the mean number of generations obtained for (F) is signiﬁcantly lower than (A), (B), and (C). The difference between (E) and (F) is in the selection scheme and between (F) and (G) in the crossover scheme. (F) uses binary tournament selection and (E) roulette wheel selection whereas in (G) uniform crossover is used as opposed to two point crossover in (F). This shows that binary tournament, memory, elitism and two point crossover together improve the performance of a GA. Performance degrades, though not signiﬁcantly as shown by Tukey test, as two point crossover is replaced with uniform crossover. This could be because uniform crossover is more disruptive. Furthermore, performance also degrades, though not signiﬁcantly as shown by Tukey test, as tournament selection is replaced with roulette wheel selection keeping other settings the same. This could be attributed to the weak selection pressure of binary tournament. In the case of (A) to (D) the mean coverage is less than 100% for lower populations. This changes to 100% with (E) which includes elitism. With elitism, individuals that traverse the sibling branch are preserved and not lost because of crossover and mutation. It was argued that since DFS, memory and elitism should ensure that there are individuals in the current population passing through the sibling branch of the selected target, it should be possible to eliminate the use of approximation level in ﬁtness computation for those individuals that lead to the traversal of the sibling branch of the target. With (K) the approximation level component was eliminated completely even though it is understood that it would be required for those individuals in the current population that take some critical branch away from the target branch’s predicate node. From Fig. 6 and Fig. 7, it is observable that the performance of (K) is comparable to that of (F) and that there is no signiﬁcant difference in means for different population sizes as can be seen in Table 3. We, however, retain the approximation level component for ﬁtness computation in our main experiments.

10000000 B

C

D

E

F

G

L

M

N

O

P

K

99

Mean Generaons

Mean Percentage Coverage

A

F

97 95 93 91

1000000

100000

10000

89 87 85

6

10 16 20 26 30 36 40 46 50 56 60 66 70 76 80 86 90 96 100 106 110

Populaon Size Fig. 7. Plots of population size vs mean percentage coverage for (A) to (K).

1000

6 10 16 20 26 30 36 40 46 50 56 60 66 70 76 80 86 90 96 100106110

Populaon Size

Fig. 8. Plots of population size vs mean number of generations for (F) to (P).

A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208

1199

Table 3 Results of ANOVA for Mean Number of Generations for (F) and (K). Benchmark program

ANOVA test (F & K)

Preliminary Study

F

F value p value

L

Population size

6

10

16

20

26

30

46

60

76

90

100

110

0.06 0.807

3.533 0.062

0.385 0.535

0.313 0.576

0.565 0.453

7.172 0.008

0.896 0.345

0.821 0.366

0.744 0.389

0.698 0.404

2.618 0.107

0.126 0.723

M

N

O

P

in Section 4.2, crossover and mutation may lead to loss of individuals from the parent population whose execution paths cover the predicate node of the target branch and it may thus be necessary to preserve the ﬁt individuals from one generation to the next. This may also be a reason why memory alone may possibly not help in improving performance and, further, branch ordering alone may also not improve performance signiﬁcantly. The latter is also apparent when (N) is compared with (F) and (L). The research questions outlined in Section 5.1.1 thus consider increments in features (branch ordering, elitism and memory), as opposed to each feature individually. Further, Research Question Three in Section 5.1.1 considers the role of different branch ordering schemes.Appendix details the outcome of the preliminary study on the following: (i) a comparison of (F) and (J), (ii) a comparison of different tournament sizes, i.e., (H) and (iii) selection of an appropriate value for mutation probability Pm .

100

Mean Percentage Coverage

95 90 85 80 75 70 65 60 55 50

6 10 16 20 26 30 36 40 46 50 56 60 66 70 76 80 86 90 96 100106110 Populaon Size

Fig. 9. Graph of population size vs mean percentage coverage for (F) to (P).

5.3. Study results Fig. 8 presents the mean number of generations for combinations (F) to (P), Fig. 9 shows the mean percentage coverage for different population sizes, Table 4 shows the results of ANOVA and Table 5 summarizes the results of the Tukey test for mean number of generations. It can be seen in Fig. 8 that the best results are obtained for (F) which includes branch ordering, memory and elitism. Without memory (L), the performance degrades, though not signiﬁcantly as indicated by Tukey test. However, without elitism, the performance degrades signiﬁcantly as indicated by both ANOVA and Tukey test. Tukey test indicates that (F) is signiﬁcantly different from (M), (N) and (O). Full coverage is achieved with (F) for all population sizes in all experiments. For (N), (O) and (P) this is not achieved for any population size. The poor performance of (P) can be attributed to the fact that much of the population does not undergo change and the small percentage of individuals that are replaced may be less ﬁt than the parents. Experiments (F), (L), (M), (N) and (O) have helped frame the research questions. (F) includes branch ordering, elitism and memory. On removing memory, i.e., case (L), performance does not degrade signiﬁcantly in terms of mean generations and mean percentage coverage. However, on removing elitism, case (M), and on removing both memory and elitism, case (N), performance degrades signiﬁcantly in comparison with (F). As observed

In this section we present the results of our main experiments on the benchmark programs. Section 5.3.1 describes the experimental approach, Sections 5.3.2, 5.3.3 and 5.3.4present the results and discussion for research questions one, two and three respectively. 5.3.1. Approach In order to answer the research questions listed in Section 5.1.1, the following approach has been taken: i. The scheme in which branches are selected randomly for coverage and there is no memory and no elitism is taken as the baseline for comparison. This scheme is referred to as RAN hereafter. ii. To answer the Research Question One, RAN is compared with strategy RNS. RNS includes elitism and memory but branches are selected randomly. iii. To answer Research Question Two, elitism and memory are combined with each branch ordering strategy and compared with RNS and RAN. The resulting schemes are named after the branch ordering technique. These are: a. DFS – considers Depth First Strategy together with memory and elitism

Table 4 Results of ANOVA for mean number of generations for (F) to (P). Benchmark program

ANOVA test (F, L, M, N, O, P)

Preliminary study

F value p value

Population size

6

10

586.91 312.04 0.0 0.0

16

20

26

30

46

60

76

90

100

110

166.89 0.0

113.75 0.0

85.7 0.0

77.67 0.0

55.14 0.0

44.68 0.0

36.86 0.0

33.8 0.0

36.01 0.0

36.34 0.0

Table 5 Summary of Tukey test for mean number of generations for (F) to (P). Combination

F

L

P

Summary of Tukey test

Signiﬁcantly different from M, N, O and P for all population sizes

Signiﬁcantly different from M, N, O and P for all population sizes

Signiﬁcantly different from F, L, M, N and O for all population sizes

1200

A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208

Table 6 GA parameter and operator settings. Parameter/operator

Value

1 2 3 4 5 6 7

Population size Crossover type Crossover probability Mutation probability Selection method Branch ordering scheme Fitness function

8

Population initialization

9

Population replacement strategy

6, 10, 16, 20, 26, . . ., 110. Two point crossover 1.0 0.01 Binary tournament DFS, BFS, PPS, RNS, RAN AL&NBD, approximation level with normalized branch distance Initialize once at the beginning of the GA run Elitism with up to 10% carry forward 107 Yes

Maximum number of generations Memory

10 11

carried out to compare pairs of mean number of generations for signiﬁcance.It may be noted that in all the runs, a branch is marked as covered the ﬁrst time it is traversed, even if it is not the target. Branches covered in some previous generation are not selected as target subsequently. 5.3.2. Research Question One Research Question One: Does elitism together with memory improve test data generation performance? The study compares RAN and RNS. 5.3.2.1. Results. Results for different benchmark programs are presented in separate ﬁgures from Figs. 10–17. Each ﬁgure presents the following: A. Mean number of generations for each population size. B. Mean percentage coverage achieved. This is omitted if full branch coverage is achieved in all runs. C. Tables showing results of ANOVA.

b. BFS – considers Breadth First Strategy together with memory and elitism c. PPS – considers Path Preﬁx Strategy together with memory and elitism iv. To answer Research Question Three, schemes DFS, BFS and PPS are compared.

F and p values for signiﬁcance in difference of means for different branch orderings. An alpha (˛) value of 0.05, i.e., 95% conﬁdence, is considered for all discussions on signiﬁcance in difference of means. It may be noted that Figs. 10–17 present the results for all the research questions and will be referred to in all the subsequent sections. Table 7 presents the results of the Tukey test. The column of interest in Table 7 is RNS.

Parameter and operator setting for GA was obtained from the preliminary study described in Section 5.2. These are detailed in Table 6. For each scheme and a population size, hundred experiments (called runs hereafter) were carried out and the following statistics were collected as in the case of the preliminary study:

5.3.2.2. Discussion. We can see from Table 7 that the results of RNS are not always signiﬁcant. However, as can be observed in the graphs in Figs. 11–17, the results are improved over RAN both in terms of mean number of generations and mean percentage coverage. The observed results may be explained as follows. With both RAN and RNS it is possible that when the target branch is selected, no individual is present in the population for which the execution path covers the predicate node of the target branch. Further, with RAN, crossover and mutation may lead to loss of individuals from the parent population whose execution path does cover the predicate node of the target branch. This is circumvented in RNS.

• Mean number of generations. It may be noted that the termination criterion for each experiment is either full branch coverage or 107 generations whichever occurs earlier. The number of generations to termination over hundred experiments is used to compute the mean. The mean does not tell us if all the branches were covered. • Mean percentage coverage achieved. ANOVA was carried out using SYSTAT 9.0 to determine if there is a signiﬁcant difference in means and additionally Tukey test was

GA: Mean Generaons

GA: Mean Percentage Coverage

400000 40000

DFS PPS RAN

4000

BFS RNS

Mean Percentage Coverage

Mean Generaons

4000000

95

DFS

93

BFS

91

PPS

89

RNS

87

RAN

16

26

Meta heuristic Technique

GA

36

46

56 66 Populaon

76

86

96

6 10 16 20 26 30 36 40 46 50 56 60 66 70 76 80 86 90 96 100 106 110

6

Complex Branch

97

85

400

Benchmark Program

99

106

Populaon

Population Size ANOVA test

6

10

20

30

46

60

76

90

100

110

DFS, BFS, PPS, RNS

F value

62.21

28.54

17.24

13.31

12.56

10.67

14.51

9.323

7.022

7.387

p value

0.0

0.0

0.0

0.0

0.0

0.001

0.003

0.003

0.009

0.007

DFS, BFS, PPS, RNS, RAN

F value

32.526

14.05

8.562

6.2

5.697

7.209

6.322

5.371

7.137

-

p value

0.0

0.0

0.0

0.002

0.004

0.001

0.002

0.005

0.001

-

Fig. 10. Complex branch program.

A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208

GA: Mean PercentCoverage

GA: Mean Generaons 1000000 DFS PPS RAN

10000 1000

BFS RNS

100

Mean Persentage Coverage

Mean Generaons

10000000

100000

Benchmark Program

16

26

36

Meta heuristic Technique

Date

GA

46

56 66 Populaon

76

86

96

100 98 96 94 92 90 88 86 84 82 80

DFS BFS PPS RNS RAN

6 10 16 20 26 30 36 40 46 50 56 60 66 70 76 80 86 90 96 100 106 110

10 6

1201

106

Populaon

Population Size ANOVA test

6

10

20

30

46

60

76

90

100

110

DFS,BFS, PPS,RNS

F value

25.39

20.08

15.01

5.651

21.031

27.16

22.31

24.69

18.477

12.83

p value

0.0

0.004

F value

116.826

359.492

1107.523

p value

0.0

0.0 252.79 1 0.0

0.0

DFS,BFS, PPS,RNS, RAN

0.0

0.0

0.0 2839.32 7 0.0

0.0 10948.31 4 0.0

0.0 2.9E+ 09 0.0

0.0 7.9E+ 09 0.0

0.0 5.6E+0 9 0.0

0.0 3.4E+0 9 0.0

Fig. 11. Date program.

DFS PPS RAN

GA: Mean Generaons

GA: Mean Percent coverage Mean Persentage Coverage

Mean Generaons

10000000

BFS RNS

10000000 1000000 100000 10000

6

Benchmark Program

16

26

Meta heuristic Technique

Michael triangle

36

46

56 66 Populaon

76

86

96

DFS

BFS

PPS

RNS

RAN 6 10 16 20 26 30 36 40 46 50 56 60 66 70 76 80 86 90 96 100 106 110

1000

100 98 96 94 92 90 88 86 84 82 80

106

Populaon

Population Size ANOVA test DFS,BFS, PPS,RNS

6

10

20

30

46

60

76

F value

143.5

48.51

19.25

p value

0.0

0.0

0.0

6.533

1.84

2.397

2.197

0.0

0.139

0.068

0.088

GA DFS,BFS, PPS,RNS, RAN

F value

432.41

446.81

721.16

1387.1

p value

0.0

0.0

0.0

0.0

1567.4 3 0.0

3897.9

4265.4

0.0

0.0

90

100

110

3.18

4.46

2.351

0.024

0.004

0.072

2124.7

8155.8

0.0

0.0

11157. 1 0.0

Fig. 12. Michael triangle program.

Table 7 Summary of Tukey test RAN vs (DFS, BFS, PPS and RNS). Benchmark Program Name

Complex-Branch Date Michael Triangle Myers Triangle Calday Rectangle Sthamer Triangle Wegener Triangle

Is the difference in mean number of generations with RAN signiﬁcant for all populations? DFS

BFS

PPS

RNS

No Yes Yes Yes No Yes Yes Yes

No Yes Yes Yes No Yes Yes Yes

Yes Yes Yes Yes Yes, for population sizes from 6 to 46 Yes Yes Yes

No No Yes Yes No Yes Yes Yes

1202

A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208

GA: Mean Generaons

10000000

GA: Mean Percentage Coverage

100000

DFS

BFS

PPS

RNS

RAN

10000

99

DFS BFS PPS RNS RAN

98 97 96 95 94 93 92

6 10 16 20 26 30 36 40 46 50 56 60 66 70 76 80 86 90 96 100 106 110

1000

6 10 16 20 26 30 36 40 46 50 56 60 66 70 76 80 86 90 96 100 106 110

Mean Generaons

1000000

Mean Percentage Coverage

100

Populaon

Benchmark Program

Myers triangle

Meta heuristic Technique

Populaon

Population Size ANOVA test

6

10

20

30

46

60

76

90

100

110

DFS,BFS, PPS,RNS

F value

6.504

6.148

13.12

10.39

10.60

8.166

4.657

3.302

1.335

1.28

p value

0.0

0.0

0.0

0.0

0.0

0.0

0.003

0.02

0.263

0.281

DFS,BFS, PPS,RNS,RAN

F value

1737.3

165.5

133.8

p value

0.0

0.0

0.0

GA

881.76 396.27 226.35 179.57 0.0

0.0

0.0

0.0

103.47 106.93 86.41 0.0

0.0

0.0

Fig. 13. Myers triangle program.

5.3.3. Research Question Two Research Question Two: Does branch ordering together with elitism and memory improve test data generation performance? The study compares each of RAN and RNS with each of BFS, DFS and PPS.

addresses the question “is mean number of generations for (DFS, BFS, PPS) lower than RNS for all population sizes?” Even though the answer to part (A) for a particular benchmark may not be signiﬁcant, we still need to see if there has been an improvement. This is taken care of in part (B). 5.3.3.2. Discussion. It can be seen from Tables 7 and 8 that

5.3.3.1. Results. Results for different benchmark programs are presented in separate ﬁgures from Figs. 10–17. Table 7 compares RAN with each of BFS, DFS and PPS and Table 8 compares RNS with each of DFS, BFS and PPS. Table 8 presents the results in two parts. Part (A) addresses the question “is the difference in mean number of generations with RNS signiﬁcant for all populations?” and part (B) 100000

a. Results with PPS are signiﬁcantly improved and different from RAN and RNS for almost all the subjects of study. b. There is an improvement in the performance in mean number of generations of DFS, BFS over RNS for most benchmark programs.

GA: Mean Generaons

DFS

Mean Generaons

10000

BFS PPS

1000

RNS

100

RAN

10

6 10 16 20 26 30 36 40 46 50 56 60 66 70 76 80 86 90 96 100 106 110

1 Populaon

Benchmark Program

Calday

Meta heuristic Technique

Population Size ANOVA test

6

10

20

30

46

60

76

90

100

110

DFS,BFS, PPS,RNS

F value

10.4

7.981

5.057

2.468

2.486

1.142

1.708

0.612

0.685

0.695

p value

0.0

0.0

0.002

0.062

0.06

0.332

0.165

0.607

0.562

0.555

DFS,BFS, PPS,RNS, RAN

F value

6.609

22.14

7.375

3.19

3.673

1.266

0.976

1.007

1.328

1.159

p value

0.328

0.258

0.403

0.42

0.299

0.006

0.013

0.0

0.0

0.0

GA

Fig. 14. Calday program.

A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208

GA :Mean Percent Covergae

GA: Mean Generaons

1000000

DFS PPS RAN

BFS RNS

100000

99.9

Mean Percentage Coverage

Mean Generaons

10000000

1203

99.7

DFS

99.5

BFS

99.3 99.1

PPS

98.9

RNS

98.7

RAN

98.5 98.3

10000 6

16

26

36

Meta heuristic Technique

Benchmar k Program

Rectangle

46

56 66 Populaon

76

86

96

6

106

16

26

36

46

56 66 Populaon

76

86

96

106

90

100

110

Population Size ANOVA test

GA

6

10

20

30

46

60

76

DFS,BFS, PPS,RNS

F value

5.03

4.569

5.448

5.595

5.789

1.921

5.154

3.08

9.484

0.698

p value

0.002

F value

195.424

p value

0.0

0.001 71.19 2 0.0

0.001 53.28 3 0.0

0.001 62.57 8 0.0

0.126 65.92 4 0.0

0.002 161.7 1 0.0

0.027 168.6 18 0.0

0.0

DFS,BFS, PPS,RNS, RAN

0.004 99.51 7 0.0

0.554 105.36 6 0.0

215.6 0.0

Fig. 15. Rectangle program.

c. The difference in mean number of generations for DFS, BFS and RNS only is not always signiﬁcant as is evident in Figs. 10–17 and Table 8. This is more so as the population size increases. A possible explanation for this is provided in Section 5.3.4.2. d. For all benchmark programs and population sizes, the results of RAN are extremely poor especially in terms of mean number of generations. For most benchmark programs, with the exception of Complex-branch and Calday, the difference is also signiﬁcant with each of DFS, BFS and PPS. This can be seen in Table 7. With PPS the difference in signiﬁcant for all programs, except Calday, and for all population sizes. For Calday, the difference in means between RAN and PPS is signiﬁcant for small

5.3.4.1. Results. Results for different benchmark programs are presented in separate ﬁgures from Figs. 10–17. The results of the Tukey test are summarized in Table 9.

GA: Mean Percent Coverage 100

DFS PPS RAN

500000

BFS RNS

50000

Mean Percentage Coverage

99 98

DFS

97

BFS

96

PPS

95

RNS

94

RAN

93

5000 6

Benchmark Program

16

26

36

Meta heuristic Technique

46

56 66 Populaon

76

86

96

6 10 16 20 26 30 36 40 46 50 56 60 66 70 76 80 86 90 96 100 106 110

Mean Generaons

5.3.4. Research Question Three Research Question Three: Does the choice of a particular branch ordering scheme affect test data generation performance? The study compares PPS with each of DFS and BFS.

GA: Mean Generaons

5000000

Sthamer triangle

population sizes. This indicates that the strategies of elitism and memory together with branch ordering may have signiﬁcantly contributed to improving GA performance.

106

Populaon

Population Size ANOVA test DFS,BFS, PPS,RNS

GA DFS,BFS, PPS,RNS, RAN

F value p value F value p value

6

10

20

30

46

60

76

90

100

110

20.74

12.88

7.745

15.75

2.479

3.913

3.661

1.276

3.018

1.359

0.0

0.0

0.0

0.0

0.061

0.009

0.013

0.282

0.03

0.255

85.46 8 0.0

124.9 53 0.0

114.4 95 0.0

122.94 9 0.0

137.8 51 0.0

143.79 9 0.0

95.22 3 0.0

79.39 5 0.0

94.92 2 0.0

78.03 6 0.0

Fig. 16. Sthamer triangle program.

A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208

GA: Mean Generaons

DFS PPS RAN

Mean Generaons

2000000

GA: Mean Percent Coverage

BFS RNS

100 Mean Percentage Coverage

1204

200000

20000

99 98

DFS

97

BFS

96 95

PPS

94

RNS

93

RAN

92

2000 6 10 16 20 26 30 36 40 46 50 56 60 66 70 76 80 86 90 96 100 106 110

6

16

26

36

46

Populaon

Benchmark Program

Meta heuristic Technique

Wegener triangle

56

66

76

86

96

106

Populaon

Population Size ANOVA test

6

10

20

30

46

60

76

90

110

DFS,BFS, PPS,RNS

F value

43.836

9.798

11.636

5.242

3.816 0.773

1.203

0.546

3.118

p value

0.0

0.002

0.001

0.023

0.052

0.274

0.461

0.079

DFS,BFS, PPS,RNS,RAN

F value

GA

p value

0.38

91.754 27.465 16.619 18.248 9.124 12.54 17.766 27.099 22.684 0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

Fig. 17. Wegener triangle program.

5.3.4.2. Discussion. It can be seen in Table 9 that PPS gives the best results, though it not always signiﬁcant when compared with DFS and BFS. Further, considering Figs. 10–17 it can be observed that a. The mean number of generations with PPS is least for almost all population sizes and for all benchmark programs. Possible reasons are discussed in point (b) below. For ˛ = 0.05, signiﬁcant difference is observable with smaller populations, up to 30, for most programs. Hence, it may possibly be concluded that larger population sizes may prove to be beneﬁcial in achieving better performance.

b. For small populations, in particular with sizes 6, 10, 16 and 20, full branch coverage is not achieved for most of the benchmark program with DFS, BFS and RNS. This is also the reason for the very high mean number of generations for these population sizes as the GA terminates only after 107 generations when branch coverage is not achieved. However, full coverage is achieved with PPS for all population sizes for all benchmark programs with the exception of rectangle program and for population 6 for Sthamer triangle. In PPS, a new branch is usually chosen as a target taking into consideration preﬁxes of paths traversed in the current generation. As a result, individuals are usually present in the

Table 8 Summary of Tukey test RNS vs (DFS, BFS, PPS). Benchmark program name

(A) Is the difference in mean number of generations with RNS signiﬁcant for all populations? (B) Is mean number of generations for (DFS, BFS, PPS) lower than RNS for all population sizes? DFS

BFS

PPS

Complex-Branch

A B

No Yes, from population size 50–80, 106–110

No Yes, from population size 10–90, 110

Yes Yes

Date

A B

Yes Yes

Yes Yes

Yes Yes

Michael Triangle

A B

Yes, from population size 6–36 Yes from population sizes 6–40, 50–100, 110

Yes, from population size 6–36 Yes, from population 6–36, 50–60, 70–96, 106–110

Yes, from population size 6–36 Yes

Myers Triangle

A B

No No

No Yes for population sizes 6, 10, 106

Yes, for population size 6 Yes from population sizes 6–96, 106–110

Calday

A B

No Yes from population sizes 10, 20–30, 40–56, 66–76, 90,106

No Yes from population sizes 6–10, 20,36,46–60,70–76, 86, 100–106

No Yes from population sizes 6–90, 106–110

Rectangle

A B

No Yes from population sizes 6, 10–56, 66, 80–110

No Yes from population sizes 6–80, 90–110

Yes, for population sizes from 6–46 Yes

Sthamer Triangle

A B

No Yes for population sizes 80, 96,100, 110

No Yes from population sizes 30, 56–110

No Yes

Wegener Triangle

A B

No Yes from population sizes 60–110

No Yes from population sizes 50, 60–110

Yes Yes

A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208

1205

Table 9 Summary of Tukey test: PPS vs (DFS, BFS). Benchmark Program Name

Is the difference in mean number of generations with PPS signiﬁcant for all populations? Is the mean number of generations for PPS lower for all population sizes?

Complex-Branch Date

Michael Triangle Myers Triangle Calday Rectangle Sthamer Triangle Wegener Triangle

DFS

BFS

A B A

Yes Yes Yes

B A B A B A B A B A B A B

Yes No Yes for population sizes 16, 26–110 No Yes Yes, for population size 6 Yes from population sizes 6–106 No Yes No Yes Yes, for population size up to 56 Yes from population sizes 6–80, 90–106

Yes Yes Yes, for population size up to 30 for DFS and population size up to 50 for BFS; for population sizes from 80–110 Yes No Yes from population sizes 10–110 No Yes population sizes 6–100, 110 Yes, for population size 6 Yes from population sizes 6–90, 100–110 No Yes No Yes from population sizes 6–86, 96,110 Yes, for population size up to 56 Yes from population sizes 6–56, 66–80, 90–96

code that leads to the traversal of the sibling branch of the target branch. In the case of DFS and BFS this may not always be the case. If suppose the DFS (BFS) branch sequence is b1 , b2 , . . ., bn and after target bi is covered, branches bi+1 to bj-1 are skipped as they have already been covered in the previous generations and thus bj is selected as the next target, then individuals that lead to the traversal of the sibling branch may not be present in the population. These may have to be added from memory and if these are not available, then guidance through ﬁtness function may have to be obtained to cover bj . This could be a reason for improved results with PPS. As expected, the mean number of generations is larger for smaller populations than for larger populations. Further observations speciﬁc to each benchmark are summarized in Tables 7 and 10. The chosen alpha (˛) value is 0.05. From these observations it may be concluded that PPS possibly the best scheme to adopt. 6. Threats to validity and limitations Threats to validity have been described in detail by Barros and Neto (2011) in the context of search based software engineering. We ﬁrst consider threats to internal validity. As noted by Deb (1997), the main limitation of GA comes from the improper choice

of parameters such as population size, crossover and mutation probabilities and selection pressure. The GA may not work with improper parameter settings. (Arcuri and Fraser, 2011) point out that “tuning does have a critical impact on algorithmic performance, and over-ﬁtting of parameter tuning is a dire threat to external validity of empirical analyses in SBSE.” In an attempt to resolve these problems, we have considered parameter and operator tuning in a comprehensive preliminary study and also conducted experiments for different population sizes. In both the preliminary and the main study, a hundred experiments were carried out for each combination of parameter and feature settings and population size. Other issues such as representation and randomness of numbers generated by the random number generator may affect performance. Representation has been discussed in Section 3.2 and the ‘C’ library random number generator functions srand() and rand() have been used in the experiments. Threats to construct validity may arise from the fact that we have measured performance using only mean number of generations and mean percentage coverage. Other measures, such as number of ﬁtness evaluations, may also be used. Whether they lend a more precise meaning to performance needs to be evaluated. Threats to external validity may come from the choice of the subjects of study. Standard benchmark programs from Díaz et al. (2008) and Blanco et al. (2009) were chosen as the subjects of study.

Table 10 Comparison of Results for each Benchmark. Benchmark and ﬁgure reference

GA Mean generations

Comparison of results for each benchmark. PPS is signiﬁcantly lower for all population sizes. Complex-Branch program (Fig. 10) There is a signiﬁcant difference in means between RNS, Date program (Fig. 11) BFS, DFS, and PPS. PPS is lowest. Michael Triangle Program (Fig. 12) Signiﬁcant difference between BFS, DFS, PPS and RNS is not observable for population sizes greater than 46. Signiﬁcant difference between BFS, DFS, PPS and RNS is Myers Triangle Program (Fig. 13) not observable for population sizes greater than 86. PPS is lower for small population sizes. Calday Program (Fig. 14) PPS is lower for all population sizes less than 106. Rectangle Program (Fig. 15) Sthamer Triangle Program (Fig. 16) Wegener Triangle Program (Fig. 17)

PPS is lower for some population sizes. For large population sizes, the results are comparable to BFS. Signiﬁcant difference between BFS, DFS, PPS and RNS is not observable for all population sizes.

Mean percentage coverage Full coverage is achieved even with small population sizes in PPS. Full coverage is achieved even with small population sizes in PPS. PPS gives 100% coverage even with small population sizes. Except for RAN, full coverage is achieved for the all branch orderings and all population sizes. Full coverage is achieved for the all branch orderings. PPS gives comparatively better coverage for small population (6 to 16) and full coverage thereafter. PPS gives 100% coverage even with small population sizes. PPS gives 100% coverage even with small population sizes.

A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208

7. Conclusion Metaheuristic techniques, and in particular the genetic algorithm, have now been applied extensively to the problem of automated test data generation. However, as research points out, the application of metaheuristic techniques poses new challenges. One of the problems faced is that the population may not contain any individual that encodes test data for which the execution path reaches the predicate node of the target branch. In order to deal with this problem, in this paper, we (a) have introduced three approaches for ordering branches for selection as targets for coverage, namely, depth ﬁrst strategy, breadth ﬁrst strategy and the path preﬁx strategy and (b) have considered elitism and memory along with branch ordering to improve test data generation performance. Extensive experiments have been carried out on standard benchmark programs with genetic algorithm (GA) whose implementation has also been described in detail in this paper. A preliminary study was carried out to frame the research questions and ﬁne tune GA parameters which were then used in the ﬁnal experiments. Results indicate that • a scheme in which the population is initialized once at the beginning of the GA run, the ﬁtness is computed using approximation level and normalized branch distance and is maximized, the branch ordering strategy is the path preﬁx strategy, memory and elitism are used and for the genetic algorithm, binary tournament selection with two point crossover, Pm – 0.01 and Pc – 1.0, gives the best performance in terms of mean number of generations and coverage. • Results for RAN indicate that the strategies of elitism, memory and branch ordering may have signiﬁcantly contributed to improving performance of GA. • Better performance may be possible for programs with integer inputs as compared to real inputs and large population sizes. Further, experiments with real world programs in large projects will provide additional empirical evidence on the potential of the suggested improvements.

Appendix A. The appendix details the outcomes of the preliminary study on the following: (i) a comparison of maximization and minimization approaches to ﬁtness function computation, i.e., cases (F) and (J) of Section 5.2.1, (ii) a comparison of different tournament sizes, i.e., case (H) of Section 5.2.1 and (iii) selection of an appropriate setting for mutation probability Pm . Fig. A.1 shows a comparison of mean generations between maximization (F) and minimization (J) functions. As can be seen in Fig. A.1 and Table A.1, there is no signiﬁcant difference in the mean number of generations over different population sizes (considering an alpha level of 0.05). We thus choose to choose the scheme described in Section 3.2 for ﬁtness computation for our experiments. Fig. A.2 presents the results for experiments with different tournament sizes, i.e., (H). Even though the difference in means is not signiﬁcant, it can be seen that the best results, even with small population sizes, is with a tournament size of two. This size leads to the weakest selection pressure which may be the reason for its better performance (Back et al., 1997). Although the difference in mean number of generations is not signiﬁcant for large population sizes, taking into account the overall performance, tournament selection with tournament size two, i.e., binary tournament is the preferred selection scheme.

F

Mean Generaons

The programs have a number of features that make them suitable for testing different approaches for test data generation. However, these are not real world programs which may be their limitation, but whether such programs would have led to different conclusions is a matter for further investigation. The considerations in eliminating threats to validity in this study may well be its limitations also.

J

200000

20000

2000

6

10 16 20 26 30 36 40 46 50 56 60 66 70 76 80 86 90 96 100 106 110

Populaon Size

Fig. A.1. Plots of population size vs mean number of generations for (F) and (J).

tournament selecon of size 2

tournment selecon of size 3

pool-tournment

tournment selecon of size 4

600000 500000 Mean Generaon

1206

Acknowledgements

400000 300000 200000 100000

This work was supported by the UGC Major Project Grant F.No. 36-70/2008 (SR) for which the authors are thankful. The authors are also extremely grateful to the anonymous referees who have helped shape this paper to its present form with their insightful comments and guidance.

0

6 10 16 20 26 30 36 40 46 50 56 60 66 70 76 80 86 90 96 100 106 110 Populaon Size

Fig. A.2. Plots of population size vs mean number of generations for different tournament sizes.

Table A.1 Results of ANOVA for mean number of generations for (F) and (J). Benchmark program

ANOVA test (F &J)

Preliminary study

F value p value

Population size 6

10

0 2.37 0.983 0.125

16

20

26

30

46

60

76

90

100

110

0 0.986

0.006 0.938

0.58 0.447

0.16 0.689

0.12 0.725

0.66 0.419

0.15 0.701

0.16 0.691

0.04 0.835

0.46 0.499

A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208

Populaon Size 30

10000000

Mean Generaons

1000000 100000 10000 1000 100 10 1 0.001

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.05

0.06

Mutaon rates Fig. A.3. Plot of mutation rates vs mean number of generations for population size 30.

Experiments were also carried out to determine an appropriate setting for mutation probability Pm . Fig. A.3 shows the effect of mutation probability on the number of generations for population size 30. It can be seen that a minimum number of generations to achieve coverage is obtained with Pm = 0.01. Since with higher probability, possible disruption of individuals increases, the number of generations increases as Pm increases. References Agarwal, K., Srivastava, G., 2010. Towards software test data generation using discrete quantum particle swarm optimization. In: Proceedings of the 3rd India Software Engineering Conference (ISEC’10), ACM, New York, NY, USA. Ahmed, M.A., Hermadi, I., 2008. GA-based multiple paths test data generator. Computers & Operations Research 35 (10), 3107–3124. Ali, S., Briand, L.C., Hemmati, H., Panesar-Walawege, R.K., 2010. A systematic review of the application and empirical investigation of search-based test case generation. IEEE Transactions on Software Engineering 36 (6), 742–762. Andreou, A.S., Economides, K.A., Sofokleous, A.A., 2007. An automatic software testdata generation scheme based on data ﬂow criteria and genetic algorithms. In: Proceedings of the 7th IEEE International Conference on Computer and Information Technology Fukushima, Japan (CIT ‘07), pp. 867–872. Arcuri, A., Fraser, G., 2011. On parameter tuning in search based software engineering. In: International Symposium on Search Based Software Engineering (SSBSE). Arcuri, A., Yao, X., 2007. A memetic algorithm for test data generation of objectoriented software. Congress on Evolutionary Computation (CEC). Arcuri, A., 2010. It does matter how you normalize the branch distance in search based software testing. In: Third International Conference on Software Testing, Veriﬁcation and Validation (ICST’2010), pp. 205–214. Back, T., Fogel, D.B., Michalewicz, Z. (Eds.), 1997. Handbook of Evolutionary Computation. , 1st ed. IOP Publ. Ltd., Bristol, UK. Baresel, A., Sthamer, H., Schmidt, M., 2002. Fitness function design to improve evolutionary structural testing. In: Langdon, W.B., Cant’u-Paz, E., Mathias, K., Roy, R., Davis, D., Poli, R., Balakrishnan, K., Honavar, V., Rudolph, G., Wegener, J., Bull, L., Potter, M.A., Schultz, A.C., Miller, J.F., Burke, E., Jonoska, N. (Eds.), Proceedings of the 2002 Conference on Genetic and Evolutionary Computation (GECCO’02). Morgan Kaufmann Publishers, New York, USA, pp. 1329–1336. Barros, M., Neto, A.C.D., 2011. Threats to Validity in Search-based Software Engineering Empirical Studies. Relatórios Técnicos do DIA/UNIRIO, No. 0006/2011. Universidade Federal Do Estado Do Rio De Janeiro. Beizer, B., 2002. Software Testing Techniques, 2nd ed. Dreamtech Press, New Delhi, India. Bertolino, A., 2007. Software Testing Research: Achievements, Challenges, Dreams. Future of Software Engineering (FOSE ‘07), 85–103. Biezer, B., 2009. Software Testing Techniques, 2nd ed. Dreamtech Press India. Blanco, R., Tuya, J., Díaz, B.A., 2009. Automated test data generation using a scatter search approach. Information and Software Technology 51 (4), 708–720. Blanco, R., Tuya, J., Díaz, E., Díaz, B.A., 2007. A scatter search approach for automated branch coverage in software testing. International Journal of Engineering Intelligent Systems (EIS) 15 (3), 135–142. Castro, L.N., Von Zuben, F.J., 2002. Learning and optimization using the clonal selection principle. IEEE Transactions on Evolutionary Computation 6 (3), 239–251. Chen, Y., Zhong, Y., 2008. Automatic path oriented test data generation using a multi population genetic algorithm. In: Proceedings of the 4th International Conference on Natural Computation, Jinan, China (ICNC’08), pp. 565–570. Chen, Y., Zhong, Y., Shi, T., Liu, J., 2009. Comparison of Two Fitness Functions for GA-Based Path-Oriented Test Data Generation. In: In Proceedings of the 2009 Fifth International Conference on Natural Computation, vol. 4, Washington, DC, USA (ICNC ‘09), pp. 177–181. Díaz, E., Tuya, J., Blanco, R., 2003. Automated software testing using a metaheuristic technique based on tabu search. In: Proceedings of the 18th IEEE International

1207

Conference on Automated Software Engineering, Montreal, Canada (ASE’03), pp. 310–313. Díaz, E., Tuya, J., Blanco, R., Dolado, J.J., 2008. A tabu search algorithm for structural software testing. Computers & Operations Research 35 (10), 3052–3072. Deb, K., 1997. Limitations of evolutionary computation methods. In: Back, T., Fogel, D.B., Michalewicz, Z. (Eds.), Handbook of Evolutionary Computation. , 1st ed. IOP Publ. Ltd. and Oxford University Press, Bristol, UK, pp. B2.9:1–B2.9:2. Ferguson, R., Korel, B., 1996. The chaining approach for software test data generation. ACM Transactions On Software Engineering and Methodology 5 (1), 63–86. Ghani, K., Clark, J., 2009. Automatic test data generation for multiple condition and MCDC coverage. In: Proceedings of the 2009 Fourth International Conference on Software Engineering Advances (ICSEA ‘09), Washington, DC, USA. IEEE Computer Society, pp. 152–157. Girgis, M.R., 2005. Automatic Test Data Generation for Data Flow Testing using a Genetic Algorithm. Journal of Universal Computer Science 11 (6), 898–915. Goldberg, D.E., 1989. Genetic Algorithms in Search, Optimization and Machine Learning. Pearson Education, Delhi, India. Gross, H., Kruse, P., Wegener, J., Vos, T.,2009. Evolutionary white-box software test with the Evo. Test framework: a progress report. In: Proceedings of the IEEE International Conference on Software Testing, Veriﬁcation, and Validation Workshops (ICSTW ‘09). IEEE Computer Society, Washington, DC, USA, pp. 111–120. Harman, M.,2007. The current state and future of search based software engineering. In: 2007 Future of Software Engineering (FOSE ‘07). IEEE Computer Society, Washington, DC, USA, pp. 342–357. Harman, M.,2008. Testability transformation for search-based testing. In: Keynote of the 1st International Workshop on Search-Based Software Testing (SBST) in Conjunction with ICST 2008. Lillehammer, Norway. Harman, M., Mansouri, A., 2010. Search based software engineering: introduction to the special issue. IEEE Transactions on Software Engineering 36 (6), 737–741. Harman, M., McMinn, P., 2010. A theoretical and empirical study of search-based testing: local, global, and hybrid search. IEEE Transactions on Software Engineering 36 (2), 226–247. Harman, M., Mansouri, A., Zhang, Y., 2009. Search based software engineering: A comprehensive analysis and review of trends techniques and applications. Technical Report TR-09-03, Department of Computer Science, King’s College London. Jones, B.F., Eyres, D.E., Sthamer, H.-H., 1998. A strategy for using genetic algorithms to automate branch and fault-based testing. Computer Journal 41 (2), 98–107. Jones, B.F., Sthamer, H.-H., Eyres, D.E., 1996. Automatic structural testing using genetic algorithms. Software Engineering Journal 11 (5), 299–306. Korel, B., 1990. Automated software test data generation. Transactions on Software Engineering 16 (8), 870–879. Li, H., Lam, C.P., 2005. Software test data generation using ant colony optimization. Proceedings of World Academy of Science, Engineering and Technology. Liaskos, K., Roper, M.,2008. Hybridizing evolutionary testing with artiﬁcial immune systems and local search. In: Proceedings of 1st International Workshop on Search-Based Software Testing (SBST) in Conjunction with ICST 2008. Lillehammer, Norway, pp. 211–220. McMinn, P., Holcombe, M., 2006. Evolutionary testing using an extended chaining approach. Evolutionary Computation 14 (1), 41–64. McMinn, P., 2004. Search-based software test data generation: a survey. Software Testing, Veriﬁcation and Reliability 14 (2), 105–156. McMinn, P., 2011. Search-based software testing: past, present and future. In: Fourth International Conference on Software Testing, Veriﬁcation and Validation Workshops (ICSTW’2011), pp. 153–163. McMinn, P., Binkley, D., Harman, M., 2008. Empirical Evaluation of a Nesting Testability Transformation for Evolutionary Testing. ACM Transactions on Software Engineering Methodology. Mehrmand, A., 2009. A Factorial Experiment on Scalability of Search-based Software Testing. Master’s Thesis, Thesis Number: MSE-2009:20, Blekinge Institute of Technology, Sweden. Michael, C., McGraw, G., Schatz, M., 2001. Generating software test data by evolution. IEEE Transaction on Software Engineering 27, 1085–1110. Michael, C.C., McGraw, G.E., Schatz, M.A., Walton, C.C., 1997. Genetic algorithms for dynamic test data generation. In: Proceedings of the 12th IEEE International Conference on Automated Software Engineering, Incline Village, NV, USA. IEEE Computer Society. Michalewicz, Z., 1996. Genetic Algorithms + Data Structures = Evolution Programs. Springer-Verlag, Berlin, Heidelberg. Pachauri, Gursaran, A., 2012. Comparative evaluation of a maximization and minimization approach for test data generation with genetic algorithm and binary particle swarm optimization. International Journal of Software Engineering & Applications (IJSEA) 3 (1), 443–454. Pargas, R.P., Harrold, M.J., Peck, R.R., 1999. Test-data generation using genetic algorithms. The Journal of Software Testing, Veriﬁcation and Reliability 9 (4), 263–282. Prather, R.E., Myers, J.P., 1987. The Path Preﬁx Software Testing Strategy. IEEE Transactions on Software Engineering 13 (July (7)), 761–766. Radcliffe, N.J., 1997. Introduction: theoretical foundations and properties of evolutionary computations. In: Back, T., Fogel, D.B., Michalewicz, Z. (Eds.), Handbook of Evolutionary Computation. , 1st ed. IOP Publ. Ltd. and Oxford University Press, Bristol, UK, pp. B2.1:1–B2.1:7. Tan, X.B., Longxin, C., Xiumei, X., 2009. Test data generation using annealing immune genetic algorithm. In: Fifth International Joint Conference on INC, IMS and IDC, 2009, NCM ‘09, pp. 344–348.

1208

A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208

Tracey, N., Clark, J., Mander, K., 1998. Automated program ﬂaw ﬁnding using simulated annealing. In: Tracz, W. (Ed.), Proceedings of the 1998 ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM Press, New York, pp. 73–81. Wang, Y., Bai, Z., Zhang, M., Du, W., Qin, Y., Liu, X., 2008. Fitness calculation approach for the switch case construct in evolutionary testing. In: Keijzer, M. (Ed.), Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation (GECCO’08). Atlanta, GA, USA, pp. 1767–1774. Wegener, J., Baresel, A., Sthamer, H., 2001. Evolutionary test environment for automatic structural testing. Information and Software Technology Special Issue on Software Engineering using Metaheuristic Innovative Algorithms 43 (14), 841–854. Wegener, J., Buhr, K., Pohlheim, H.,2002. Automatic test data generation for structural testing of embedded software systems by evolutionary testing. In: Proceedings of the 2002 Genetic and Evolutionary Computation Conference (GECCO’02). Morgan Kaufmann Publishers Inc., New York, USA, pp. 1233–1240. Windisch, A., Wappler, S., Wegener, J., 2007. Applying particle swarm optimization to software testing. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2007), London, UK, July 2007, pp. 1121–1128.

Xiao, M., El-Attar, M., Reformat, M., Miller, J., 2007. Empirical evaluation of optimization algorithms when used in goal-oriented automated test data generation techniques. Empirical Software Engineering 12, 183–239. Zhu, H., Patrick, A.V., Hall John, H.R., 1997. Software unit test coverage and adequacy. ACM Computing Surveys 29 (4), 366–427. Ankur Pachauri received his Ph.D. degree from the Dayalbagh Educational Institute in 2013 and his Master’s Degree in Computer Applications from Dr. B.R. Ambedkar University, Agra, in 2006. His research interests are in searchbased software engineering and software testing. Dr. Gursaran Srivastava received his B.Sc. Engineering degree from the Dayalbagh Educational Institute (DEI), M.Tech. degree from IIT Kanpur and Ph.D. degree from DEI in 1987, 1989 and 1997 respectively. His Ph.D. thesis focused on veriﬁcation of designs in object-oriented software systems, rule-based systems and subjectivity in software measurement. At present he is a Professor in the Department of Mathematics at DEI where he teaches courses in computer science and mathematics. His research interests are in veriﬁcation and validation of software systems, search based software engineering, evolutionary algorithms, graph algorithms and context aware systems for e-Consultation and e-Learning. He has completed funded research projects in these areas and has published in leading international and national journals.

Automated test data generation for branch testing using genetic algorithm: An improved approach using branch ordering, memory and elitism

Automated test data generation for branch testing using genetic algorithm: An improved approach using branch ordering, memory and elitism

Recommend Documents