The Journal of Systems and Software 86 (2013) 1191–1208
Contents lists available at SciVerse ScienceDirect
The Journal of Systems and Software journal homepage: www.elsevier.com/locate/jss
Automated test data generation for branch testing using genetic algorithm: An improved approach using branch ordering, memory and elitism Ankur Pachauri, Gursaran Srivastava ∗ Department of Mathematics, Faculty of Science, Dayalbagh Educational Institute, Agra 282005, India
a r t i c l e
i n f o
Article history: Received 3 November 2011 Received in revised form 15 October 2012 Accepted 26 November 2012 Available online 5 January 2013 Keywords: Automated program test data generation Software testing Genetic algorithm
a b s t r a c t One of the problems faced in generating test data for branch coverage using a metaheuristic technique is that the population may not contain any individual that encodes test data for which the execution reaches the predicate node of the target branch. In order to deal with this problem, in this paper, we (a) introduce three approaches for ordering branches for selection as targets for coverage with a genetic algorithm (GA) and (b) experimentally evaluate branch ordering together with elitism and memory to improve test data generation performance. An extensive preliminary study was carried out to help frame the research questions and fine tune GA parameters which were then used in the final experimental study. © 2012 Elsevier Inc. All rights reserved.
1. Introduction With the realization that the process of software test data generation can be cast into a search problem, a new area of research called search-based software test data generation has emerged to which metaheuristic techniques can be readily applied (McMinn, 2004; Harman and Mansouri, 2010; Ali et al., 2010; Harman and McMinn, 2010). However, as research points out, the application of metaheuristic techniques poses new challenges. During testing, the program under test P is executed on a test set of test data – a specific point in the input domain – and the results are evaluated. The test set is constructed to satisfy a test data adequacy criterion that specifies test requirements (Zhu et al., 1997). The branch coverage criterion is a test adequacy criterion that is based on the program flow graph. More formally, a test set T is said to satisfy the branch coverage criterion if on executing P on T, every branch in P’s flow graph is traversed at least once. One of the problems faced in generating test data for branch coverage using a metaheuristic technique such as genetic algorithm is that the population may not contain any individual that encodes test data for which the execution path reaches the predicate node of the target branch. In order to deal with this problem, in this paper, we (a) introduce three approaches for ordering branches for selection as targets for coverage with a genetic algorithm (GA) and (b) experimentally evaluate branch ordering together with elitism and memory to improve test data generation capability.
∗ Corresponding author. Tel.: +91 5622801545; fax: +91 5622801226. E-mail addresses:
[email protected] (A. Pachauri),
[email protected] (G. Srivastava). 0164-1212/$ – see front matter © 2012 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jss.2012.11.045
The paper is further organized as follows. In Section 2 we describe the motivation for the present study together with a comprehensive review of related work. In Section 3 we introduce the genetic algorithm (GA) and describe its application to program test data generation. The proposed approach to enhance performance of GA based test data generation is described in detail in Section 4. In Section 5 we describe the experiments carried out in this study. This section (i) details the research questions, (ii) describes a preliminary study which was carried out to tune the GA parameters and operators and to frame the research questions and (iii) presents a discussion on the results of the main experiments. Section 6 discusses the threats to validity and limitations and Section 7 concludes the paper with directions for future work. 2. Motivation and related work Test data generation in software testing is the process of identifying a set of test cases to satisfy a selected test data adequacy criterion. Bertolino (2007) points out that the most promising results toward automated test data generation have come from three approaches: model-based, random and search-based approaches. Search-based test data generation is a part of the much broader research area of search-based software engineering (Harman, 2007; Harman et al., 2009; Harman and Mansouri, 2010). Search-based test data generation consists of exploring the input domain of a program under test for test data to satisfy a selected test data adequacy criterion. By using metaheuristic techniques – highlevel frameworks which utilize heuristics in order to find solutions – search is directed toward the most promising areas of the domain (Michael et al., 2001; McMinn, 2004). As a technique, it can be used with specification-based criteria as well as program-based criteria.
1192
A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208
Metaheuristic techniques such as simulated annealing (Tracey et al., 1998), Tabu Search (Díaz et al., 2003), genetic algorithms (Jones et al., 1996; Michael et al., 1997; Pargas et al., 1999), particle swarm optimization (Windisch et al., 2007), quantum particle swarm optimization (Agarwal and Srivastava, 2010), scatter search (Blanco et al., 2009), ant colony optimization (Li and Lam, 2005), memetic algorithms (Arcuri and Yao, 2007), clonal selection algorithm (Castro and Von Zuben, 2002) and immune genetic algorithm (Liaskos and Roper, 2008; Tan et al., 2009) have been applied to the problem of automated test data generation and provide evidence of their successful application. Amongst these, several have addressed the issue of test data generation with program-based criteria (Michael et al., 2001; Girgis, 2005; Andreou et al., 2007; Chen and Zhong, 2008; Ahmed and Hermadi, 2008; Ghani and Clark, 2009) and in particular the branch coverage criterion (Jones et al., 1998; Wegener et al., 2001; Blanco et al., 2007; Wang et al., 2008; Harman, 2008; Chen et al., 2009; Gross et al., 2009). In program based test data generation using metaheuristic techniques, the basic approach is as follows. The source code of the program under test is instrumented to collect information about the program as it executes. The resulting information, collected during each execution of the program, is used to heuristically determine how close the test case is to satisfying a specified test requirement as specified by the selected test criterion. This allows the test generator to modify the program’s inputs gradually, moving them ever closer to values that actually do satisfy the requirement. In other words, the problem of generating test data reduces to the well understood problem of function optimization. Furthermore, such test generation methods can handle arrays and pointer references because the values of array indices and pointers are known throughout the generation process. Wegener et al. (2002) point out that because of the non-linearity of software (conditional statements, loops, flags, switch-case, break) the conversion of test problems into optimization tasks usually results in complex, discontinuous and non-linear search spaces for which search methods such as hill climbing are not suitable, but metaheuristic search methods can be employed. One of the problems faced in generating test data for branch coverage using a population based metaheuristic technique such as genetic algorithm is that when a branch is chosen as the target for coverage, it may happen that none of the individuals (individuals encode input test data) in the population encode inputs for which the execution path reaches the predicate node of the target branch, i.e., a critical branch is taken that causes the predicate node of the target branch to be missed in an execution of the program (Ferguson and Korel, 1996; McMinn and Holcombe, 2006). In order to deal with this problem, Michael et al. (2001) postpone the selection of the branch for coverage and Baresel et al. (2002) have described the design of fitness functions to guide search. The usual approach adopted is to compute the individual fitness in a way that it incorporates information about how close was the input test data in reaching the target of interest, called approach level, and combine this with branch distance data which reflects how close was the sibling branch, of the critical branch, was to be taken and which would have actually taken the traversal closer to the target branch. Accordingly, the fitness of an individual is computed as approach level + normalized branch distance (McMinn, 2004; Harman and McMinn, 2010; Arcuri, 2010, McMinn, 2011). The possibility of selecting target branches in a specific sequence and augmenting the metaheuristic process in a way that from the step (generation) a target is selected to the generation that it is covered, the current population has at least one individual that encodes inputs for a path that includes the sibling branch of the target branch, has not been explored. We hypothesize that this should result in better coverage and performance. Harman et al. (Harman et al., 2009) note that since 1995, there has been an upsurge in the
works in search based test data generation based on the achievement of branch coverage and cite a number of references, but do not mention the idea of branch ordering and augmenting the search techniques as described above, in their report. McMinn (2011) in his paper also focuses primarily on the design of fitness functions. In this paper we consider three approaches for branch ordering for target selection together with elitism and memory and evaluate them experimentally. One important issue in search-based test data generation is that of scalability. In our context this concerns the number of branches, the nesting depth and the search space. These have been addressed in the literature in the context of the genetic algorithm. Mehrmand (2009) has conducted a factorial experiment on the scalability of search-based software testing with Java programs. He concludes that GA can outperform random testing as complexity, in terms of number of branches and statements, is increased. Xiao et al. (2007) have conducted a scalability analysis in the context of goal oriented automated test data generation techniques. They conclude that GA performs better for both small and large search spaces in the context of condition-decision coverage. In general, Harman et al. (2009) point out that search-based software engineering has attractive scalability potential through parallel executions of fitness computations. In light of these experiments and observations, the improvements proposed in this approach assume importance since they may be used to enhance test data generation performance as programs become large with increased number of branches and search space. 3. Background In this section we first describe the genetic algorithm and then describe GA-based test data generation in detail. 3.1. Genetic algorithm Genetic algorithm (GA) is a metaheuristic search technique that is based on the idea of genetics and evolution in which new and fitter set of string individuals are created by combining portions of fittest string individuals of the parent population (Goldberg, 1989). A genetic algorithm execution begins with a random initial population of candidate solutions. Each candidate solution is generally a vector of parameters usually encoded in binary string (or bit string) called a chromosome or an individual. If there are m input parameters with the ith parameter expressed in ni bits, then the length of the chromosome is simply i ni . In this paper each individual, or chromosome, encodes test data. After creating the initial population, each chromosome is evaluated and assigned a fitness value. From this initial selection, the population of individuals iteratively evolves to one in which candidates satisfy some termination criteria or, as in our case, fail to make any forward progress. Each iteration step is also called a generation. Each generation may be viewed as a two stage process (Goldberg, 1989). Beginning with the current population (also called the parent population), selection is applied to create an intermediate population and then crossover and mutation are applied to this population. Another (selection) step is then applied to the individuals from the intermediate and the current generation parent population to create the parent population for the next generation. In generational GA, the intermediate population replaces the current generation’s parent population to create the parent population for the next generation whereas in the steady state GA a small percentage of worst individuals from the parent population is replaced with best individuals from the intermediate population. In the case of generational GA, elitism ensures that the fittest chromosomes survive from one population to the next.
A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208
1193
1. Choose an appropriate test adequacy criterion. This in our case is the branch coverage criterion. 2. Setup genetic algorithm. a. Select a representation for test data to be input to program P. b. Define a fitness function. c. Instrument the program P to create program Pt. The instrumented program Pt is used directly for test data generation. d. Select suitable genetic algorithm parameters and operators. 3. Generate test data. a. Run the genetic algorithm for test data generation using Pt for fitness computation. b. Identify and eliminate infeasibility. c. Regenerate test data if necessary.
Fig. 1. Complete steps for test data generation using genetic algorithm.
The chromosome length, population size, and the various probability values in a GA application are referred to as the GA parameters in this paper. Selection, crossover, mutation are also referred to as the GA operators. 3.2. GA-based test data generation Let P be the program under test, then a general sequence of steps for test data generation using genetic algorithm is described in Fig. 1. Test data is generated to meet the requirements of a particular test data adequacy criterion. The criterion in our case is the branch coverage criterion. McMinn (2011) points out that there are, in general, two requirements to be fulfilled to apply metaheuristic techniques to testing problems: (a) representation and (b) fitness function. He further notes that a suitable representation is automatically available for the case of test data generation where the input vector or sequences of inputs to the software under test can be optimized. The search space is thus defined over the represented inputs to the program under test and may be very large. However, for the application of the genetic algorithm, this search space does not have to be determined beforehand. Thus the basic choice that all search algorithms, including genetic algorithm, have to make at each stage of the search is which point (or points) in the search space to sample next (Radcliffe, 1997). At the start, assuming no prior knowledge, the choice is essentially random. After a while, as various points are sampled, the choice becomes more informed. In genetic algorithm, the population acts as the memory, and the choice of the next point in the search space to sample is determined by applying different operators on the individuals based on their fitness. In our case, the inputs for one execution of P, i.e., a single test data, are represented in a binary string also called a binary individual. How real values can be represented in a binary string is described in detail in Michalewicz (1996). The fitness of a binary individual is computed as Fitness(x) = Approximation Level + Normalized Branch Distance As opposed to the usual practice of formulating the generation problem as a fitness minimization problem, in this work it is taken to be a maximization problem (Pachauri and Gursaran, 2012). The definition of approximation level and normalized branch distance is also different from (McMinn, 2004) although the basic idea is similar. A critical branch (Korel, 1990), as defined earlier, is a branch that leads the execution away from the target branch in a path through the program. The approximation Level is a count of the number of predicate nodes in the shortest path from the first predicate node, from the start node, in the flow graph to the predicate node with the critical branch. See Fig. 2 for an example. The shortest path is chosen to avoid loops and take care of multiple paths that may be followed
Table 1 Branch distance computation.
1 2 3 4 5 6 7 8
Decision type
Branch distance
a
b a >= b a == b a != b a && b a || b
a–b a–b b–a b–a Abs(a–b) Abs(a–b) a+b min(a, b)
to reach the critical branch. The Normalized Branch Distance is calculated according to the formula Normalized branch distance =
1
1.001distance
where distance, or branch distance, as defined in Baresel et al. (2002) and McMinn and Holcombe (2006), is computed at the node with the critical branch using the values of the variables and constants involved in the predicates used in the conditions of the corresponding branching statement. However, the definition of normalized branch distance is different from the definition of McMinn (2004) as the problem is a maximization problem. Table 1 shows the computation of branch distance for different conditions. Entries one to five are the same as in Korel (1990). Table 1 also describes the computation of branch distance in the presence of logical operators AND (&&) and OR (||). In both these cases, the definition takes into account the fact that branch distance is to be minimized whereas the fitness is to be maximized. Fig. 2 illustrates branch distance computation for different cases. In Fig. 1, Step 2d involves the selection of appropriate parameter values and step 3a involves an application of the genetic algorithm. Implementation of Step 3a is described in Fig. 3. Step 3c states: regenerate test data if necessary. A tester may choose to rerun the GA for two reasons: infeasibility has been eliminated or a larger test set is required to improve confidence (Beizer, 2002; Biezer, 2009) Fig. 3 outlines the test data generation procedure with a genetic algorithm, i.e., the implementation of step 3a in Fig. 1. There are a number of parameters and operators whose values and types need to be decided. A preliminary study was carried out to determine the best values of parameters and types of operators. This is described in detail in Section 5.2. Harman et al. (2009) note that in early works the fitness function sought to maximize the number of branches covered. This was found to be inadequate as it tended to avoid the branches that were hard to cover. In the works that appeared later, the coverage of each branch is viewed as a test objective, for which a fitness function is constructed based upon the path taken by an individual test data. This, as described above, is the approach taken in this paper. The termination criterion for a GA run for each test objective can be target
1194
A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208
0
(a<=0 || b<=0 || c<=0)
False
Target Missed Approximaon Level 0 Branch Distance = Min ((a-0), (b-0), (c-0))
2
(a>10000 || b>10000 || c>10000)
False
4
(a<=b && b<=c || c<=2)
Target Missed Approximaon Level 1 Branch Distance = Min ((10000-a), (10000-b), (10000-c))
Target Missed Approximaon Level 3 Branch Distance = Min (((a-b) + (b-c)), (c-2))
False
TARGET
Fig. 2. Approximation level and branch distance computation.
branch coverage or a predefined maximum number of GA iterations or population convergence. In the experiments in this paper, the GA run is terminated on whichever occurs earlier: coverage of all branches or a predefined maximum number of iterations. 4. Proposed approach Apart from genetic algorithm parameters and operators, additional features can be incorporated to improve test data generation performance. These are: • Target branch selection method. See step 5 in Fig. 3. This is discussed in detail in Section 4.1. • Memory, see step 7 in Fig. 3. In a GA application, it is possible that a predicate node that was reached in an earlier generation is no
longer reached in the current generation and if a branch at that node is the target branch, then the fitness of all the individuals becomes low. In order to circumvent this problem, we store the individuals that traverse a branch and use them when the sibling branch is chosen as the target node. Storing individuals for later use is ‘memory’. This is detailed further in Section 4.3. • Elitism at step 14 in Fig. 3.
4.1. Branch ordering As discussed in Section 2 the issue of branch selection order has not been explored. Branches may be ordered using any one of the following schemes:
1. Start with a randomly generated population of n individuals (test data) for program under test P 2. While (termination criterion is not met) { 3. Execute Pt on each individual (test data) x in the current population popcur; 4. If (selected target branch b is traversed or if no target has been selected) 5. Use a branch selection strategy to identify a new target branch b; 6. Calculate the fitness f(x) of each individual x in the population with respect to target b; 7. If available, use memory to replace worst individuals in the population; 8. Initialize intermediate population popinter to empty; // generate a new population 9. Repeat { 10. Select a pair of parent individuals from popcur; 11. With probability Pc (the crossover probability), crossover the parents to form two offspring (or children); 12. Mutate the two offspring with probability Pm (the mutation probability), and place the resulting individuals in popinter; } until (n individuals have been added to popinter); 13. 14. Replace (completely or partially) individuals in popcur with individuals from popinter (elitism is used at this step) 15. } Fig. 3. Test data generation with genetic algorithm.
A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208
1195
1
2
3
14 4
5
13 6
8
7
9
10
11
12 Fig. 5. Example graph for path prefix strategy.
DFS Order
(1,2),(2,4),(4,6),(6,8),(8,12),(6,9),(9,12),(4,7),(7,10),
BFS Order
(1,2),(1,3),(3,14,(2,4),(2,5),(5,13),(4,6),(4,7),(6,8), (6,9),(7,10),(7,11),(8,12),(9,12),(10,12),(11,12)
(10,12),(7,11),(11,12),(2,5),(5,13),(1,3),(3,14)
Fig. 4. Branch orders with DFS and BFS: an example.
• • • •
Breadth first strategy (BFS). Depth first strategy (DFS). Path prefix strategy (PPS). Random strategy (RNS).
is covered by p. For example, in Fig. 5, if path traversed is p: s,. . .,d, b, . . ., f, branch (d,b ) is not covered and s, . . ., d is the minimal initial portion of p that satisfies the condition above, then s, . . ., d,b is a reversible prefix. Accordingly, branch (d,b ) is a candidate for selection for coverage. If branch (d,b ) is selected for coverage, then the path s, . . ., d,b is said to be the reversal of path s, . . ., d,b. In the path prefix strategy, at any stage, if p1 , . . ., pk−1 are the traversed paths, the idea is to find an input xk to cause the reversal of shortest reversible prefix q, with reversal q , amongst all the pi ’s. In our case prefix q identifies the next branch to be considered for coverage. 4.2. Population replacement strategy and elitism
All the four methods use the control flow graph for ordering the branches. As the names suggest depth first strategy and breadth first strategy order branches according to the sequence in which they are examined in a depth first and breadth first search of the control flow graph beginning with the start node. Fig. 4 illustrates the ordering. The random strategy orders the branches randomly. The path prefix strategy is described as follows. To achieve path coverage, as opposed to branch coverage, the problem is to find a program input that result in the traversal of the path. This may become difficult in the presence of infeasible paths. In an attempt to circumvent this, Prather and Myers (1987) suggested the use of an adaptive strategy in which one new test path, or sub-path is added at a time and previous paths serve as a guide for selection of subsequent paths using some inductive strategy. The path prefix strategy suggested by them is one such strategy for test data generation. We have used this strategy to select a branch for coverage. The sequence in which the branches are selected defines an ordering of the branches. For a path p that is traversed in an execution, a reversible prefix q is defined as the minimal initial portion of path p to a decision node d, whose branches are not fully covered, and the branch that
There are two issues that need to be addressed: (i) population initialization every time a new target branch is selected and (ii) replacement of the parent population with the child population. In case (i) the question is, should we initialize the population each time a new target is selected? A preliminary study was carried out to choose the best strategy. This study is described in Section 5.2. The steps outlined in Fig. 1 assume that the population is not initialized each time a new target branch is selected. Coming to issue (ii), in a generational GA, the entire parent population is replaced with the child population in each generation. The preliminary study has shown that, at times, the number of individuals (fit individuals here) in the parent population for which the execution path reaches the predicate node with the target branch may be small. In this situation, crossover and mutation may give a child population for which the predicate node is no longer reached, i.e., the execution path for every individual misses the predicate node with the target branch. It may thus be necessary to preserve the fit individuals from one generation to the next leading to an Elitist strategy. In our experiments we carry forward up to 10% of fit individuals, with a minimum of one fit individual, to the next generation.
1196
A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208
4.3. Memory In one generation of a GA run, since the program under test is executed once for each individual in the population, it is possible that branches other than the target branch which have not been covered earlier are traversed. Further, it is possible that when a branch is selected as the target branch, although its sibling branch has been traversed earlier, no individual in the current population results in an execution path that reaches the corresponding predicate node. In this situation, it would clearly be helpful if the population is supplemented with those individuals whose execution paths cover the target’s sibling branch. In order to facilitate this, each time a branch is traversed for the first time, up to five individuals that cause the branch to be traversed are stored. This is what we refer to as memory. Now each time a branch is selected for coverage, up to five worst individuals in the current population are replaced with individuals from memory that traverse its sibling branch. However, it should be noted here that additionally an elitist strategy may be required to preserve individuals that traverse the sibling, across generations. 5. Experiments In this section we describe the research questions and the experiments carried out to address them. In Section 5.1 we describe the research questions and the subjects of study. In Section In Section 5.2 we describe the preliminary study in detail and in Section 5.3 we present the main study. 5.1. Study design In this section we outline the research questions of the present study and describe the benchmark programs that are subjects of study. 5.1.1. Research questions In Section 2.0 it was noted that the possibility of selecting target branches in a specific sequence and augmenting the metaheuristic process in a way that from the step (generation) a target is selected to the generation that it is covered, the current population has at least one individual that encodes inputs for a path that includes the sibling branch of the target branch, has not been explored. Ensuring this, we had hypothesized, should result in better performance of the test data generation process. To this end, in Section 4, it had been proposed that the following features be incorporated, namely, branch ordering, elitism and memory. In Section 4.2 it was argued that crossover and mutation are disruptive and may result in a child population for which the predicate node of the target branch is no longer reached, i.e., the execution path for every individual misses the predicate node with the target branch. It may thus be necessary to preserve the fit individuals from one generation to the next with the help of elitism. Thus instead of investigating of the effect of each of elitism, memory and branch ordering individually on test data generation performance, the research questions address the effects of combining features. The reason for this is further grounded in the preliminary study which is discussed in detail in Section 5.2.3. The research questions for the present study are as follows: Research Question One: Does elitism together with memory improve test data generation performance? Research Question Two: Does branch ordering together with elitism and memory improve test data generation performance? Research Question Three: Does the choice of a particular branch ordering scheme affect test data generation performance?
5.1.2. Subjects of study Standard benchmark programs were chosen as the subjects of study. These have been taken from Díaz et al. (2008) and Blanco et al. (2009). The programs have a number of features such as real inputs, equality conditions with the AND operator and deeply nested predicates that make them suitable for testing different approaches for test data generation. • Line in a Rectangle Problem (Rectangle): This program takes eight real inputs, four of which represent the coordinates of a rectangle and other four represents the coordinates of a line. The program determines the position of the line with respect to the position of rectangle and generates one out of four possible outputs: (A) The line is completely inside the rectangle; (B) The line is completely outside the rectangle; (C) The line is partially covered by the rectangle; and (D) Error: The input values do not define a line and/or a rectangle. • The maximum nesting level is 12. The program’s CFG has 54 nodes with 18 predicate nodes. • Number of Days between Two Dates Problem (Date): This program calculates the days between two given dates of the current century. It takes six integer inputs – three of which represent the first date (day, month, and year) and other three represents the second date (day, month, and year). It has 128 nodes. This program includes a number of branches with equality conditions. Some of them use the remainder operator (%), which adds discontinuity to the decisions domains and therefore pose a greater difficulty in finding the tests that cover those branches. The nesting level is very high for most of the branches and, in combination with the AND decisions, the equality conditions and the use of the remainder operator, make this program very appropriate to evaluate the effectiveness and efficiency of an automatic test generator for the branch coverage criterion. The CFG has 43 predicate nodes. • Calday: This routine returns the Julian day number. There are three integer inputs to the program. First input represents month, second represents day and the third represent the year. Its CFG has 27 Nodes with 11 predicate nodes. It has equality conditions and the use of the remainder operator. The maximum nesting level is 8. • Complex Branch: It accepts 6 short integer inputs. In this routine there are some complex predicate conditions with relational operators combined with complex AND and OR conditions, it also contains while loops and SWITCH-CASE statement. Its CFG contains 30 nodes. • Meyer’s Triangle Classifier Problem: This program classifies a triangle on the basis of its input sides as non-triangle or a triangle, i.e., isosceles, equilateral or scalene. It takes three real inputs all of which represent the sides of the triangle. It’s CFG has 14 Nodes with 6 predicate nodes. The maximum nesting level is 5. It has equality conditions with AND operator, which make the branches difficult to cover. • Sthamer’s Triangle Classifier Problem: This program also classifies a triangle on the basis of its input sides as non-triangle or a triangle that is isosceles, equilateral, right angle triangle or scalene. It takes three real inputs; all of them represent the sides of the triangle but with different predicate conditions. Its CFG has 29 Nodes with 13 predicate nodes. The maximum nesting level is 12. It has equality conditions with AND operator and complex relational operators. • Wegener’s Triangle Classifier Problem: This program also classifies a triangle on the basis of its input sides as non-triangle or a triangle that is isosceles, equilateral, orthogonal or obtuse angle. It takes three real inputs; all of them represent the sides of the triangle but with different predicate conditions. Its CFG has 32 Nodes with 13 predicate nodes. The maximum nesting level is 9.
A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208
• Michael’s Triangle Classifier Problem: This program also classifies a triangle on the basis of its input sides as non-triangle or a triangle that is isosceles, equilateral or scalene. It takes three real inputs; all of them represent the sides of the triangle but with different predicate conditions. It’s CFG has 26 Nodes with 11 predicate nodes. The maximum nesting level is 6.
1197
12. Maximum number of generations – 107 . 13. Length of Chromosome (Binary String) – 63. The exact combinations of parameter, operator settings and additional features for which experiments were finally carried out were as follows:
5.2. Tuning the GA A preliminary study was carried out to • Determine the appropriate combinations of parameters and operators for the genetic algorithm. • Study the effect of elitism, memory and branch ordering so as help frame the research questions. • Mitigate the effects of random variation and to emphasize the need for memory and elitism. 5.2.1. Preliminary study: design Data was collected for combinations of the following settings: 1. Pilot benchmark program – Meyer’s triangle classification program. The program takes three real inputs, has equality conditions, compound conditions and maximum nesting level of 5. All of which make it suitable for the preliminary study. 2. Population size – 6, 10, 16, 20, 26, 30, 36, . . ., 110. 3. Crossover type – Two point crossover and uniform crossover. 4. Crossover probability (Pc ) – 1.0. 5. Mutation probability (Pm ) – 0.01. Following experiments, as described later, the value for Pm was taken as 0.01. 6. Selection method – Roulette wheel selection (RW), tournament selection (TS) and pool tournament (PT). With roulette wheel selection, sigma truncation was used as the fitness scaling scheme as this helps circumvent the problem of negatively scaled fitness in later stages of the GA run. Experiments were carried out to determine an appropriate tournament size for TS. Accordingly, the final tournament size was taken to be two, i.e., we have a binary tournament selection. In pool tournament, first for each individual, two individuals are randomly selected, with replacement, to form a pool of three individuals from which the best two are selected. This results in a population of double the size of the original population. On this population, a tournament of size two without replacement is carried out. This reduces the population by half. 7. Branch ordering scheme – Depth first search (DFS). However, for two experiments, the branch ordering was determined as follows. First, the nodes in the CFG were numbered, beginning with the start node, in an arbitrary order that interleaves a depth first movement with breadth first movement, again arbitrarily. This numbering was then used to sequence the branches using the numbering given to the head node of the branch. This scheme is simple referred to as the arbitrary branch ordering scheme (ABO). In ABO, if a branch is covered before it is selected as the target branch, it is marked as covered and never reselected as a target branch. 8. Adequacy criterion – Branch Coverage. 9. Fitness Function – Normalized branch distance (NBD) and the function described in Section 3.2 which takes approximation level into account (AL&NBD). 10. Population initialization – Initialize with every new target (IENT) and initialize once at the beginning of the GA run (NIENT). 11. Population replacement strategy – Replace entire parent population. Use elitism in which up to ten percent of the fittest population is carried forward to the next generation.
A. IENT, NBD, ABO, Roulette wheel selection, two point crossover, Pm : 0.01, Pc : 1.0. B. NIENT, NBD, ABO, Roulette wheel selection, two point crossover, Pm : 0.01, Pc : 1.0. C. NIENT, NBD, DFS, Roulette wheel selection, two point crossover, Pm : 0.01, Pc : 1.0. D. NIENT, NBD, DFS with Memory, Roulette wheel selection, two point crossover, Pm : 0.01, Pc : 1.0. E. NIENT, AL&NBD, DFS with Memory, Roulette wheel selection, elitism, two point crossover, Pm : 0.01, Pc : 1.0. F. NIENT, AL&NBD, DFS with Memory, binary tournament, elitism, two point crossover, Pm : 0.01, Pc : 1.0. G. NIENT, AL&NBD, DFS with Memory, binary tournament, elitism, uniform crossover, Pm : 0.01, Pc : 1.0. H. Within (F), different tournament sizes. I. Within (F), different mutation probabilities. J. (F) and fitness computation as in McMinn (2004). In order to ensure that our strategy for fitness computation is no worse than the minimization strategy of McMinn et al. (2008) we experiment with the settings in (F) but with fitness computed as described in McMinn (2004). K. (F) without approximation level in fitness computation process. Since DFS, memory and elitism should ensure that there are individuals in the current population passing through the sibling branch of the selected target; it should be possible to eliminate the use of approximation level in fitness computation for those individuals that lead to the traversal of the sibling branch of the target. However, it should be noted that approximation level would be required for those individuals in the current population that take some critical branch away from the target branch’s predicate node. As can be seen, the experimental combinations above include all the settings in a comparative way. (A) and (B) compare IENT and NIENT and NIENT is selected for further experiments. In (C) DFS is included, in (D) memory is included and in (E) elitism is taken into account and the fitness function is also modified to include the approximation level. Considering DFS with memory and elitism, in (F) the selection scheme is modified to tournament selection and in (G) the crossover scheme is modified to uniform crossover. (H) compares different tournament sizes and (I) mutation probabilities with the settings for experiment (F). With (H) we build a case for selecting the tournament size as two and with (I) for selecting Pm as 0.01. In an attempt to mitigate the effects of random variation and to emphasize the need for memory and elitism, the following experiments were further carried out: L. (F) without memory, i.e., NIENT, AL&NBD, DFS, binary tournament, elitism, two point crossover, Pm :0.01, Pc :1.0. M. (F) without elitism. i.e., NIENT, AL&NBD, DFS with Memory, binary tournament, two point crossover, Pm : 0.01, Pc : 1.0. N. N. (F) without memory and without elitism. O. (F) without memory, without elitism and all branches are selected as target one by one. Further, steady state GA was also considered. P. Steady state GA: Ninety percent of the fittest individuals in the current population popcur are copied to the next generation. The remaining ten percent of the next generation population is constructed as follows. A pool of 25% of the fittest individuals from
1198
A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208 A
B
C
D
E
F
G
Table 2 Summary of Tukey test for mean number of generations for (A) to (K).
K
Mean Generaons
2000000
Combination
D
E
F
Summary of Tukey test
Significantly different from A and C for population size 10, A and B for population size 20 and A, B and C for all remaining population sizes
Significantly different from A, B and C for all population sizes
Significantly different from A, B and C for all population sizes
200000
20000
2000
6 10 16 20 26 30 36 40 46 50 56 60 66 70 76 80 86 90 96 100 106 110
Populaon Size
Fig. 6. Plots of population size vs mean number of generations for (A) to (K).
popcur is selected, crossed over and mutated and out of these the fittest individuals are selected to make up the ten percent of the next generation population. In the steady state GA, memory and elitism are not used. The remaining settings are as in (F). 5.2.2. Preliminary study: setup For each combination (A) to (P) and for each population size, hundred experiments were carried out and the following statistics were collected: • Mean number of generations. The maximum number of generations for the experiments was taken to be 107 . Accordingly, the termination criterion for each experiment is either full branch coverage or 107 generations whichever occurs earlier. It is possible that full branch coverage is not achieved even after 107 generations. The mean number of generations thus does not tell us whether full branch coverage is achieved, hence the second statistic. • Mean percentage coverage achieved. ANOVA test was carried out using SYSTAT 9.0 to determine if there is a significant difference in the combinations for each population size. Additionally Tukey test was carried out to compare pairs of means for (A) to (F) for significance. The results of the experiments are described in the next section. All the programs for the preliminary study and the main experiments were implemented in ‘C’. 5.2.3. Preliminary study: results and discussion Fig. 6 shows the mean number of generations for combinations (A) to (G) and (K). Fig. 7 shows the mean percentage coverage for different population sizes and Table 2 summarizes the results of the
Tukey test for mean number of generations. In Fig. 6 we can see that the best results are obtained for (F) and Fig. 7 shows that full (100%) coverage is achieved for all experiments and for all population sizes for (F). Tukey test (Table 2) and Fig. 6 show that the mean number of generations obtained for (F) is significantly lower than (A), (B), and (C). The difference between (E) and (F) is in the selection scheme and between (F) and (G) in the crossover scheme. (F) uses binary tournament selection and (E) roulette wheel selection whereas in (G) uniform crossover is used as opposed to two point crossover in (F). This shows that binary tournament, memory, elitism and two point crossover together improve the performance of a GA. Performance degrades, though not significantly as shown by Tukey test, as two point crossover is replaced with uniform crossover. This could be because uniform crossover is more disruptive. Furthermore, performance also degrades, though not significantly as shown by Tukey test, as tournament selection is replaced with roulette wheel selection keeping other settings the same. This could be attributed to the weak selection pressure of binary tournament. In the case of (A) to (D) the mean coverage is less than 100% for lower populations. This changes to 100% with (E) which includes elitism. With elitism, individuals that traverse the sibling branch are preserved and not lost because of crossover and mutation. It was argued that since DFS, memory and elitism should ensure that there are individuals in the current population passing through the sibling branch of the selected target, it should be possible to eliminate the use of approximation level in fitness computation for those individuals that lead to the traversal of the sibling branch of the target. With (K) the approximation level component was eliminated completely even though it is understood that it would be required for those individuals in the current population that take some critical branch away from the target branch’s predicate node. From Fig. 6 and Fig. 7, it is observable that the performance of (K) is comparable to that of (F) and that there is no significant difference in means for different population sizes as can be seen in Table 3. We, however, retain the approximation level component for fitness computation in our main experiments.
10000000 B
C
D
E
F
G
L
M
N
O
P
K
99
Mean Generaons
Mean Percentage Coverage
A
F
97 95 93 91
1000000
100000
10000
89 87 85
6
10 16 20 26 30 36 40 46 50 56 60 66 70 76 80 86 90 96 100 106 110
Populaon Size Fig. 7. Plots of population size vs mean percentage coverage for (A) to (K).
1000
6 10 16 20 26 30 36 40 46 50 56 60 66 70 76 80 86 90 96 100106110
Populaon Size
Fig. 8. Plots of population size vs mean number of generations for (F) to (P).
A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208
1199
Table 3 Results of ANOVA for Mean Number of Generations for (F) and (K). Benchmark program
ANOVA test (F & K)
Preliminary Study
F
F value p value
L
Population size
6
10
16
20
26
30
46
60
76
90
100
110
0.06 0.807
3.533 0.062
0.385 0.535
0.313 0.576
0.565 0.453
7.172 0.008
0.896 0.345
0.821 0.366
0.744 0.389
0.698 0.404
2.618 0.107
0.126 0.723
M
N
O
P
in Section 4.2, crossover and mutation may lead to loss of individuals from the parent population whose execution paths cover the predicate node of the target branch and it may thus be necessary to preserve the fit individuals from one generation to the next. This may also be a reason why memory alone may possibly not help in improving performance and, further, branch ordering alone may also not improve performance significantly. The latter is also apparent when (N) is compared with (F) and (L). The research questions outlined in Section 5.1.1 thus consider increments in features (branch ordering, elitism and memory), as opposed to each feature individually. Further, Research Question Three in Section 5.1.1 considers the role of different branch ordering schemes.Appendix details the outcome of the preliminary study on the following: (i) a comparison of (F) and (J), (ii) a comparison of different tournament sizes, i.e., (H) and (iii) selection of an appropriate value for mutation probability Pm .
100
Mean Percentage Coverage
95 90 85 80 75 70 65 60 55 50
6 10 16 20 26 30 36 40 46 50 56 60 66 70 76 80 86 90 96 100106110 Populaon Size
Fig. 9. Graph of population size vs mean percentage coverage for (F) to (P).
5.3. Study results Fig. 8 presents the mean number of generations for combinations (F) to (P), Fig. 9 shows the mean percentage coverage for different population sizes, Table 4 shows the results of ANOVA and Table 5 summarizes the results of the Tukey test for mean number of generations. It can be seen in Fig. 8 that the best results are obtained for (F) which includes branch ordering, memory and elitism. Without memory (L), the performance degrades, though not significantly as indicated by Tukey test. However, without elitism, the performance degrades significantly as indicated by both ANOVA and Tukey test. Tukey test indicates that (F) is significantly different from (M), (N) and (O). Full coverage is achieved with (F) for all population sizes in all experiments. For (N), (O) and (P) this is not achieved for any population size. The poor performance of (P) can be attributed to the fact that much of the population does not undergo change and the small percentage of individuals that are replaced may be less fit than the parents. Experiments (F), (L), (M), (N) and (O) have helped frame the research questions. (F) includes branch ordering, elitism and memory. On removing memory, i.e., case (L), performance does not degrade significantly in terms of mean generations and mean percentage coverage. However, on removing elitism, case (M), and on removing both memory and elitism, case (N), performance degrades significantly in comparison with (F). As observed
In this section we present the results of our main experiments on the benchmark programs. Section 5.3.1 describes the experimental approach, Sections 5.3.2, 5.3.3 and 5.3.4present the results and discussion for research questions one, two and three respectively. 5.3.1. Approach In order to answer the research questions listed in Section 5.1.1, the following approach has been taken: i. The scheme in which branches are selected randomly for coverage and there is no memory and no elitism is taken as the baseline for comparison. This scheme is referred to as RAN hereafter. ii. To answer the Research Question One, RAN is compared with strategy RNS. RNS includes elitism and memory but branches are selected randomly. iii. To answer Research Question Two, elitism and memory are combined with each branch ordering strategy and compared with RNS and RAN. The resulting schemes are named after the branch ordering technique. These are: a. DFS – considers Depth First Strategy together with memory and elitism
Table 4 Results of ANOVA for mean number of generations for (F) to (P). Benchmark program
ANOVA test (F, L, M, N, O, P)
Preliminary study
F value p value
Population size
6
10
586.91 312.04 0.0 0.0
16
20
26
30
46
60
76
90
100
110
166.89 0.0
113.75 0.0
85.7 0.0
77.67 0.0
55.14 0.0
44.68 0.0
36.86 0.0
33.8 0.0
36.01 0.0
36.34 0.0
Table 5 Summary of Tukey test for mean number of generations for (F) to (P). Combination
F
L
P
Summary of Tukey test
Significantly different from M, N, O and P for all population sizes
Significantly different from M, N, O and P for all population sizes
Significantly different from F, L, M, N and O for all population sizes
1200
A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208
Table 6 GA parameter and operator settings. Parameter/operator
Value
1 2 3 4 5 6 7
Population size Crossover type Crossover probability Mutation probability Selection method Branch ordering scheme Fitness function
8
Population initialization
9
Population replacement strategy
6, 10, 16, 20, 26, . . ., 110. Two point crossover 1.0 0.01 Binary tournament DFS, BFS, PPS, RNS, RAN AL&NBD, approximation level with normalized branch distance Initialize once at the beginning of the GA run Elitism with up to 10% carry forward 107 Yes
Maximum number of generations Memory
10 11
carried out to compare pairs of mean number of generations for significance.It may be noted that in all the runs, a branch is marked as covered the first time it is traversed, even if it is not the target. Branches covered in some previous generation are not selected as target subsequently. 5.3.2. Research Question One Research Question One: Does elitism together with memory improve test data generation performance? The study compares RAN and RNS. 5.3.2.1. Results. Results for different benchmark programs are presented in separate figures from Figs. 10–17. Each figure presents the following: A. Mean number of generations for each population size. B. Mean percentage coverage achieved. This is omitted if full branch coverage is achieved in all runs. C. Tables showing results of ANOVA.
b. BFS – considers Breadth First Strategy together with memory and elitism c. PPS – considers Path Prefix Strategy together with memory and elitism iv. To answer Research Question Three, schemes DFS, BFS and PPS are compared.
F and p values for significance in difference of means for different branch orderings. An alpha (˛) value of 0.05, i.e., 95% confidence, is considered for all discussions on significance in difference of means. It may be noted that Figs. 10–17 present the results for all the research questions and will be referred to in all the subsequent sections. Table 7 presents the results of the Tukey test. The column of interest in Table 7 is RNS.
Parameter and operator setting for GA was obtained from the preliminary study described in Section 5.2. These are detailed in Table 6. For each scheme and a population size, hundred experiments (called runs hereafter) were carried out and the following statistics were collected as in the case of the preliminary study:
5.3.2.2. Discussion. We can see from Table 7 that the results of RNS are not always significant. However, as can be observed in the graphs in Figs. 11–17, the results are improved over RAN both in terms of mean number of generations and mean percentage coverage. The observed results may be explained as follows. With both RAN and RNS it is possible that when the target branch is selected, no individual is present in the population for which the execution path covers the predicate node of the target branch. Further, with RAN, crossover and mutation may lead to loss of individuals from the parent population whose execution path does cover the predicate node of the target branch. This is circumvented in RNS.
• Mean number of generations. It may be noted that the termination criterion for each experiment is either full branch coverage or 107 generations whichever occurs earlier. The number of generations to termination over hundred experiments is used to compute the mean. The mean does not tell us if all the branches were covered. • Mean percentage coverage achieved. ANOVA was carried out using SYSTAT 9.0 to determine if there is a significant difference in means and additionally Tukey test was
GA: Mean Generaons
GA: Mean Percentage Coverage
400000 40000
DFS PPS RAN
4000
BFS RNS
Mean Percentage Coverage
Mean Generaons
4000000
95
DFS
93
BFS
91
PPS
89
RNS
87
RAN
16
26
Meta heuristic Technique
GA
36
46
56 66 Populaon
76
86
96
6 10 16 20 26 30 36 40 46 50 56 60 66 70 76 80 86 90 96 100 106 110
6
Complex Branch
97
85
400
Benchmark Program
99
106
Populaon
Population Size ANOVA test
6
10
20
30
46
60
76
90
100
110
DFS, BFS, PPS, RNS
F value
62.21
28.54
17.24
13.31
12.56
10.67
14.51
9.323
7.022
7.387
p value
0.0
0.0
0.0
0.0
0.0
0.001
0.003
0.003
0.009
0.007
DFS, BFS, PPS, RNS, RAN
F value
32.526
14.05
8.562
6.2
5.697
7.209
6.322
5.371
7.137
-
p value
0.0
0.0
0.0
0.002
0.004
0.001
0.002
0.005
0.001
-
Fig. 10. Complex branch program.
A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208
GA: Mean PercentCoverage
GA: Mean Generaons 1000000 DFS PPS RAN
10000 1000
BFS RNS
100
Mean Persentage Coverage
Mean Generaons
10000000
100000
Benchmark Program
16
26
36
Meta heuristic Technique
Date
GA
46
56 66 Populaon
76
86
96
100 98 96 94 92 90 88 86 84 82 80
DFS BFS PPS RNS RAN
6 10 16 20 26 30 36 40 46 50 56 60 66 70 76 80 86 90 96 100 106 110
10 6
1201
106
Populaon
Population Size ANOVA test
6
10
20
30
46
60
76
90
100
110
DFS,BFS, PPS,RNS
F value
25.39
20.08
15.01
5.651
21.031
27.16
22.31
24.69
18.477
12.83
p value
0.0
0.004
F value
116.826
359.492
1107.523
p value
0.0
0.0 252.79 1 0.0
0.0
DFS,BFS, PPS,RNS, RAN
0.0
0.0
0.0 2839.32 7 0.0
0.0 10948.31 4 0.0
0.0 2.9E+ 09 0.0
0.0 7.9E+ 09 0.0
0.0 5.6E+0 9 0.0
0.0 3.4E+0 9 0.0
Fig. 11. Date program.
DFS PPS RAN
GA: Mean Generaons
GA: Mean Percent coverage Mean Persentage Coverage
Mean Generaons
10000000
BFS RNS
10000000 1000000 100000 10000
6
Benchmark Program
16
26
Meta heuristic Technique
Michael triangle
36
46
56 66 Populaon
76
86
96
DFS
BFS
PPS
RNS
RAN 6 10 16 20 26 30 36 40 46 50 56 60 66 70 76 80 86 90 96 100 106 110
1000
100 98 96 94 92 90 88 86 84 82 80
106
Populaon
Population Size ANOVA test DFS,BFS, PPS,RNS
6
10
20
30
46
60
76
F value
143.5
48.51
19.25
p value
0.0
0.0
0.0
6.533
1.84
2.397
2.197
0.0
0.139
0.068
0.088
GA DFS,BFS, PPS,RNS, RAN
F value
432.41
446.81
721.16
1387.1
p value
0.0
0.0
0.0
0.0
1567.4 3 0.0
3897.9
4265.4
0.0
0.0
90
100
110
3.18
4.46
2.351
0.024
0.004
0.072
2124.7
8155.8
0.0
0.0
11157. 1 0.0
Fig. 12. Michael triangle program.
Table 7 Summary of Tukey test RAN vs (DFS, BFS, PPS and RNS). Benchmark Program Name
Complex-Branch Date Michael Triangle Myers Triangle Calday Rectangle Sthamer Triangle Wegener Triangle
Is the difference in mean number of generations with RAN significant for all populations? DFS
BFS
PPS
RNS
No Yes Yes Yes No Yes Yes Yes
No Yes Yes Yes No Yes Yes Yes
Yes Yes Yes Yes Yes, for population sizes from 6 to 46 Yes Yes Yes
No No Yes Yes No Yes Yes Yes
1202
A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208
GA: Mean Generaons
10000000
GA: Mean Percentage Coverage
100000
DFS
BFS
PPS
RNS
RAN
10000
99
DFS BFS PPS RNS RAN
98 97 96 95 94 93 92
6 10 16 20 26 30 36 40 46 50 56 60 66 70 76 80 86 90 96 100 106 110
1000
6 10 16 20 26 30 36 40 46 50 56 60 66 70 76 80 86 90 96 100 106 110
Mean Generaons
1000000
Mean Percentage Coverage
100
Populaon
Benchmark Program
Myers triangle
Meta heuristic Technique
Populaon
Population Size ANOVA test
6
10
20
30
46
60
76
90
100
110
DFS,BFS, PPS,RNS
F value
6.504
6.148
13.12
10.39
10.60
8.166
4.657
3.302
1.335
1.28
p value
0.0
0.0
0.0
0.0
0.0
0.0
0.003
0.02
0.263
0.281
DFS,BFS, PPS,RNS,RAN
F value
1737.3
165.5
133.8
p value
0.0
0.0
0.0
GA
881.76 396.27 226.35 179.57 0.0
0.0
0.0
0.0
103.47 106.93 86.41 0.0
0.0
0.0
Fig. 13. Myers triangle program.
5.3.3. Research Question Two Research Question Two: Does branch ordering together with elitism and memory improve test data generation performance? The study compares each of RAN and RNS with each of BFS, DFS and PPS.
addresses the question “is mean number of generations for (DFS, BFS, PPS) lower than RNS for all population sizes?” Even though the answer to part (A) for a particular benchmark may not be significant, we still need to see if there has been an improvement. This is taken care of in part (B). 5.3.3.2. Discussion. It can be seen from Tables 7 and 8 that
5.3.3.1. Results. Results for different benchmark programs are presented in separate figures from Figs. 10–17. Table 7 compares RAN with each of BFS, DFS and PPS and Table 8 compares RNS with each of DFS, BFS and PPS. Table 8 presents the results in two parts. Part (A) addresses the question “is the difference in mean number of generations with RNS significant for all populations?” and part (B) 100000
a. Results with PPS are significantly improved and different from RAN and RNS for almost all the subjects of study. b. There is an improvement in the performance in mean number of generations of DFS, BFS over RNS for most benchmark programs.
GA: Mean Generaons
DFS
Mean Generaons
10000
BFS PPS
1000
RNS
100
RAN
10
6 10 16 20 26 30 36 40 46 50 56 60 66 70 76 80 86 90 96 100 106 110
1 Populaon
Benchmark Program
Calday
Meta heuristic Technique
Population Size ANOVA test
6
10
20
30
46
60
76
90
100
110
DFS,BFS, PPS,RNS
F value
10.4
7.981
5.057
2.468
2.486
1.142
1.708
0.612
0.685
0.695
p value
0.0
0.0
0.002
0.062
0.06
0.332
0.165
0.607
0.562
0.555
DFS,BFS, PPS,RNS, RAN
F value
6.609
22.14
7.375
3.19
3.673
1.266
0.976
1.007
1.328
1.159
p value
0.328
0.258
0.403
0.42
0.299
0.006
0.013
0.0
0.0
0.0
GA
Fig. 14. Calday program.
A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208
GA :Mean Percent Covergae
GA: Mean Generaons
1000000
DFS PPS RAN
BFS RNS
100000
99.9
Mean Percentage Coverage
Mean Generaons
10000000
1203
99.7
DFS
99.5
BFS
99.3 99.1
PPS
98.9
RNS
98.7
RAN
98.5 98.3
10000 6
16
26
36
Meta heuristic Technique
Benchmar k Program
Rectangle
46
56 66 Populaon
76
86
96
6
106
16
26
36
46
56 66 Populaon
76
86
96
106
90
100
110
Population Size ANOVA test
GA
6
10
20
30
46
60
76
DFS,BFS, PPS,RNS
F value
5.03
4.569
5.448
5.595
5.789
1.921
5.154
3.08
9.484
0.698
p value
0.002
F value
195.424
p value
0.0
0.001 71.19 2 0.0
0.001 53.28 3 0.0
0.001 62.57 8 0.0
0.126 65.92 4 0.0
0.002 161.7 1 0.0
0.027 168.6 18 0.0
0.0
DFS,BFS, PPS,RNS, RAN
0.004 99.51 7 0.0
0.554 105.36 6 0.0
215.6 0.0
Fig. 15. Rectangle program.
c. The difference in mean number of generations for DFS, BFS and RNS only is not always significant as is evident in Figs. 10–17 and Table 8. This is more so as the population size increases. A possible explanation for this is provided in Section 5.3.4.2. d. For all benchmark programs and population sizes, the results of RAN are extremely poor especially in terms of mean number of generations. For most benchmark programs, with the exception of Complex-branch and Calday, the difference is also significant with each of DFS, BFS and PPS. This can be seen in Table 7. With PPS the difference in significant for all programs, except Calday, and for all population sizes. For Calday, the difference in means between RAN and PPS is significant for small
5.3.4.1. Results. Results for different benchmark programs are presented in separate figures from Figs. 10–17. The results of the Tukey test are summarized in Table 9.
GA: Mean Percent Coverage 100
DFS PPS RAN
500000
BFS RNS
50000
Mean Percentage Coverage
99 98
DFS
97
BFS
96
PPS
95
RNS
94
RAN
93
5000 6
Benchmark Program
16
26
36
Meta heuristic Technique
46
56 66 Populaon
76
86
96
6 10 16 20 26 30 36 40 46 50 56 60 66 70 76 80 86 90 96 100 106 110
Mean Generaons
5.3.4. Research Question Three Research Question Three: Does the choice of a particular branch ordering scheme affect test data generation performance? The study compares PPS with each of DFS and BFS.
GA: Mean Generaons
5000000
Sthamer triangle
population sizes. This indicates that the strategies of elitism and memory together with branch ordering may have significantly contributed to improving GA performance.
106
Populaon
Population Size ANOVA test DFS,BFS, PPS,RNS
GA DFS,BFS, PPS,RNS, RAN
F value p value F value p value
6
10
20
30
46
60
76
90
100
110
20.74
12.88
7.745
15.75
2.479
3.913
3.661
1.276
3.018
1.359
0.0
0.0
0.0
0.0
0.061
0.009
0.013
0.282
0.03
0.255
85.46 8 0.0
124.9 53 0.0
114.4 95 0.0
122.94 9 0.0
137.8 51 0.0
143.79 9 0.0
95.22 3 0.0
79.39 5 0.0
94.92 2 0.0
78.03 6 0.0
Fig. 16. Sthamer triangle program.
A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208
GA: Mean Generaons
DFS PPS RAN
Mean Generaons
2000000
GA: Mean Percent Coverage
BFS RNS
100 Mean Percentage Coverage
1204
200000
20000
99 98
DFS
97
BFS
96 95
PPS
94
RNS
93
RAN
92
2000 6 10 16 20 26 30 36 40 46 50 56 60 66 70 76 80 86 90 96 100 106 110
6
16
26
36
46
Populaon
Benchmark Program
Meta heuristic Technique
Wegener triangle
56
66
76
86
96
106
Populaon
Population Size ANOVA test
6
10
20
30
46
60
76
90
110
DFS,BFS, PPS,RNS
F value
43.836
9.798
11.636
5.242
3.816 0.773
1.203
0.546
3.118
p value
0.0
0.002
0.001
0.023
0.052
0.274
0.461
0.079
DFS,BFS, PPS,RNS,RAN
F value
GA
p value
0.38
91.754 27.465 16.619 18.248 9.124 12.54 17.766 27.099 22.684 0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
Fig. 17. Wegener triangle program.
5.3.4.2. Discussion. It can be seen in Table 9 that PPS gives the best results, though it not always significant when compared with DFS and BFS. Further, considering Figs. 10–17 it can be observed that a. The mean number of generations with PPS is least for almost all population sizes and for all benchmark programs. Possible reasons are discussed in point (b) below. For ˛ = 0.05, significant difference is observable with smaller populations, up to 30, for most programs. Hence, it may possibly be concluded that larger population sizes may prove to be beneficial in achieving better performance.
b. For small populations, in particular with sizes 6, 10, 16 and 20, full branch coverage is not achieved for most of the benchmark program with DFS, BFS and RNS. This is also the reason for the very high mean number of generations for these population sizes as the GA terminates only after 107 generations when branch coverage is not achieved. However, full coverage is achieved with PPS for all population sizes for all benchmark programs with the exception of rectangle program and for population 6 for Sthamer triangle. In PPS, a new branch is usually chosen as a target taking into consideration prefixes of paths traversed in the current generation. As a result, individuals are usually present in the
Table 8 Summary of Tukey test RNS vs (DFS, BFS, PPS). Benchmark program name
(A) Is the difference in mean number of generations with RNS significant for all populations? (B) Is mean number of generations for (DFS, BFS, PPS) lower than RNS for all population sizes? DFS
BFS
PPS
Complex-Branch
A B
No Yes, from population size 50–80, 106–110
No Yes, from population size 10–90, 110
Yes Yes
Date
A B
Yes Yes
Yes Yes
Yes Yes
Michael Triangle
A B
Yes, from population size 6–36 Yes from population sizes 6–40, 50–100, 110
Yes, from population size 6–36 Yes, from population 6–36, 50–60, 70–96, 106–110
Yes, from population size 6–36 Yes
Myers Triangle
A B
No No
No Yes for population sizes 6, 10, 106
Yes, for population size 6 Yes from population sizes 6–96, 106–110
Calday
A B
No Yes from population sizes 10, 20–30, 40–56, 66–76, 90,106
No Yes from population sizes 6–10, 20,36,46–60,70–76, 86, 100–106
No Yes from population sizes 6–90, 106–110
Rectangle
A B
No Yes from population sizes 6, 10–56, 66, 80–110
No Yes from population sizes 6–80, 90–110
Yes, for population sizes from 6–46 Yes
Sthamer Triangle
A B
No Yes for population sizes 80, 96,100, 110
No Yes from population sizes 30, 56–110
No Yes
Wegener Triangle
A B
No Yes from population sizes 60–110
No Yes from population sizes 50, 60–110
Yes Yes
A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208
1205
Table 9 Summary of Tukey test: PPS vs (DFS, BFS). Benchmark Program Name
Is the difference in mean number of generations with PPS significant for all populations? Is the mean number of generations for PPS lower for all population sizes?
Complex-Branch Date
Michael Triangle Myers Triangle Calday Rectangle Sthamer Triangle Wegener Triangle
DFS
BFS
A B A
Yes Yes Yes
B A B A B A B A B A B A B
Yes No Yes for population sizes 16, 26–110 No Yes Yes, for population size 6 Yes from population sizes 6–106 No Yes No Yes Yes, for population size up to 56 Yes from population sizes 6–80, 90–106
Yes Yes Yes, for population size up to 30 for DFS and population size up to 50 for BFS; for population sizes from 80–110 Yes No Yes from population sizes 10–110 No Yes population sizes 6–100, 110 Yes, for population size 6 Yes from population sizes 6–90, 100–110 No Yes No Yes from population sizes 6–86, 96,110 Yes, for population size up to 56 Yes from population sizes 6–56, 66–80, 90–96
code that leads to the traversal of the sibling branch of the target branch. In the case of DFS and BFS this may not always be the case. If suppose the DFS (BFS) branch sequence is b1 , b2 , . . ., bn and after target bi is covered, branches bi+1 to bj-1 are skipped as they have already been covered in the previous generations and thus bj is selected as the next target, then individuals that lead to the traversal of the sibling branch may not be present in the population. These may have to be added from memory and if these are not available, then guidance through fitness function may have to be obtained to cover bj . This could be a reason for improved results with PPS. As expected, the mean number of generations is larger for smaller populations than for larger populations. Further observations specific to each benchmark are summarized in Tables 7 and 10. The chosen alpha (˛) value is 0.05. From these observations it may be concluded that PPS possibly the best scheme to adopt. 6. Threats to validity and limitations Threats to validity have been described in detail by Barros and Neto (2011) in the context of search based software engineering. We first consider threats to internal validity. As noted by Deb (1997), the main limitation of GA comes from the improper choice
of parameters such as population size, crossover and mutation probabilities and selection pressure. The GA may not work with improper parameter settings. (Arcuri and Fraser, 2011) point out that “tuning does have a critical impact on algorithmic performance, and over-fitting of parameter tuning is a dire threat to external validity of empirical analyses in SBSE.” In an attempt to resolve these problems, we have considered parameter and operator tuning in a comprehensive preliminary study and also conducted experiments for different population sizes. In both the preliminary and the main study, a hundred experiments were carried out for each combination of parameter and feature settings and population size. Other issues such as representation and randomness of numbers generated by the random number generator may affect performance. Representation has been discussed in Section 3.2 and the ‘C’ library random number generator functions srand() and rand() have been used in the experiments. Threats to construct validity may arise from the fact that we have measured performance using only mean number of generations and mean percentage coverage. Other measures, such as number of fitness evaluations, may also be used. Whether they lend a more precise meaning to performance needs to be evaluated. Threats to external validity may come from the choice of the subjects of study. Standard benchmark programs from Díaz et al. (2008) and Blanco et al. (2009) were chosen as the subjects of study.
Table 10 Comparison of Results for each Benchmark. Benchmark and figure reference
GA Mean generations
Comparison of results for each benchmark. PPS is significantly lower for all population sizes. Complex-Branch program (Fig. 10) There is a significant difference in means between RNS, Date program (Fig. 11) BFS, DFS, and PPS. PPS is lowest. Michael Triangle Program (Fig. 12) Significant difference between BFS, DFS, PPS and RNS is not observable for population sizes greater than 46. Significant difference between BFS, DFS, PPS and RNS is Myers Triangle Program (Fig. 13) not observable for population sizes greater than 86. PPS is lower for small population sizes. Calday Program (Fig. 14) PPS is lower for all population sizes less than 106. Rectangle Program (Fig. 15) Sthamer Triangle Program (Fig. 16) Wegener Triangle Program (Fig. 17)
PPS is lower for some population sizes. For large population sizes, the results are comparable to BFS. Significant difference between BFS, DFS, PPS and RNS is not observable for all population sizes.
Mean percentage coverage Full coverage is achieved even with small population sizes in PPS. Full coverage is achieved even with small population sizes in PPS. PPS gives 100% coverage even with small population sizes. Except for RAN, full coverage is achieved for the all branch orderings and all population sizes. Full coverage is achieved for the all branch orderings. PPS gives comparatively better coverage for small population (6 to 16) and full coverage thereafter. PPS gives 100% coverage even with small population sizes. PPS gives 100% coverage even with small population sizes.
A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208
7. Conclusion Metaheuristic techniques, and in particular the genetic algorithm, have now been applied extensively to the problem of automated test data generation. However, as research points out, the application of metaheuristic techniques poses new challenges. One of the problems faced is that the population may not contain any individual that encodes test data for which the execution path reaches the predicate node of the target branch. In order to deal with this problem, in this paper, we (a) have introduced three approaches for ordering branches for selection as targets for coverage, namely, depth first strategy, breadth first strategy and the path prefix strategy and (b) have considered elitism and memory along with branch ordering to improve test data generation performance. Extensive experiments have been carried out on standard benchmark programs with genetic algorithm (GA) whose implementation has also been described in detail in this paper. A preliminary study was carried out to frame the research questions and fine tune GA parameters which were then used in the final experiments. Results indicate that • a scheme in which the population is initialized once at the beginning of the GA run, the fitness is computed using approximation level and normalized branch distance and is maximized, the branch ordering strategy is the path prefix strategy, memory and elitism are used and for the genetic algorithm, binary tournament selection with two point crossover, Pm – 0.01 and Pc – 1.0, gives the best performance in terms of mean number of generations and coverage. • Results for RAN indicate that the strategies of elitism, memory and branch ordering may have significantly contributed to improving performance of GA. • Better performance may be possible for programs with integer inputs as compared to real inputs and large population sizes. Further, experiments with real world programs in large projects will provide additional empirical evidence on the potential of the suggested improvements.
Appendix A. The appendix details the outcomes of the preliminary study on the following: (i) a comparison of maximization and minimization approaches to fitness function computation, i.e., cases (F) and (J) of Section 5.2.1, (ii) a comparison of different tournament sizes, i.e., case (H) of Section 5.2.1 and (iii) selection of an appropriate setting for mutation probability Pm . Fig. A.1 shows a comparison of mean generations between maximization (F) and minimization (J) functions. As can be seen in Fig. A.1 and Table A.1, there is no significant difference in the mean number of generations over different population sizes (considering an alpha level of 0.05). We thus choose to choose the scheme described in Section 3.2 for fitness computation for our experiments. Fig. A.2 presents the results for experiments with different tournament sizes, i.e., (H). Even though the difference in means is not significant, it can be seen that the best results, even with small population sizes, is with a tournament size of two. This size leads to the weakest selection pressure which may be the reason for its better performance (Back et al., 1997). Although the difference in mean number of generations is not significant for large population sizes, taking into account the overall performance, tournament selection with tournament size two, i.e., binary tournament is the preferred selection scheme.
F
Mean Generaons
The programs have a number of features that make them suitable for testing different approaches for test data generation. However, these are not real world programs which may be their limitation, but whether such programs would have led to different conclusions is a matter for further investigation. The considerations in eliminating threats to validity in this study may well be its limitations also.
J
200000
20000
2000
6
10 16 20 26 30 36 40 46 50 56 60 66 70 76 80 86 90 96 100 106 110
Populaon Size
Fig. A.1. Plots of population size vs mean number of generations for (F) and (J).
tournament selecon of size 2
tournment selecon of size 3
pool-tournment
tournment selecon of size 4
600000 500000 Mean Generaon
1206
Acknowledgements
400000 300000 200000 100000
This work was supported by the UGC Major Project Grant F.No. 36-70/2008 (SR) for which the authors are thankful. The authors are also extremely grateful to the anonymous referees who have helped shape this paper to its present form with their insightful comments and guidance.
0
6 10 16 20 26 30 36 40 46 50 56 60 66 70 76 80 86 90 96 100 106 110 Populaon Size
Fig. A.2. Plots of population size vs mean number of generations for different tournament sizes.
Table A.1 Results of ANOVA for mean number of generations for (F) and (J). Benchmark program
ANOVA test (F &J)
Preliminary study
F value p value
Population size 6
10
0 2.37 0.983 0.125
16
20
26
30
46
60
76
90
100
110
0 0.986
0.006 0.938
0.58 0.447
0.16 0.689
0.12 0.725
0.66 0.419
0.15 0.701
0.16 0.691
0.04 0.835
0.46 0.499
A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208
Populaon Size 30
10000000
Mean Generaons
1000000 100000 10000 1000 100 10 1 0.001
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.05
0.06
Mutaon rates Fig. A.3. Plot of mutation rates vs mean number of generations for population size 30.
Experiments were also carried out to determine an appropriate setting for mutation probability Pm . Fig. A.3 shows the effect of mutation probability on the number of generations for population size 30. It can be seen that a minimum number of generations to achieve coverage is obtained with Pm = 0.01. Since with higher probability, possible disruption of individuals increases, the number of generations increases as Pm increases. References Agarwal, K., Srivastava, G., 2010. Towards software test data generation using discrete quantum particle swarm optimization. In: Proceedings of the 3rd India Software Engineering Conference (ISEC’10), ACM, New York, NY, USA. Ahmed, M.A., Hermadi, I., 2008. GA-based multiple paths test data generator. Computers & Operations Research 35 (10), 3107–3124. Ali, S., Briand, L.C., Hemmati, H., Panesar-Walawege, R.K., 2010. A systematic review of the application and empirical investigation of search-based test case generation. IEEE Transactions on Software Engineering 36 (6), 742–762. Andreou, A.S., Economides, K.A., Sofokleous, A.A., 2007. An automatic software testdata generation scheme based on data flow criteria and genetic algorithms. In: Proceedings of the 7th IEEE International Conference on Computer and Information Technology Fukushima, Japan (CIT ‘07), pp. 867–872. Arcuri, A., Fraser, G., 2011. On parameter tuning in search based software engineering. In: International Symposium on Search Based Software Engineering (SSBSE). Arcuri, A., Yao, X., 2007. A memetic algorithm for test data generation of objectoriented software. Congress on Evolutionary Computation (CEC). Arcuri, A., 2010. It does matter how you normalize the branch distance in search based software testing. In: Third International Conference on Software Testing, Verification and Validation (ICST’2010), pp. 205–214. Back, T., Fogel, D.B., Michalewicz, Z. (Eds.), 1997. Handbook of Evolutionary Computation. , 1st ed. IOP Publ. Ltd., Bristol, UK. Baresel, A., Sthamer, H., Schmidt, M., 2002. Fitness function design to improve evolutionary structural testing. In: Langdon, W.B., Cant’u-Paz, E., Mathias, K., Roy, R., Davis, D., Poli, R., Balakrishnan, K., Honavar, V., Rudolph, G., Wegener, J., Bull, L., Potter, M.A., Schultz, A.C., Miller, J.F., Burke, E., Jonoska, N. (Eds.), Proceedings of the 2002 Conference on Genetic and Evolutionary Computation (GECCO’02). Morgan Kaufmann Publishers, New York, USA, pp. 1329–1336. Barros, M., Neto, A.C.D., 2011. Threats to Validity in Search-based Software Engineering Empirical Studies. Relatórios Técnicos do DIA/UNIRIO, No. 0006/2011. Universidade Federal Do Estado Do Rio De Janeiro. Beizer, B., 2002. Software Testing Techniques, 2nd ed. Dreamtech Press, New Delhi, India. Bertolino, A., 2007. Software Testing Research: Achievements, Challenges, Dreams. Future of Software Engineering (FOSE ‘07), 85–103. Biezer, B., 2009. Software Testing Techniques, 2nd ed. Dreamtech Press India. Blanco, R., Tuya, J., Díaz, B.A., 2009. Automated test data generation using a scatter search approach. Information and Software Technology 51 (4), 708–720. Blanco, R., Tuya, J., Díaz, E., Díaz, B.A., 2007. A scatter search approach for automated branch coverage in software testing. International Journal of Engineering Intelligent Systems (EIS) 15 (3), 135–142. Castro, L.N., Von Zuben, F.J., 2002. Learning and optimization using the clonal selection principle. IEEE Transactions on Evolutionary Computation 6 (3), 239–251. Chen, Y., Zhong, Y., 2008. Automatic path oriented test data generation using a multi population genetic algorithm. In: Proceedings of the 4th International Conference on Natural Computation, Jinan, China (ICNC’08), pp. 565–570. Chen, Y., Zhong, Y., Shi, T., Liu, J., 2009. Comparison of Two Fitness Functions for GA-Based Path-Oriented Test Data Generation. In: In Proceedings of the 2009 Fifth International Conference on Natural Computation, vol. 4, Washington, DC, USA (ICNC ‘09), pp. 177–181. Díaz, E., Tuya, J., Blanco, R., 2003. Automated software testing using a metaheuristic technique based on tabu search. In: Proceedings of the 18th IEEE International
1207
Conference on Automated Software Engineering, Montreal, Canada (ASE’03), pp. 310–313. Díaz, E., Tuya, J., Blanco, R., Dolado, J.J., 2008. A tabu search algorithm for structural software testing. Computers & Operations Research 35 (10), 3052–3072. Deb, K., 1997. Limitations of evolutionary computation methods. In: Back, T., Fogel, D.B., Michalewicz, Z. (Eds.), Handbook of Evolutionary Computation. , 1st ed. IOP Publ. Ltd. and Oxford University Press, Bristol, UK, pp. B2.9:1–B2.9:2. Ferguson, R., Korel, B., 1996. The chaining approach for software test data generation. ACM Transactions On Software Engineering and Methodology 5 (1), 63–86. Ghani, K., Clark, J., 2009. Automatic test data generation for multiple condition and MCDC coverage. In: Proceedings of the 2009 Fourth International Conference on Software Engineering Advances (ICSEA ‘09), Washington, DC, USA. IEEE Computer Society, pp. 152–157. Girgis, M.R., 2005. Automatic Test Data Generation for Data Flow Testing using a Genetic Algorithm. Journal of Universal Computer Science 11 (6), 898–915. Goldberg, D.E., 1989. Genetic Algorithms in Search, Optimization and Machine Learning. Pearson Education, Delhi, India. Gross, H., Kruse, P., Wegener, J., Vos, T.,2009. Evolutionary white-box software test with the Evo. Test framework: a progress report. In: Proceedings of the IEEE International Conference on Software Testing, Verification, and Validation Workshops (ICSTW ‘09). IEEE Computer Society, Washington, DC, USA, pp. 111–120. Harman, M.,2007. The current state and future of search based software engineering. In: 2007 Future of Software Engineering (FOSE ‘07). IEEE Computer Society, Washington, DC, USA, pp. 342–357. Harman, M.,2008. Testability transformation for search-based testing. In: Keynote of the 1st International Workshop on Search-Based Software Testing (SBST) in Conjunction with ICST 2008. Lillehammer, Norway. Harman, M., Mansouri, A., 2010. Search based software engineering: introduction to the special issue. IEEE Transactions on Software Engineering 36 (6), 737–741. Harman, M., McMinn, P., 2010. A theoretical and empirical study of search-based testing: local, global, and hybrid search. IEEE Transactions on Software Engineering 36 (2), 226–247. Harman, M., Mansouri, A., Zhang, Y., 2009. Search based software engineering: A comprehensive analysis and review of trends techniques and applications. Technical Report TR-09-03, Department of Computer Science, King’s College London. Jones, B.F., Eyres, D.E., Sthamer, H.-H., 1998. A strategy for using genetic algorithms to automate branch and fault-based testing. Computer Journal 41 (2), 98–107. Jones, B.F., Sthamer, H.-H., Eyres, D.E., 1996. Automatic structural testing using genetic algorithms. Software Engineering Journal 11 (5), 299–306. Korel, B., 1990. Automated software test data generation. Transactions on Software Engineering 16 (8), 870–879. Li, H., Lam, C.P., 2005. Software test data generation using ant colony optimization. Proceedings of World Academy of Science, Engineering and Technology. Liaskos, K., Roper, M.,2008. Hybridizing evolutionary testing with artificial immune systems and local search. In: Proceedings of 1st International Workshop on Search-Based Software Testing (SBST) in Conjunction with ICST 2008. Lillehammer, Norway, pp. 211–220. McMinn, P., Holcombe, M., 2006. Evolutionary testing using an extended chaining approach. Evolutionary Computation 14 (1), 41–64. McMinn, P., 2004. Search-based software test data generation: a survey. Software Testing, Verification and Reliability 14 (2), 105–156. McMinn, P., 2011. Search-based software testing: past, present and future. In: Fourth International Conference on Software Testing, Verification and Validation Workshops (ICSTW’2011), pp. 153–163. McMinn, P., Binkley, D., Harman, M., 2008. Empirical Evaluation of a Nesting Testability Transformation for Evolutionary Testing. ACM Transactions on Software Engineering Methodology. Mehrmand, A., 2009. A Factorial Experiment on Scalability of Search-based Software Testing. Master’s Thesis, Thesis Number: MSE-2009:20, Blekinge Institute of Technology, Sweden. Michael, C., McGraw, G., Schatz, M., 2001. Generating software test data by evolution. IEEE Transaction on Software Engineering 27, 1085–1110. Michael, C.C., McGraw, G.E., Schatz, M.A., Walton, C.C., 1997. Genetic algorithms for dynamic test data generation. In: Proceedings of the 12th IEEE International Conference on Automated Software Engineering, Incline Village, NV, USA. IEEE Computer Society. Michalewicz, Z., 1996. Genetic Algorithms + Data Structures = Evolution Programs. Springer-Verlag, Berlin, Heidelberg. Pachauri, Gursaran, A., 2012. Comparative evaluation of a maximization and minimization approach for test data generation with genetic algorithm and binary particle swarm optimization. International Journal of Software Engineering & Applications (IJSEA) 3 (1), 443–454. Pargas, R.P., Harrold, M.J., Peck, R.R., 1999. Test-data generation using genetic algorithms. The Journal of Software Testing, Verification and Reliability 9 (4), 263–282. Prather, R.E., Myers, J.P., 1987. The Path Prefix Software Testing Strategy. IEEE Transactions on Software Engineering 13 (July (7)), 761–766. Radcliffe, N.J., 1997. Introduction: theoretical foundations and properties of evolutionary computations. In: Back, T., Fogel, D.B., Michalewicz, Z. (Eds.), Handbook of Evolutionary Computation. , 1st ed. IOP Publ. Ltd. and Oxford University Press, Bristol, UK, pp. B2.1:1–B2.1:7. Tan, X.B., Longxin, C., Xiumei, X., 2009. Test data generation using annealing immune genetic algorithm. In: Fifth International Joint Conference on INC, IMS and IDC, 2009, NCM ‘09, pp. 344–348.
1208
A. Pachauri, G. Srivastava / The Journal of Systems and Software 86 (2013) 1191–1208
Tracey, N., Clark, J., Mander, K., 1998. Automated program flaw finding using simulated annealing. In: Tracz, W. (Ed.), Proceedings of the 1998 ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM Press, New York, pp. 73–81. Wang, Y., Bai, Z., Zhang, M., Du, W., Qin, Y., Liu, X., 2008. Fitness calculation approach for the switch case construct in evolutionary testing. In: Keijzer, M. (Ed.), Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation (GECCO’08). Atlanta, GA, USA, pp. 1767–1774. Wegener, J., Baresel, A., Sthamer, H., 2001. Evolutionary test environment for automatic structural testing. Information and Software Technology Special Issue on Software Engineering using Metaheuristic Innovative Algorithms 43 (14), 841–854. Wegener, J., Buhr, K., Pohlheim, H.,2002. Automatic test data generation for structural testing of embedded software systems by evolutionary testing. In: Proceedings of the 2002 Genetic and Evolutionary Computation Conference (GECCO’02). Morgan Kaufmann Publishers Inc., New York, USA, pp. 1233–1240. Windisch, A., Wappler, S., Wegener, J., 2007. Applying particle swarm optimization to software testing. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2007), London, UK, July 2007, pp. 1121–1128.
Xiao, M., El-Attar, M., Reformat, M., Miller, J., 2007. Empirical evaluation of optimization algorithms when used in goal-oriented automated test data generation techniques. Empirical Software Engineering 12, 183–239. Zhu, H., Patrick, A.V., Hall John, H.R., 1997. Software unit test coverage and adequacy. ACM Computing Surveys 29 (4), 366–427. Ankur Pachauri received his Ph.D. degree from the Dayalbagh Educational Institute in 2013 and his Master’s Degree in Computer Applications from Dr. B.R. Ambedkar University, Agra, in 2006. His research interests are in searchbased software engineering and software testing. Dr. Gursaran Srivastava received his B.Sc. Engineering degree from the Dayalbagh Educational Institute (DEI), M.Tech. degree from IIT Kanpur and Ph.D. degree from DEI in 1987, 1989 and 1997 respectively. His Ph.D. thesis focused on verification of designs in object-oriented software systems, rule-based systems and subjectivity in software measurement. At present he is a Professor in the Department of Mathematics at DEI where he teaches courses in computer science and mathematics. His research interests are in verification and validation of software systems, search based software engineering, evolutionary algorithms, graph algorithms and context aware systems for e-Consultation and e-Learning. He has completed funded research projects in these areas and has published in leading international and national journals.