Available online at www.sciencedirect.com Available online at www.sciencedirect.com
Procedia Engineering
Procedia Engineering 00 (2011) 000–000 Procedia Engineering 15 (2011) 1186 – 1190 www.elsevier.com/locate/procedia
Advanced in Control Engineeringand Information Science
Genetic Algorithm and its Application in the path-oriented test data automatic generation Liu Shimin, Wang Zhangang ∗ Tianjin Polytechnic University, 63, Chenglin Road, Hedong District, Tianjin, China
Abstract In order to solve the problem of test data redundancy and some paths uncovered in the process of path-oriented software test data random generating automatically. This paper proposes a test data generating automatically approach base on genetic algorithm. Through genetic algorithm’s global optimization function, generates enough test data and reduces redundancy so that guarantee the test fully and efficiently. In the paper, we show the principle and procedures of genetic algorithm test data automatic generation. We do the experiment in MATLAB environment and carry on simulation and analysis to the experiment result.
© 2011 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of [CEIS 2011] Keyword:genetic algorithm; path-oriented; test data redundancy; test data atuomatic generation;
1. Introduction With the software scale getting bigger and bigger, it needs to generate massive test data in the test. Test data automatic generation receives more and more people’s attention. Random test data automatic generation method is simple and can generate lots of data quickly. But this method is great blindness and subject to data distribution effect, so it will result in a large number of data redundancies. Genetic algorithm is a optimization algorithm of simulating biological evolution, and it has been proposed by Holland in the 1960s. As a highly efficient search optimization algorithm, it shows an unique advantage in
∗
Corresponding author. Tel.:+86-13602084943. E-mail address:
[email protected].
1877-7058 © 2011 Published by Elsevier Ltd. doi:10.1016/j.proeng.2011.08.219
1187
Liu Shimin Liu andShimin Wang Zhangang / Procedia Engineering 15 (2011) 1186 – 1190 ,et al/ Procedia Engineering 00 (2011) 000–000
2
addressing the high complex problem of big space, multimodal, nonlinear, global change, and so on. We introduce genetic algorithm to path-oriented test data automatic generation model. Using the fitness function adjust the test data dynamically to reduce the test redundancy. 2. Basic principle 2.1. Genetic algorithm Genetic algorithm is a calculation model of simulating biological genetics and natural selection mechanism’s biological evolution process. It is an approach of searching the optimization by simulating natural evolution process. After encoding chromosome, genetic algorithm finds the optimal solution from global view through crossover and mutation of genetic operation. Genetic algorithm can be denoted formally as G=(C, E, P0, N, Φ, Ψ, Γ, T) where C is the encoding method for individual; E is fitness function of individual; P0 is initial population; N is population size; Φ is selection operator; Ψ is crossover operator; Γ is mutation operator; T is the ending condition, it usually expresses the biggest iteration of algorithm implementation. The parameter set of actual problem is encoded based on encoding rules. The parameter is encoded as bit string which can be recognized by genetic algorithm, and through initialization processing generates initial population. The population is developed by adaptive way. It eliminates the individual whose fitness function value is low by judging the fitness function value, then it operates selection, crossover, mutation operations. Finally, it generates a new generation. After repeated iteration, we get the set of optimum solution. 2.2. Path division The path-oriented test data generation can be described as follow: given a program P and a path of P, suppose that D is the input space of P, find x(x∈D) when x is a input data, it cause path u to be covered during execution. We call the control flow path table as PATH_T, and path u=
. Path u is executed by P when the input data is x. We call u as a trigger path of test data x, named U(x, u). The test data set which cause path u to be covered can be denoted as T(u). If path u triggered by data x could execute successfully, let’s put u into set PATH_T, otherwise we says this path execution abnormality. If different test data can trigger the same path, we call these data as this path test redundancy. Suppose that the number of test data set T(u) is N, and N≥0, the value of elements of this set for this trigger path is
⎧1/ N , N > 0 g=⎨ . N =0 ⎩0,
(1)
The value of the same kind of test data for test is equal. The size of this test data set is called test redundancy. The lager the value is ,the more this test data appears easily. A single test data found bug’s ability drops, and it is not conducive to discover new path error. 3. Test data generation based on genetic algorithm The path-oriented test data generation’s goal is to generate test data for a specified path. Its test data generation system concludes three parts: program analyzer, path selector and test data generator. In the structured programming, the program is converted to control flow graph by program analyzer, then
1188
Liu Shimin&and Wang Zhangang / Procedia Engineering 15 (2011) 1186 – 1190 Liu Shimin Wang Zhangang/ Procedia Engineering 00 (2011) 000–000
3
through the path selector select a path, finally the test data generator generates the test data for the given path which meet the requirements. • Generally we always generate the initial population by random way. The size of the initial population can’t be too large or too small. So we take 20 to 100. We assign the randomly generated initial population to the current test data set T. The first generation of randomly generated test data is affected by the distribution of the area of input data to generate more similar or same easy-executable trigger path. Merge the similar test data and generate the current trigger path set. Through the given program, init the trigger path PATH_T, and design the weight of the path. Supposed that each path of the set has the same execution probability, then they have the same initial weight 1/n (n is the path number.). • Genetic algorithm on resolving the test data generation problem needs convert the problem’s solution space to chromosome form which is expressed by encoding way. Test data generation process is actually alternate between coding and decoding process. In this paper, we use binary parameters cascade encoding • The calculation of the fitness function is the key of genetic algorithm. It will affect the genetic algorithm efficiency directly. We import the single test data’s value for the path to compute the fitness. The same type of test data set’s all test data has the same value for the trigger path.
gi =
1
n (u i ) * m ( xui )
(2)
n(ui ) is the specified path number, m( xui ) is the number of the test data of the test data set of the path ui.. Let fitness function f ( xi ) = gi ,the computation is adaptive. The path which is easy to be
triggered is corresponding the larger test data set and its fitness will also decrease. Instead, the set which has less test data has higher fitness, and the algorithm will select the data which has higher fitness to conduct the next operation. • Selection is that copy the current population individuals with fitness proportional probabilities into new population. Selection operator enhances the population’s average fitness value, but it doesn’t generate new individual. Crossover is to make the individual of a new population matching randomly, then the individual which be paired off will exchange part genes according to some ways. Crossover modes have single point, two points, uniform crossover, etc. Crossover probability control the cross frequency. Because crossover is the main method of genetic algorithm generating new individual, so the crossover probability usually takes larger value. But if it is too great, it will destroy the population’s fine mode, so we generally take 0.4~0.9. Mutation is to change a individual’s some gene or certain genes by a smaller probability. Generally the probability takes a small value between 0.0001 and 0.1. Crossover decides genetic algorithm’s global search ability, and mutation decides genetic algorithm’s local search ability. • The ending condition of the algorithm is when the maximum fitness value of non-empty sets of the test data sets of all trigger paths is below a certain threshold, then stop. The threshold guarantees that the number of the test data of the trigger path appears hardest can reaches a certain quantity. The execution of algorithm is shown in figure 1. PATH_T is a path set, and CUR_T is current population, and T is test data set. 4. The analysis of experimental result We take a typical triangle classification for example, and realize the test data generation based on genetic algorithm in MATLAB. MATLAB is powerful mathematical software; it has the outstanding
4
1189
Liu Shimin Liu andShimin Wang Zhangang / Procedia Engineering 15 (2011) 1186 – 1190 ,et al/ Procedia Engineering 00 (2011) 000–000
numerical computation ability. Using the MATLAB processing matrix powerful function to compile genetic algorithm generates test data program has a big advantage. Begin Init population CUR_T, T=CUR_T, TP(P)=PATH_T Use x ∈CUR_T to run program P, and trigger path u, put x in T(u)
Compute the fitness of the test data set of u
Y
Maximum fitness value
Ending, return T
N Calculation fitness proportion, Selection, Crossover and mutation
Generate new population
Fig. 1. path-oriented test data generation based on genetic algorithm execution flow chart
Set the input parameters a, b, c range for [1,500] and the size of population is 100; the type of the population is binary string; crossover takes single-point crossover; mutation takes uniform way; the threshold value takes 0.0006. By using the genetic algorithm and random testing method respectively, we get the results shown as table 1. From the table, we can see that the genetic algorithm can generate more test data. Among them like an isosceles triangle (ISO), an equilateral triangle (EQU) this kind of paths has more constraint conditions can also generate more test data, and make them test enough. This algorithm can not only generate a certain amount of test data for each path to ensure the test sufficiency, but also it can reduce the test data redundancy. Compared with randomly generated test data method has great superiority. The path-oriented test data automatic generation based on genetic algorithm can not only complete a group of target path’s test data automatic generation, but also can generate optimal test data for a specified target path. For example, let’s test the path of the below program whose judge statements are all true, and generate corresponding test data. Table 1. Based on Genetic algorithm generating test data
GA_GEN
PATH_T
RUN
INVALID
TRL
ISO
EQU
5
65
2563
2576
603
347
1190
Liu Shimin&and Wang Zhangang / Procedia Engineering 15 (2011) 1186 – 1190 Liu Shimin Wang Zhangang/ Procedia Engineering 00 (2011) 000–000
If(a>=b){if(a==b)return 0;else return 1;}else return -1; We also take the binary coding method, single-point crossover and crossover’s probability is 0.8; mutation’s probability is 0.01; population size is 20; the biggest algebra is 100. The result is shown as figure 2.
Fig. 2. Test data for a single path generating
From the graph we can see that the function’s convergence speed is very fast, and the fitness value achieve optimization soon. In the graph, it show the test data’s optimal value (1,1), and it conform to the coverage requirement. 5. Conclusions In this paper, we use genetic algorithm to achieve the path-oriented test data automatic generation. According to the fitness the algorithm adjusts test data number dynamically, and makes the path whose input fields is narrow to get a sufficient testing, and ensure each of the execution paths can generate a certain amount of test data to guarantee the test adequacy. Furthermore, the algorithm reduces the test data of redundancy and improves the efficiency of test data generation. Through experiments we realize the test data generation for a group of target paths and a single target path. In this paper, there exist some flaws, such as the space occupied bigger when parameter is much larger, etc. we need to improve these. Fitness function improvement is also an important direction of the future research. References [1]G. J. Myers, The Art of Software Testing.2nd ed.:John Wiley & Sons Inc; 2004 [2]Holland J., Adaptation in Natural and Artificial Systems: University of Michigan, Inc. Michigan, USA;1975. [3]Xue Renzuo, Software Reliability Engineering: Tsinghua University, Beijing, China; 2007. [4]B. Korel, “Automated software test data generation”, IEEE Transactions on Software Engineering; 1990, p.870-879. [5]A. Baresel, S. Harmen, and S. Michael, “fitness Function Design To Improve Evolutionary Structural Testing”, in Proceedings of the Genetic and Evolutionary Computation Conference: Morgan Kaufmann Publishers Inc;2002.
5