Information and Software Technology 43 (2001) 841±854
www.elsevier.com/locate/infsof
Evolutionary test environment for automatic structural testing Joachim Wegener*, Andre Baresel, Harmen Sthamer DaimlerChrysler AG, Research and Technology, Alt-Moabit 96a, D-10559 Berlin, Germany
Abstract Testing is the most signi®cant analytic quality assurance measure for software. Systematic design of test cases is crucial for the test quality. Structure-oriented test methods, which de®ne test cases on the basis of the internal program structures, are widely used. A promising approach for the automation of structural test case design is evolutionary testing. Evolutionary testing searches test data that ful®l a given structural test criteria by means of evolutionary computation. In this work, an evolutionary test environment has been developed that performs fully automatic test data generation for most structural test methods. The introduction of an approximation level for ®tness evaluation of generated test data and the de®nition of an ef®cient test strategy for processing test goals, increases the performance of evolutionary testing considerably. q 2001 Elsevier Science B.V. All rights reserved. Keywords: Test automation; Structural test; Evolutionary test; Evolutionary computation
1. Introduction Testing is the most signi®cant analytic quality assurance measure for software. Often, more than 30% of the entire development expenditure is allocated to testing. The crucial activity for the test quality is the test case design, since type and scope of the test are determined by selecting feasible test cases. Existing test case design methods can essentially be differentiated into black-box tests and white-box tests. With black-box tests, test cases are determined from the speci®cation of the program under test, whereas with the white-box tests, they are derived from the internal structure. In both cases, complete automation of the test case design is dif®cult. Automation of the black-box test is only meaningfully possible if a formal speci®cation exists. A wide range of commercial tools is available to support white-box tests, e.g. TESSY [28], Cantata [5], Attol [1], and TCAT [27]. However, these tools are limited to program code instrumentation and coverage measurement. The test case design itself is further reliant on the tester. Since the de®nition of structural test criteria, it has been desirable to automate test case determination, thus arriving at clear cost savings. A promising start towards achieving this is the evolutionary test, described since the mid 1990s in various papers, e.g. Refs. [12,26,29]. Evolutionary tests * Corresponding author. E-mail addresses:
[email protected] (J. Wegener),
[email protected] (A.
[email protected] (H. Sthamer).
Baresel),
comprise the use of search methods for test data generation. The principle behind evolutionary testing is the conversion of the test goal into an optimisation problem that is then solved by the application of metaheuristic search methods, e.g. evolutionary algorithms. In this work our preceding work on evolutionary structural testing of Ada based programs [12,26] is enhanced. A test environment for programs written in C has been developed, in which the test case design for control-¯ow- and data-¯ow-oriented structural test methods is automated. The features of the test environment reach beyond previous work in this ®eld. The test environment contains a powerful graphical user interface to specify the input domain of the test object. All test activities, such as: ² ² ² ²
test organisation, instrumentation of the test object, test execution, and monitoring
are performed fully automatically. If no test oracle is available, the tester will be required to perform test evaluation personally. Furthermore, several improvements of the evolutionary test itself have been implemented in the evolutionary test environment: ² ®tness evaluation of individuals is enhanced by an approximation level which is based on the test object's
0950-5849/01/$ - see front matter q 2001 Elsevier Science B.V. All rights reserved. PII: S 0950-584 9(01)00190-2
842
J. Wegener et al. / Information and Software Technology 43 (2001) 841±854
tion of different test tasks has already been proven in preceding work, e.g. Refs. [26,34]. The only prerequisites for the application of evolutionary tests are an executable test object and its interface speci®cation. In addition, for the automation of structural testing, the source code of the test object must be available to enable its instrumentation. 2.1. Brief introduction to evolutionary algorithms
Fig. 1. Evolutionary algorithms.
control ¯ow graph and which enables comparison of individuals executing different program paths, and ² seeding of good initial individuals and an ef®cient scheduling strategy for the evolutionary tests ensure a high overall test performance. Section 2 of this paper is a short introduction to evolutionary testing and its breadth of application. It also describes the use of evolutionary testing to automate structural testing. Previous work is acknowledged and discussed. In Section 3, improvements and developments in evolutionary structural testing are described in detail. In Section 4, the developed test environment components are brie¯y presented. Section 5 comprises the ®rst results obtained from the employment of the test environment. Finally, Section 6 summarises the main points and provides suggestions for future research within this ®eld. 2. Evolutionary testing Evolutionary testing designates the use of metaheuristic search methods for test case generation. The input domain of the test object forms the search space in which one searches for test data that ful®l the respective test goal. Due to the non-linearity of software (if-statements, loops, etc.) the conversion of test problems to optimisation tasks mostly results in complex, discontinuous, and non-linear search spaces. Neighbourhood search methods like hill climbing are not suitable in such cases. Therefore, metaheuristic search methods are employed, e.g. evolutionary algorithms, simulated annealing or tabu search. In this work, evolutionary algorithms will be used to generate test data, since their robustness and suitability for the solu-
Evolutionary algorithms represent a class of adaptive search techniques and procedures based on the processes of natural genetics and Darwin's theory of biological evolution. They are characterised by an iterative procedure and work in parallel on a number of potential solutions for a population of individuals. Permissible solution values for the variables of the optimisation problem are encoded in each individual. The fundamental concept of evolutionary algorithms is to evolve successive generations of increasingly better combinations of those parameters that signi®cantly affect the overall performance of a design. Starting with a selection of good individuals, the evolutionary algorithm tries to achieve the optimum solution by random exchange of information between increasingly ®t samples (recombination) and the introduction of a probability of independent random change (mutation). The adaptation of the evolutionary algorithm is achieved by selection and reinsertion operators, which are based on ®tness. The selection operators control which individuals are selected for reproduction depending on the individuals' ®tness values. The reinsertion strategy determines how many and which individuals are taken from the parent and the offspring population to form the next generation. The ®tness value is a numerical value that expresses the performance of an individual with regard to the current optimum so that different individuals can be compared. The notion of ®tness is fundamental to the application of evolutionary algorithms; the degree of success in using them may depend critically on the de®nition of a ®tness that changes neither too rapidly nor too slowly with the design parameters of the optimisation problem. The ®tness function must guarantee that individuals can be differentiated according to their suitability for solving the optimisation problem. Fig. 1 provides an overview of a typical procedure for evolutionary algorithms. Firstly, a population of guesses to the solution of a problem is initialised, usually at random. Each individual within the population is evaluated by calculating the ®tness. Usually, this results in a spread of solutions ranging in ®tness from very poor to good. The remainder of the algorithm is iterated until the optimum is achieved, or another stopping condition is ful®lled. Pairs of individuals are selected from the population, according to the pre-de®ned selection strategy, and combined in some way to produce a new guess analogous to biological reproduction. Combination algorithms are many and varied [19]. Additionally, mutation is applied. New individuals are
J. Wegener et al. / Information and Software Technology 43 (2001) 841±854
evaluated for their ®tness. Survivors into the next generation are chosen from parents and offspring, often according to ®tness. It is important, however, to maintain diversity in the population to prevent premature convergence to a sub-optimal solution. 2.2. Application to software testing In order to automate software tests with the aid of evolutionary algorithms, the test goal must itself be transformed into an optimisation task. For this, a numeric representation of the test goal is necessary, from which a suitable ®tness function for evaluation of the generated test data can be derived. If an appropriate ®tness function can be de®ned, then the evolutionary test proceeds as follows. Usually, the initial population is generated at random. Each individual of the population represents a test datum with which the test object is executed. For each test datum the execution is monitored and the ®tness value is determined for the corresponding individual. Next, population members are selected with regard to their ®tness and subjected to combination and mutation to generate new offspring. It is important to ensure that the test data generated are in the input domain of the test object. Offspring individuals are then also evaluated by executing the test object with the corresponding test data. A new population is formed, by combining offspring and parent individuals, according to the survival procedures laid down. From here on, this process repeats itself, starting with selection, until the test objective is ful®lled or another given stopping condition is reached. Depending on which test goal is pursued, different ®tness functions emerge for test data evaluation. If, for example, the temporal behaviour of an application is being tested, the ®tness evaluation of the individuals is based on the execution times measured for the test data (e.g. Refs. [3,7,16,20,33]). For safety tests, the ®tness values are derived from pre- and post-conditions of modules (e.g. Ref. [30]), and for robustness tests of fault-tolerance mechanisms, the number of controlled errors can form the starting point for the ®tness evaluation [25]. Further applications of evolutionary testing for functional test case design are described in Refs. [4,13,30]. 2.3. Previous work on structural testing application Previous work on the automation of structural testing by means of metaheuristic search techniques can be divided into two categories, depending on the construction of the ®tness functions: In the work of Watkins [32], Roper [24], Weichselbaum [35] and Pargas et al. [18] the ®tness of an individual is determined on the basis of the coverage measured for the associated test data, i.e. test data sets that cover more program branches than others are assigned higher ®tness values. Whereas, Watkins, Roper and Weichselbaum measure the coverage acquired by a test datum on the
843
basis of the control-¯ow graph, Pargas et al. use the control-dependence graph of the test object for this purpose. Roper, Weichselbaum and Pargas et al. concentrate on statement and branch testing. Weichselbaum addresses, in addition, condition testing. One criticism of the work mentioned is their exclusive use of coverage as ®tness criteria. Therefore, the search will mainly be directed at the execution of a few long, easily accessible program paths, thereby making it more dif®cult to attain complete coverage. Watkins' work concentrates on the path test. The ®tness function changes dynamically, depending on the test progress. Individuals that pass through unexecuted, or rarely executed program paths, obtain a higher ®tness value than the individuals whose test data pass via frequently executed program paths, independent of the coverage reached. The emphasis of the works of Xanthakis et al. [36], Sthamer [26], Jones et al. [12], McGraw et al. [15] as well as Tracey et al. [29] lies in the automation of statement and branch testing. In addition, McGraw et al. are also working upon the condition test. In order to direct the search more strongly toward program structures not covered in previous test runs, the ®tness function aligns itself with the required branch predicates. The aim is to generate test data that lead to an execution of previously unattained program branches and conditions. The test is divided into partial aims. Each partial aim represents a program structure that requires execution to achieve full coverage, e.g. a statement, a branch or a condition with its logical values. For each partial aim, an individual ®tness function is formulated and a separate optimisation is undertaken to search for a test datum executing the partial aim. The objective is to compute a distance for each individual that indicates how far it is away from executing the program conditions in the desired way. The pool of test data determined through the optimisation of each partial aim then serves as the test data set for the coverage of the structural test criterion. The ®tness function is derived from the predicates of the branching conditions of the test object. For example, if a branching condition x y needs to be evaluated as True, then the ®tness function may be de®ned as ux 2 yu [12,26,29], or as the hamming distance [26]. The ®tness function is minimised during optimisation. If an individual obtains the ®tness value 0, a test datum is found which ful®ls the branching condition: x and y have the same value. The partial aim is attained and the evolutionary test can proceed to the next partial aim. For a branch predicate of x . y the ®tness function is formed by uy 2 xu and is similarly minimised. If the ®tness function assumes a value , 0 then a test datum is found which ful®ls the corresponding partial aim. For a multiple condition of the type a _ b the resulting ®tness value of an individual is obtained from the minimum value of each single determined ®tness value of the predicates a and b. In the case of a ^ b the ®tness value of an individual is the result of the sum of the determined ®tness values for each single predicate (compare Ref. [29]). In comparison to orientation of the ®tness at the obtained
844
J. Wegener et al. / Information and Software Technology 43 (2001) 841±854
coverage, the analysis of the branching conditions for ®tness evaluation enables substantially more purposeful search for program structures that have, so far, not yet been accessed. Therefore, it is assumed that this approach will yield greater coverage, and better results, for more complex test objects than coverage oriented approaches. Coverage oriented approaches tend to concentrate on searching easily accessible and, as far as possible, long program paths. With branching condition-oriented methods, test control complexity increases substantially as the test has to be divided into partial aims that are individually subjected to an optimisation. Partial aims must be identi®ed and thoroughly managed for this purpose. In addition, test data resulting from partial aims should be stored, and combined at the end of the test run, into a single test data set. Separation of tests into a number of partial aims has also been reported in individual work on coverage oriented methods, e.g. Ref. [18]. 3. New approach to evolutionary structural testing Structural testing is widespread in industrial practice and stipulated in many software-development standards, e.g. Refs. [6,11,23,31]. Usual test aims are the execution of all statements (statement coverage), all branches (branch coverage) or all conditions with the logical values True and False (condition coverage). The aim of applying evolutionary testing to structural testing is the generation of a quantity of test data, leading to the best possible coverage of the respective structural test criterion. A test environment has been developed to support all common control-¯ow and data-¯ow oriented test methods, whereas all previous work concentrates on selected structural test criteria (statement-, branch-, condition and path test). For this purpose, the structural test criteria are divided into four categories, depending on control-¯ow graph and required test purpose: ² ² ² ²
node-oriented methods, path-oriented methods, node-path-oriented methods, and node-node-oriented methods.
Separation of the test into partial aims and de®nition of ®tness functions for partial aims are performed in the same manner for each category. In order to achieve a preferably large coverage of the selected structural test criterion, each partial aim needs to be executed, for example pass through all test object statements to achieve a high degree of statement coverage. For the evolutionary test therefore, the test has to be divided into partial aims that result from each of the speci®ed structural test criterion. Identi®cation of partial aims is based on the control-¯ow graph of the program under test. As previously mentioned, the de®nition of a ®tness
function, that represents the test aim accurately and supports the guidance of the search, is conditional to the successful application of evolutionary tests. For the de®nition of the ®tness function, we build upon previous work for the consideration of the branching conditions (among others [12,26,29]). We extend these by introducing the idea of an approximation level, which indicates how many branching condition nodes still require execution in the desired way to achieve the required partial aim. This extension makes it possible to treat different paths through the program to the desired partial aim equally for the ®tness evaluation. Unlike previous work, it is unnecessary to select a speci®c path to a distinct node through the control-¯ow graph. For this approach, only the execution of this speci®c node is of relevance. 3.1. Node-oriented methods The node-oriented methods require the attainment of speci®c nodes in the control-¯ow graph. The statement test as well as the different variants of the condition test may be classi®ed in this category. With condition testing [3,17], a special case applies for the ful®lment of the respective test criterion. In addition to the branch nodes, the necessary logical value allocations for the atomic predicates in the conditions must also be attained. For node-oriented methods, partial aims result from the nodes of the control-¯ow graph. The objective of the evolutionary test is to ®nd a test data set, which executes every desired node of the control-¯ow graph. For the statement test, all nodes need to be considered, and for the different variations of the condition test, only the branching nodes are relevant. Additionally, for condition testing, the predicates of the branching conditions must be evaluated. For the simple condition test, for example, the evaluation of each of the atomic predicates must be inventoried to represent True and False partial aims, and for the multiple condition test, all combinations of logical values for the atomic predicates form independent partial aims. 3.1.1. Fitness function For the node-oriented methods the ful®lment of a partial aim is independent of the path executed in the control-¯ow graph. This has been taken into account by our ®tness function. The ®tness functions of the partial aims consist of two components. In addition to the calculation of the distance in the branching nodes (see Section 2.3), which speci®es how far away an individual is from ful®lling the respective branching condition (compare Refs. [12,26,29]), an approximation level is introduced as additional element for the ®tness evaluation of individuals. The approximation level, described in the next paragraph, allows the comparison of individuals that miss the partial aim in different branching nodes. The higher the attained level of approximation, the better the ®tness of the individual. This extends our idea in Ref. [26] where a small ®xed value was added to
J. Wegener et al. / Information and Software Technology 43 (2001) 841±854
Fig. 2. Monitoring execution of three different individuals (left); approximation levels for the nodes of the corresponding control-¯ow graph (right).
the calculated distance for every executed node belonging to a path attaining the target node. Therefore, individuals closer to the target node receive a higher ®tness value as compared to those that branch away earlier. The approximation level supplies a ®gure for an individual that gives the number of branching nodes lying between nodes covered by the individual and the target node. For this computation, only those branching nodes are taken into account that contain an outgoing edge that misses the target node. This is shown in Fig. 2: Individual 1 attains a lower approximation level than individuals 2 and 3, as it already branches away from the target node in node 1. The second individual obtains a higher approximation level since node 1 is executed as desired. It leads to a miss of the target node in node 7. The individuals 2 and 3 attain the same approximation level, although they cover different paths through the control-¯ow graph. For both individuals there is only one branching condition left that needs to be ful®lled in the desired way to cover the target node as required. In order to determine the approximation level, the nodes of the control-¯ow graph are grouped together. Each node group has the following characteristics: ² It has exactly one entry node. ² It only contains nodes that are part of at least one path from the entry node to the target node. ² It only has exits that miss the partial aim or lead to the entry node of the following node group. ² It is minimal, i.e. further division of the node group is not possible without violating the other characteristics.
845
On the basis of the formed node groups it is determined how many approximation levels lie between the ®rst node of the control-¯ow graph and the target node. For every node group, different paths within a group are analysed to ascertain how many branching nodes could lead to the target node being missed. The largest sums are added to obtain the overall number of approximation levels for the target node. All nodes of the control-¯ow graph are then assigned to the respective approximation level. The numbering begins with the ®rst node of the control-¯ow graph and ends with the maximum approximation level for the node closest to the target node. When numbering nodes within a node group, different paths are treated as equal (see Fig. 3). In case a loop is contained in a node group, all nodes which could lead to a miss of the target node obtain the same approximation level, since the order in which the nodes will be executed cannot always be determined in advance. Without composing node groups, all possible program paths would require analyses to determine the maximum approximation level. Due to the large number of program paths for complex test objects, this could not be performed ef®ciently. The individual with the greatest level of approximation and the least distance to ful®lling the next branching condition obtains the highest ®tness value: ² ®tness (partial aim, individual) approximation_level (partial aim, individual) 1 (1-distance_branching_ condition (partial aim, individual)) where distance_branching_condition (partial aim, individual) is normalised. For the bit operators & and u the distance is determined through the hamming distance [26]. For condition tests the ®tness evaluation needs to be slightly extended. The evaluation of the atomic predicates in the target node has to be included. The evaluation of the atomic predicates takes place in the same way as for the distance calculations in the branching conditions. For compound predicates the single distances are added and normalised. 3.2. Path-oriented methods Path-oriented methods require execution of certain paths
Fig. 3. Assignment of approximation levels to nodes.
846
J. Wegener et al. / Information and Software Technology 43 (2001) 841±854
Fig. 4. Execution of two individuals for a path-oriented test goal.
in the control-¯ow graph. All variations of path tests from the reduced path test [10] to complete path coverage [9] belong to this category. Therefore, all paths through the control-¯ow graph, which are necessary to ful®l the chosen structural test criterion, are determined and identi®ed as partial aims. 3.2.1. Fitness function Establishing ®tness functions for path-oriented testing methods is much simpler than for node-oriented methods because the execution of a certain path through the control¯ow graph forms the partial aim for the evolutionary test. Correspondingly to the node-oriented methods, the ®tness function for path-oriented methods consists of the two components approximation level and distance calculation. Basically, there are two possibilities to evaluate the approximation level of an individual for the speci®ed program path attainment. The covered program path is compared with the program path speci®ed as partial aim either in such a way that: ² the length of the identical initial path section re¯ects the approximation level (compare Ref. [14]), or ² all identical path sections are considered for the approximation level. In the second case, the length of all identical path sections provides the basis for the approximation level. The advantage of this evaluation method is that an individual that diverges from the target path at the beginning, but covers the desired path towards the end, will obtain a similarly high ®tness value as an individual which covers the speci®ed target path at the beginning, but diverges from it towards the end. The combination of two such individuals (recombination) can lead to a considerably better individual. In Fig.
4, for instance, the execution of the ®rst individual (covering the six nodes 1, 3±5, 7, and 8 on the target path) will only obtain a high approximation level if all identical path sections are considered for the ®tness evaluation. Otherwise, the second individual (covering ®ve nodes 1±4, and 7) will obtain a higher approximation level than the ®rst one. In both cases, ®tness evaluation is supplemented by calculation of the distances to the target path in the branching nodes, where the program path covered by the individual deviates from the target path. For the ®rst evaluation method, this is the distance in the branching node by which the target path and the path covered by the individual diverge for the ®rst time. For the second evaluation method, all branching nodes where the target path and the individual's path diverge are taken into consideration. The distances are accumulated and normalised. 3.3. Node-path-oriented methods The node-path-oriented methods require the achievement of a speci®c node and from this node the execution of a speci®c path through the control-¯ow graph. The branch test, the different forms of segment coverage [22], as well as LCSAJ [8] (linear code sequence and jump) belong to this class of methods. The identi®cation of partial aims for node±path-oriented methods is treated similarly to node-oriented methods. The simplest case is the branch test, where the partial aims are given by the nodes of the control-¯ow graph with their outgoing edges. For LCSAJ the partial aims result from the initial nodes of a linear code sequence and the path from this node to a jump statement. 3.3.1. Fitness function Partial aims for node±path-oriented structural criteria comprise two requirements that need to be included in the
Fig. 5. Execution of two individuals for a node±path-oriented test goal (the individual on the right obtains a higher ®tness value than the one on the left).
J. Wegener et al. / Information and Software Technology 43 (2001) 841±854
847
Fig. 6. Execution of two individuals for a node±node-oriented test goal.
evaluation of the generated individuals. The attainment of a speci®c node is required on the one hand, and on the other hand a path that begins with this node has to be covered. Accordingly, the ®tness evaluation of the individuals needs to represent both these components. For this the ®tness function can be based on the ®tness functions for nodeoriented and path-oriented methods. Fitness calculations for individuals, which do not reach the target node are carried out as for node-oriented methods. For individuals reaching the target node, the afore ®tness calculations for path-oriented methods are additionally applied, to guide the search into the direction of the desired path. In Fig. 5 the right-hand individual obtains a higher ®tness value than the left-hand one since it covers the ®rst target node and a section of the target path. 3.4. Node-node-oriented methods Finally, the node-node-oriented methods aim to execute program paths that cover certain node combinations of the control-¯ow graph in a pre-determined sequence without specifying a concrete path between nodes. Most of the data-¯ow oriented methods like all-defs and all-uses ®t into this category [21].
Paths through the control-¯ow graph connecting the nodes under consideration are identi®ed as partial aims. However, those paths assigning a new value to the variable, using the variable, or resulting in an invalid variable state have to be excluded for the data-¯ow test criteria. The alldefs criterion, for example, identi®es all variable de®nitions and uses. Together with the control-¯ow information, node combinations are formed which contain a variable de®nition and a variable use for each variable. The identi®cation of the partial aims for the other data-¯ow oriented criteria takes place analogously. 3.4.1. Fitness function Fitness calculations for node-node-oriented methods also take place in two stages. Partial aims for the evolutionary test are made up of node combinations. After execution of the ®rst target node, a second target node has to be covered, without however, specifying a path through the control-¯ow graph as for node-path-oriented methods. The approximation of an individual to the ®rst target node can be evaluated in the same manner as for node-oriented methods. For all individuals executing the ®rst target node, an approximation to the second node is added which is also calculated using the ®tness function for node-oriented methods. In Fig. 6 the
Fig. 7. Components of the evolutionary test environment.
848
J. Wegener et al. / Information and Software Technology 43 (2001) 841±854
Fig. 8. Graphical user interface for test object interface speci®cation, showing the interface of the C-function is_line_covered_by_rectangle( ).
second individual obtains a higher ®tness value than the ®rst one since it covers the ®rst target node. A detailed de®nition of the target functions, and a comprehensive examination of the calculations of approximation levels can be found in Ref. [2]. The distance calculations for branching conditions are described in detail in Refs. [26,29]. 4. Test automation In order to automate test case design for different structural testing methods with evolutionary tests we have developed a tool environment which consists of six components: ² parser for analysis of test objects, ² graphical user interface for speci®cation of the input domain of the test objects, ² instrumenter which captures program structures executed by the generated test data, ² test driver generator which generates a test bed running the test object with the generated test data, ² test control which includes identi®cation and administration of the partial aims for the test and which guarantees an ef®cient test through the de®nition of a processing order and storage of initial values for the partial aims, and ² a toolbox of evolutionary algorithms to generate the test data. Fig. 7 shows the information exchange between the tools. 4.1. Parser The parser identi®es the functions in the source ®les which form the possible test objects. It determines all necessary structural information on the test objects. Control-¯ow and data-¯ow analyses are carried out for every test object.
These analyses determine the interface, the control-¯ow graph, the contained branching conditions with their atomic predicates, as well as semantic information on the used data structures, e.g. the organisation of user-de®ned data types. 4.2. Graphical user interface for interface speci®cation To ensure ef®cient test data generation and to avoid the generation of inadmissible test data from the beginning, the tester may have to de®ne the test object interface determined by the parser more precisely. For this the developed tool environment provides a graphical user interface that displays the test objects and their interfaces as they have been determined by the parser. The tester can limit the value ranges for the input parameters (as shown in the left window of Fig. 8 for the parameter rectangle.width) and enter logical dependencies between different input parameters (as shown in the right window of Fig. 8 for rectangle.width and rectangle.height). These will then be considered during test data generation. Furthermore, it is also possible to enter initial values for single or for all input parameters. As a result, test data of a previous test run or data of an already existing functional test, as well as speci®c value combinations for single input parameters, can be used as starting point for test data generation (seeding). 4.3. Instrumenter The third component in the tool environment is the instrumenter that enables test run monitoring. In order to eliminate in¯uences on the program behaviour the instrumentation has to take place in the branching conditions of the program. The insertion of instrumentation statements into the program branches suggested in previous work, e.g. Ref. [26], is not possible if side effects like if(*a11&&11b&&func_c(a)) occur within the branching conditions, since this could lead to a multiple execution of
J. Wegener et al. / Information and Software Technology 43 (2001) 841±854
the atomic predicates and their side effects. The instrumentation of the branching conditions is always the same, independent of the selected structural test criterion. The atomic predicates in the branching nodes of the test object are instrumented to measure the distances individuals are away from ful®lling the branching conditions (see Section 3). The instrumentation also provides information on the statements and program branches executed by an individual. 4.4. Test driver generator The test driver generator generates a test bed that calls the test object with the generated individuals and returns the monitoring results provided by the execution of the instrumented test object to the test control. When the test object is called by the test driver, the individuals are mapped onto the interface of the test object. It is important that user speci®cations for the test object interface are taken into account. Individuals that do not represent a valid input are extracted and assigned a low ®tness value. 4.5. Test control The most complex component of the evolutionary test environment is the test control because it is responsible for several dif®cult tasks like the management of the partial aims with their processing status, the collection of suitable initial values for the optimisation of partial aims, and the recording of test data ful®lling partial aims. The test control identi®es partial aims for the selected structural test criterion. Partial aims are determined on the basis of the control-¯ow graph provided by the parser. The test control manages the determined partial aims and regulates the test progress. One after the other the different partial aims are selected in order to search for test data with the evolutionary test. Independent optimisations are performed for every partial aim. Although, only one partial aim is considered for the optimisation at a time, all generated individuals are evaluated with regard to all unachieved partial aims. Thus partial aims reached by chance are identi®ed, and individuals with good ®tness values for one or more partial aims are noted and stored. Subsequent testing of these partial aims then uses the stored individuals as initial values (also compare Ref. [15]). This method is called seeding. It enhances the ef®ciency of the test because the optimisation does not start with an entirely random generated set of individuals. To calculate the ®tness values for the individuals, the test control determines the program paths executed by every individual, on the basis of the monitoring data and the test object's control-¯ow graph. Fitness values are then evaluated by applying the ®tness functions described in Section 3 that take the distance measurement for the branching conditions as well as the approximation level into account. The processing sequence of the partial aims that have not yet been attained is guided by the test control depending on the availability of suitable initial values. The partial aim for
849
which the individuals with the best ®tness values are available is selected as the next one for the test. This ensures that the test quickly achieves a high coverage, because partial aims dif®cult to execute or infeasible, do not slow down the overall testing process. When no initial values are available, or several equally good initial values for different partial aims exist, then a breadth-®rst search is carried out. If the search fails to ®nd a test datum for a partial aim it will be marked as already processed and not ful®lled. It is possible to reset this status if an individual is accidentally found during the test of another partial aim that has attained a higher ®tness value than those individuals already found for this partial aim. The partial aim will then be targeted again for a new test with better initial values. Once all partial aims have been processed the test is ®nished. The test data of the separate partial aims are then compiled and displayed with the obtained coverage. On this basis, the tester is able to check whether program structures that were not covered are infeasible, or whether the evolutionary test was not able to generate suitable test data. In addition, the test control offers a simple application programming interface (API) to export test data found, and actual values for the output parameters of the test object, in order to support automatic test evaluation on the basis of test oracles. Moreover, it provides test and monitoring information for a visualisation of the test progress. 4.6. Genetic and evolutionary algorithms toolbox GEATbx As toolbox we have applied the Genetic and Evolutionary Algorithm Toolbox for Use with Matlab (GEATbx) by Hartmut Pohlheim [19]. This is a very powerful tool that supports binary coding of individuals as well as integer or real number representations of individuals. It is possible to implement genetic algorithms as well as evolutionary strategies. Almost any hybrid form of evolutionary algorithms is possible. The toolbox offers a large number of different operators for the components of evolutionary algorithms described in Section 2.1. The toolbox also enables the application of sub-population, migration and competition between sub-population, and possesses extensive visualisation functions for displaying optimisation progress. It is possible to specify admissible value domains for the parameters of an individual. The toolbox automatically ensures that these value domains are observed during the generation of individuals. The test driver only needs to check for dependencies between the single variables of an individual. 5. First experiments Our tool environment has already been applied in initial experiments with typical real-world examples for automatic generation of test data for statement and branch tests which are usually performed on the unit level. Excellent results were obtained. For all test objects a complete coverage was achieved by the evolutionary test.
850
J. Wegener et al. / Information and Software Technology 43 (2001) 841±854
Table 1 Complexity measures for the different test objects Test object
Lines of code
Number of branches/loops
Cyclomatic complexity
Maximum nesting level
Myer's interval
Atof( ) Is_line_covered_by_rectangle( ) Is_point_located_in_rectangle( ) Search_®eld( ) Net¯ow( ) Complex_¯ow( ) Classify_triangle( )
64 71 5 108 154 42 35
57/4 24/1 5/0 37/4 153/10 41/2 38/0
16 8 2 13 34 13 14
2 4 1 2 7 2 2
27 6 3 10 40 10 7
5.1. Test objects Table 1 shows a selection of examined test objects with their characteristics. The number of branches corresponds to the number of partial aims for the test. The number of loops and the cyclomatic complexity give additional information on the test object's control ¯ow. Furthermore, the maximum nesting level and Myer's interval are mentioned for each test object. The maximum nesting level could indicate the dif®culty in reaching a partial aim with respect to the control¯ow, whereas Myer's interval shows the complexity of certain branching conditions. Atof( ) is a typical C library function. It converts strings to the corresponding ¯oating point value. Atof( ) contains several evaluations which check the input string for its validity. In the experiments the maximum string length was set to 10 characters in ASCII coding. Accordingly, the size of the search space is 255 10. Is_line_covered_by_rectangle( ) is a computer graphics function which checks whether a given line is not covered, partially covered, or entirely covered by a given rectangle. It
has eight input parameters that de®ne the co-ordinates for the rectangle and the line. For the four x-co-ordinates, values between 0 and 1599 were allowed, and for the four y-co-ordinates the value ranges were set to 0±1279. The function Is_point_located_in_rectangle( ) is the simplest test object of the investigation. It is also from computer graphics. It checks if a point is located inside a rectangle. It has six input parameters for the co-ordinates of point and rectangle. For the co-ordinates, the same value ranges as for the previous function were chosen. Search_®eld( ) is a function from a motor-control system which performs a local search on two large ®elds describing motor characteristics. It has ten input parameters representing the starting points for the search. Net¯ow( ) is the most complex function. It optimises a net architecture for maximum data ¯ow. It has ten loops with a maximum of four nested loops. The control-¯ow graph for Net¯ow( ) is presented in Fig. 11. The input is the description of the net to be optimised. The size of nets was limited to seven nodes and 11 edges resulting in 73 input parameters. Complex_Flow( ) is an arti®cial test object that contains
Fig. 9. Mean coverage achieved by evolutionary and random testing in the experiments.
J. Wegener et al. / Information and Software Technology 43 (2001) 841±854
851
Fig. 10. Mean number of test data generated for the evolutionary and the random test in the experiments.
several dif®cult branching conditions. To achieve full coverage, combinations of different branch predicates need to be executed. The Classify_Triangle( ) function is an implementation of the classic triangle classi®er example used in a large number of testing papers. It is used in two different data type versions. The input domain is given either
by three ¯oating point values or by three integer values. 5.2. Test results In the experiments performed, evolutionary testing was compared to random testing for all test objects. Evolutionary
Fig. 11. Control-¯ow graph of the net¯ow( ) function (underlying arrows indicate the loops).
852
J. Wegener et al. / Information and Software Technology 43 (2001) 841±854
testing was carried out using an evolutionary algorithm with four or ®ve sub-population containing 40±60 individuals, depending on the complexity of the test object. Discrete recombination was applied and the mutation range varied for different sub-population from small to mid-size. For random testing, much more test data had to be generated to achieve a high degree of coverage (compare Figs. 9 and 10). Each experiment was repeated 10 times. The evolutionary test achieved full branch coverage in all experiments. One branch in the Net¯ow( ) function is infeasible (branch 42 ! 44 in Fig. 11). Therefore, the highest possible coverage is 99%. For most test objects, random testing was unable to achieve full branch coverage reliably. The highest coverage was achieved for the two computer graphics examples: in all experiments random testing also reached full branch coverage (therefore, they are not listed in the ®gures). For the Complex_Flow( ) example random testing achieved full branch coverage in 50% of the experiments, and for the Search_®eld( ) example in 20%. For the other test objects, random testing reached full branch coverage in none of the experiments. The mean coverage achieved by evolutionary testing and random testing in all the experiments is shown in Fig. 9. Fig. 10 shows the mean number of test data generated for the evolutionary test and the random test to reach the achieved coverage. Although, for random testing between 5 and 63 times more test data were generated, the coverage reached is not as good as for evolutionary testing. The time for the evolutionary tests varied for the different test objects between 160 and 254 s. This results in a mean processing time of 4±6 s per partial aim. Additional experiments without storing initial values for the partial aims exhibited a clear deterioration in overall test performance when compared to storing initial values. By storing initial values during the test run and later seeding them, the number of individuals generated to cover the most complex partial aims could be reduced by more than 75%. 6. Conclusion, future work The aim of this work is the automatic generation of test data for structural tests. For this a tool environment has been developed that applies evolutionary testing to C programs. Test data are generated by means of evolutionary algorithms. From the view of the evolutionary test, structural testing methods can be divided into four categories for which the partitioning of the test into partial aims and the ®tness evaluation take place in a similar manner: node-oriented methods, path-oriented methods, node-path-oriented methods, and node-node-oriented methods. Our de®nition of ®tness functions is based on existing work that evaluates individuals according to the branching conditions of the test object. These approaches render
a more goal-oriented search for uncovered program structures than methods that undertake the ®tness evaluation by means of measured coverage. By extending the ®tness function with an approximation level it is possible to treat different paths to a partial aim equally during the ®tness evaluation of generated individuals. The approximation level indicates the distance between nodes in the control-¯ow graph, executed by an individual, and the required partial aim. The introduction of the approximation level reduces the risk of selecting a certain path through the control-¯ow graph as partial aim that is dif®cult to execute or even infeasible. Other approaches randomly select a path to the partial aim whose branch conditions are then optimised one after the other, i.e. if a program path is very dif®cult to execute, the evolutionary test will slow down considerably, or the expected partial aim will not be reached at all. Our approach ranks all possible paths to the partial aim as equal. The fewer branching nodes there are between an undesired branching and the desired partial aim, the higher the approximation level of the individual and thus its ®tness value. This behaviour of the ®tness function enables the evolutionary test to concentrate on the generation of individuals for the most promising path to the partial aim. In principle, it cannot be ruled out that a path with several simple branching conditions is more likely to lead to a coverage of the expected partial aim, than a path with few but very complex branching conditions. Simple branching conditions, however, should be covered quickly by the generated individuals. This would lead to a rapid improvement of the approximation level. The search would then concentrate on this path even though it contains more branching nodes. In order to guarantee an ef®cient overall test, the test control of the evolutionary test environment evaluates every individual with regard to every partial aim that has not been reached. Partial aims reached purely by chance are thus identi®ed immediately. Individuals suitable for one or more partial aims are noted, stored, and used as seeds at the optimisation of these partial aims. The processing sequence of the partial aims is guided by the quality of the available initial values. In this way the test quickly achieves the highest possible coverage. Initial experiments utilising this strategy have proved successful, and the overall testing procedure has been considerably accelerated. Future work aims at further improvements of the evolutionary structural test. For example, the occurrence of ¯ags in the branching conditions of a test object makes a distance calculation which would support a guidance of the search impossible. In cases where the value of a ¯ag is calculated before the branching condition, the application of program transformations that replace a ¯ag by its semantic meaning seems possible. A further problem for the test is the short circuit evaluation in C which breaks off the evaluation of atomic predicates if logical dependencies are linked by && or uu, as soon as the value of the entire condition is de®nite.
J. Wegener et al. / Information and Software Technology 43 (2001) 841±854
This leads to an arti®cial narrowing of the search domain since the linked predicates can only be optimised one after the other. For the condition A && B && C, A and B need to be evaluated as True ®rst, before the evaluation of the generated individual with regard to the predicate C is possible. The search for individuals that ful®l A&&B leads to an arti®cial focusing on the individuals, without taking the predicate C into account. The search for individuals that ful®l all three predicates becomes more complicated. One possibility for solving this problem is the parallel evaluation of single predicates. For this, however, one has to ensure that single predicates do not contain any side effects. By introducing data-¯ow analyses that determine which input parameters every partial aim relies on, testing ef®ciency can be further improved. Ideally, it is possible to drastically reduce the size of the search domain for the selected partial aim, which in turn accelerates the search considerably, if only a few input parameters are relevant for the attainment of the partial aim. Further research is necessary with regard to an improved consideration of loops in the ®tness functions, and the con®guration of evolutionary algorithms in dependency of the structural characteristics of the test objects. At present, statement tests, branch tests, condition tests, and segment tests can be applied. Work on multiplecondition testing is drawing to a close. An automation of path tests and data-¯ow oriented testing methods will be ®nished by 2001. The test environment will also be extended for structural testing of object-oriented Java programs. Furthermore, a visualisation component for observing the testing progress will be included and the distribution of tests to several computers will be supported in the future. References [1] Attol Unit Test:http://www.rational.com/products/testrt/index.jsp, rational Software. [2] A. Baresel, Automatisierung von Strukturtests mit evolutionaÈren Algorithmen (Automation of structural testing using evolutionary algorithms), Diploma Thesis, Humboldt University, Berlin, Germany, 2000. [3] B. Beizer, Software Testing Techniques, Van Nostrand Reinhold, New York, 1983. [4] E. Boden, G. Martino, Testing software using order-based genetic algorithms, Proceedings of the First Conference on Genetic ProgrammingStanford University, USA, 1996, pp. 461±466. [5] Cantata:http://www.iplbath.com/p4.htm, IPL. [6] Capability Maturity Model for Software, Software Engineering Institute, Carnegie Mellon University. [7] H.-G. Gross, B. Jones, D. Eyres, Evolutionary algorithms for the veri®cation of execution time bounds for real-time software, IEE Informatics Colloquium on Applicable Modelling, Veri®cation, and Analysis Techniques for Real-Time Systems, London, Great Britain (1999) 8/1±8/8. [8] M. Hennel, D. Hedley, M. Woodward, Experience with an Algol 68 numerical algorithms testbed. Proceedings of the Symposium on Computer Software Engineering, Polytechnic Press, New York, 1976, pp. 457±463.
853
[9] W. Howden, Reliability of the path analysis testing strategy, IEEE Transactions on Software Engineering 2 (3) (1976) 208±215. [10] W. Howden, An evaluation of the effectiveness of symbolic testing, Software-Practice and Experience 8 (1978) 381±397. [11] IEC 65A Software for Computers in the Application of Industrial Safety-Related Systems (Sec. 122). [12] B. Jones, H. Sthamer, D. Eyres, Automatic structural testing using genetic algorithms, Software Engineering Journal 11 (5) (1996) 299± 306. [13] B. Jones, H. Sthamer, X. Yang, D. Eyres, The automatic generation of software test data sets using adaptive search techniques, Proceedings of the Third International Conference on Software Quality Management (SQM '95), Sevilla, Spain (1995) 435±444. [14] B. Korel, Automated software test data generation, IEEE Transactions on Software Engineering 16 (8) (1990) 870±879. [15] G. McGraw, C. Michael, M. Schatz, Generating software test data by evolution, Technical Report RSTR-018-97-01, RST Corporation, Sterling, Virginia, USA, 1998. [16] F. Mueller, J. Wegener, Comparison of static analysis and evolutionary testing for the veri®cation of timing constraints, Proceedings of the Fourth IEEE Real-Time Technology and Applications Symposium, Denver, USA (1998). [17] G. Myers, The Art of Software-Testing, Wiley, New York, 1979. [18] R. Pargas, M. Harrold, R. Peck, Test-data generation using genetic algorithms, Software Testing, Veri®cation and Reliability 9 (4) (1999) 263±282. [19] H. Pohlheim, Genetic and Evolutionary Algorithm Toolbox for Use with Matlab Ð Documentation, 1999, http://www.geatbx.com/. [20] P. Puschner, R. Nossal, Testing the results of static worst-case execution-time analysis, Proceedings of the 19th IEEE Real-Time Systems Symposium (RTSS '98), Madrid, Spain (1998) 134±143. [21] S. Rapps, E. Weyuker, Data ¯ow analysis techniques for test data selection, Proceedings of the Sixth International Conference on Software Engineering, Tokyo, Japan (1982) 272±277. [22] E. Riedemann, Testmethoden fuÈr sequentielle und nebenlaÈu®ge Software-Systeme (Test Methods for Sequential and Parallel Software Systems), B.G. Teubner, 1997. [23] RTCA/DO-178B Software Considerations in Airborne Systems and Equipment Certi®cation. [24] M. Roper, Computer-Aided Software Testing using Genetic Algorithms, Proceedings of the 10th International Software Quality Week (QW '97), San Francisco, USA (1997). [25] A. Schultz, J. Grefenstette, K. Jong, Test and evaluation by genetic algorithms, IEEE Expert 8 (5) (1993) 9±14. [26] H.-H. Sthamer, The automatic generation of software test data using genetic algorithms, PhD Thesis, University of Glamorgan, Pontyprid, Wales, Great Britain, 1996. [27] TCAT: http://www.soft.com/TestWorks, Software Research. [28] TESSY: http://www.ats-software.de/html/prod_tessy.htm, Razorcat Development. [29] N. Tracey, J. Clark, K. Mander, J. McDermid, An automated framework for structural test-data generation, Proceedings of the 13th IEEE Conference on Automated Software Engineering, Hawaii, USA (1998). [30] N. Tracey, J. Clark, K. Mander, Automated program ¯aw ®nding using simulated annealing, Proceedings of the ACM/SIGSOFT International Symposium on Software Testing and Analysis (ISSTA '98), Clearwater Beach, Florida, USA (1998) 73±81. [31] Vorgehensmodell zur Planung und DurchfuÈhrung von Informationstechnik-Vorhaben des Bundesministeriums des Innern (Process Model for Planning and Realisation of Information Technology Projects of the German Secretary of Interior). [32] A. Watkins, A tool for the automatic generation of test data using genetic algorithms, Proceedings of the Software Quality Conference '95, Dundee, Great Britain (1995) 300±309. [33] J. Wegener, M. Grochtmann, Verifying timing constraints of
854
J. Wegener et al. / Information and Software Technology 43 (2001) 841±854
real-time systems by means of evolutionary testing, Real-Time Systems 15 (3) (1998) 275±298. [34] J. Wegener, H. Sthamer, B. Jones, D. Eyres, Testing real-time systems using genetic algorithms, Software Quality Journal 6 (2) (1997) 127± 135. [35] R. Weichselbaum, Software test automation by means of genetic
algorithms, Proceedings of the Sixth International Conference on Software Testing, Analysis and Review (EuroSTAR '98), Munich, Germany (1998). [36] S. Xanthakis, C. Ellis, C. Skourlas, A. LeGall, S. Katsikas, Application of genetic algorithms to software testing, Proceedings of the Fifth International Conference on Software Engineering, Toulouse, France (1992).