WCA: A weighting local search for constrained combinatorial test optimization

WCA: A weighting local search for constrained combinatorial test optimization

WCA: A Weighting Local Search for Constrained Combinatorial Test Optimization Journal Pre-proof WCA: A Weighting Local Search for Constrained Combin...

687KB Sizes 0 Downloads 58 Views

WCA: A Weighting Local Search for Constrained Combinatorial Test Optimization

Journal Pre-proof

WCA: A Weighting Local Search for Constrained Combinatorial Test Optimization Yingjie Fu, Zhendong Lei, Shaowei Cai, Jinkun Lin, Haoran Wang PII: DOI: Reference:

S0950-5849(20)30038-0 https://doi.org/10.1016/j.infsof.2020.106288 INFSOF 106288

To appear in:

Information and Software Technology

Received date: Revised date: Accepted date:

12 September 2019 16 February 2020 17 February 2020

Please cite this article as: Yingjie Fu, Zhendong Lei, Shaowei Cai, Jinkun Lin, Haoran Wang, WCA: A Weighting Local Search for Constrained Combinatorial Test Optimization, Information and Software Technology (2020), doi: https://doi.org/10.1016/j.infsof.2020.106288

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. Β© 2020 Published by Elsevier B.V.

WCA: A Weighting Local Search for Constrained Combinatorial Test Optimization Yingjie Fua,b , Zhendong Leia,b , Shaowei Caia,b,βˆ— , Jinkun Lina and Haoran Wanga,b a State

Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, China of Computer and Control Engineering, University of Chinese Academy of Sciences, China

b School

ARTICLE INFO

ABSTRACT

Keywords: Combinatorial interaction testing Covering array generation Local search Weighting mechanism Search-based software testing

Context: Covering array generation (CAG) is the core task of Combinatorial interaction testing (CIT), which is widely used to discover interaction faults in real-world systems. Considering the universality, constrained covering array generation (CCAG) is more in line with the characteristics of applications, and has attracted a lot of researches in the past few years. Objective: In CIT, a covering array (CA) with smaller size means lower cost of testing, particularly for the systems where the execution of a test suite is time consuming. As constraints between parameters are ubiquitous in real systems, this work is dedicated to more efficient algorithms for CCAG. Specifically, we aim to develop a heuristic search algorithm for CCAG, which allows generating CAs with smaller size in a limited time when compared with existing algorithms. Method: We propose a weighting local search algorithm named WCA, which makes use of weights associated with the tuples and dynamically adjusts them during the search, helping the algorithm to avoid search stagnation. As far as we know, this is the first weighting local search for solving CCAG. Results: We apply WCA to a wide range of benchmarks, including real-world ones and synthetic ones. The results show that WCA achieves a significant improvement over three state-of-the-art competitors in 2-way and 3-way CCAG, in terms of both effectiveness and efficiency. The importance of weighting is also reflected by the experimental comparison between WCA and its alternative algorithm without the weighing mechanism. Conclusion: WCA is an effective heuristic algorithm for CCAG to obtain smaller CAs efficiently, and the weighting mechanism plays a crucial role.

1. Introduction

With the growing requirements of complicated and specific software applications in modern life, there has been an increasing demand for customizable software. Such software systems allow the users to choose different options according to their needs, and thus the software can be more personalized. On the other hand, customizable softwares also bring great challenges, one of which is the difficulty of verification. Faults may be caused by a combination of any features, which implies an exponential search space for detecting faults introduced by feature combinations. To give an intuitive impression, consider a software system with 10 possible features, each with 4 options, there are more than one million (1,048,576) configurations to test. Due to the large scale of the search space, it is almost impossible to traverse and check every possible combination [1, 2]. Therefore, an efficient sampling method is urgently called for, and combinatorial interaction testing (CIT) comes into being. CIT has proved to be an efficient method for detecting interaction faults. It employs combinatorial optimization technique to sample the configuration space, and thus significantly reduces the workload of testing. The main task of CIT is to build a covering array (CA) serving as a test suite [3]. A CA guarantees that each combination of t parameters is covered at least once, where t is the covering βˆ— Corresponding

author

[email protected] (Y. Fu); [email protected] (Z. Lei); [email protected] (S. Cai); [email protected] (J. Lin); [email protected] (H. Wang)

Yingjie Fu et al.: Preprint submitted to Elsevier

strength. A simple example is given below. Let us consider a system with three parameters, π‘₯1 , π‘₯2 and π‘₯3 , each of which has some independent options (1, 2 for π‘₯1 , 3, 4 for π‘₯2 and 5, 6, 7 for π‘₯3 ), and no constraint is taken into account. Obviously there are totally 2 Γ— 2 Γ— 3 = 12 possible combinations. Nevertheless, when it comes to CIT with strength 2, this number can be reduced to 6. A 2-way covering array of this system with size 6 is presented as Table 1. Table 1 A CIT instance π‘₯1

π‘₯2

π‘₯3

1 2 1 2 1 2

4 3 3 4 3 4

5 5 6 6 7 7

Meanwhile, as constraints between parameters are ubiquitous in real systems, CAG with constraints is more universal than that without, and is known as the constrained covering array generation (CCAG) problem. In CCAG, a configuration violating the constraints cannot be part of a CA. Due to this fact, the methods designed for unconstrained CAG cannot be directly applied to CCAG, as this may lead to inaccuracies in the final covering array [4]. Numerous algorithms have been developed for solving Page 1 of 17

CCAG, which can be mainly categorized to three classes: constraint encoding algorithms [5, 6, 7], greedy algorithms [8, 9, 10] and meta-heuristic algorithms [11, 12, 13, 14, 15]. Constraint encoding algorithms encode a CCAG instance into a constraint optimization problem, which is then solved by calling a constraint solver. This method is able to catch small solutions in 2-way CCAG, but its solving ability is greatly limited when it comes to 𝑑-way CCAG where 𝑑 is bigger than 2. Greedy algorithms are usually able to generate covering arrays in a short time, but the quality of these covering arrays is usually not satisfactory. As the cost of testing one single configuration increases, algorithms that can obtain CAs with considerable small size are getting more preference [12]. A few meta-heuristic algorithms [11, 12, 13, 14] have been designed for CCAG, which are mainly based on the local search method. Local search has been widely acknowledged as one of the best approaches to NP-hard combinatorial optimization problems, such as Maximum Satisfiability [16, 17], Minimum Vertex Covering [18] and Maximum Clique Problem [19]. The local search algorithms for CCAG, which is also an NP-hard combinatorial optimization problem [20, 21, 22], usually find better solutions than constraint encoding algorithms and greedy algorithms. Particularly, TCA [15] is a well-performing local search algorithm for CCAG, and shows significantly better performance than previous algorithms. It switches between random mode and greedy mode according to a certain probability, and adopts a tabu strategy [23] to avoid the search trapping in local optima. Most local search algorithms tend to select a neighboring solution that is better than the current one at each step, and take it as the new current solution. This makes it inevitable to be "blinded" by local optima and cause search stagnation, leading to a degrade performance. A powerful method for avoiding such situations is to use weighting mechanisms. In fact, weighting mechanisms have achieved great success in solving constraint optimization problems including SAT [24, 25], MaxSAT [16, 26], Minimum Vertex Covering (MVC) [27, 28] and Set Cover Problem (SCP) [29]. Such local search based on constraint weighting is usually known as dynamic local search or weighting local search. Nevertheless, weighting mechanisms have not been applied to CCAG yet. In this work, we propose the first weighting local search for CCAG named WCA, which employs a weighting mechanism working on tuples. Experiments are conducted on classical CIT benchmarks, including real-world instances and synthetic instances. According to the experimental results, WCA shows great advantages compared to its state-of-the-art competitors, including TCA [15], CASA [12] and HHSA [14]. Besides, our experimental analysis show that the weighting mechanism plays a crucial role in the outstanding performance of WCA. The main contributions of this work are as follows: 1. The first dynamic local search to solve CCAG, which associates an integer for each tuple as its weight, and introduces a weight-related scoring function for local

Yingjie Fu et al.: Preprint submitted to Elsevier

search operations. 2. A cache-based method for computing the weightrelated scoring function. 3. Empirical evidence on the advancement of WCA in terms of both effectiveness and efficiency, and empirical evidence on the important role of weighting mechanism in WCA.

2. Preliminaries

In this section, we list some necessary definitions that appear in this paper, and present a typical local search framework for CCAG.

2.1. Definitions

A System under Test (SUT) is denoted as 𝑀 = βŸ¨π‘ƒ , 𝐢⟩, where 𝑃 is the set of parameters of the system and 𝐢 is the set of constraints between parameters. The set of values available for a parameter 𝑝𝑖 ∈ 𝑃 is 𝐷(𝑝𝑖 ). Definition 1. Given an SUT 𝑀 = βŸ¨π‘ƒ , 𝐢⟩, a test case 𝑑𝑐 = {(𝑝1 , 𝑣1 ), (𝑝2 , 𝑣2 ), β‹― , (π‘π‘˜ , π‘£π‘˜ )} is a complete configuration in which each parameter 𝑝𝑖 in 𝑃 is associated with a value 𝑣𝑖 from 𝐷(𝑝𝑖 ). π‘˜ is the number of parameters. Definition 2. Given an SUT 𝑀 = βŸ¨π‘ƒ , 𝐢⟩ and the covering strength 𝑑, a tuple 𝜏 = {(𝑝𝑖1 , 𝑣𝑖1 ), (𝑝𝑖2 , 𝑣𝑖2 ), β‹― , (𝑝𝑖𝑑 , 𝑣𝑖𝑑 )} is a sub-configuration consisting of 𝑑 parameters. A tuple with 𝑑 parameters is a 𝑑-tuple. Sometimes the term "𝑑-tuple" is also expressed as "𝑑-way interaction". A tuple 𝜏 is covered by a test case 𝑑𝑐 if and only if 𝜏 βŠ† 𝑑𝑐, that is, each parameter appears in both 𝜏 and 𝑑𝑐 has the same value. Definition 3. Given an SUT 𝑀 = βŸ¨π‘ƒ , 𝐢⟩ and the covering strength 𝑑, a t-way covering array 𝐢𝐴(𝑀, 𝑑) is an array consisting of test cases as rows, so that every 𝑑-tuple is covered by at least one of the test cases. The number of test cases involved in 𝐢𝐴(𝑀, 𝑑) is regarded as the size of it. For constrained CAG, a tuple or a test case is valid if and only if no constraint in 𝐢 is broken. In order to better describe the local search framework in our work, we introduce the concept of partial-CA, which is an extension of CA. Definition 4. Given an SUT 𝑀 = βŸ¨π‘ƒ , 𝐢⟩, a partial covering array (partial-CA) of 𝑀 is an array 𝛼 consisting of valid test cases. The cost of a partial-CA 𝛼, denoted as π‘π‘œπ‘ π‘‘(𝛼), is the number of uncovered tuples under 𝛼. Different from the definition of covering array, the existence of uncovered tuples in a partial-CA 𝛼 is allowed. Therefore a CA is actually a special form of a partial-CA, and the cost of a CA is always 0. Local search algorithms for CCAG maintain a partialCA and modify it iteratively by performing local operations. The definition of local operation is given below. Page 2 of 17

Algorithm 1: Local Search Framework for CCAG Input: SUT π‘€βŸ¨π‘ƒ , 𝐢⟩, covering strength 𝑑 Output: CA 𝛼 βˆ— , the best size 𝑁 βˆ— 1 𝛼 ← Initialization(); 2 𝑁 ← the size of 𝛼; 3 while The termination criterion is not met do 4 if 𝛼 is a CA then 5 𝛼 βˆ— ← 𝛼, 𝑁 βˆ— ← 𝑁; 6 𝑁 = 𝑁 βˆ’ 1; 7 remove one row from 𝛼; 8 continue; 9 10 11

/* search for a covering array of size 𝑁

3.1. Weight-related Scoring Function

*/

𝑂𝑃 βˆ— ← a candidate local operation according to some scoring functions and heuristics; apply 𝑂𝑃 βˆ— to 𝛼;

return 𝛼 βˆ— and 𝑁 βˆ— ;

Definition 5. Given a partial-CA 𝛼, a local operation, denoted as 𝑂𝑃 (𝑖, 𝑗, 𝑒), means to modify the value of one parameter 𝑝𝑗 in a test case 𝑑𝑐𝑖 from its current value to 𝑒. On this basis, a candidate local operation is a local operation that does not break any constraint or tabu rule.

2.2. A Local Search Framework for CCAG

A typical local search framework for CCAG (outlined in Algorithm 1) is to solve the problem as a series of subproblems on finding a CA with the given fixed size. When finding a CA with size 𝑁, the algorithm removes one row from the current solution and goes on to search for a CA with size 𝑁 βˆ’ 1. To find a CA with a given size 𝑁, the algorithm starts with a partial CA, and performs local operations iteratively until all tuples are covered. In each iteration, it employs scoring functions to compare different local operations, and picks one of them to modify the current partial-CA. Therefore, scoring functions are very important to local search algorithms. In this work, we propose a weight-related scoring function for CCAG. Note that getting stuck in local optima is a stagnation that often occurs during the searching process. Generally, local optima are positions in search space from which no step can achieve an improvement w.r.t the evaluation function [30]. More specifically, for CCAG, local optima are partial-CAs from which no local operation can lead to another one with better evaluation function.

3. Weight-related Scoring Function and Weighting Mechanism

A main idea in our algorithm is a weight-related scoring function for the local search, which is based on a weighting mechanism on tuples. In this section, we first introduce the weight-related scoring function, then describe how the weighting mechanism works and how to calculate the weight-related scoring function efficiently.

Yingjie Fu et al.: Preprint submitted to Elsevier

Local search algorithms for CCAG maintain a partialCA during the search, and in each search step pick a local operation to modify the current partial-CA. To choose the local operation to perform, previous local search algorithms for CCAG employ a common scoring function, which is simply the change on the cost of the partial-CA (the number of uncovered tuples) before and after the operation. This scoring function considers all valid tuples to be equally important during the search. The previous scoring function is straightforward w.r.t. the objective function of the CCAG problem. Nevertheless, it does not take into account the search behavior of the algorithm. We propose to make use of weighting techniques to enhance the scoring function, which guides the search according to the search behavior. Generally speaking, the idea is to prefer to cover the tuples that are more often uncovered in local optima in the past search. In other words, those tuples that are more difficult to cover in the past search have higher priority to be covered. The above considerations have resulted in a weightrelated scoring function, together with a weighting mechanism. To start with, we introduce the weight for each valid tuple. Definition 6. The weight of a valid tuple 𝜏, denoted as 𝑀(𝜏) is a positive integer associated with 𝜏, which is initialized as 1 in the beginning, and can be modified during the search.

With the weights of tuples, we define the weighted cost of a partial-CA as below, which is an extended version of the π‘π‘œπ‘ π‘‘ of a partial-CA. Definition 7. For a partial-CA 𝛼, the weighted cost, denoted as π‘€π‘π‘œπ‘ π‘‘(𝛼), is the total weight of tuples that are not covered by 𝛼.

Apparently, for any CA, its π‘π‘œπ‘ π‘‘ and π‘€π‘π‘œπ‘ π‘‘ are both 0. Thus, when the algorithm finds a partial-CA whose π‘€π‘π‘œπ‘ π‘‘ is 0, it finds a CA. According to the local search framework in our algorithm, it removes one row from the CA, and goes on to search for a CA with a smaller size. Now, we are ready to give the formal definition of the weight related scoring function. Definition 8. Given a partial-CA 𝛼, π‘€π‘ π‘π‘œπ‘Ÿπ‘’(𝑂𝑃 ) is the weight related scoring function of a local operation 𝑂𝑃 , and π‘€π‘ π‘π‘œπ‘Ÿπ‘’(𝑂𝑃 ) = π‘€π‘π‘œπ‘ π‘‘(𝛼) βˆ’ π‘€π‘π‘œπ‘ π‘‘(𝛼 β€² ), where 𝛼 β€² is obtained by performing 𝑂𝑃 on 𝛼.

Note that although weight related scoring functions have been used in other combinatorial optimization problems such as MaxSAT and Minimum Vertex Cover, this is the first weight related scoring function for CCAG. This wscore function is used to choose the local operation to execute in our local search algorithm. Recalling that the weights of all valid tuples are initialized as 1 in the beginning, in order to effectively use the wscore function in local search, we need to design a weighting mechanism to adjust the weights during the search. Page 3 of 17

3.2. The Weighting Mechanism

We design a weighting mechanism to dynamically adjust the weights of tuples. The weighting mechanism works as follows. β€’ All tuple weights are initialized as 1 in the beginning of local search. β€’ Whenever the search is stuck in a local optimum, the weight of each uncovered tuple is increased by 1, and those of covered tuples remain the same. As the weights change during the search, the wscore values of local operations also change. By this means, the weighting mechanism plays an important role in the behavior of the search. Example: Considering the instance in Section 1, suppose the algorithm is searching for a 2-way CA with size 5, all weights have the initial value 1, and no weight update has been performed. When the search proceeds to the partial-CA shown in Table 2, it can be seen that 𝜏1 = {(π‘₯1 , 1), (π‘₯2 , 4))}, 𝜏2 = {(π‘₯1 , 1), (π‘₯3 , 5))} and 𝜏3 = {(π‘₯2 , 4), (π‘₯3 , 5))} are not covered. Calculations show that there is no candidate local operation whose π‘€π‘ π‘π‘œπ‘Ÿπ‘’(𝑂𝑃 ) > 0, which means the search is stuck in a local optimum. Therefore, our weighting mechanism adds the weight of each uncovered tuple by 1 while keeping the weights of other tuples unchanged. In this way, the wscore of some local operations may become positive. Let us consider the candidate local operation 𝑂𝑃 (1, 1, 1), which can cover 𝜏1 but discard another tuple 𝜏 β€² = {(π‘₯1 , 2), (π‘₯3 , 7)}. In the local optimum, π‘€π‘’π‘–π‘”β„Žπ‘‘(𝜏1 ) = π‘€π‘’π‘–π‘”β„Žπ‘‘(𝜏 β€² ) = 1, thus π‘€π‘ π‘π‘œπ‘Ÿπ‘’(𝑂𝑃 ) is 0. However, after updating the weights, π‘€π‘’π‘–π‘”β„Žπ‘‘(𝜏1 ) = 2, and the wscore of 𝑂𝑃 grows to 1 as a result. Table 2 A possible local optimum for the instance

𝑑𝑐1 𝑑𝑐2 𝑑𝑐3 𝑑𝑐4 𝑑𝑐5

π‘₯1

π‘₯2

π‘₯3

2 2 1 2 1

4 3 3 4 3

7 5 6 6 7

Generally, our algorithm picks a local operation to execute in each iteration, heavily depending on the wscore values. As stated before, the goal of searching is to obtain a CA 𝛼 of which π‘€π‘π‘œπ‘ π‘‘(𝛼) = 0, and π‘€π‘ π‘π‘œπ‘Ÿπ‘’(𝑂𝑃 ) indicates the reduction of π‘€π‘π‘œπ‘ π‘‘(𝛼) caused by 𝑂𝑃 . In WCA, candidate local operations with higher wscore are more likely to be accepted. Details are given in Section 4. The insight of the weighting mechanism is intuitive. When a "stuck" situation is observed, the uncovered tuples’ weights are increased, to make the following search steps more inclined to cover them. By adjusting the weights at local optima, the algorithm would not be trapped in certain local optima or similar local optima in which some valid Yingjie Fu et al.: Preprint submitted to Elsevier

tuples are always uncovered. In this sense, the use of weighting mechanism increases the diversity of the search, enabling the algorithm to perform a more thorough seeking across the search space.

3.3. Score Acquisition

In order to calculate the weight related scoring function, we propose a cache-based score acquisition method, which is naturally applicable to our weighting mechanism. In WCA, we maintain a matrix WSCORE with N rows and M columns, where N is the number of test cases in the current partial-CA, and 𝑀 = Σ𝑝𝑖 βˆˆπ‘ƒ |𝐷(𝑝𝑖 )|, i.e., the sum of the domain size of all parameters. We establish a one-to-one mapping that maps each value of a parameter to a number in [1, 𝑀] as follows: the π‘Ÿπ‘‘β„Ž value of parameter 𝑝𝑗 is mapped to the number π‘š = Σ𝑝𝑑 ∈{𝑝𝑖 βˆˆπ‘ƒ |𝑖<𝑗} |𝐷(𝑝𝑑 )| + π‘Ÿ. Then, the wscore of operation 𝑂𝑃 (𝑖, 𝑗, 𝑒) which modifies 𝑝𝑗 in 𝑑𝑐𝑖 from its current value to another value 𝑒 that is mapped to π‘š is stored in the cell WSCORE[𝑖][π‘š]. In each step of the search, the wscore of candidate local operations can be directly obtained from WSCORE without calculation. There are two situations in which the WSCORE update is performed: when the current partial-CA is modified, including altering values and removing rows, or the weights of tuples increase. Before giving specific update rules, we firstly introduce some essential definitions. Definition 9. Given a current partial-CA 𝛼 and a tuple 𝜏, πΆπ‘œπ‘£π‘’π‘Ÿπ‘‚π‘ƒ (𝛼, 𝜏) is the set of local operations on 𝛼 that can change the state of 𝜏 from uncovered to covered. Definition 10. Given a current partial-CA 𝛼 and a tuple 𝜏, π‘ˆ π‘›π‘π‘œπ‘£π‘’π‘Ÿπ‘‚π‘ƒ (𝛼, 𝜏) is the set of local operations on 𝛼 that can change the state of 𝜏 from covered to uncovered. The number of test cases that cover a tuple 𝜏 is called the cover count of 𝜏. Note that πΆπ‘œπ‘£π‘’π‘Ÿπ‘‚π‘ƒ (𝛼, 𝜏) is empty when 𝜏 is already covered by 𝛼, which is because more coverage would only lead to the increase of cover count, instead of the alternation from uncovered to covered. Similarly, if 𝜏 is not covered by 𝛼, π‘ˆ π‘›π‘π‘œπ‘£π‘’π‘Ÿπ‘‚π‘ƒ (𝛼, 𝜏) is empty as well. Update rules for modifications: Given a modification, relevant tuples are those whose cover count is affected by it. More specifically, for the value alternation, a relevant tuple is one that contains the old or the new value of the changed position, as for the row removal, all tuples covered by the removed test case are regarded as relevant tuples. When a modification is performed, the cover count of all relevant tuples change as well. However, not all of these changes have effect on WSCORE. At this time, all the relevant tuples are supposed to be checked and those with inconsequential changes are omitted. Other relevant tuples are considered to be essential and are divided into 5 groups according to their cover count, corresponding to the 5 cases in Table 3. These update rules may seem cumbersome, but the main idea of them is actually intuitive. For example, suppose a tuple 𝜏 is not covered by the current partial-CA Page 4 of 17

Table 3 Update rules for modifications. Suppose the modification is applied on a partial-CA 𝛼, resulting in another partial-CA 𝛼 β€² . CoverCount of 𝜏

Update Rules

0β†’1

𝑓 π‘œπ‘Ÿ π‘Žπ‘™π‘™ π‘œπ‘ ∈ πΆπ‘œπ‘£π‘’π‘Ÿπ‘‚π‘ƒ (𝛼, 𝜏) βˆͺ π‘ˆ π‘›π‘π‘œπ‘£π‘’π‘Ÿπ‘‚π‘ƒ (𝛼 β€² , 𝜏), π‘€π‘ π‘π‘œπ‘Ÿπ‘’(π‘œπ‘)βˆ’ = π‘€π‘’π‘–π‘”β„Žπ‘‘(𝜏)

1β†’2

𝑓 π‘œπ‘Ÿ π‘Žπ‘™π‘™ π‘œπ‘ ∈ π‘ˆ π‘›π‘π‘œπ‘£π‘’π‘Ÿπ‘‚π‘ƒ (𝛼, 𝜏), π‘€π‘ π‘π‘œπ‘Ÿπ‘’(π‘œπ‘)+ = π‘€π‘’π‘–π‘”β„Žπ‘‘(𝜏)

1β†’0

𝑓 π‘œπ‘Ÿ π‘Žπ‘™π‘™ π‘œπ‘ ∈ πΆπ‘œπ‘£π‘’π‘Ÿπ‘‚π‘ƒ (𝛼 β€² , 𝜏) βˆͺ π‘ˆ π‘›π‘π‘œπ‘£π‘’π‘Ÿπ‘‚π‘ƒ (𝛼, 𝜏), π‘€π‘ π‘π‘œπ‘Ÿπ‘’(π‘œπ‘)+ = π‘€π‘’π‘–π‘”β„Žπ‘‘(𝜏)

2β†’1

𝑓 π‘œπ‘Ÿ π‘Žπ‘™π‘™ π‘œπ‘ ∈ π‘ˆ π‘›π‘π‘œπ‘£π‘’π‘Ÿπ‘‚π‘ƒ (𝛼 β€² , 𝜏), π‘€π‘ π‘π‘œπ‘Ÿπ‘’(π‘œπ‘)βˆ’ = π‘€π‘’π‘–π‘”β„Žπ‘‘(𝜏)

0β†’0

𝑓 π‘œπ‘Ÿ π‘Žπ‘™π‘™ π‘œπ‘ ∈ πΆπ‘œπ‘£π‘’π‘Ÿπ‘‚π‘ƒ (𝛼, 𝜏) βˆ’ πΆπ‘œπ‘£π‘’π‘Ÿπ‘‚π‘ƒ (𝛼 β€² , 𝜏), π‘€π‘ π‘π‘œπ‘Ÿπ‘’(π‘œπ‘)βˆ’ = π‘€π‘’π‘–π‘”β„Žπ‘‘(𝜏) 𝑓 π‘œπ‘Ÿ π‘Žπ‘™π‘™ π‘œπ‘ ∈ πΆπ‘œπ‘£π‘’π‘Ÿπ‘‚π‘ƒ (𝛼 β€² , 𝜏) βˆ’ πΆπ‘œπ‘£π‘’π‘Ÿπ‘‚π‘ƒ (𝛼, 𝜏), π‘€π‘ π‘π‘œπ‘Ÿπ‘’(π‘œπ‘)+ = π‘€π‘’π‘–π‘”β„Žπ‘‘(𝜏)

𝛼, π‘ˆ π‘›π‘π‘œπ‘£π‘’π‘Ÿπ‘‚π‘ƒ (𝛼, 𝜏) = βˆ…, and πΆπ‘œπ‘£π‘’π‘Ÿπ‘‚π‘ƒ (𝛼, 𝜏) β‰  βˆ… in most cases. For each 𝑂𝑃 ∈ πΆπ‘œπ‘£π‘’π‘Ÿπ‘‚π‘ƒ (𝛼, 𝜏), the ability to cover 𝜏 can lead to a positive contribution to π‘€π‘ π‘π‘œπ‘Ÿπ‘’(𝑂𝑃 ). However, once 𝜏 is covered by one local operation, resulting in a different partial-CA 𝛼 β€² , πΆπ‘œπ‘£π‘’π‘Ÿπ‘‚π‘ƒ (𝛼 β€² , 𝜏) becomes empty while π‘ˆ π‘›π‘π‘œπ‘£π‘’π‘Ÿπ‘‚π‘ƒ (𝛼 β€² , 𝜏) is not empty. That is to say all operations in πΆπ‘œπ‘£π‘’π‘Ÿπ‘‚π‘ƒ (𝛼, 𝜏) can not cover 𝜏 as a newly covered tuple any more, but all operations in π‘ˆ π‘›π‘π‘œπ‘£π‘’π‘Ÿπ‘‚π‘ƒ (𝛼 β€² , 𝜏) can turn 𝜏 into a newly uncovered tuple. Thus the wscore of operations in both πΆπ‘œπ‘£π‘’π‘Ÿπ‘‚π‘ƒ (𝛼, 𝜏) and π‘ˆ π‘›π‘π‘œπ‘£π‘’π‘Ÿπ‘‚π‘ƒ (𝛼 β€² , 𝜏) should be reduced by π‘€π‘’π‘–π‘”β„Žπ‘‘(𝜏). The last row of Table 3 indicates a special case that the cover count of a tuple before and after the modification are both 0. In this case, the change of WSCORE happens only when πΆπ‘œπ‘£π‘’π‘Ÿπ‘‚π‘ƒ (𝛼 β€² , 𝜏) β‰  πΆπ‘œπ‘£π‘’π‘Ÿπ‘‚π‘ƒ (𝛼, 𝜏). Take the upper part as an example, given an uncovered tuple 𝜏 which can be covered by a test case 𝑑𝑐 with only a single local operation 𝑂𝑃 ∈ πΆπ‘œπ‘£π‘’π‘Ÿπ‘‚π‘ƒ (𝛼, 𝜏), if the following operation leads to another parameter that have different values in 𝜏 and 𝑑𝑐, at least two local operations are required for 𝑑𝑐 to cover 𝜏, which means 𝑂𝑃 βˆ‰ πΆπ‘œπ‘£π‘’π‘Ÿπ‘‚π‘ƒ (𝛼 β€² , 𝜏). For the original partial-CA 𝛼, being able to cover 𝜏 forms a positive part of π‘€π‘ π‘π‘œπ‘Ÿπ‘’(𝑂𝑃 ), while 𝑂𝑃 loses this attribute after the modification. Similarly, the score of all local operations in πΆπ‘œπ‘£π‘’π‘Ÿπ‘‚π‘ƒ (𝛼, 𝜏)βˆ’ πΆπ‘œπ‘£π‘’π‘Ÿπ‘‚π‘ƒ (𝛼 β€² , 𝜏) should be reduced by π‘€π‘’π‘–π‘”β„Žπ‘‘(𝜏). Update rules for weight increase: We have already elaborated the weighting mechanism in WCA that when the current partial-CA is local optimal, weights of uncovered tuples are increased, after which WSCORE should be updated as well. In this case, we only need to update wscore of those local operations that are able to cover uncovered tuples. Specifically speaking, in WCA, when the search gets stuck in a local optimum with the set of uncovered tuples 𝑆, the weight of each uncovered tuple is increased by 1, and the update rule of WSCORE is presented as follows. For each uncovered tuple 𝜏 ∈ 𝑆 and each operation 𝑂𝑃 ∈ πΆπ‘œπ‘£π‘’π‘Ÿπ‘‚π‘ƒ (𝛼, 𝜏), π‘€π‘ π‘π‘œπ‘Ÿπ‘’(𝑂𝑃 )+ = 1.

Yingjie Fu et al.: Preprint submitted to Elsevier

4. The WCA Algorithm

In this section, we propose a weighting-based local search algorithm named WCA for CCAG, which adopts the typical framework mentioned in Section 2.2, and mainly employs the weight-related scoring function to guide the search.

4.1. Constraints Handling

As mentioned in Section 1, constraints in real-word applications are ubiquitous, which must be handled carefully. In WCA, constraints on the permissible combinations of values of parameters are given in the conjunctive normal form of first order logic formulas. For example, the clause Β¬(π‘₯1 = 1) ∨ (π‘₯2 = 3) indicates that if π‘₯1 equals to 1 then π‘₯2 must equal to 3. This form of constraints can also be handle directly by ACTS, which serves as the initialization of WCA to generate a starting valid CA. We would like to note that this method of constraints encoding is first proposed by Cohen et al. [4], and then adopted by Garvin et al. [12]. During the search process, each test case is a complete assignment to all parameters. Therefore, it is sufficient to inspect whether all constraints are satisfied based on these values of parameters. Actually, given a local operation, only the constraints related to changing parameters need to be checked, since the status of the other constraints are not affected by the operation. As mentioned in Definition 5, local operations that violate constraints cannot become candidate local operations, and also cannot be executed as well.

4.2. Description of WCA

In this subsection, we specify each component of WCA and give a detailed description on important ones. The pseudo code of WCA is presented in Algorithm 2. Firstly we introduce the tabu strategy [23] which has been used in TCA [15]. This strategy has an important impact on the selection of candidate local operations in WCA as well. Its main idea is that, once a parameter is modified in a test case of the current partial-CA, it should not be modified again in the same test case in the following 𝑑𝑑 steps, where 𝑑𝑑 is the tabu tenure. Operations under tabu can not be picked as candidate local operations. Page 5 of 17

Algorithm 2: The WCA Algorithm Input: SUT π‘€βŸ¨π‘ƒ , 𝐢⟩, covering strength 𝑑, step threshold 𝑇 Output: CA 𝛼 βˆ— , the best size 𝑁 βˆ— 1 𝛼 ← ACTS Initialization(); 2 build the initial WSCORE by calculation; 3 𝑁 ← the size of 𝛼; 4 while The cutoff time is not reached do 5 if 𝛼 is a CA then 6 𝛼 βˆ— ← 𝛼, 𝑁 βˆ— ← 𝑁; 7 𝑁 = 𝑁 βˆ’ 1; 8 randomly remove one row from 𝛼; 9 update WSCORE for modification; 10 continue; 11 12 13 14 15

/* search for a covering array of size 𝑁

*/

𝛼 ← π‘€π‘’π‘–π‘”β„Žπ‘‘π‘–π‘›π‘”-π‘π‘Žπ‘ π‘’π‘‘ π‘ π‘’π‘Žπ‘Ÿπ‘β„Ž(𝛼); if 𝛼 has not been improved for 𝑇 steps then 𝛼 ← π‘Ÿπ‘Žπ‘›π‘‘π‘œπ‘š(𝛼); update WSCORE for modification;

return 𝛼 βˆ— and 𝑁 βˆ— ;

For the initialization part of our algorithm, we utilize the well-known ACTS [31] algorithm to construct a covering array that determines the initial size. ACTS is chosen because it is an effective greedy algorithm which can generate a valid CA very fast. The best found CA, denoted by 𝛼 βˆ— , is initialized to the CA constructed by ACTS, and the best found size 𝑁 βˆ— is initialized to the size of 𝛼 βˆ— . In implementation, the WSCORE matrix is also initialized based on this, and in the whole algorithm, whenever a modification happens or the weights of tuples increase, WSCORE is updated accordingly w.r.t to the rules stated in Section 3.3 After initialization, a loop is executed until the cutoff time is reached. In each round of the loop, we firstly check whether the current partial-CA can cover all valid tuples. If this is the case, which means a valid CA of the current size has been found, the best found CA 𝛼 βˆ— and the best found 𝑁 βˆ— are updated by those smaller ones, and a test case picked randomly is removed from the current CA 𝛼 after that. Otherwise, WCA performs a local operation to modify 𝛼, aiming to search for a valid CA of the current size. There are two modes to do this task, including weighting-based search mode and random mode. Most of the time, this is done by a weighting-based search procedure. Details of the weighting-based search mode are presented in Algorithm 3. We use two arrays to maintain the cover count of all tuples and the set of uncovered tuples (whose cover count = 0), which are updated immediately once any modification happens on the current CA. In this mode, uncovered tuples are scanned in a random order. For each uncovered tuple 𝜏, the wscore of all candidate operations in πΆπ‘Žπ‘›π‘‘π‘‚π‘ƒ (𝛼, 𝜏) (the set of candidate local operations on 𝛼 that can cover 𝜏 without violating any constraint or any tabu rule) are read from the WSCORE Yingjie Fu et al.: Preprint submitted to Elsevier

Algorithm 3: The weighting-based Search in WCA Input: partial-CA 𝛼 Output: partial-CA 𝛼 1 𝑆 ← the set of uncovered tuples; βˆ— 2 𝑂𝑃 ← Nil; 3 for all 𝜏 ∈ 𝑆 do 4 πΆπ‘Žπ‘›π‘‘π‘‚π‘ƒ (𝛼, 𝜏) ← candidate local operations that can cover 𝜏 without violating any constraint or any tabu rule; 5 if πΆπ‘Žπ‘›π‘‘π‘‚π‘ƒ (𝛼, 𝜏) β‰  βˆ… then 6 𝑂𝑃 βˆ— ← a local operation in πΆπ‘Žπ‘›π‘‘π‘‚π‘ƒ (𝛼, 𝜏) with the highest wscore; 7 if π‘€π‘ π‘π‘œπ‘Ÿπ‘’(𝑂𝑃 βˆ— ) > 0 then 8 apply 𝑂𝑃 βˆ— on 𝛼; 9 update WSCORE for modification; 10 return 𝛼; 11 12 13

Increase the weights of uncovered tuples; update WSCORE for weight increase; if 𝑂𝑃 βˆ— = Nil then

/* there is no candidate local operation that can cover uncovered tuples */

14

15 16

randomly pick one test case to cover an arbitrary tuple from 𝑆 with no constraint or tabu rule violated; update WSCORE for modification; return 𝛼;

matrix, and the one with the highest wscore is chosen as 𝑂𝑃 βˆ— , with ties broken randomly. Once a candidate operation with π‘€π‘ π‘π‘œπ‘Ÿπ‘’(𝑂𝑃 βˆ— ) > 0 is found, which means that 𝑂𝑃 βˆ— is able to reduce π‘€π‘π‘œπ‘ π‘‘(𝛼), no more uncovered tuples will be considered, and the chosen 𝑂𝑃 βˆ— is applied to modify 𝛼 immediately. Conversely, if there is no candidate operation with positive wscore, the weights of uncovered tuples are increased by 1. As for the rarely occurring case that no candidate local operation can lead to a newly covered tuple on 𝛼, one test case is chosen randomly to cover an arbitrary uncovered tuple. The π‘Ÿπ‘Žπ‘›π‘‘π‘œπ‘š mode is activated only when the algorithm fails to improve the maintained partial-CA 𝛼 for a long time. Specifically, if the algorithm fails to reduce the number of uncovered tuples of 𝛼 within 𝑇 steps (𝑇 is an algorithmic parameter), a random step is executed to create disturbances and jump far away from the current area. Random disturbance is a common approach to escape from search stagnation. In the random mode π‘Ÿπ‘Žπ‘›π‘‘π‘œπ‘š(𝛼) of WCA, for each row (test case) of 𝛼, we pick a parameter randomly, and change its value to an arbitrary valid one.

5. Experiments

In this section, we conduct a series of experiments to evaluate the effectiveness and efficiency of WCA. A wide range of benchmarks, including real-world instances and Page 6 of 17

synthetic instances, are used to compare the performance of WCA with its competitors on the merit of CA size and time consumption. In this work, we focus on the experiments of 2-way and 3-way CCAG, since these covering strengths are the most concerns in previous literatures.

5.1. Research Questions

As a test suite for revealing interactive faults of a system, the size of a covering array is expected to be as small as possible. Smaller CAs always result in lower cost of testing. Therefore, to evaluate the effectiveness of algorithms, we mainly focus on the size of CAs generated by each algorithm. To make the comparison fairly, all algorithms are evaluated under the same time budget. An algorithm is considered to be superior if it finds valid CAs of smaller size than other algorithms within the same time budget. Besides, when the generated CAs are of the same size, the time consumption of each algorithm is used to evaluate their efficiency. On these grounds, we compare WCA against its competitors and present the first research question below. RQ1: How does the WCA algorithm perform compared against the state-of-the-art solvers on 2-way and 3-way CCAG? It is worth noting that the time spent in generating CAs should be taken into consideration when estimating the total cost of a test process, especially for systems whose test suite execution take only a very short time. Although local search algorithms can obtain much smaller CAs than other algorithms, the overwhelming time consumption limits their superiority in wider applications. Hence there is an urgent need to improve the efficiency of local search algorithms while reserving the capability of finding small CAs. On this basis, it is interesting to examine how WCA performs within limited time budget. RQ2: Can WCA outperform the state-of-the-art solvers while using much shorter time? The weighting mechanism is a major component of WCA. In order to illustrate the importance of it, we compare WCA against its alternative version WCA-noweight, which describes the variant resulted by removing line 11 and line 12 from Algorithm 3. Experiments are conducted on 3-way CCAG to study the following research question. RQ3: How much does the weighting mechanism contribute to the effectiveness of WCA?

5.2. The State-of-the-art Competitors

We compare WCA with four state-of-the-art CCAG algorithms, which are TCA [15], CASA [12], HHSA [14], and ACTS [31] respectively. TCA, CASA and HHSA are metaheuristic algorithms and ACTS is a greedy algorithm. All of these algorithms are available online. TCA [15] proposed by Lin et al. is a two-mode metaheuristic algorithm which combines the greedy mode and the random mode effectively. It is one of the best metaheuristic algorithms on both 2-way and 3way benchmarks. Experiments show that TCA can generate CAs with obviously smaller size than previous algorithms including greedy Yingjie Fu et al.: Preprint submitted to Elsevier

construction algorithm ACTS and Cascade [7], as well as the meta-heuristic algorithm CASA. CASA [12] is a meta-heuristic algorithm based on simulated annealing proposed by Garvin et al. By reorganizing the search progress according to CCAG’s structure, CASA yields smaller CAs than previous simulated annealing algorithms and the run-time is reduced significantly. HHSA [14] proposed by Jia et al. is a hyper-heuristic algorithm that learns search strategies according to a broad range of problem instances. It consists of a simulated annealing search as the outer layer and a reinforcement learning agent as the inner layer. ACTS [31] is a widely used greedy algorithm which constructs CAs using a strategy named In-Parameter-Order. ACTS can generate CAs of reasonable size quickly, while it does not pursue CAs of the smallest size.

5.3. Benchmarks

A broad range of benchmarks are used in our experiments to study the research questions presented in Section 5.1. All of them are publicly accessible and have been widely used to evaluate CCAG algorithms [4, 12, 13, 15, 32, 33, 34]. There are five instances extracted from non-trivial real world systems. They are descried as follows. β€’ Apache, a popular HTTP server used in UNIX and Windows NT. β€’ Bugzilla, a tracking system for bugs and defects in developers’ production.

β€’ GCC, a well-known GNU Compiler Collection including front ends for many languages such as C, C++, Java, Ada and libraries of them. β€’ SPINV, a widely used verifier-form of SPIN as model checking tool. β€’ SPINS, a simulation component of SPIN.

The other 30 instances are synthetically generated based on the characteristics of the abundance, type and complexity of constraints related to these 5 realistic instances. More details of the instances can be found in [12, 13].

5.4. Experimental Settings

All experiments are conducted on a computer cluster of which every node is equipped with dual 56-core, 2.00GHz Intel Xeon E7-4830 CPUs, 35 MB L3 cache and 256 GB RAM and works under Ubuntu (version: 16.04.5 LTS). Considering the randomness of heuristic algorithms, for WCA, TCA, HHSA and CASA, we conduct 10 runs for each on every 2-way or 3-way constrained instance. ACTS is conducted only once since it is a deterministic greedy algorithm and the time budget of each run is set to 1000 CPU seconds. For WCA, the step threshold 𝑇 in Algorithm 2 is set to 1000, and the tabu tenure 𝑑𝑑 is set to 4. For other algorithms, we use the default setting of them. The specific algorithm Page 7 of 17

2way

3way

103 102

Rtime

102 101 101

100

100 WCA

TCA

CASA

HHSA

WCA

TCA

CASA

HHSA

π‘…π‘‘π‘–π‘šπ‘’ is the ratio of the time consumption used by one algorithm and the minimum time consumption among all the algorithms

Figure 1: Statistical results comparing the time consumption of different algorithms on both 2-way and 3-way cases

2way

3way

1.5

3.0

1.4

Rsize

2.5 1.3 2.0

1.2

1.5

1.1 1.0

1.0 WCA WCA(100s) TCA

ACTS

CASA

HHSA

WCA WCA(100s) TCA

ACTS

CASA

HHSA

𝑅𝑠𝑖𝑧𝑒 is the ratio of CAs’ size obtained by one algorithm and the best size among all the algorithms

Figure 2: Statistical results comparing the test suite size of different algorithms on both 2-way and 3-way cases

used in ACTS (for both independent experiment and the initialization of WCA) is IPOG [35]. We use box plots to show the distributions with respect to algorithms. The lower and upper bounds of the box are the 1st quartile and the 3rd quartile, respectively, and the line in the middle of the box indicates the median. We take the position 1.5 times the interquartile range from the quartiles as the edges. Whiskers extend to show the rest of the distribution which is between the upper and lower edges but not in the box. As for points that are higher than the upper edge or lower than the lower edge, they are called outliers Yingjie Fu et al.: Preprint submitted to Elsevier

whose values are represented by dots. For the purpose of normalization, for each instance, the ordinate represents the ratio of the result obtained by one algorithm and the best result among all the algorithms. The abscissa distinguishes different algorithms. Particularly, when comparing the time consumption, we use a logarithmic scale for the ordinate. In order to measure the reliability of our statements, we also give the 𝑝-value in the Wilcoxon signed-rank test. The detailed results are presented in tables. For each instance solved by a meta-heuristic algorithm, we report Page 8 of 17

Table 4 Experimental results of WCA(1000s), TCA(1000s), ACTS, HHSA(1000s) and CASA(1000s) for 2-way constrained CAG on all benchmarks WCA(1000s) TCA(1000s) ACTS HHSA(1000s) CASA(1000s) Benchmark (|𝑃 |,|𝐢|) avg (min) time avg (min) time min time avg (min) time avg (min) time Apache (172,7) Bugzilla (52,5) Gcc (199,40) Spins (18,13) Spinv (55,49) Syn1 (97,24) Syn2 (94,22) Syn3 (29,10) Syn4 (58,17) Syn5 (174,39) Syn6 (77,30) Syn7 (30,15) Syn8 (119,37) Syn9 (61,37) Syn10 (147,47) Syn11 (96,32) Syn12 (147,27) Syn13 (133,26) Syn14 (92,15) Syn15 (58,22) Syn16 (87,34) Syn17 (137,29) Syn18 (141,28) Syn19 (197,43) Syn20 (158,48) Syn21 (85,46) Syn22 (79,22) Syn23 (27,15) Syn24 (119,29) Syn25 (134,27) Syn26 (95,32) Syn27 (62,20) Syn28 (194,37) Syn29 (144,22) Syn30 (79,35) #WinAvg. #WinMin.

30 (30) 16 (16) 15 (15) 19 (19) 31 (31) 36 (36) 30 (30) 18 (18) 20 (20) 42 (42) 24 (24) 9 (9) 36 (36) 20 (20) 37.8 (37) 37.3 (37) 36 (36) 36 (36) 36 (36) 30 (30) 24 (24) 36 (36) 37.9 (37) 39.9 (39) 48.9 (48) 36 (36) 36 (36) 12 (12) 38.1 (38) 43.8 (43) 26.3 (26) 36 (36) 45 (45) 25 (25) 16 (16) 28 28

0.7 0.5 7.1 0.8 18.7 2.2 1.3 0.5 0.5 353.4 1.1 0.5 98.2 1.2 313.4 397.8 1.9 1.2 0.6 2.1 1.9 1.1 89.5 306.7 29.8 0.9 0.5 0.7 533.4 323.0 175.1 1.0 85.3 0.8 1.0

30 (30) 16 (16) 16.1 (16) 19 (19) 32.1 (32) 36 (36) 30 (30) 18 (18) 20 (20) 43.1 (43) 24 (24) 9 (9) 37 (37) 20 (20) 39.9 (39) 38.9 (38) 36 (36) 36 (36) 36 (36) 30 (30) 24 (24) 36 (36) 39.9 (39) 43.9 (43) 49.8 (49) 36 (36) 36 (36) 12 (12) 40.1 (40) 45.6 (45) 27 (27) 36 (36) 47 (47) 25 (25) 16 (16) 15 15

the smallest (best) size of the resulted CAs among 10 runs, denoted as β€˜min’, and the average size of them, denoted as β€˜avg’. We also report the average time spent in obtaining the resulted CAs over 10 runs as β€˜time’. Note that there is only one size and one time for each instance in ACTS. If an algorithm fails to find any valid CA within the time budget in all runs, the corresponding values are marked by β€˜β€”β€™. For each instance, we use boldface to indicate the best β€˜min’ and β€˜avg’ found among the algorithms. Besides, the best β€˜time’ is shown in bold only between algorithms which obtain the best β€˜min’ and β€˜avg’. In addition, for each algorithm, we report the number of instances on which it achieves the best β€˜min’ among all competitors, denoted by β€˜#WinMin’, and the number of instances on which it achieves the best β€˜avg’, denoted by β€˜#WinAvg’, Yingjie Fu et al.: Preprint submitted to Elsevier

1.9 0.1 386.6 0.1 334.4 106.4 0.9 <0.1 0.1 320.4 0.2 <0.1 202.7 0.1 123.8 169.5 12.5 1.0 0.6 0.7 0.3 14.7 140.8 210.4 215.5 0.4 0.2 <0.1 287.5 125.8 276.6 0.2 251.3 6.7 114.6

33 19 23 26 45 48 32 19 22 54 25 12 47 22 47 47 43 40 39 32 25 41 52 51 60 39 37 14 48 52 34 37 57 29 22

0.6 0.4 0.9 0.4 1.0 0.9 1.0 0.5 0.5 1.0 0.8 0.4 0.8 0.8 1.2 1.1 0.8 0.9 0.6 1.2 1.0 1.0 0.7 1.6 1.1 0.9 0.5 0.4 0.9 0.9 1.0 0.8 1.2 0.8 0.9 0 0

29.9 (27) 16 (16) 18.0 (17) 19 (19) 31 (31) 36.6 (36) 29 (29) 17 (17) 19.2 (19) 45.9 (44) 24 (24) 9 (9) 39.0 (37) 19 (19) 43.1 (41) 41.2 (41) 36.9 (36) 36 (36) 36 (36) 29 (29) 24 (24) 36.2 (36) 41.0 (39) 46.4 (46) 53.1 (51) 36 (36) 36 (36) 11.2 (11) 41.4 (41) 47.3 (46) 29.8 (29) 36 (36) 48.7 (48) 27.1 (26) 17.6 (17) 18 21

589.3 8.6 376.7 14.4 136.8 275.4 206.3 12.5 76.6 470.9 93.6 43.5 467.8 141.4 287.0 53.4 615.3 438.8 189.5 99.4 117.1 387.5 301.7 690.7 725.2 264.3 193.8 6.7 183.7 419.2 149.9 75.3 807.5 345.5 89.7

34.6 (32) 4.1 16.4 (16) 0.2 22.1 (19) 76.4 19.8 (19) 0.2 40.2 (36) 3.2 40.1 (38) 13.3 31.8 (30) 2.0 18.6 (18) <0.1 21.9 (20) 0.5 50.1 (45) 69.0 24.2 (24) 0.6 9 (9) <0.1 41.5 (38) 22.4 20.2 (20) 0.2 44.2 (42) 30.9 43.3 (41) 17.8 41.7 (39) 18.5 37.6 (36) 4.8 38.2 (37) 4.0 31.9 (30) 0.5 24.8 (24) 0.6 40.5 (38) 11.0 42.4 (41) 27.6 49.4 (47) 40.7 53.4 (52) 120.4 36.6 (36) 2.7 36 (36) 0.5 12.7 (12) <0.1 43.1 (42) 32.8 48.0 (47) 119.2 32.9 (30) 2.8 36.6 (36) 0.3 51.4 (50) 79.3 30.7 (29) 3.4 19.7 (19) 1.0 2 9

5.5. Experimental Results

In this subsection, we present the experimental results to answer the research questions raised in Section 5.1. Results on comparing WCA with its state-of-the-art competitors for both 2-way and 3-way CCAG (RQ1): Figure 1 compares the time it takes for each algorithm to obtain the best 2-way and 3-way covering arrays within the same time budget (1000s). The ACTS algorithm is not taken into account, since greedy algorithms are known to be much faster than meta-heuristic algorithms. The comparison w.r.t. the size of the generated arrays by different algorithms is presented in Figure 2. It can be observed that WCA performs better than its competitors when the cutoff time is 1000s, with 𝑝 < 0.0025 for 2-way CCAG, and 𝑝 < 0.00004 for 3-way CCAG. Detailed results are given in Table 4 and Table 5. For 2-way Page 9 of 17

Table 5 Experimental results of WCA(1000s), TCA(1000s), ACTS, HHSA(1000s) and CASA(1000s) for 3-way constrained CAG on all benchmarks WCA(1000s) TCA(1000s) ACTS HHSA(1000s) CASA(1000s) Benchmark(|𝑃 |,|𝐢|) avg (min) time avg (min) time min time avg (min) time avg (min) time Apache (172,7) Bugzilla (52,5) Gcc (199,40) Spins (18,13) Spinv (55,49) Syn1 (97,24) Syn2 (94,22) Syn3 (29,10) Syn4 (58,17) Syn5 (174,39) Syn6 (77,30) Syn7 (30,15) Syn8 (119,37) Syn9 (61,37) Syn10 (147,47) Syn11 (96,32) Syn12 (147,27) Syn13 (133,26) Syn14 (92,15) Syn15 (58,22) Syn16 (87,34) Syn17 (137,29) Syn18 (141,28) Syn19 (197,43) Syn20 (158,48) Syn21 (85,46) Syn22 (79,22) Syn23 (27,15) Syn24 (119,29) Syn25 (134,27) Syn26 (95,32) Syn27 (62,20) Syn28 (194,37) Syn29 (144,22) Syn30 (79,35) #WinAvg. #WinMin.

134.5 (133) 48 (48) 74.7 (69) 80.4 (80) 195.2 (194) 249.7 (249) 131.8 (130) 50 (50) 80 (80) 328.9 (328) 96 (96) 25.7 (25) 256.3 (255) 60 (60) 276.9 (276) 277.3 (276) 216 (216) 180 (180) 216 (216) 150 (150) 96 (96) 216 (216) 275.6 (275) 306.8 (306) 409.6 (409) 216 (216) 144 (144) 36 (36) 292.4 (292) 351.6 (350) 164.8 (164) 180 (180) 368.0 (367) 125 (125) 65.3 (64) 33 34

262.9 1.7 486.6 3.9 47.9 262.7 275.4 1.1 2.8 750.1 4.5 2.6 509.4 3.5 688.0 318.8 72.2 15.7 6.7 4.4 5.2 80.1 701.2 770.8 825.9 5.4 3.7 1.4 620.5 643.4 202.0 3.6 800.9 18.6 132.0

155.7 (154) 48 (48) 83.3 (81) 80 (80) 200.5 (198) 255.8 (254) 143.8 (142) 51 (51) 80 (80) 413.5 (410) 96 (96) 25.3 (25) 270.8 (268) 60 (60) 328.6 (323) 285.9 (285) 238.4 (236) 181.7 (180) 216 (216) 150 (150) 96 (96) 230.4 (227) 304.4 (301) 492.3 (486) 511.7 (502) 216 (216) 144 (144) 36 (36) 302.7 (299) 394.8 (391) 169.2 (167) 180 (180) 506.0 (502) 125.4 (125) 69.7 (69) 13 14

891.9 10.0 874.7 3.6 229.2 831.5 520.1 41.4 40.2 993.0 138.6 354.2 894.5 8.2 989.8 878.3 966.9 733.6 156.7 117.8 99.3 933.4 973.6 989.6 995.5 88.1 46.3 4.6 944.3 990.5 837.6 45.3 988.0 712.6 437.2

173 68 108 98 286 293 174 71 102 386 119 35 326 84 329 318 263 200 244 173 117 265 344 373 463 235 164 48 341 404 207 204 420 154 100

8.4 0.5 9.2 0.4 1.5 3.0 2.4 0.5 0.9 12.9 1.6 0.5 4.7 1.5 8.5 2.9 6.5 4.5 2.0 1.4 1.8 5.7 6.9 20.9 12.6 2.2 1.4 0.5 4.8 7.0 2.6 1.4 20.8 5.0 1.7 0 0

β€” 58.6 (50) β€” 86.1 (79) β€” β€” β€” 58.8 (57) 100 (100) β€” β€” 26 (26) β€” 81 (81) β€” β€” β€” β€” β€” 160 (160) β€” β€” β€” β€” β€” β€” β€” 37.3 (36) β€” β€” β€” β€” β€” β€” β€” 0 2

β€” 248.1 (246) 920.7 600.7 64.6 (61) 36.6 β€” 140.1 (134) 943.8 98.8 100.5 (94) 7.0 β€” 233.2 (224) 722.3 β€” 367.1 (358) 888.6 β€” 184.7 (174) 804.9 166.7 61.1 (59) 2.7 818.6 103.6 (96) 90.2 β€” 1069.1 (1068) 954.1 β€” 122.6 (118) 663.6 108.6 27.8 (27) 4.8 β€” 403.6 (389) 955.6 720.0 76.6 (70) 389.7 β€” 798.1 (795) 946.0 β€” 409.1 (398) 870.9 β€” 390.6 (367) 952.7 β€” 291.0 (277) 931.9 β€” 271.6 (262) 921.6 962.5 169.3 (165) 508.4 β€” 123.7 (119) 765.1 β€” 353.2 (338) 924.4 β€” 449.2 (446) 937.1 β€” β€” β€” β€” 1050.9 (1026) 873.1 β€” 251.3 (244) 955.5 β€” 170.7 (162) 534.3 208.1 39.2 (37) 2.6 β€” 462.6 (448) 952.9 β€” 593.5 (570) 965.6 β€” 220.9 (216) 854.0 β€” 201.6 (194) 594.0 β€” β€” β€” β€” 193.1 (186) 913.7 β€” 88.8 (82) 464.4 0 0

Table 6 Experimental statistics of WCA(100s), TCA(1000s), ACTS, HHSA(1000s) and CASA(1000s) for 2-way and 3-way CCAG on all benchmarks

2-way 3-way

#WinAvg. #WinMin. #WinAvg. #WinMin.

WCA(100s)

TCA(1000s)

ACTS

HHSA(1000s)

CASA(1000s)

27 28 33 34

15 17 13 14

0 0 0 0

18 21 0 2

2 9 0 0

CCAG, there are 28 out of 35 instances on which WCA stands out as the best algorithm on the metric of both average size and best size. These figures for TCA are both 15, for HHSA are 18 and 21, and for CASA are 2 and 9. As shown in Table 5, the advantages of WCA are more obvious on 3-way benchmarks. Within the same time budYingjie Fu et al.: Preprint submitted to Elsevier

get, WCA significantly outperforms all its competitors on both real-world and synthetic instances. On the metric of the average size of CAs, it can always find better solutions on all instances except Spins and Syn7. According to the statistics, TCA algorithm seems to be the best in this comparison among these state-of-the-art competitors, but is still inferior Page 10 of 17

Table 7 Experimental results of WCA(1000s) and WCA-noweight(1000s) for 3-way CCAG on all benchmarks Benchmark(|𝑃 |,|𝐢|) Apache (172,7) Bugzilla (52,5) Gcc (199,40) Spins (18,13) Spinv (55,49) Syn1 (97,24) Syn2 (94,22) Syn3 (29,10) Syn4 (58,17) Syn5 (174,39) Syn6 (77,30) Syn7 (30,15) Syn8 (119,37) Syn9 (61,37) Syn10 (147,47) Syn11 (96,32) Syn12 (147,27) Syn13 (133,26) Syn14 (92,15) Syn15 (58,22) Syn16 (87,34) Syn17 (137,29) Syn18 (141,28) Syn19 (197,43) Syn20 (158,48) Syn21 (85,46) Syn22 (79,22) Syn23 (27,15) Syn24 (119,29) Syn25 (134,27) Syn26 (95,32) Syn27 (62,20) Syn28 (194,37) Syn29 (144,22) Syn30 (79,35) #WinAvg. #WinMin.

WCA(1000s)

WCA-noweight(1000s)

avg (min)

time

avg (min)

time

134.5 (133) 48 (48) 74.7 (69) 80.4 (80) 195.2 (194) 249.7 (249) 131.8 (130) 50 (50) 80 (80) 328.9 (328) 96 (96) 25.7 (25) 256.3 (255) 60 (60) 276.9 (276) 277.3 (276) 216 (216) 180 (180) 216 (216) 150 (150) 96 (96) 216 (216) 275.6 (275) 306.8 (306) 409.6 (409) 216 (216) 144 (144) 36 (36) 292.4 (292) 351.6 (350) 164.8 (164) 180 (180) 368.0 (367) 125 (125) 65.3 (64)

262.9 1.7 486.6 3.9 47.9 262.7 275.4 1.1 2.8 750.1 4.5 2.6 509.4 3.5 688.0 318.8 72.2 15.7 6.7 4.4 5.2 80.1 701.2 770.8 825.9 5.4 3.7 1.4 620.5 643.4 202.0 3.6 800.9 18.6 132.0

169.4 (166) 56.3 (56) 99.6 (97) 94.7 (90) 229.3 (228) 289.8 (286) 164.2 (163) 58.0 (57) 92.9 (91) 381.5 (380) 116.1 (114) 27 (27) 309.3 (306) 64.3 (63) 327.2 (324) 317.4 (316) 258.2 (254) 197.5 (194) 233.2 (231) 169.4 (167) 110.2 (109) 260.2 (258) 329.6 (327) 369.7 (367) 460.1 (459) 220.3 (219) 144.4 (144) 38 (38) 337.0 (335) 401.3 (400) 194.1 (192) 200.1 (198) 417.4 (415) 148.2 (144) 81.7 (80)

134.2 290.8 647.1 <0.1 732.5 234.4 674.5 477.7 615.5 603.3 48.2 104.6 776.0 675.6 246.2 3.4 372.4 388.9 890.4 584.0 643.6 604.8 870.5 451.2 455.6 740.3 493.2 21.5 367.2 383.1 772.6 587.3 380.5 268.4 501.0

35 35

to the WCA algorithm. Moreover, it can be seen that for 3way CCAG, the average time used by WCA is much shorter than those used by other heuristic solvers. It means that WCA can find better solutions while spending much less CPU time than its competitors. In particular, for the Gcc instance, the average size of CAs found by WCA is 74.7 using only 486.6 seconds, while TCA found 83.3 in 874.7 seconds, CASA found 140.1 in 943.8 seconds and HHSA failed to generate CAs on this instance. We thus conclude that WCA outperforms all of its competitors on both 2-way and 3-way benchmarks. Besides, its superiority on 3-way benchmarks is more significant than that on 2-way benchmarks. This implies that WCA is more competitive when the strength t gets larger. Statistical results on comparing WCA with a smaller Yingjie Fu et al.: Preprint submitted to Elsevier

0 1

cutoff time against its state-of-the-art competitors (RQ2): To evaluate the performance of WCA within a limited time budget, we compare the size of covering arrays obtained by WCA in 100 seconds with those generated by other algorithms in 1000 seconds. Figure 2 summarizes the size of CAs found by all the algorithms and more specific statistics are presented in Table 6. As shown in Figure 2, the sizes of CAs generated by WCA within 100s and 1000s are quite close. With regard to the median size, we can only see tiny difference between WCA(1000s) and WCA(100s). Therefore, the time budget of 100 seconds is enough for WCA to exploit its performance. When it comes to the comparison between WCA and other algorithms, it can be clearly seen that even though the time budget is considerably shorter, WCA(100s) still has obvious advantages over its competitors in terms of medians, Page 11 of 17

quartiles and extreme values. As we can see from Table 6, for 3-way CCAG, WCA(100s) outperforms its competitors on all but two instances. It also obtains the best solutions on 27 out of 35 instances for 2-way CCAG on the metric of average size, while this figure for HHSA is 18, for TCA is 15, and for CASA is 2. Therefore, WCA shows its superiority over the state-of-theart solvers even within a limited time budget for solving the CCAG problem. The result of Wilcoxon signed-rank test is 𝑝 < 0.003 for 2-way cases and 𝑝 < 0.00004 for 3-way cases. Empirical evidence of the effectiveness of the weighting mechanism (RQ3): In order to illustrate that the weighting mechanism takes a quite important role in WCA, we remove this component from WCA, resulting in an alternative algorithm called WCA-noweight. The experimental results are presented in Table 7. It shows that the WCA algorithm, with the weighting mechanism, significantly outperforms WCA-noweight (𝑝 < 0.000001 for 3-way cases). It generates much better CAs than WCA-noweight on most instances, while worse on none. Besides, the time-consumption of WCA is usually shorter than WCA-noweight. In particular, for the Spinv instance, the average size of CAs generated by WCA is 195.2 using about 48 seconds, while WCA-noweight can only find CAs of average size 229.3 using nearly 733 seconds. Summary on experiments: In this subsection, we compare WCA against the state-of-the-art algorithms on a broad range of benchmarks. The experimental results suggest that WCA is superior to its competitors on both real-world and synthetic instances. For the 2-way and 3-way CCAG, WCA can generate much smaller covering arrays than the other meta-heuristic algorithms, even when the time budget is 10 times shorter. In addition, to illustrate the importance of the weighting mechanism, we compare WCA with its alternative version which works without weighting the tuples. The experimental results show that the performance of WCA degrades dramatically without the weighting mechanism.

5.6. Threats to Validity

The performance of CCAG solvers depends on their particular implementations such as programming language and data structures. It may lead to slightly different results if the details of implementation changes. However, it is unrealistic to pursue the exact identity in all aspects. It is our best effort to use the source code provided by their authors and execute all algorithms on the same experimental environment and resource. Among all algorithms, WCA, TCA, HHSA and CASA are all implemented in C++, and the greedy algorithm ACTS is implemented in java. Although there are some differences in their implementation details, since WCA can obtain smaller covering arrays in one tenth of the time on most instances, the gap between WCA and its competitors is large enough to illustrate the superior of WCA. Meanwhile, the cutoff time 1000s may seems too short to exploit the ability of algorithms. In fact, TCA is the main competitor of WCA, and the cutoff time 1000s is also taken as its termination criterion [15]. To our knowledge, results Yingjie Fu et al.: Preprint submitted to Elsevier

achieved by WCA within 1000s are even mostly better than those got by HHSA and CASA within much more time or even hours reported in previous literatures [12, 14].

6. Related Work

Combinatorial interaction testing (CIT) is a popular approach for the detection of option-combination faults. In the past few decades, as CIT has been more widely used, a growing number of studies have been conducted on it. The main task of CIT is covering array generation (CAG), on which some technologies based on machine learning [36] and post-optimization [37] have appeared recently. Γ‚Δƒ Constrained covering array generation (CCAG) is an extension of CAG with constraints which are quite common in real-world applications. Mainstream algorithms for CCAG are generally divided into three categories: constraint encoding algorithms, greedy algorithms and meta-heuristic algorithms. Constraint encoding algorithms are dedicated to encoding a CCAG problem into a constrained optimization problem, such as SAT and MaxSAT [5, 6], then using a corresponding solver to solve it. These algorithms are effective for solving 2-way CCAG, but typically fail to solve the problems when the required covering strength is three or more. Research on CCAG’s greedy algorithms has also made a progress. Greedy algorithms are able to generate valid covering arrays in a very short time, while the size of the resulted arrays is not their main consideration. The low time consumption makes greedy algorithms more advantageous especially when the execution of test suites only takes little time. Existing greedy algorithms are mainly based on two ideas, one-test-at-a-time (OTAT) and in-parameterorder (IPO). OTAT is first used in AETG [8], which has received extensive attention and has been improved by many works [7, 10, 38, 39, 40, 41]. IPO is a strategy which was first proposed to generate 2-way covering arrays [42]. A generalization of IPO was proposed by Lei et al. to generate t-way CAs as well. ACTS [31], which is widely used in realworld systems, is a typical application of IPO strategy [9]. Different from the two types mentioned above, metaheuristic algorithms occupy an important position in solving CCAG because of its ability to obtain good solutions in a reasonable time. Many effective algorithms can be classified into this category, such as TCA [15], HHSA [14] and CASA [12]. In fact, most meta-heuristic algorithms are based on a famous local search framework. The main process of local search algorithms is iteratively picking candidate operations and executing them. A great challenge for local search algorithms is to avoid search stagnation, and many methods have been proposed, such as tabu search and simulated annealing. Respectively, tabu search marks recent operations by setting taboos, and simulated annealing accepts worse operations with a certain probability. Search-based software engineering [43] is a subarea of software engineering. The main idea of it is to use computational search techniques to tackle software engineering Page 12 of 17

problems. Many problems of practical significance in this area, such as test case generation [44], program refactoring [45], prioritization for regression testing [46], and module clustering [47], have been solved by search-based algorithms effectively. Weighting mechanism is a technique which can guide local search algorithms by dynamically updating weights. When an algorithm gets stuck in local optima, the weights are updated according to some specific rules, guiding the following search to proceed away from the current ones. Weighting techniques have achieved great success in many combinatorial optimization problems, such as SAT [24, 25], MaxSAT [16, 26], Minimum Vertex Covering (MVC) [27, 28] and Set Cover Problem (SCP) [29]. As far as we know, the algorithm proposed in this paper is the first weighting local search for solving CCAG.

7. Conclusions and Future Work

In this work, we design a weighting mechanism in local search for CCAG, which associates an integer with each tuple as its weight and dynamically adjusts them during the search. By adjusting the weights of tuples, this weighting mechanism prevents the search from being stuck in the same or similar local optima. On this basis, we propose a weighting local search called WCA. Different from previous algorithms, WCA employs a weight related scoring function wscore, which can dynamically highlight the importance of hard-to-cover tuples during the search, for operation selection. Experiments on a broad range of benchmarks are conducted to compare WCA with three state-of-the-art algorithms, and the results show that WCA significantly outperforms all its competitors. When the time budgets of all the algorithms are the same (1000s), the CAs obtained by WCA are generally smaller than those generated by others. Moreover, even when WCA’s time budget is reduced to 100s, and those of its competitors remain 1000s, WCA still maintains its superiority. These exciting results clearly demonstrate that WCA outperforms its state-of-the-art rivals in terms of both effectiveness and efficiency. In the future, we will carry out more specific analysis about how the search process is guided by weight mechanism in local search algorithms for CCAG. We also would like to conduct further studies on the search space of CCAG for different instances to better understand the weighting mechanism and improve it. Besides, considering that weighting mechanism is very effective for CCAG, we are interested to apply it to similar problems. The source code of WCA is publicly available at: https:// github.com/superjessie/CIT-WCA

8. Acknowledgements

This work is supported by Beijing Academy of Artificial Intelligence (BAAI). Shaowei Cai is supported by the Youth Innovation Promotion Association, Chinese Academy of Sciences (No. 2017150). Yingjie Fu et al.: Preprint submitted to Elsevier

References

[1] D. R. Kuhn, D. R. Wallace, A. M. Gallo, Software fault interactions and implications for software testing, IEEE Trans. Software Eng. 30 (2004) 418–421. [2] C. Yilmaz, M. B. Cohen, A. A. Porter, Covering arrays for efficient fault characterization in complex configuration spaces, IEEE Trans. Software Eng. 32 (2006) 20–34. [3] C. Nie, H. Leung, A survey of combinatorial testing, ACM Comput. Surv. 43 (2011) 11. [4] M. B. Cohen, M. B. Dwyer, J. Shi, Interaction testing of highlyconfigurable systems in the presence of constraints, in: Proceedings of the ACM/SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2007, London, UK, July 9-12, 2007, pp. 129– 139. [5] M. Banbara, H. Matsunaka, N. Tamura, K. Inoue, Generating combinatorial test cases by efficient SAT encodings suitable for CDCL SAT solvers, in: Logic for Programming, Artificial Intelligence, and Reasoning - 17th International Conference, LPAR17, Yogyakarta, Indonesia, October 10-15, 2010. Proceedings, pp. 112–126. [6] A. Yamada, T. Kitamura, C. Artho, E. Choi, Y. Oiwa, A. Biere, Optimization of combinatorial testing by incremental SAT solving, in: 8th IEEE International Conference on Software Testing, Verification and Validation, ICST 2015, Graz, Austria, April 13-17, 2015, pp. 1– 10. [7] Z. Zhang, J. Yan, Y. Zhao, J. Zhang, Generating combinatorial test suite using combinatorial optimization, Journal of Systems and Software 98 (2014) 191–207. [8] D. M. Cohen, S. R. Dalal, M. L. Fredman, G. C. Patton, The AETG system: An approach to testing based on combinatiorial design, IEEE Trans. Software Eng. 23 (1997) 437–444. [9] Y. Lei, R. Kacker, D. R. Kuhn, V. Okun, J. Lawrence, IPOG/IPOG-D: efficient test generation for multi-way combinatorial testing, Softw. Test., Verif. Reliab. 18 (2008) 125–148. [10] A. Yamada, A. Biere, C. Artho, T. Kitamura, E.-H. Choi, Greedy combinatorial test case generation using unsatisfiable cores, in: 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE), IEEE, pp. 614–624. [11] P. Galinier, S. Kpodjedo, G. Antoniol, A penalty-based tabu search for constrained covering arrays, in: Proceedings of the Genetic and Evolutionary Computation Conference, ACM, pp. 1288–1294. [12] B. J. Garvin, M. B. Cohen, M. B. Dwyer, An improved meta-heuristic search for constrained interaction testing, in: Search Based Software Engineering, 2009 1st International Symposium on, pp. 13–22. [13] B. J. Garvin, M. B. Cohen, M. B. Dwyer, Evaluating improvements to a meta-heuristic search for constrained interaction testing, Empirical Software Engineering 16 (2011) 61–102. [14] Y. Jia, M. Cohen, M. Petke, Learning combinatorial interaction test generation strategies using hyperheuristic search, in: 37th International Conference on Software Engineering (ICSE 2015), 1624 May 2015, Firenze, USA, pp. 540–550. [15] J. Lin, C. Luo, S. Cai, K. Su, D. Hao, L. Zhang, Tca: An efficient two-mode meta-heuristic algorithm for combinatorial test generation (t), in: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), IEEE, pp. 494–505. [16] S. Cai, C. Luo, J. Thornton, K. Su, Tailoring local search for partial maxsat, in: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27 -31, 2014, QuΓ©bec City, QuΓ©bec, Canada., pp. 2623–2629. [17] C. Luo, S. Cai, W. Wu, Z. Jie, K. Su, CCLS: an efficient local search algorithm for weighted maximum satisfiability, IEEE Trans. Computers 64 (2015) 1830–1843. [18] S. Cai, J. Lin, C. Luo, Finding a small vertex cover in massive sparse graphs: construct, local search, and preprocess, Journal of Artificial Intelligence Research 59 (2017) 463–494. [19] Y. Wang, S. Cai, M. Yin, Two efficient local search algorithms for maximum weight clique problem, in: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 805–811.

Page 13 of 17

[20] P. Danziger, E. Mendelsohn, L. Moura, B. Stevens, Covering arrays avoiding forbidden edges, Theoretical Computer Science 410 (2009) 5403–5414. [21] E. Maltais, L. Moura, Hardness results for covering arrays avoiding forbidden edges and error-locating arrays, Theoretical Computer Science 412 (2011) 6517–6530. [22] L. Kampel, D. E. Simos, A survey on the state of the art of complexity problems for covering arrays, Theoretical Computer Science (2019). [23] K. J. Nurmela, Upper bounds for covering arrays by tabu search, Discrete Applied Mathematics 138 (2004) 143–152. [24] J. Thornton, Clause weighting local search for sat, Journal of Automated Reasoning 35 (2005) 97–142. [25] S. Cai, K. Su, Local search for boolean satisfiability with configuration checking and subscore, Artificial Intelligence 204 (2013) 75–98. [26] Z. Lei, S. Cai, Solving (weighted) partial maxsat by dynamic local search for sat., in: IJCAI, pp. 1346–1352. [27] S. Richter, M. Helmert, C. Gretton, A stochastic local search approach to vertex cover, in: Annual Conference on Artificial Intelligence, Springer, pp. 412–426. [28] S. Cai, K. Su, A. Sattar, Local search with edge weighting and configuration checking heuristics for minimum vertex cover, Artificial Intelligence 175 (2011) 1672–1696. [29] C. Gao, T. Weise, J. Li, A weighting-based local search heuristic algorithm for the set covering problem, in: 2014 IEEE Congress on Evolutionary Computation (CEC), IEEE, pp. 826–831. [30] H. H. Hoos, T. StΓΌtzle, Stochastic local search: Foundations and applications, Elsevier, 2004. [31] L. Yu, Y. Lei, M. N. Borazjany, R. Kacker, D. R. Kuhn, An efficient algorithm for constraint handling in combinatorial test generation, in: 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation, Luxembourg, Luxembourg, March 1822, 2013, pp. 242–251. [32] M. B. Cohen, M. B. Dwyer, J. Shi, Constructing interaction test suites for highly-configurable systems in the presence of constraints: A greedy approach, IEEE Trans. Software Eng. 34 (2008) 633–650. [33] D. R. Kuhn, V. Okum, Pseudo-exhaustive testing for software, in: 2006 30th Annual IEEE/NASA Software Engineering Workshop, IEEE, pp. 153–158. [34] H. Mercan, C. Yilmaz, K. Kaya, Chip: A configurable hybrid parallel covering array constructor, IEEE Trans. Software Eng. 45 (2019) 1270–1291. [35] Y. Lei, R. Kacker, D. R. Kuhn, V. Okun, J. Lawrence, IPOG: A general strategy for t-way software testing, in: 14th Annual IEEE International Conference and Workshop on Engineering of Computer Based Systems (ECBS 2007), 26-29 March 2007, Tucson, Arizona, USA, pp. 549–556. [36] L. Kampel, M. Wagner, I. S. Kotsireas, D. E. Simos, How to use boltzmann machines and neural networks for covering array generation, in: International Conference on Learning and Intelligent Optimization, Springer, pp. 53–68. [37] P. Nayeri, C. J. Colbourn, G. Konjevod, Randomized postoptimization of covering arrays, European Journal of Combinatorics 34 (2013) 91–103. [38] R. C. Bryce, C. J. Colbourn, The density algorithm for pairwise interaction testing, Softw. Test., Verif. Reliab. 17 (2007) 159–182. [39] R. C. Bryce, C. J. Colbourn, M. B. Cohen, A framework of greedy methods for constructing interaction test suites, in: 27th International Conference on Software Engineering (ICSE 2005), 15-21 May 2005, St. Louis, Missouri, USA, pp. 146–155. [40] J. Czerwonka, Pairwise testing in the real world: Practical extensions to test-case scenarios, Microsoft Corporation, Software Testing Technical Articles (2008). [41] Y.-W. Tung, W. S. Aldiwan, Automating test case generation for the new generation mission software system, in: Aerospace Conference Proceedings, 2000 IEEE, volume 1, pp. 431–437. [42] Y. Lei, K. Tai, In-parameter-order: A test generation strategy for pairwise testing, in: 3rd IEEE International Symposium on High-

Yingjie Fu et al.: Preprint submitted to Elsevier

[43]

[44] [45]

[46] [47]

Assurance Systems Engineering (HASE ’98), 13-14 November 1998, Washington, D.C, USA, Proceedings, pp. 254–261. M. Harman, The current state and future of search based software engineering, in: International Conference on Software Engineering, ISCE 2007, Workshop on the Future of Software Engineering, FOSE 2007, May 23-25, 2007, Minneapolis, MN, USA, pp. 342–357. P. McMinn, Search-based software test data generation: a survey, Softw. Test., Verif. Reliab. 14 (2004) 105–156. M. Harman, L. Tratt, Pareto optimal search based refactoring at the design level, in: Genetic and Evolutionary Computation Conference, GECCO 2007, Proceedings, London, England, UK, July 7-11, 2007, pp. 1106–1113. Z. Li, M. Harman, R. M. Hierons, Search algorithms for regression test case prioritization, IEEE Trans. Software Eng. 33 (2007) 225– 237. M. Harman, S. Swift, K. Mahdavi, An empirical study of the robustness of two module clustering fitness functions, in: Genetic and Evolutionary Computation Conference, GECCO 2005, Proceedings, Washington DC, USA, June 25-29, 2005, pp. 1029–1036.

Page 14 of 17

Declaration of interests β˜’ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. ☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:

Yingjie Fu: Methodology, Software, Validation, Formal analysis, Writing - Original Draft, Writing - Review & Editing, Visualization Zhendong Lei: Methodology, Writing - Original Draft Shaowei Cai: Conceptualization, Supervision, Project administration Jinkun Lin: Investigation, Resources, Data Curation Haoran Wang: Writing - Review & Editing