An approach to solving non-linear real constraints for symbolic execution

An approach to solving non-linear real constraints for symbolic execution

The Journal of Systems and Software 157 (2019) 110383 Contents lists available at ScienceDirect The Journal of Systems and Software journal homepage...

3MB Sizes 0 Downloads 41 Views

The Journal of Systems and Software 157 (2019) 110383

Contents lists available at ScienceDirect

The Journal of Systems and Software journal homepage: www.elsevier.com/locate/jss

An approach to solving non-linear real constraints for symbolic execution Saeed Amiri-Chimeh, Hassan Haghighi∗ Faculty of Computer Science and Engineering, Shahid Beheshti University G. C., Tehran, Iran

a r t i c l e

i n f o

Article history: Received 8 January 2019 Revised 31 May 2019 Accepted 17 July 2019 Available online 17 July 2019 Keywords: Constraint solving Decision procedure Non-linear arithmetic Symbolic execution Software testing Test data generation

a b s t r a c t Constraint solvers are well-known tools for solving many real-world problems such as theorem proving and real-time scheduling. One of the domains that strongly relies on constraint solvers is the technique of symbolic execution for automatic test data generation. Many researchers have tried to alleviate the shortcomings of the available constraint solvers to improve their applications in symbolic execution for test data generation. Despite many recent improvements, constraint solvers are still unable to efficiently deal with certain types of constraints. In particular, constraints that include non-linear real arithmetic are among the most challenging ones. In this paper, we propose a new approach to solving non-linear real constraints for symbolic execution. This approach emphasizes transforming constraints into functions with specific properties, which are named Satisfaction Functions. A satisfaction function is generated in a way that by maximizing it, values that satisfy the corresponding constraint are obtained. We compared the performance of our technique with three constraint solvers that were known to be able to solve non-linear real constraints. The comparison was made regarding the speed and correctness criteria. The results showed that our technique was comparable with other methods regarding the speed criterion and outperformed these methods regarding the correctness criterion. © 2019 Elsevier Inc. All rights reserved.

1. Introduction As an automatic code analysis and software test data generation technique, Symbolic Execution (King, 1976; 1975) is mainly dependent on the capabilities of constraint solvers. Test data generation using symbolic execution consists of two main steps: Constraint generation and constraint solving. The constraint generation step builds a unique path constraint over the symbolic inputs for every desired execution path of the Software Under Test (SUT). These constraints are built in such a way that they are satisfied only by input values that execute their corresponding paths. This is where constraint solving plays an important role in Symbolic Execution; by solving the corresponding constraint of a path, concrete assignments to SUT’s input parameters are generated that can be used for testing the path. Fig. 1 depicts this process for a simple example with the goal of reaching a possible bug. Symbolic Execution was introduced nearly three decades ago, but only recent advances in constraint solving tools and techniques has made it a practical approach (Anand et al., 2013; Braione et al., 2017).



Corresponding author. E-mail addresses: [email protected] [email protected] (H. Haghighi). https://doi.org/10.1016/j.jss.2019.07.045 0164-1212/© 2019 Elsevier Inc. All rights reserved.

(S.

Amiri-Chimeh),

Despite the advances of constraint solving techniques, one of the biggest obstacles of utilizing Symbolic Execution is still the constraint solvers’ inability to deal with complex constraints (Anand et al., 2013; Thomé et al., 2017). Particularly, constraints built on undecidable theories, such as non-linear real arithmetic, are problematic. Also, solving some of the constraints built on decidable theories are expensive. Table 1 shows the complexity of satisfying some types of constraints (Zhang, 2008). As a result, current constraint solvers have problems in dealing with constraints that include non-linear real or integer arithmetic. This is a big issue, since non-linear arithmetic is extensively used in many programs (Haller et al., 2012). Solving constraints built on non-linear arithmetic is undecidable; consequently, constraint solvers usually need relatively long run-times to solve constraints built on such theories, compared to other less complex theories. Another issue is that current constraint solvers don’t necessarily return correct solutions (i.e. solutions that satisfy the given constraint) (Anand et al., 2013). Accordingly, two important performance criteria of the constraint solvers in this regard are speed and correctness. In this paper, we wish to improve the applicability of constraint solvers in Symbolic Execution by introducing a new approach to solving satisfiability modulo non-linear real arithmetic. The constraints obtained from programs that have floating-point input pa-

2

S. Amiri-Chimeh and H. Haghighi / The Journal of Systems and Software 157 (2019) 110383

Fig. 1. An overview of test data generation using symbolic execution. Table 1 Complexity of constraint satisfaction . Constraint type

Complexity

Boolean formulas Linear constraints over rationales Linear constraints over rationales and integers Non-linear constraints over integers Non-linear constraints over reals

Decidable, NP-Hard Decidable, Linear-time Decidable, NP-Hard Undecidable Undecidable

Fig. 2. The satisfaction function and an approximative satisfaction function of the constraint ¬ ( f ∗ f ≤ f ) ∧ (ln( f ∗ f + 1 ) ≤ 2 ).

rameters usually have non-linear real arithmetic sub-constraints. Our goal is to propose a new approach for solving such subconstraints. We show this new approach, namely Smooth Modeling, is a relatively fast method that is more correct in solving constraints, compared to other related methods and tools (see Section 3). This new approach to constraint solving is based on the definition of satisfaction functions. A satisfaction function is defined in correspondence to a constraint. The most important characteristic of a satisfaction function is that it returns 1 for input values that satisfy its corresponding constraint. Similarly, a satisfaction function returns 0 for input values that don’t satisfy its corresponding constraint. Consider an arbitrary constraint like C. We call the intersection of its mathematical function’s domains DC . The corresponding satisfaction function of C is a function from DC → {0, 1}. Approximative satisfaction functions are very similar to satisfaction functions but with few differences. These functions are smooth in bigger intervals. To be more specific, an approximative satisfaction function is non-differentiable or has discrete derivatives only where the mathematical functions of its corresponding constraint are non-differentiable or have discrete derivatives. Unlike satisfaction functions, approximative satisfaction functions return values nearly equal to 1 for input values that satisfy their related constraint. Similarly, they return values nearly equal to 0 for input values that don’t satisfy their related constraint. Accordingly, the approximative satisfaction function of the constraint C is from DC → [0, 1]. As an example, Fig. 2 shows the satisfaction function and an approximative satisfaction function of the constraint ¬( f ∗ f ≤ f ) ∧ (ln( f ∗ f + 1 ) ≤ 2 ). The presented approximative satisfaction function in Fig. 2 is:

exp( f 2 − f ) ∗ exp(2 − ln( f 2 + 1 )) (exp(2 − ln( f 2 + 1 )) + 1 ) ∗ (exp( f 2 − f ) + 1 )

(1)

It can be inferred that an input  value satisfies the  given constraint, if it is in the interval [− exp(2 ) − 1, 0] ∪ [1, exp(2 ) − 1]. As can be seen in the figure, the output of the approximative satis-

faction function is (nearly) equal to 1 in the same interval. We will explain how to create satisfaction functions and their corresponding approximative variants from constraints, later in this paper. Although a little unusual, we can look at the satisfaction function of the constraint C as the probability distribution of its satisfaction over DC . Having this in mind, we can say it is possible to maximize the satisfaction function to find values that satisfy the corresponding constraint. Since for any constraint C, any member of DC either satisfies the constraint or doesn’t satisfy it, the mentioned distribution is going to be a piece-wise non-smooth function which its range is the set of {0, 1}. Solving non-smooth optimization problems are difficult; instead, we maximize the approximative satisfaction functions which are smooth. It is worth mentioning that optimizing smooth approximations of non-smooth objective functions is a known practice and proven to be effective (Nesterov, 2005). Accordingly, to generate test data for a specific execution path of the SUT using Smooth Modeling, we perform the constraint generation step of the symbolic execution first. This gives us the path constraint. Then, we generate the corresponding satisfaction function and its approximative version. Finally, we generate test data by maximizing the resulting approximative satisfaction function. Fig. 3 depicts the described process for a simple example. Subsequently, the following research questions will be investigated in this paper: •





RQ1: Is it possible to design an algorithm for automatically generating the satisfaction function (and its approximative version) from a constraint? RQ2: What is the performance of Smooth Modeling in terms of speed, compared to other related constraint solvers? RQ3: What is the performance of Smooth Modeling in terms of correctness, compared to other related constraint solvers?

To investigate these research questions, the following tasks were designed and performed: 1. Designing the algorithm mentioned in RQ1 2. Choosing an optimizer to be used in maximizing approximative satisfaction functions. 3. Preparing sets of constraints to evaluate Smooth Modeling’s performance.

S. Amiri-Chimeh and H. Haghighi / The Journal of Systems and Software 157 (2019) 110383

3

Fig. 3. An Overview of test data generation using Smooth Modeling.

4. Investigating other constraint solvers that are known to be able to deal with non-linear real arithmetic and mathematical functions. Three constraint solvers were chosen as competitors of Smooth Modeling: dReal, CORAL and CHOCO. The rationales behind this selection will be presented in Section 3. 5. Comparing the performance of Smooth Modeling to that of other three constraint solvers in terms of speed and correctness (RQ2 and RQ3). The collected results demonstrate that Smooth Modeling outperforms all other three constraint solvers in terms of correctness while it is comparable to them in terms of speed. The rest of this paper is organized as follows: Section 2 explains Symbolic Execution in details. Section 3 reviews the related works of this research. Section 4 presents the approach of Smooth Modeling in full details. Section 5 addresses the evaluation of Smooth Modeling including methodology, results and discussions. Finally, Section 6 mentions the possible future works and concludes the paper. 2. Symbolic execution Since our work depends on the symbolic execution of the source code, in this section, we review this technique. The key idea behind symbolic execution (King, 1976) is to use symbolic values instead of concrete values for input parameters, and to represent values of program variables as symbolic expressions. Symbolic execution retains a symbolic state σ , which maps variables to symbolic expressions, and a path constraint PC, a first order quantifier free formula over symbolic expressions, for each program path. PC accumulates constraints on the inputs that cause the execution to follow the related path. At every conditional statement if(e) then S1 else S2, PC is updated with conditions over inputs to choose between alternative paths. A new path condition PC is created and set to PC∧¬σ (e) (“else” branch) and PC is updated to PC∧σ (e) (“then” branch), where σ (e) denotes the symbolic predicate acquired by evaluating e in the symbolic state. Note that unlike in concrete execution, both branches of a conditional statement can be taken in symbolic execution, resulting in two program paths. If any of PC or PC becomes unsatisfiable, symbolic execution terminates along the corresponding path. When symbolic execution along a path stops (normally or with an error), the current PC can be solved by a constraint solver which generates the test data. If the program is executed on these concrete inputs, it will take the same path as the symbolic execution. Symbolic execution of code having loops or recursion may result in an infinite number of paths; therefore, in practice, one needs to put a boundary on the search, e.g., a timeout or a limit on the number of paths or exploration depth. To show how a constraint is generated using symbolic execution on a single path, consider the path of ABDEF in Fig. 4.

Fig. 4. A sample source code and it’s corresponding control flow graph.

At the beginning of the symbolic execution, PC is TRUE and the symbolic state is {(f → f0 )}. At node A, the symbolic state will be updated to {(f → f0 ), (x → f0 ∗ f0 )}. When execution reaches node B, the ”else” branch is looked for, so PC will be updated to PC ∧ ¬σ (x <= f ) which is equivalent to (f0 ∗ f0 > f0 ). Now at node D, the symbolic state will be updated to {( f → f0 ), (x → f0 ∗ f0 + 1 )}, and lastly, at node E, the “then” branch is desired, so PC will be updated to PC∧σ (ln(x) ≤ 2) which is equivalent to (( f 0 ∗ f0 > f0 ) ∧ (ln( f0 ∗ f0 + 1 ) ≤ 2 )). By solving this constraint we will generate concrete input data that causes the program to execute the path of ABDEF.

4

S. Amiri-Chimeh and H. Haghighi / The Journal of Systems and Software 157 (2019) 110383

3. Related works In recent years, constraint solving has significantly advanced which made Symbolic Execution a hot topic in the field of test data generation, after three decades since its introduction. However, as mentioned before, the ability of constraint solvers to deal with complex constraints is still limited, which subsequently affects the applicability of Symbolic Execution in general. In this section, we mention tools and techniques that are related to solving complex constraints, explicitly, those which aim to solve constraints with non-linear real arithmetic. Here, we first talk about SMT Solving, an important concept in constraint solving. Then, we mention some well-known constraint solvers that can handle non-linearity. Finally, we name the tools that we compare to our new approach. We only chose tools that support mathematical functions, because one of the features of the new approach is its ability in dealing with non linearity and mathematical functions. 3.1. SMT solving A constraint solver is a decision procedure for the problems that are expressed by logical formulas. The most famous problem in this regard is the SAT problem. In the SAT problem, given a formula over Boolean variables, one should determine if there exists a valuation to the variables in such a way that evaluation of the formula is TRUE. In many problems, constraint variables are of other types, such as real, integer and bitvector. In other words, those constraints are built on other theories such as real arithmetic. This is why most of the constraint solving problems are defined as SAT Modulo Theories (SMT) problems. SMT generalizes SAT by adding equality reasoning, arithmetic, fixed-sized bitvectors, arrays, quantifiers and other first-order theories. In other words, An SMT instance is a SAT instance in which the boolean variables are replaced by predicates from some first-order theory such as real arithmetic. Accordingly, many Symbolic Execution tools use SMT solvers to solve path constraints and generate test data. Broad applications of SMT solving has led to the emergence of the SMT-LIB, “an international initiative aimed at facilitating research and development in Satisfiability Modulo Theories” (Barrett et al., 2017). Some goals of the SMT-LIB initiative are providing specifications for background theories and providing a language for expressing SMT formulas, known as SMT-LIB input/output language (Barrett et al., 2010). As mentioned in Section 2, path constraints in Symbolic Execution are quantifier-free formulas over symbolic input variables. Also, we focus on programs with floating-point variables in this study. Therefore, among different theories in SMT, Quantifier-Free Non-linear Real Arithmetic (QF-NRA) is related to the scope of this research. Unfortunately, not many SMT solvers support this theory. Moreover, those that support QF-NRA, not necessarily support the use of predefined mathematical functions. 3.2. Constraint solvers 3.2.1. Yices 2 Yices (Dutertre, 2014) is one of the tools that have been used in Symbolic Execution studies. This tool has been developed in SRI International and is free for non-commercial uses. Yices is an SMT that checks the satisfiability of formulas and supports equality, real and integer arithmetic, bitvectors and different types of scalars and tuples. The second version of this tool supports both linear and non-linear arithmetic. Unfortunately, Yices 2 does not support mathematical functions which are an essential part of this research. Yices has been integrated to the Symbolic

Pathfinder (Pa˘ sa˘ reanu and Rungta, 2010) which is a well-known Symbolic Execution framework, developed by NASA. 3.2.2. dReal dReal (Gao et al., 2013) is an automated reasoning tool. Its attention is on solving problems that can be expressed as first-order logic formulas over real variables. In particular, its advantage over other SMT solvers is its ability in handling problems that involve non-linear real functions. dReal supports trigonometric, logarithmic and exponential functions. dReal implements the framework of δ -complete decision procedures. dReal returns “unsat” or “δ -sat” on input formulas, where δ , specified by the user, is a rational number that represents the precision. When the answer is “unsat”, dReal produces a proof of unsatisfiability; when “δ -sat” is returned, dReal provides a solution in a way that a δ -perturbed form of the formula is satisfied. dReal has been integrated into Symbolic Pathfinder, specifically for solving constraints over floatingpoint variables and constraints that have non-linear real functions. 3.2.3. Z3 Z3 (de Moura and Bjørner, 2008) is another SMT solver, developed by Microsoft Research. Z3 supports linear real and integer arithmetic, fixed-size bitvectors, uninterpreted functions, arrays and quantifiers. Recently, by using nlsat (Jovanovic´ and de Moura, 2012), Z3 has become able to deal with non-linearity, but it still does not support constraints that have non-linear real functions. Z3 has been used in PEX (Tillmann and de Halleux, 2008), an automatic white-box test data generation tool for the.NET framework, that implements symbolic execution. PEX uses Z3 to reason about the feasibility of execution paths. 3.2.4. CVC4 CVC4 (Barrett et al., 2011) is an automatic theorem prover for SMT. CVC4 currently supports equality, real and integer linear arithmetic, bit-vectors, arrays, tuples, records, user-defined inductive datatypes, strings, finite sets and separation logic. In addition, CVC4 supports quantifiers through heuristic instantiation. CVC4 has also been used in Symbolic Pathfinder. The support of non-linear arithmetic is currently very weak in CVC4. In the current version of CVC4, non-linearity is handled by abstracting monomials as unique new variables. However, it is planned to implement nlsat in CVC4 to make it more powerful with respect to handling constraints built on non-linear arithmetic. 3.2.5. CORAL CORAL (Souza et al., 2011) is a constraint solver which relies on metaheuristic algorithms. It has specifically been developed for solving constraints that have floating-point variables and mathematical functions. CORAL looks at constraint solving as a search problem. In CORAL, two search strategies are provided: Random Search and Particle Swarm Optimization (PSO). It has been demonstrated that PSO significantly outperforms random search in CORAL (Souza et al., 2011). CORAL has been integrated into Symbolic Pathfinder alongside dReal to empower this tool in dealing with constraints that have non-linear real functions. 3.2.6. CHOCO CHOCO (Prud’homme et al., 2017) is a Free Java library for Constraint Programming. The user models its problem in a declarative way by stating the set of constraints that need to be satisfied. Then, the problem is solved by alternating constraint filtering algorithms with a search mechanism. CHOCO was initially developed in 1999 but changed significantly in 2003, when it was implemented in Java programming language and became more portable and learnable. Also, CHOCO

S. Amiri-Chimeh and H. Haghighi / The Journal of Systems and Software 157 (2019) 110383

5

Fig. 5. The sample source code.

has been integrated into Symbolic Pathfinder and supports constraints over real, integer, Boolean and set variables. It is also capable of dealing with non-linear real functions by utilizing IBEX,1 a C++ library for constraint processing over real numbers. CHOCO solves constraints by first removing some of the values that don’t satisfy the constraint by running a constraint filtering algorithm on the constraint. Then, a search algorithm explores the solution space, considering the found unsatisfying values. This process is repeated until a satisfying value is found or the unsatisfiability of the constraint is proven. 3.3. Selected competitors Among the mentioned constraint solvers, only dReal, CORAL and CHOCO support mathematical functions. Therefore, these three solvers were selected to be compared with our new approach to constraint solving. Fig. 6. The control flow graph of the program in Fig. 5.

4. Smooth Modeling In this section, we explain Smooth Modeling and its application in test data generation. Initially, we present an example to give an overview of this method. Later, we will describe the algorithms used in Smooth Modeling. Lastly, we describe how this method can generate test data for programs with more complicated arithmetic constraints. 4.1. Example Here, we show how Smooth Modeling works and point out its similarities and differences to the classic symbolic execution. Consider the presented program in Fig. 5. This program has two variables and includes conditional statements that have polynomial expressions. Assume that the test designer is suspicious about some execution paths, and therefore, we need to generate test data for those execution paths. 4.1.1. Symbolic Execution Fig. 6 depicts the control flow graph of the program in Fig. 5. Our goal is to generate test data for the ABDE execution path. If we symbolically execute the source code in Fig. 6, we obtain the following constraint for the ABDE execution path:

( x2 + y2 ≤ 9 ) ∧ ( x2 + y2 < 4 )

(2)

4.1.2. Test data generation In classic symbolic execution, generated constraints are solved by a constraint solver. That way, actual values that satisfy the constraints and cause the execution of the desired paths are generated. As said before, constraint solvers are challenged when requested to solve constraints that have non-linear arithmetic. In Smooth Modeling, we do not use current constraint solving techniques; instead, we generate a function called satisfaction function from the path constraint, which is then used for test data generation. 1

Available at http://www.ibex-lib.org/.

Consider the arbitrary constraint C. Also, let DC be the intersection of the domains of the mathematical functions in C. The satisfaction function of the constraint C has the following properties: • •

It is a function from DC to {0, 1} It returns 1 for input values that satisfy its corresponding path constraint and returns 0 for input values that don’t satisfy its corresponding path constraint.

As said before, we can look at the satisfaction function of C as the probability distribution of its satisfaction over DC . Having this in mind, it is possible to find values that satisfy a path constraint when trying to maximize its corresponding satisfaction function. Since any member of DC either satisfies C or doesn’t satisfy it, the mentioned distribution is going to be a piece-wise function which its range is the set of {0, 1}. Finding the maximum of such a discrete non-differentiable function is hard. Therefore, we maximize a continuous smooth approximation of the satisfaction function to generate values that satisfy C. An approximative satisfaction function of C returns values nearly equal to 1 for input values that satisfy C. Similarly, it return values nearly equal to 0 for input values that don’t satisfy C. Accordingly, the approximative satisfaction function of C is from DC to [0,1]. Later in this section, we present an algorithm called GenerateSatFunction, which automatically generates the satisfaction function and approximative satisfaction function from a given constraint. For example, if we use GenerateSatFunction on (x2 + y2 ≤ 9 ) ∧ (x2 + y2 < 4 ), we get the approximative satisfaction function that is shown in Formula 3. The surface of this function is depicted in Fig. 7.





1 2 + 3 ∗ y2 − 12 ) + 1 ) ( exp ( 3 ∗ x   1 (exp(3 ∗ x2 + 3 ∗ y2 − 27 ) + 1 )

(3)

As can be seen, the maximum of this function happens at (0,0). Accordingly, by setting x = 0 and y = 0, we can satisfy (x2 +

6

S. Amiri-Chimeh and H. Haghighi / The Journal of Systems and Software 157 (2019) 110383

Fig. 8. The graph of Sigmoid for three different values of c. Fig. 7. The surface of the function in Formula 3, which is an approximative satisfaction function of (x2 + y2 ≤ 9 ) ∧ (x2 + y2 < 4 ).

y2 ≤ 9 ) ∧ (x2 + y2 < 4 ). Therefore, by maximizing this approximative satisfaction function, we could generate a test data that executes the ABDE path of Fig. 6. 4.2. Algorithms In this section, we present the GenerateSatFunction algorithm for generating the satisfaction function of a path constraint and obtaining a smooth approximation of it. Initially, we present an algorithm for generating the satisfaction functions from arithmetic conditions. This algorithm, called Condition2SatFunction, is an important module of GenerateSatFunction. By an arithmetic condition, we mean an expression that only consists of arithmetic, equality and inequality operators. For example, consider the constraint of x2 − y < 5. This constraint is a simple arithmetic condition because it only has some arithmetic operators and an inequality. Then, we present an algorithm called CombineSatFunctions which combines satisfaction functions based on the Boolean operators between constraints. This algorithm is another important module of GenerateSatFunction. Finally, we explain how to use all of the presented algorithms to generate the satisfaction function of a given path constraint automatically and present the pseudo-code of GenerateSatFunction. It must be noted that we have assumed all constraints that include ≤ , ≥ or = have been rewritten using the following rules: • • •

( a = b ) → ¬ ( a = b ) (a ≥ b ) → (a > b ) ∨ (a = b ) (a ≤ b ) → (a < b ) ∨ (a = b )

4.2.1. Mapping arithmetic conditions First, we explain how to map arithmetic conditions to their corresponding satisfaction functions. Then, we present some examples. Finally, we present the pseudo-code of this mapping. To generate the satisfaction function of an arithmetic inequality, we use the Step function which is from R to {0, 0.5, 1}. The Step function returns 0 for negative values and 1 for positive values. Depending on the value of Step in 0, there are different variants of the function. We use Heaviside Step which returns 0.5 for 0. Accordingly, we use step as defined in Eq. (4).



Step(X ) =

0 x<0 0.5 x = 0 1 x>0

(4)

Now, we can say that any x ∈ R satisfies x > 0 if and only if Step(x ) = 1. Similarly, we can say that x satisfies x < 0 if and only if Step(−x ) = 1. This way, Step can be used as the satisfaction function of arithmetic inequality. In general, if we assume A and B to be two arithmetic expressions, Step(B − A ) is the satisfaction function of A < B and Step(A − B ) is the satisfaction function of A > B. As for the approximative satisfaction function of the arithmetic inequality, we use the Sigmoid function. The Sigmoid function is an S-shaped function that can be considered as a smooth approximation of Step. We use Sigmoid as it is defined in Eq. (5).

Sigmoid (x ) =

1 exp(c ∗ x ) = 1 + exp(−c ∗ x ) 1 + exp(c ∗ x )

(5)

where c is a scalar that controls the approximation. Greater values for c make Sigmoid a more accurate approximation of Step. Based on our initial experiments, we set c = 1 by default. Fig. 8 depicts the graph of Sigmoid for three different values of c. To generate the satisfaction function of arithmetic equality, we use the Zero function which is again from R to {0, 1}. The Zero function only returns 1 for 0 and returns 0 otherwise. The definition of Zero can be seen in Eq. (6).



Zero(X ) =

1 x=0 0 x = 0

(6)

We can say that any x ∈ R satisfies x = 0 if and only if Zero(x ) = 1. This way, Zero can be the satisfaction function of the arithmetic equality. In general, if we assume A and B to be two arithmetic expressions, Zero(A − B ) is the satisfaction function of A = B. As for the approximative satisfaction function of the arithmetic equality, we use the Gaussian function. The Gaussian function can be considered as a smooth approximation of Zero. We use the Gaussian function as it is defined in Eq. (7).

Gaussian(x ) = exp(−

(c ∗ x )2 2

)

(7)

where c is a scalar that controls the approximation. Greater values for c make Gaussian a more accurate approximation of Zero. Based on our initial experiments, we set c = 0.1 by default. Fig. 9 depicts the graph of Gaussian for three different values of c. Table 2 summarizes the usage of Gaussian and Sigmoid in generating approximative satisfaction functions. Here, we present two examples to show how Sigmoid and Gaussian can be used to make satisfaction functions of arithmetic conditions. Example1: Assume that we want to generate the satisfaction function of x2 < 4. As mentioned in Table 2, we can use Step(4 −

S. Amiri-Chimeh and H. Haghighi / The Journal of Systems and Software 157 (2019) 110383

7

Table 2 Using Gaussian and Sigmoid for generating approximative satisfaction functions . Arithmetic condition

Conditional expression

Satisfaction function Exact

Approximative

Equal to Less than Greater than

A (x ) = B (x ) A(x) < B(x) A(x) > B(x)

Zero(A(X ) − B(x )) Step(B(x ) − A(x )) Step(A(x ) − B(x ))

Gaussian(A(X ) − B(x )) Sigmoid (B(x ) − A(x )) Sigmoid (A(x ) − B(x ))

Fig. 11 depicts the graph of this approximative satisfaction function. As can be seen in Fig. 11, the maximum of Gaussian(x2 + y2 − 9 ) happens on a circle with radius of 3 and center of (0,0). Every point on this circle also satisfy the corresponding arithmetic equality condition. Condition2SatFunction in Algorithm 1 is a simple implementaAlgorithm 1: Condition2SatFunction - Generating the approximative satisfaction function string for a given arithmetic condition. input : The arithmetic condition operator and its left-side and right-side expressions as strings output: The corresponding approximative satisfaction function of the given arithmetic condition as a string 1 2

Fig. 9. The graph of Gaussian for three different values of c.

3 4 5 6 7 8 9

if operator is ’ = ’ then return ’Gaussian (’+ left + ’ − ’+ right + ’)’ else if operator is ’<’ then return ’Sigmoid (’+ right + ’ − ’+ left + ’)’ else return ’Sigmoid (’+ left + ’ − ’+ right + ’)’ end end

tion of Table 2. This algorithm 1 generates the approximative satisfaction function of a given arithmetic condition. It can be modified to generate satisfaction function by simply replacing “Gaussian” with “Zero” and “Sigmoid” with “Step”. 4.2.2. Mapping boolean operators Assume that we have two arbitrary constraints, called C1 and C2. Also, let SF1 and SF2 be the satisfaction functions of these constraints, respectively. Now, we want to find the satisfaction functions of the following constraints: •

Fig. 10. Satisfaction function and approximative satisfaction function of x < 4. 2

• •

x2 ) as the satisfaction function of this arithmetic condition. Additionally, we can use Sigmoid (4 − x2 ) = 1/(1 + exp(−(4 − x2 ))) as its approximative satisfaction function. Fig. 10 depicts the graph of these two functions. As can be seen in Fig. 10, Step(4 − x2 ) returns 1 if x ∈ [−2, 2]; otherwise, it returns 0. Also, we can see how Sigmoid (4 − x2 ) is approximating Step(4 − x2 ). Accordingly, by maximising Sigmoid (4 − x2 ), which happens at 0, we can find an input value that satisfies the corresponding arithmetic condition, i.e., x2 < 4. Example2: Assume that we want to generate the satisfaction function of the constraint x2 + y2 = 9. As mentioned in Table 2, we can use Zero(x2 + y2 − 9 ) as the satisfaction function of this arithmetic condition. Additionally, we can use Gaussian(x2 + y2 − 9 ) = exp(−(x2

+

y2

2

− 9 ) /2 ) as its approximative satisfaction function.

C1∧C2 C1∨C2 ¬C1

First, we explain how to obtain the satisfaction function of such constraints. Then, we demonstrate a few examples. Finally, we present the pseudo-code of CombineSatFunctions that combines satisfaction functions based on the Boolean operators between the corresponding constraints. Before getting into details, we need to set a few definitions. Let’s assume that the input space of C1 is called D1. Also, assume that the set of all values that satisfy C1 is called S1, and the corresponding satisfaction function of C1 is SF1. Similarly, we can define D2, S2, and SF2 for C2. Referring to the concept of satisfaction functions, we can say that:

SF 1(x ) = 1 ⇔ x ∈ S1 SF 1(x ) = 0 ⇔ x ∈ D1 − S1 SF 2(x ) = 1 ⇔ x ∈ S2 SF 2(x ) = 0 ⇔ x ∈ D2 − S2

(8)

8

S. Amiri-Chimeh and H. Haghighi / The Journal of Systems and Software 157 (2019) 110383

Fig. 11. Approximative satisfaction function of constraint x2 + y2 = 9.

Table 3 Combination of satisfaction functions based on different boolean operators. Boolean operator

Constraint

Satisfaction function

Conjunction Disjunction Negation

C 1∧C 2 C 1∨C 2 ¬C1

SF1∗ SF2 SF 1 + SF 2 − SF 1 ∗ SF 2 1 − SF 1

function, which is a simple implementation of Table 3, is presented in Algorithm 2. Algorithm 2: CombineSatFunctions - Combining left and right satisfaction functions to generate new satisfaction functions. input : Satisfaction functions of two constraints and a Boolean operator output: Corresponding satisfaction function of the constraint that is created from the application of the given Boolean operator on the corresponding constraints of the input satisfaction functions.

As mentioned before, we can look at a satisfaction function as the probability distribution of its corresponding constraint’s satisfaction over its input space. In other words, we can say that:

P (x ∈ S1 ) = SF 1(x ) P (x ∈ S2 ) = SF 2(x )

(9)

Conjunction: We know that a value such as x ∈ D1 ∩ D2 satisfies C1∧C2 if and only if x ∈ S1 ∩ S2. As a result, we can say that the probability of x satisfying C1∧C2 is P(x ∈ S1 ∩ S2). Also, we have:

P (x ∈ S1 ∩ S2 ) = P (x ∈ S1 ) ∗ P (x ∈ S2 ) = SF 1(x ) ∗ SF 2(x )

This means that x satisfies C1∨C2 if and only SF 2(x ) − SF 1(x ) ∗ SF 2(x ) = 1. Therefore, we can say SF 2 − SF 1 ∗ SF 2 is the satisfaction function of C1∨C2. Negation: A value such as x ∈ D1 satisfies ¬C1 if x ∈ S1. As a result, we can say that the probability of ¬C1 is P(x ∈ S1). Also, we have:

P (x ∈ / S1 ) = 1 − P (x ∈ S1 ) = 1 − SF 1(x )

2 3 4 5

6

(10)

7 8

This means that x satisfies C1∧C2 if and only if SF 1(x ) ∗ SF 2(x ) = 1. Therefore, we can say that SF1∗ SF2 is the satisfaction function of C1∧C2. Disjunction: We know that a value such as x ∈ D1 ∩ D2 satisfies C1∨C2 if and only if x ∈ S1 ∪ S2. As a result, we can say that the probability of x satisfying C1∨C2 is P(x ∈ S1 ∪ S2). Also, we have:

P (x ∈ S1 ∪ S2 ) = P (x ∈ S1 ) + P (x ∈ S2 ) − P (x ∈ S1 ∩ S2 ) = SF 1(x ) + SF 2(x ) − SF 1(x ) ∗ SF 2(x )

if operator is ’∧’ then return leftSatFunc +’*’+ rightSatFunc else if operator is ’∨’ then return leftSatFunc +’+’+ rightSatFunc +’−’+ leftSatFunc +’*’+ rightSatFunc else return ’1−’+ leftSatFunc end end

1

(11) if SF 1(x ) + that SF 1 +

9

As an example, consider the constraint in Formula (13).

(x < 4 ) ∨ ( (x − 4 )2 < 1 ) 2

Using Condition2SatFunction, we can say that the approximative satisfaction function of x2 < 4 is:

Sigmoid (4 − x2 ) =

1 1 + exp(−(4 − x2 ))

(12)

This means that x satisfies ¬C1 if and only if 1 − SF 1(x ) = 1. Therefore, considering the concept of satisfaction functions, we can say that 1 − SF 1 is the satisfaction function of ¬C1. Table 3 summarizes the combination of satisfaction functions based on different boolean operators. The CombineSatFunctions

(14) 2

Also, the approximative satisfaction function of (x − 4 ) < 1 is:

Sigmoid (1 − (x − 4 ) ) = 2

and only if x satisfying

(13)

1 1 + exp(−(1 − (x − 4 ) )) 2

(15)

Now, considering the mentioned rules in Table 3, the approxi2 mative satisfaction function of (x2 < 4 ) ∨ ( (x − 4 ) < 1 ) is:

Sigmoid (4 − x2 ) + Sigmoid (1 − (x − 4 ) ) 2 −Sigmoid (4 − x2 ) ∗ Sigmoid (1 − (x − 4 ) ) 1 1 = + 1 + exp(−(4 − x2 )) 1 + exp(−(1 − (x − 4 )2 )) 1 1 − ∗ 1 + exp(−(4 − x2 )) 1 + exp(−(1 − (x − 4 )2 )) 2

Fig. 12 depicts the graph of this function.

(16)

S. Amiri-Chimeh and H. Haghighi / The Journal of Systems and Software 157 (2019) 110383

9

Fig. 12. Approximative satisfaction function of (x2 < 4 ) ∨ ( (x − 4 ) < 1 ). 2

these small sub-trees. Then, we use CombineSatFunctions to combine these generated satisfaction functions. As a result, we gradually generate satisfaction functions for bigger sub-trees. The algorithm terminates when the satisfaction function of the whole tree is generated. This routine, called RecursivelyGenerateSatFunction, is presented in Algorithm 3: Algorithm 3: RecursivelyGenerateSatFunction - Traversing parse tree of the path constraint. input : Root of a parse tree output: Corresponding satisfaction Function of the input parse tree 1 2

3 4

Fig. 13. The parse tree of (x2 + y2 > 9 ) ∧ (cos(x ∗ y ) > 0 ) .

5

2

By analyzing the constraint of (x2 < 4 ) ∨ ( (x − 4 ) < 1 ), it is known that any value in (−2, 2 ) ∪ (3, 5 ) satisfies the constraint. Correspondingly, the value of the depicted function in Fig. 12 is near 1 in the same ranges. By finding the maximum of this function, which happens at 0, we obtain a data that satisfies the corresponding constraint. 4.2.3. Satisfaction function generation Now, we explain how to use Condition2SatFunction and CombineSatFunctions to design GenerateSatFunction, which generates the satisfaction function of any given constraint. First we need to parse the path constraint according to the operators of =, <, >, ∧, ∨, and ¬. The result is going to be a parse tree in which leaves are arithmetic expressions. Direct parents of these expressions are conditional arithmetic operators (=, <, >). Other nodes of this tree are Boolean operators (∧, ∨, ¬). As an example, consider the following constraint:

(x2 + y2 > 9 ) ∧ (cos(x ∗ y ) > 0 )

6

7 8

if operator type of root is arithmetic condition then satFunc ← Condition2SatFunction(left child of root , right child of root , operator of root) else leftSatFunc ← RecursivelyGenerateSatFunction(left child of root) rightSatFunc ← RecursivelyGenerateSatFunction(right child of root) satFunc ← CombineSatFunctions(leftSatFunc , rightSatFunc , operator of root) end return satFunc

Consequently, GenerateSatFunction must first parse the constraint and then call RecursivelyGenerateSatFunction with the root of the constraint’s parse tree as its input. Algorithm 4 presents Algorithm 4: GenerateSatFunction. input : The constraint output: The corresponding satisfaction function 1 2 3

root ← Parse(constraint) satFunc ← RecursivelyGenerateSatFunction(root) return satFunc

(17)

The parse tree of this constraint is depicted in Fig. 13. We present a recursive bottom-up approach for generating the corresponding satisfaction function from the constraint’s parse tree. We start from arithmetic conditions that are at the bottom of the tree (e.g. node B and its children in Fig. 13). We use Condition2SatFunction to generate satisfaction functions for all of

GenerateSatFunction. If we run GenerateSatFunction for the constraint given in Formula (17), which its parse tree is depicted in Fig. 13, the following steps will be taken: 1. The path constraint is parsed (Algorithm 4, line 1).

10

S. Amiri-Chimeh and H. Haghighi / The Journal of Systems and Software 157 (2019) 110383

Fig. 14. Approximative satisfaction function of (x2 + y2 > 9 ) ∧ (cos(x ∗ 7 ) > 0 ).

2. RecursivelyGenerateSatFunction is called on node A (Algorithm 4, line 2). 3. RecursivelyGenerateSatFunction is called on node B (Algorithm 3, line 4). 4. Condition2SatFunction is called on node B and its children. Consequently, Sigmoid (x2 + y2 − 9 ) is generated (Algorithm 3, line 2). 5. RecursivelyGenerateSatFunction is called on node C (Algorithm 3, line 5). 6. Condition2SatFunction is called on node C and its children. Consequently, Sigmoid(cos(x∗ y)) is generated (Algorithm 3, line 2). 7. CombineSatFunctions is called on the operator of node A and previously generated satisfaction functions. Consequently, Sigmoid (x2 + y2 − 9 ) ∗ Sigmoid (cos(x ∗ y )) is generated (Algorithm3, line 6). At the end, GenerateSatFunction generates Sigmoid (x2 + y2 − 9 ) ∗ Sigmoid (cos(x ∗ y )) as the satisfaction function of the constraint given in Formula (17). The surface of this function is depicted in Fig. 14. 5. Evaluation In this section, we evaluate Smooth Modeling. In Section 5.1, we describe the evaluation method. In Section 5.2, we investigate the performance of Smooth Modeling in solving different types of constraints. In this regard, we compare the performance of Smooth Modeling with that of three other constraint solvers. As said before, we have to use an optimization algorithm to find maximizing values for satisfaction functions. These values satisfy the corresponding path constraints and can be used as the test data. Therefore, the performance of the optimization algorithm affects the test data generation process. In Section 5.3, we present a sensitivity analysis of the optimization algorithm with respect to different values of some of Smooth Modeling parameters. 5.1. Evaluation method The method of the conducted evaluation includes the following steps: 1. Implementation: To automatically generate the approximative satisfaction function of any given path constraint, we implemented all of the algorithms that have been mentioned in

Section 4. Also, we chose an optimization algorithm for maximizing approximative satisfaction functions and generating satisfying data. For more details about the implementations, see Section 5.1.1. 2. Constraint set preparation: We needed a set of constraints to compare the performance of Smooth Modeling with other constraint solvers. Therefore, we prepared a set of 150 nonlinear real constraints with various types of mathematical functions. Also, we considered 9 real-world programs and extracted their path constraints which provided us with 25 more constraints. For more details about these two constraint sets, see Section 5.1.2. 3. Execution and data collection: We measured the performance of Smooth Modeling in solving the prepared constraints. The performance of dReal, CHOCO and CORAL were also measured for the same constraints. Section 5.1.3 provides more details about how we used these tools. We considered two metrics as the indicators of performance: (a) Average time needed for solving the constraints. The average was calculated over all constraints that the solver could successfully solve. (b) Percentage of constraints that have been solved. We used a machine with an Intel Core i5-4200M processor (2.50GHz) and 6GB of RAM for executing Smooth Modeling, dReal, CHOCO and CORAL on the prepared constraint sets. 4. Analysis: Based on the obtained results, we compared the performance of the three mentioned tools with that of Smooth Modeling. The source-codes of all the implementations and the prepared constraint sets can be found at https://github.com/SACHAM0RA/ Smooth-Modeling. 5.1.1. Implementation To generate the satisfaction functions of any given path constraint, we implemented all of the algorithms that have been presented in Section 4. We used the C# language and the.Net Framework to do this task. Another key task was to choose an optimization algorithm to maximize the approximative satisfaction functions globally, and consequently, generate the data that satisfies the related constraints. At first, we chose the Newton’s method. The Newton’s method assumes that the function can be locally approximated as a quadratic in the region around the optimum, and uses the first and

S. Amiri-Chimeh and H. Haghighi / The Journal of Systems and Software 157 (2019) 110383 Table 4 Some sample constraints from the categorizedSet . Constraints cos(x2 ) ∗ sin(y + z ) == 0.5 x ∗ 2(y−z ) == 202 2 2 ( x+y ) > ( y + z ) cos(x ∗ y ) + sin(y ∗ z ) > 0.34 x2 ∗ y3 + x3 == 25 x 2 20 log(x√ )∗20 + y == √ √ 4 ∗ x + y ∗ z > 75



5.1.2. Constraint set We investigated the constraint sets of other similar studies (Souza et al., 2011; Gao et al., 2013). We recognized that almost all of the mathematical functions that were used in these studies could be categorized into trigonometric, polynomial, exponential, square root, and logarithmic functions. Accordingly, we designed a set of 150 constraints for our experiment. This set of constraints includes trigonometric, polynomial, exponential, square root, and logarithmic functions. We tried to make this constraint set as fair as possible; hence, the size of all categories was considered the same. Also, the number of equality and inequality constraints has been the same in each category. It should be noted that these constraints don’t have boolean operators and are all single-term constraints. We named this constraint set as categorizedSet. Some of the constraints of this set are presented in Table 4. Moreover, we considered 9 real-world programs and extracted 25 path constraints from them. It should be noted that we used the 0-1-more approach (Ammann and Offutt, 2008) to extract path constraints from the programs that had loops. We named the set of these 25 constraints as programsSet. The considered programs are mentioned in Table 5. It is worth mentioning that we collected programsSet for the following reasons: •



The set categorizedSet only included clauses (i.e., constraints with no logical connective). We wanted to see how our approach performs on constraints that have multiple clauses and different boolean operators. All constraints in categorizedSet are non-linear. We wanted to see how our approach performs on combinations of linear and non-linear constraints.

We wanted to see if the results from categorizedSet are supported by the results from programsSet.

5.1.3. Execution and data collection We measured the performance of Smooth Modeling, dReal, CHOCO and CORAL in solving the constraints of both categorizedSet and programsSet. Here we mention some details on how we configured the tools and assessed their operation. Constraint representation: Some of the tools required the input constraints to be expressed in a certain way: •

second derivatives to find the stationary point (where the gradient is 0). In higher dimensions, the Newton’s method uses the gradient and the Hessian matrix of the second derivatives of the objective function. The Newton method greatly exploits the smoothness of the objective function and seems to be an appropriate choice for maximizing approximative satisfaction functions. But our initial experiments showed that computing the Hessian matrix of an approximative satisfaction function is an expensive task. Therefore, we decided to use the quasi-Newton algorithm instead. The quasiNewton algorithm is an alternative to the Newton’s method when Jacobian or Hessian is unavailable or is too expensive to compute at every iteration. In quasi-Newton methods the Hessian matrix does not need to be computed; instead, the Hessian is updated by analyzing successive gradient vectors. It must be noted that we used a multi-start approach to avoid reporting local maximums as false positives. As for the termination criteria of the optimization algorithm, we considered the satisfaction of the corresponding constraint, in addition to the first order optimality conditions and a timeout that was set to 5 s. We used MATLAB R2017a to implement the optimization algorithm.

11







In test data generation using Smooth Modeling, the optimizer acts as the test data generator. The input of the optimizer, is the approximative satisfaction function of the given constraint. As mentioned before, the implementation of the algorithms presented in Section 4 automated the generation of approximative satisfaction functions. dReal is an SMT solving tool; therefore, we created a smt2 file (Barrett et al., 2017) for each constraint. Constraints had to be written as prefix expressions in smt2 files. CORAL has a special input language2 which every constraint had to be translated into. CHOCO is a java library for constraint programming; thus, the constraints didn’t need any specific preparation. Common infix expression of the constraints would suffice.

We ignored the overhead of converting constraints into their related representations. Therefore, the results presented in Section 5.2 are only constraint solving times for each tool. As an example, Table 6 shows the representation of cos(x(1)∗ x(2)) > 0 for every tool. Solution precision: Due to the hardware limitations of the floating-point representations, it is impossible to find theoreticallyaccurate solutions for all constraints. This matter is more crucial regarding the equality constraints. As an example, consider √ the constraint of x2 = 2. One solution to this constraint is x = 2. It is clear that this √ solution cannot be accurately represented by the hardware since 2 has infinite decimals. Therefore, every constraint solver that deals with such constraints allows users to set a precision parameter. We configured all of the mentioned tools to represent solutions with the precision of 10 decimal places. Satisfaction definition: An important issue regarding the equality constraints is determining when a generated solution is close enough to the actual answer. This is important because most of the constraint solving techniques have an iterative nature. For example, consider the constraint of sin(x ) = 0.5. If we set x1 = 0.52335672, then sin(x1 ) = 0.49979035905. Also, if we set x2 = 0.5235987755, then sin(x2 ) = 0.49999999999. Both x1 and x2 are very close to the actual answer which is π /6; however, x2 seems to be a better one. This means that we had to define a tolerance in which the solutions are considered as satisfying. In this experiment, we determined that a set of assignments like X satisfies A(X ) = B(X ) if |A(X  )B(X  )| < 1e − 2. Solutions: Another important concern is about how the tools report their solutions. If a constraint has n variables, then every solution of Smooth Modeling and CORAL would consist of n assignments, each for one of the constraint variables. We could simply check these assignments to see if they satisfy the constraint. Differently, solutions of CHOCO and dReal would have n intervals, each in correspondence to one of the variables. In this study, our ultimate objective was to find concrete values that satisfy the constraint; therefore, for every solution of CHOCO and dReal, we chose a random value from every interval and checked if the resulting set of assignments satisfied the constraint. 2

Available at http://pan.cin.ufpe.br/coral/InputLanguage.html.

12

S. Amiri-Chimeh and H. Haghighi / The Journal of Systems and Software 157 (2019) 110383 Table 5 Programs that were used in the preparation of programsSet . Program name

Description

squareRoot pointIsOnLine inscribedCircleAreaLessThan countIntegers polarOnPoly quadEqRoots circleIntersect triangleType areaOfIncircleLessThan

Babylonian method for square root Check if the given point lies on the given line Check if the area of the inscribed circle in a hexagon is less than a value Count integers in a range which are divisible by their euler totient value Given a point in the polar coordinates, check if it is on the curve of x2 Calculate the roots of a quadratic equation Check if two circles touch each other or not Classify a triangle, given its sides Check if the area of the incircle of a triangle is less than a value

Fig. 15. Size of the intersections of solved constraints of each tool.

Table 6 Representation of cos(x(1)∗ x(2)) > 0 for every tool . Tool

Constraint representation

Smooth Modeling dReal CORAL CHOCO

sigmoid(cos(x(1)∗ x(2))) ( > (cos(∗ x1x2))0.0) DGT (SIN( MUL(DVAR(ID1 ), DVAR(ID2 ))), DCONST (0 )) cos(x1∗ x2) > 0

Table 7 The number of constraints from categorizedSet that a solver could solve and another solver could not solve .

Smooth Modeling CHOCO dReal CORAL

Smooth Modeling

CHOCO

dReal

CORAL

– 10 10 11

39 – 18 39

36 14 – 32

22 21 17 –

5.2. Evaluation results •

Here, we report the collected experimental results.

• •

5.2.1. Number of solved constraints From all 150 constraints of categorizedSet, Smooth Modeling generated satisfying assignments for 138 constraints (92%). In the second place, CORAL generated satisfying assignments for 127 constraints (84%). In the third place, dReal satisfied 113 constraints (75%). Lastly, CHOCO generated 109 satisfying assignments (72%). Among 25 constraints of programsSet, Smooth Modeling and CORAL both solved 23 constraints (92%). Also, dReal and CHOCO solved 22 (88%) and 21 (84%) constraints, respectively. Therefore, it can be inferred that the obtained results from programsSet are consistent with those of categorizedSet. Fig. 15 shows the number of constraints solved by each tool and each combination of different tools. Considering this figure, the following points are worth mentioning:

There were 69 constraints (46%) that all tools solved. For every constraint there was at least one tool that solved it. Smooth Modeling and CORAL exclusively solved 2 constraints, where dReal and CHOCO did not solve any constraints exclusively.

For a better one-to-one comparison, Table 7 shows the number of constraints from categorizedSet that a solver could solve and an other solver could not. For example, there were 39 constraints that Smooth Modeling solved but CHOCO could not solve. As can be seen in this table, Smooth Modeling outperforms in every comparison. Table 8 presents the number of constraints that each tool solved for each constraint type. The highlights of the presented data are as follows: •

Out of 30 polynomial constraints, CORAL solved 29 constraints and was more successful than the other tools with respect to

S. Amiri-Chimeh and H. Haghighi / The Journal of Systems and Software 157 (2019) 110383

13

Table 8 The number of constraints that each tool solved for each constraint type .

Polynomial Trigonometric Exponential Square Root Logarithmic Total









Smooth Modeling

CHOCO

dReal

CORAL

28 30 25 26 29 138

26 18 25 21 19 109

22 17 27 25 22 113

29 18 23 28 29 127

this constraint type. Smooth Modeling was very close to CORAL by solving 28 polynomial constraints. For trigonometric constraints, Smooth Modeling drastically outperformed other tools by solving all 30 trigonometric constraints. For exponential constraints, dReal solved 27 constraints. Smooth Modeling and CHOCO gained the second place by solving 25 constraints. For square root constraints, CORAL solved 28 constraints and gained the first place. Smooth Modeling gained the second place by solving 26 constraints. For logarithmic constraints, Smooth Modeling and CORAL both solved 29 constraints and performed better than the others.

These results show that our approach was overall more successful in satisfying the selected constraints. In particular, Smooth Modeling did very well in solving trigonometric constraints. Moreover, we tried to analyze the 12 constraints that Smooth Modeling could not solve. The main reasons why Smooth Modeling could not solve these constraints are summarized as follows: •





The method was not successful in finding a real initial solution. Thus, the optimization algorithm never actually started (4 constraints). The optimization algorithm repeatedly stuck in local maximums of the given approximative satisfaction function (2 constraints). Initial solutions were always located in flat areas of the given approximative satisfaction function. Therefore, the optimization algorithm always terminated after a few iterations and reported an incorrect solution from these areas (6 constraints).

We proposed some possible solutions for these issues in Section 6. 5.2.2. Time Table 9 shows how much time each tool required in order to generate satisfying data for the solved constraints from categorizedSet. It must be noted that for every constraint, every solver was executed 10 times and the average time was recorded. As can be seen, there is little difference between the performance of dReal and Smooth Modeling, and Smooth Modeling was slightly slower. To be more precise, Smooth Modeling was 1.81 times slower than dReal. CHOCO and CORAL were much slower than dReal and Smooth Modeling. CHOCO and CORAL were 13.02 times and 39.21 times slower than Smooth Modeling, respectively. Regarding the constraints from programsSet, the average required time for Smooth Modeling, dReal, CHOCO, and CORAL were respectively 0.0095, 0.0145, 0.2928, and 0.7576 s. The only major difference between these results and those obtained from categorizedSet is that Smooth Modeling was slightly faster than dReal. Considering the constraint categories, Table 10 shows how much time each tool required in order to generate satisfying data for the solved constraints of each category. As can be seen in this table, Smooth Modeling and dReal were drastically faster than the others in every category. dReal was slightly faster than Smooth Modeling for polynomial, square root,

Fig. 16. The relationship between the number of solved constraints and the number of iterations.

and logarithmic constraints. On the other hand, Smooth Modeling was faster in trigonometric and exponential categories. 5.3. Sensitivity analysis In this section, we study the effect of some parameters on the performance of Smooth Modeling. 5.3.1. Impact of the number of iterations on constraint satisfaction Fig. 16 demonstrates the relationship between the number of solved constraints and the number of iterations of the optimization process. The number of iterations is the sum of two parts. The first part is the number of iterations that were taken to find a proper initial solution for the optimization. The second part is the number of iterations that the quasi-newton method took to optimize the approximative satisfaction function. As can be inferred from Fig. 16, more than 90% (135) of the constraints were solved after 41 iterations. Therefore, if it is required to set a boundary on the number of iterations, 41 seems to be the appropriate choice according to the conducted experiment. 5.3.2. Impact of Sigmoid and Gaussian formulas on constraint satisfaction As mentioned in Section 4, Sigmoid and Gaussian are the building blocks of the approximative satisfaction functions. Eqs. (5) and (7) show the definition of these functions as we used in the experiment. In both of the formulas, there is a scalar c that controls the approximation level. To distinguish c in Sigmoid and Gaussian, we use two separate parameters named csigmoid and cgaussian . If we set csigmoid = 0, then

Sigmoid (x ) =

1 = 0.5 1 + exp(−x ∗ 0 )

(18)

Now, what happens if we set csigmoid to very big values? We can say that for very big values of csigmoid ,

Sigmoid (x )

1 1 + exp(−c ∗ x ) 0 x<0 = 0.5 x = 0 1 x>0 = Step(x ) = lim



c→∞

(19)

As can be seen, greater values for csigmoid make Sigmoid a more accurate approximation of the Step function. On the other hand, smaller values for csigmoid make Sigmoid more like a constant function.

14

S. Amiri-Chimeh and H. Haghighi / The Journal of Systems and Software 157 (2019) 110383 Table 9 The required time to solve constraints by each tool . Tool

Number of solved constraints

Total time

Average time

dReal Smooth Modeling CHOCO CORAL

113 138 109 127

1.34 2.96 30.39 106.58

0.0118 0.0214 0.2788 0.8392

Fig. 17. Curves of two approximative satisfaction functions of (x2 > 9 ) ∧ ( (x − 2 ) < 81 ) that have different values of csigmoid . 2

Table 10 The time each tool required in order to generate satisfying data for the solved constraints of each category .

Polynomial Trigonometric Exponential Square Root Logarithmic

Smooth Modeling

CHOCO

dReal

CORAL

0.011936 0.008803 0.009520 0.03481 0.034

0.280473 0.280444 0.2885 0.2754 0.266005

0.011909 0.014352 0.012111 0.0114 0.010227

0.826206 0.812777 0.769130 0.881071 0.883793

The same can be said about Gaussian. If we set cgaussian = 0, then



− ( 0 ∗ x )2 Gaussian(x ) = exp 2



=1

(20)

For very big values of cgaussian ,

Gaussian(x ) − ( c ∗ x )2 = lim exp( ) c→∞ 2  1 x=0 = 0 x = 0 = Zero(x )

(21)

Greater values for cgaussian make Gaussian a more accurate approximation of the Zero function. On the other hand, smaller values for cgaussian make the Gaussian more like a constant function. As an example, Fig. 17 shows the curves of two approximative 2 satisfaction functions of (x2 > 9 ) ∧ ( (x − 2 ) < 81 ) that have different values of csigmoid . The presented data in Sections 5.2 were obtained with csigmoid equal to 1 and cgaussian equal to 0.1. However, our further experiments showed that these values have a great impact on the performance of Smooth Modeling. For example, from 12 constraints that Smooth Modeling could not solve when cgaussian was set to 0.1, we solved 6 constraints only by changing the value of cgaussian to 0.01. On the other hand, setting cgaussian to 0.01 caused Smooth Modeling to report incorrect solutions for some of the before-satisfied constraints.

In general, greater values of csigmoid and cgaussian make the approximative satisfaction functions less approximative (i.e., more accurate modelings of the satisfying areas of the input space). On the other hand, lower values make the optimization easier for the chosen optimization algorithm and lead to shorter solving times. Therefore, when setting the value of these parameters, we should take this trade-off into consideration. Further studies on how to choose the optimal value for csigmoid and cgaussian are required in future works.

6. Conclusion and future works Symbolic Execution generates path constraints for programs under test. These constraints are solved by constraint solvers and the solutions are used as the test data for the related execution paths. In this process, solving non-linear constraints is challenging and problematic. This paper proposes a new approach, called Smooth Modeling, for dealing with constraints that involve nonlinear arithmetic and have mathematical functions. The core idea of Smooth Modeling is to model every constraint as a (mostly) smooth function, called approximative satisfaction function. These functions are defined on the domain of the corresponding constraints and have the range of [0, 1]. Approximative satisfaction functions return values near 1 for inputs that satisfy their corresponding constraints. Similarly, they return values near 0 for inputs that do not satisfy their corresponding constraints. Therefore, maximizing an approximative satisfaction function would lead to the generation of data that satisfy its corresponding constraint. The conducted experiments showed that Smooth Modeling could satisfy constraints that have trigonometric, exponential, square root and logarithmic functions on floating-point variables. The performance of Smooth Modeling was comparable to wellknown constraint solvers regarding speed. Also, the Smooth Modeling outperformed other solvers regarding the percentage of solved constraints. In future works, we plan to address the following concerns:

S. Amiri-Chimeh and H. Haghighi / The Journal of Systems and Software 157 (2019) 110383

Fig. 18. abs(x) and Sigmoid (x ) ∗ x + (1 − sigmoid (x )) ∗ (−x ).







Initial solutions: The Quasi-Newton algorithm needs an initial solution to start the optimization from. Sometimes, finding proper initial solutions takes a long time since the constraints are not always defined over the whole input space. For exam√ ple, x > 10 is not defined for negative real inputs. Therefore, detecting the domain of the given constraint might speed up the generation of the initial solution. Also, using metaheuristic optimization algorithms, such as genetic algorithm and particle swarm optimization, might be beneficial, since they are not dependent on a single starting solution. Local maximums: Depending on the mathematical functions that have been used in a given constraint, the approximative satisfaction function of the constraint may have local maximums that do not necessarily satisfy the corresponding constraint. For example, the constraint of x3 − 2 ∗ x2 > 0.25 includes the mathematical function of x3 − 2 ∗ x2 . This function has a local maximum at x = 0; As a result, Sigmoid (x3 − 2 ∗ x2 − 0.25 ) has a local maximum at x = 0, which is not a satisfying data. We tried to alleviate this issue by following a multi-start optimization. However, using other global optimization algorithms, such as metaheuristic algorithms, might be more helpful. Total smoothness: Sigmoid and Gaussian are smooth functions. As a result, if and only if all the functions that are used in a constraint be smooth, it can be shown that the approximative satisfaction function of the constraint will be completely smooth, too. For example, the approximative satisfaction function of abs(x) > 15 is not differentiable at x = 0 and therefore not smooth. Solving non-smooth optimization problems is much harder than solving smooth ones. Therefore, finding a way to generate totally smooth approximative satisfaction functions may improve the performance of Smooth Modeling. One possible approach is to replace non-smooth functions in a constraint with approximative smooth versions. For example, we could replace abs(x) with Sigmoid (x ) ∗ x + (1 − sigmoid (x )) ∗ (−x ). Fig. 18 depicts these functions.

References Ammann, P., Offutt, J., 2008. Introduction to Software Testing. Cambridge University Press.

15

Anand, S., Burke, E.K., Chen, T.Y., Clark, J., Cohen, M.B., Grieskamp, W., Harman, M., Harrold, M.J., McMinn, P., Bertolino, A., Li, J.J., Zhu, H., 2013. An orchestrated survey of methodologies for automated software test case generation. J. Syst. Softw. 86 (8), 1978–2001. doi:10.1016/j.jss.2013.02.061. Barrett, C., Conway, C.L., Deters, M., Hadarean, L., Jovanovi’c, D., King, T., Reynolds, A., Tinelli, C., 2011. CVC4. In: Gopalakrishnan, G., Qadeer, S. (Eds.), Proceedings of the 23rd International Conference on Computer Aided Verification (CAV ’11). Springer, pp. 171–177. URL: http://www.cs.stanford.edu/∼barrett/pubs/BCD+11. pdf. Snowbird, Utah Barrett, C., Fontaine, P., Tinelli, C., 2017. The SMT-LIB Standard: Version 2.6. Technical Report. Department of Computer Science, The University of Iowa. Available at www.SMT-LIB.org Barrett, C., Stump, A., Tinelli, C., 2010. The SMT-LIB Standard: Version 2.0. In: Gupta, A., Kroening, D. (Eds.), Proceedings of the 8th International Workshop on Satisfiability Modulo Theories (Edinburgh, UK). Braione, P., Denaro, G., Mattavelli, A., Pezzè, M., 2017. Combining symbolic execution and search-based testing for programs with complex heap inputs. In: Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, Santa Barbara, CA, USA, July 10–14, 2017, pp. 90–101. doi:10.1145/3092703.3092715. Dutertre, B., 2014. Yices 2.2. In: Biere, A., Bloem, R. (Eds.), Computer-Aided Verification (CAV’2014). Springer, pp. 737–744. Gao, S., Kong, S., Clarke, E.M., 2013. dreal: an SMT solver for nonlinear theories over the reals. In: Bonacina, M.P. (Ed.), Automated Deduction – CADE-24. Springer, Berlin, Heidelberg, pp. 208–214. Haller, L., Griggio, A., Brain, M., Kroening, D., 2012. Deciding floating-point logic with systematic abstraction. In: Formal Methods in Computer-Aided Design, FMCAD 2012, Cambridge, UK, October 22–25, 2012, pp. 131–140. ´ D., de Moura, L., 2012. Solving non-linear arithmetic. In: Gramlich, B., Jovanovic, Miller, D., Sattler, U. (Eds.), Automated Reasoning. Springer, Berlin, Heidelberg, pp. 339–354. King, J.C., 1975. A new approach to program testing. In: ACM SIGPLAN Notices, 10. ACM, pp. 228–233. King, J.C., 1976. Symbolic execution and program testing. Commun. ACM 19 (7), 385–394. de Moura, L., Bjørner, N., 2008. Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (Eds.), Tools and Algorithms for the Construction and Analysis of Systems. Springer, Berlin, Heidelberg, pp. 337–340. Nesterov, Y., 2005. Smooth minimization of non-smooth functions. Math. Programm. 103 (1), 127–152. Prud’homme, C., Fages, J.-G., Lorca, X., 2017. Choco Documentation. TASC - LS2N CNRS UMR 6241, COSLING S.A.S.URL: http://www.choco-solver.org. Pa˘ sa˘ reanu, C.S., Rungta, N., 2010. Symbolic pathfinder: symbolic execution of java bytecode. In: Proceedings of the IEEE/ACM International Conference on Automated Software Engineering. ACM, New York, NY, USA, pp. 179–180. doi:10. 1145/1858996.1859035. Souza, M., Borges, M., d’Amorim, M., Pa˘ sa˘ reanu, C.S., 2011. Coral: solving complex constraints for symbolic pathfinder. In: Bobaru, M., Havelund, K., Holzmann, G.J., Joshi, R. (Eds.), NASA Formal Methods. Springer, Berlin, Heidelberg, pp. 359–374. Thomé, J., Shar, L.K., Bianculli, D., Briand, L.C., 2017. Search-driven string constraint solving for vulnerability detection. In: Proceedings of the 39th International Conference on Software Engineering, ICSE 2017, Buenos Aires, Argentina, May 20–28, 2017, pp. 198–208. doi:10.1109/ICSE.2017.26. Tillmann, N., de Halleux, P., 2008. Pex - white box test generation for.net. In: Proc. of Tests and Proofs (TAP’08). Springer Verlag, pp. 134–153. URL: https://www. microsoft.com/en-us/research/publication/pex-white-box-test-generation-fornet/. Zhang, J., 2008. Constraint Solving and Symbolic Execution. Springer, Berlin, Heidelberg, pp. 539–544. DOI: 10.1007/978-3-540-69149-5_59 Saeed Amiri-Chimeh received his B.S. degree in computer science from the Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran, in 2014. He received his M.Sc. degree in software engineering from the Faculty of Computer Science and Engineering at Shahid Beheshti University in 2017. He is currently a Ph.D. student in the Faculty of Computer Science and Engineering at Shahid Beheshti University. His research interests are in the areas of procedural content generation, artificial intelligence and software testing. Hassan Haghighi is associate professor at the Faculty of Computer Science and Engineering, Shahid Beheshti University, Iran. He received his Ph.D. degree in software engineering from Sharif University of Technology, Iran, in 2009. His main research interest includes using formal methods in the software development life cycle, and he has more than 50 papers in this area.