Information Processing Letters 28 (1988) 157-163 North-Holland
4 July 1988
A STRUCI’URAL TEST SELECTION CRITERION * Hasan URAL and Bo YANG Department of Computer Science, University of Ottawa, Ottawa, Ontario, Canada KIN 984 Communicated by David Gries Received 30 October 1987 Revised 18 January 1988
Keywords: Program testing, test selection criterion, well-formed program, program flowgraph, output-input path, redundant use
1. Introduction Software testing aims at revealing the existence of errors in a program by executing it in a controlled environment over a finite set of test cases. Strategies for selecting such test cases generally fall into two categories and relate to either functional or structural testing. In functional testing [3,4,7], the specification of a program is used to select test cases, whereas in structural testing [6-111 the flowgraph of a program is used in selecting tests. Each structural testing strategy defines a test selection criterion in terms of control or data flow information contained in a program’s flowgraph. The well-known control flow based structural test selection criteria are statement, branch, and path coverage criteria [7]. For example, the branch coverage criterion requires the selection of test data that result in traversing those paths which cover all edges in the flowgraph at least once. Some recent data flow based structural test selection criteria are Rapps and Weyuker’s all-du-paths criterion [10,12], Ntafos’ required k-tuples criterion [8], and La&i and Korel’s (ordered ) data context criterion [6]. These three criteria (and their derival
This work was partly supported by the National Science and Engineering Research Council of Canada under Grant No. A0976.
0020-0190/88/$3.50
path, output-free
tives) focus on tracing the flow of data through the associations in a flowgraph L between assignments of values to variables and the uses of these variables in either assigning values to other variables or determining the outcome of conditional branching. From the point of view of testing of embedded functions in a program, structural testing strategies based on data flow are perhaps better suited than those based on control flow, since the former identify data dependencies and hence implicitly require testing of functional segments [4]. It is important to note that Rapps and Weyuker’s all du-paths criterion considers individual associations between definitions and uses of variables. On the other hand, Ntafos’ required k-tuples criterion considers the chains of associations between definitions and uses of variables that affect each other. Although the intent of the authors of these two criteria was perhaps to trace the effects of input variables on the influenced output variables, the identification of all input variables that influence a specific output variable has not been realized by either of them. It was Korel who suggested that such identification may be helpful in providing better understand& of the program and in checking the consistency of the program with its specification [S]. In this paper we present a structural test selection criterion that is based on the analysis of the
0 1988, Elsevier Science Publishers B.V. (North-Holland)
157
Volume 28, Number 3
INFORMATION
PROCESSING LE’ITERS
effects of program inputs on program outputs. ms criterion requires that each critical association between each input variable and the output variable that is influenced by this input variable be examined during testing. We then prove that this criterion is ‘stronger’ than the all du-paths criterion. In Section 2 we review some related terminology and introduce definitions of some new terms used throughout this paper. Section 3 formally defines our criterion and explores some of its properties. In Sstion 4 we demonstrate the strength of the new criterion by comparing it with the existing criteria. Section 5 concludes the paper. 2. Terminology We define some terms used throughout this paper. The concepts related to our criterion are independent of the details of the programming language in which the source code is given. A program is either a main program or a single subprogram (i.e., a procedure or function) and has single entry (from which the program is invoked) and single exit (at which the execution of the program is terminated). A program is represented by a corresponding flowgraph G(V, E), where V is a set of nodes each representing a statement or a sequence of statements such that if the first one is executed, the rest is executed subsequently, and E is a set of edges which represent the control flow between nodes. In a flowgraph, the node corresponding to the entry of the program is called the entry node. Similarly, the node corresponding to the exit of the program is called the exit node. A variable occurrence in a program is said to be a definition if it is: (1) on the left-hand side of an assignment, or (2) in an input statement from which it obtains a value, or (3) an Output or input/output parameter in a subprogram call. A variable occurrence in a program is said to be a llse if it is not a definition. A use is further classified as a computational-use (c-use) or a predicate-use (p-use). A c-use directly affects the amputation being performed (e.g., on the righthand side of an assignment statement) or allows 158
4 July 1988
one to see the result of some earlier definition (e.g., in the list of variables of an output statement). A p-use directly affects the control flow of the program (e.g., in a predicate portion of a conditional transfer statement) [lo]. An input of the program is the definition of a variable that: (1) occurs in an input statement, or (2) is defined by an assignment statement whose right-hand side contains only constants, or (3) is defined by a subprogram call whose parameter list contains the variable as an output parameter (i.e., the variable is not defined before the call, but is assigned by the called subprogram). An output of the program is either a c-use of a variable in an output statement or an output statement that contains only constants. A redundant use (which may be a c-use or a p-use) is a use that has no effect on program outputs. A path is a finite sequence of nodes (n,, n2,..., n,), where m 2 2 and there is an edge (ni, ni,i) for 1 < if m - 1. A Ioop-free path is a path (n,, n2,. . ., n,) in which ni # nj whenever i + j for 1~ i, j < m. A complete path is a path whose first node is the entry node and whose last node is the exit node of a flowgraph G. A path (n,, n2 ,..., n,,,, n,) is a def-clear path with respect to a variable x from node n, to node n, or to edge (n,_,, n,) if there are no definitions of x from node n2 to node n,,,,, (inclusive). An output-free path is a path in which none of its nodes contains an output. The use of a variable x is affected by the definition of a variable y if either: (a) x and y are the same variable and the USC of x is reached by the definition of y through a def-clear path w.r.t. y, or (b) the definition of a variable z is given in terms of the use of y at a node that is reached from the definition of y through a def-clear path w.r.t. y, and the use of x is affected by the definition of z. An input I influences an output 0 if (a) 0 is a c-use of a variable in an output statement and the c-use is affected by I, or
Volume 28, Number 3
INFORMATION
PROCESSING LETTERS
(b) 0 is an output statement containing only constants and the last p-use that leads the control flow to 0 is affected by I, or (c) input 1i influences 0, and the last p-use (if any) which leads the. control flow to Ii is affected by I. An output-input path is a path (n,, n2,. . . , n,) where n, contains an input I and n, contains an output 0 that is influenced by I. An external p-use is a p-use that is not in any output-input path. An extended output-input path is an output-input path followed by a path that terminates with an edge containing an external p-use. A simple cycle is a path in which all nodes except the first and the last are distinct. For short, we call an output-input path or an extended output-input path an OI-path. A simple OI-path is an 01-path that contains: (a) zero or two iterations of each loop that is entered and exited at the same node and there is a definition of a variable X succeeding a c-use of X in the body of the loop, aLAd (b) zero or one iteration of any other loop. By basic test coverage we mean that every edge of the flowgraph is covered st least once by some test. A well-formedprogram is a program containing at least one input statement and one output statement, and in which: there is no redundant use, every definition of a variable reaches some use of that variable, every use of a variable is reached by some definition of that variable, all statements are reachable, every definition within an output-input path must affect the value of the output of that path or the control flow that leads to that output. Otherwise, the program can be reformed to avoid such a case. All simple OI-paths criterion Let us make the following assumptions: (1) the program to be tested is well-formed; (i) output statements preceding the first input statement of the program are ignored;
4 July 1988
(3) if a complete path is an output-free path and, in addition, if the absence of output in the path is not treated as an anomaly, then the first statement in the path is considered to contain an output-input path; (4) in a flowgraph of the program, there are no edges from a node to itself. Then, the all simple O&paths criterion is defined as follows:
For a given flowgraph G, select those complete paths that cover each simple 01-path of G at least once. Properties of this test selection criterion are given by the following theorem. 3.1. Theorem.
Every edge of a jlowgraph covered by some simple O&path.
G is
Pro& We assume the opposite, i.e., there exists an edge e that cannot be covered by any simple OX-path of G. Since the program is well-formed, edge e must be reachable through some path from the entry node. There is an edge e’ associated with a predicate p, such that whenever e’ is traversed, e must be traversed subsequently. Edges e and e ’ could possibly be the same. Let x be a variable contained in p. Also, because the program is well-formed, there exists some definition of x (denoted by d) that reaches e’ along a def-clear path from d to e’, i.e., d + e’. If d is defined in terms of some other variable(s), say y, then there must be some input I, from which there is a path to e’ : I + d + e’ and I affects the use of y and d affects e’ (therefore, the truth value of p is affected by I), otherwise d itself is the input, so ve have d + e’. There are two cases for the assumption of the proof: (a) from e’ there exists a path reaching an output 0; (b) from e’ there exists no path reaching any output. In case (a) we have the path I-,d+e’+Oor Since (when d itself is the input) d-,e’+O. there is no redundant use in a well-formed program, the above path must be an output-input 159
Volume 28, Number 3
INFORMATION PROCESSING LETTERS
path. By removing some iterations of each loop within the path, we can reduce the path to a simple output-input path that retains the same
coverage of nodes and edges as the output-input path. Thus, e’ (and therefore e) is covered by the simple output-input path; a contradiction. In case (b) we have the path I * d + e’ -B exit or (when d itself is the input) d + e’ --, exit, where every path from e’ to the etit of the program is an output-free path. If there exists a path from the program entry to I (or d) COihIUIlg all OUtput-input path as a portion, then e’ (and therefore e) is in an extended output-input path. Otherwise, by assumption (3), every path that covers e’ (and therefore e) is an extended output-input path, i.e., e’ (and therefore e) is covered by at least one extended output-input path, in this case anyway. Similar to the proof of case (a), the extended output-input path can be reduced to a simple extended output-input path which also covers e’ (and therefore e). This is a contradiction to our assumption. 0 3.2. Corollary. Every use in a jlowgraph G is covered by some simple O&path. Proof. Since any use is either a p-use (associated with some edge) or a c-use (associated with some node), and from Theorem 3.1 all edges and therefore all nodes are covered by some simple output-input path or some simple extended output-input path, the claim of the corollary holds. cl
3.3- tirollary. One can always find a set of simple OI-paths that form a basic test coverage of the program.
4 July 1988
complete paths that satisfies A also satisfies B. A includes B is represented by A * B. Test selection criterion A strictly includes test selection criterion B if, for any given flowgraph, A inclulles B but B does not include A. A strictly includes B is represented by A+ B. Test selection criteria A and B are incomparable if neither A = B nor B =) A. In this paper we are only interested in the nontrivial case where A+ B. Comparisons of structural test selection criteria performed by [1,2,10,11,12] revealed that although test cases that satisfy one criterion are likely to satisfy other criteria, the three data flow based test selection criteria mentioned in Section 1 are incomparable. In this section we prove that our criterion strictly includes all du-paths. It is obvious that the all paths criterion strictly includes the all simple OI-paths criterion, since the latter only requires simple paths to be traversed. 4.1. Lemma. If the use of a variable y (denoted by u,,) of a du-path 1 Y. r. t. y is covered by a simple output-input path, I --, 0, say, then the du-path is covered by some simple output-input path. Pr&. From the given condition I --) u,, + O,, we can divide the proof into three parts according. to the location of the definition of variable y (denoted d,) that reaches u,, through the du-path (denoted by d,, 4 u,,). 1: Ir case I+ d,, --) u,, 3 O,, the proof for this case is immediate: the du-path is covered by the same simple output-input path. 2: In case
proot. The proof is immediate from Theorem 3.1. cl
4. (Comparison of the criterion with other criteria
Comparisons between a pair of structural test selection criteria are based on the following relation [1,2,10,12]: Test selection criterion A includes test selection criterion B if, for any given flowgraph, any set of 160
we apply the following arguments: (a) If d,, itself is an input, dY + u,, --) 0, can be reduced to a simple path by removing some ’ A pa* (n,, n2,..., 4,,-1, n,)
is a du-path w.r.t. a variable y if n1 has a definition of y and either: (1) n,,, has a C-USC of y and (q, nz,..., n,) is a def-clear loop-free path w.r.t. y or a def-clear simple path w.r.t. y, or (2) (n,_l, n,) has a p-use of y and (nl, nz,...,n,_,) is a def-clear loop-free path w.r.t. y (lo].
Volume 28, Number 3
INFORMATION
PROCESSING LETTERS
iterations of each loop (if any) within path u,, + 0,. Since there is no redundant use in a wellformed program, Uu, and therefore dy, must influence O,, i.e., the new path from & to 0, is a simple output-input path(b) If $, is not an input, then it must be defined in terms of some other variable, x say, and we have the following path: d, --) (uX, d,) + u,, + O,, where (u,, d;y) represents the node in which x is used to defme y. Note that dX + U, is a du-path, and therefore the above path remains to be the same case (i.e., case 2) for which we reapply the above arguments. Since the path extended in (b) is a simple path and the program is well-formed, the above arguments succeed at (a) in finite iterations. Therefore we complete our proof for case 2. 3= In case L&--) I + u,, + O,, the proof can be done by similar arguments as those in the proof for case 2. c3
4 July 1988
use fx) def (Y) use 00 US3 (Y) def (L)
use (x) def (L)
4.2. Lemma If the use ofa variable y of a du-path w. r. t. y is cover& by a simple extended output-input path, then the du-path must be covered by some simple extended output-input path. Proof. From the given condition
I + 0, -+ uY+ exit we can divide the proof into two parts according to the location of d,, which reaches tp,, through the du-path (denoted d,, + uv). 1: In case f -+ 0, 3 dY + uu --) exit, the proof is immediate: the &path is covered by the same simple extended output-input path. 2: In case d,, 4 0, + uY+ exit, since the program is well-formed, dY must influence 0, in some way. Thus, there is a use of y, let us call it u$ which lies somewhere along the path dY -+ O,, i.e., dY+ ui 4 0,. By definition, dY --) ui is a du-path w.r.t. to y. From Lemma 4.1 we know that this du-path is covered by some simple output-input path to O,, f 4 dv + u: + O,, so the du-path dY --) 0, -+ uY is covered by an extension from this simple output-input path (i.e., a simple extended output-input path), and the proof of the lemma is completed, q 4.3. Th~~rn.
use (2) def (2)
& nexi
Fig. 1.
mediately be proven using Corollary 3.2, and Lemmas 4.1 and 4.2. The strictness of the inclusion can be proven by considering the program given below, and its flowgraph shown in Fig. 1. n entry
be;gin
*I
input(x);
n2 n3
All simple O&paths +b all du-paths. n4
Proof. All simple O&paths r=$all du-paths can im-
if even( x ) then
L := x/2 else input(Y)
L:=(x-I)*
y/2; 161
INFLATION
Volume 28, Number 3
*5 %
z
if
:=f( L);
not integer(z)
“8
ruund(z); output(k = ‘, z);
ivexit
end.
*7
PROCESSING LETTERS
then
2 SC-
Let
We have presented a new structural test selec-
tiun criterion that is based on the iden~fi~atiun of all input vtiables that influence each output variable in a source program. This criterion requires that each critical association between each input variable and the output variable that is influenced by this input variable be examined at least once during testing* The complexity of the proposed criterion is the same as the complexity of the afl &-J&S criterion 1121,but it strictly includes a# 4&4-j~&~.That is, any set of test cases satisfying aff sinpikz OI-patirs also satisfies aff &paths, but not vice versa. As well, more insight can be gained on the 4zonsistencyof a program with its specification if the program is tested over a set of test cases satisfying a/.. simple Op-paths. We are currently studying a family of structural test selection criteria that are derivatives of the criterion presented in this paper. Also, we are in the process of proving that some of these criteria strictly include both required k-fuples and (ordered) contextcoverage criteria. R should be pointed out that, like other structural test selection criteria, our eriterion is also hampered by its reliance on the syntactic selection of paths to be traversed during testing. It is well 162
4 JuIy 1988
known that not all paths in a program are feasible (executable) and that the identification of feasible or infeasible paths is an undecidable problem. That is, the syntactic information in a flowgraph, representing the source code uf a program, is not suf~~ent to determine whether a particular path is feasible (or ~f~sible~- Thus, a set of paths selected to satisfy a structural criterion often contams infeasible paths. There seems to be an interplay between the selection of paths to satisfv a certain criterion and the search for input values to enable the program to fullow each se&c&d path. A rast study [9] expluits this interplay by using prefixes of previously selected paths to select subsequent paths. This is an ~terest~g approach that needs further study.
The authors are grateful to C&W.Masapati for his critical reading of the manuscript and to RL. Probert for his valuable suggestions.
References
m L.A. Clarke et aI., A comparison of data flow path selection criteria, in: 8th Internat. Con& an Sofmare Eninuring (1985) 244-251. L.A. Cfarke et al., An investigation of data flaw path selection criteria, in: Wurkshcp on Software Testing, Banff, Canada (1986) 23-32. 132 WE Howden, A functional approach to program testing and analysis, IEEE Tram Sofmare Engineering Slk12 (10) (1986) 99?-1oQs. r41 W.E. Howden, ~~ctia~a~ Program Tating and ~~u~s~ (M~raw-~, New York, 1987). r51 B. Korel, The program dependence graph in static pm gram testing, Inform Process Left. 24 (2) (1987) 103-108, [6l J.W. Laski and B. Korel, A data flow oriented program testing method, IBEE Tmm Sofmare Engiwering SE9 (3) (1983) 34’7-354. r71 E. MiIler and W.E. Howden, Sofmme Testing aIl(i Valida= tiun Techniques, IEEE Tutorial (IEEE Computer Society Press, Los Ahunitos, CA, 2nd ed., 1981). r81 SC. Ntafos, On requimd element testing &‘&IZTram Sofhvare Btgineering SE-10 (6) (1984) 795-803. r91 R.E. Prather and J.P. Myers, Jr., The path prefii software testing strategy, fEEE Trans. Software Engimering SE13 (7) (1987) 761-766.
Volume 28, Number 3
INFORMATION
PROCESSING LETTERS
[lo] S. Rapps and E.J. Weyuker, Selecting software test data using data flow information, IEEE Truns. Sofhvae Engineering SE11 (4) (1985) 367-375. [ll] M.D. Welser, J.D. Gannon and P.R. McMulh, Compti-
4 July 1988
son of structural test coverage metrics, IEEE Sofmure 2 (2) (1985) 80-85. [12] E.J. Weyuker, The complexity of data flow criteria for test data selection, Inform. Process. Lett. 19 (2) (1984) 103-109.
163