Coverage testing software architectural design in SDL

Coverage testing software architectural design in SDL

Computer Networks 42 (2003) 359–374 www.elsevier.com/locate/comnet Coverage testing software architectural design in SDL W. Eric Wong a,* , Tatiana...

460KB Sizes 0 Downloads 69 Views

Computer Networks 42 (2003) 359–374 www.elsevier.com/locate/comnet

Coverage testing software architectural design in SDL W. Eric Wong

a,*

, Tatiana Sugeta

a,1

, J. Jenny Li b, Jose C. Maldonado

c

a

c

Department of Computer Science, University of Texas at Dallas, Richardson, TX 75083, USA b Network Software Research, Avaya Research Labs, Basking Ridge, NJ 07920, USA Department of Computer Science, University of Sa~ o Paulo at Sa~ o Carlos, Sa~ o Carlos, SP, Brazil

Abstract Statistical data analysis shows that early fault detection can cut cost significantly. With improved technology for automatic code generation from architectural design specifications, it becomes even more important to have, from the beginning, a highly reliable and dependable architectural design. To ensure this, we have to predict the ‘‘quality’’ of the system early in the development process. The use of traces or execution histories as an aid to testing and analysis is well established for programming languages like C and C++, but it is rarely applied in the field of software specification for designs. We propose a solution by applying our technology at source code level to coverage testing software designs represented in a high-level specification and description language such as SDL. Sophisticated dominator analysis is applied to provide hints on how to generate efficient test cases to increase, as much as possible with as few tests as possible, the control-flow- and data-flow-based coverage of the SDL specification being tested. We extend source codebased coverage testing to the software design specification level for specification validation. A coverage analysis tool, CATSDL , with user-friendly interfaces was developed to support our method. An illustration is provided to demonstrate the feasibility of using our method to validate the architectural design efficiently in terms of higher testing coverage and lower cost.  2003 Elsevier Science B.V. All rights reserved. Keywords: Architectural design; SDL; Control-flow- and data-flow-based coverage testing; CATSDL

1. Introduction The analysis of statistical data on faults shows that early detection of errors can cut costs significantly. Recent improvements in technology allow * Corresponding author. Tel.: +972-883-6619; fax: +972-8832349. E-mail address: [email protected] (W.E. Wong). URL: http://www.utdallas.edu/~ewong. 1 Visiting student from the University of Sa~ o Paulo at Sa~ o Carlos, Brazil

automatic code generation from architectural design specifications. Therefore, having a highly reliable and dependable architectural design from the beginning assumes greater importance. An ideal scenario would be to conduct as much testing on design as possible before the coding based on it is started. The reason is simple. When compared with the late stage of the development process where the final system has already been implemented and integrated together, it is much cheaper at the design stage not only to fix bugs, if any, but also to make some changes (either major or

1389-1286/03/$ - see front matter  2003 Elsevier Science B.V. All rights reserved. doi:10.1016/S1389-1286(03)00248-2

360

W.E. Wong et al. / Computer Networks 42 (2003) 359–374

minor), if necessary. With this in mind, two important questions have to be answered: how to represent an architectural design so that it can be tested to ensure its quality, and what kind of testing methods and/or criteria should be used to test this design. In this paper, the ITU-T Specification and Description Language (SDL) is used to represent the architectural design of a software system for the following reasons. SDL satisfies the requirements for an executable architecture description language [17]. It allows dynamic creation and termination of process instances and their corresponding communication paths during execution. As a result, SDL is capable of modeling the architectures of dynamic distributed systems in which the number of components and connectors may vary during system execution. In addition, SDL can represent all four views of software architectures [20]. For example, SDL uses delay and non-delay channels to indicate the relative physical locations of components [5]. Moreover, because SDL specifications are executable, its dynamic execution trace (including the exact execution counts of a given part of an SDL specification, such as a decision, a state input, or a context variable) generated during the simulation 2 can be collected for coverage measurement with respect to both control-flow- and data-flow-based criteria. An architectural design in SDL-92 can be viewed, for example, as a collection of blocks 3 and processes communicating with each other by exchanging signals through channels [10]. The top level is a system level specification. Each system contains some blocks; each block contains either blocks or processes; blocks communicate through 2 In this paper, ‘‘executing an SDL specification’’ has the same meaning as ‘‘simulating an SDL specification.’’ We also use ‘‘SDL specifications’’ and ‘‘SDL code’’ interchangeably. In addition, because our coverage analysis tool CATSDL uses Telelogic Tau as the simulator (Section 4) for generating execution trace with respect to each test case, the version of the SDL used by our tool has to be consistent with that supported by Tau. With this in mind, we used SDL-92 instead of SDL2000. 3 Here, a block is a structuring element in SDL. It is different from a block defined for the block coverage criterion. Refer to Section 3 for more details.

channels; each channel can be either delaying or non-delaying; each process of a block is defined by an extended finite state machine (EFSM); they communicate through channels. Altogether, an SDL specification provides a process view of a systemÕs architectural design. The textual representation of SDL specifications can be viewed as ‘‘programs’’ in a specification and description language, just like programs in C. As a result, all the testing methods applied to C programs, including random testing and functional testing (both are black box oriented) as well as control-flow-based and data-flow-based white box coverage testing, can also be applied to SDL specifications. Many studies [12,15,31] have been conducted to compare the fault detection ability of these testing methods. The results are somewhat mixed because they depend not only on how test cases are generated but also the programs and the types of faults being studied. White box-based coverage testing methods can detect certain faults that cannot be caught by random testing and functional testing. However, this does not imply that we should ignore the latter. To the contrary, we believe all these testing methods are complementary. For example, one may start with functional testing to construct a test set and then improve this set by following some control-flowbased and/or data-flow-based coverage criteria. This is one of the reasons why we propose to extend source-code-based coverage testing to the software design specification level for specification validation. Researchers have proposed different techniques to generate test cases from specifications, but the tests so generated are used to test the implemented system that is based on these specifications, rather than the specifications themselves [7,18,19,25,29]. It is different from our coverage testing, which is applied directly on the SDL specifications to validate the architectural designs instead of the final implemented systems. The benefit of doing so is stated at the beginning of this section. Another significant difference between those studies and ours is that the former assume the underlying SDL specification itself is correct. This can be a very big assumption in some cases. What if the specification is not correct? Do we still want to start the im-

W.E. Wong et al. / Computer Networks 42 (2003) 359–374

plementation even before we have a good idea on whether the underlying specification behaves as we expect? For some well-known communication protocols, one might argue that the specifications have already been validated and hence there is no need to conduct any additional coverage-based testing on them. While this may be the case for certain SDL specifications, it is certainly not true for other SDL specifications. Coverage-based testing on SDL specifications is also different from the model checking [1,4,8]. The latter searches through all reachable states (like conducting an ‘‘exhaustive’’ testing) while verifying whether the model holds certain properties, whereas the former provides a good indicator of progress during testing, and guidelines on how the existing test set can be improved to increase testersÕ confidence on the ‘‘correctness’’ of the specifications being tested. Once again, these two techniques are complementary, instead of contradictory. While conducting a coverage testing on SDL specifications, we propose that in addition to reporting the current coverage with respect to a set of tests, we should also utilize the coverage information to identify which parts of the specification have not been covered, with respect to a selected coverage criterion, by existing test cases. A sophisticated dominator analysis [2] is applied to these parts to provide useful hints on how to generate efficient test cases to increase, as much as possible with as few tests as possible, the coverage of the specification being tested. Although it is essential to report the cumulative coverage obtained by executing a suite of test cases on an SDL specification, it is also important to be able to decompose such overall cumulative coverage with respect to a selected subset of test cases on a selected part of the specification being tested. For example, instead of reporting the block coverage of tests t1 , t2 , t3 , t4 , and t5 on an SDL specification with three processes P1 , P2 , and P3 , one might only be interested in the block coverage of tests t1 and t3 on process P1 . Such decomposed information can help practitioners conduct smart debugging and gain better comprehension of SDL specifications. Refer to Section 4.3 for more discussion. Finally, a coverage analysis tool, CATSDL , with user-friendly interfaces was developed to support our method.

361

The rest of the paper is organized as follows. Section 2 presents an overview of some related studies. Section 3 describes five different coverage criteria: two control-flow-based and three dataflow-based. A coverage analysis tool for SDL, CATSDL , and the underlying methodology used for implementing this tool, is explained in Section 4. Section 5 illustrates how hints provided by CATSDL can help generate tests to increase coverage effectively. Finally, in Section 6 we offer our conclusions and recommendations for future research.

2. Related studies Model checking, which has been used to analyze specifications of software systems, is a formal verification technique based on state exploration [1,4,8]. It relies on building a finite model of a system and checking that a desired property holds in that model by visiting all reachable states and verifying that related invariants and corresponding temporal logic expressions are satisfied. One of the technical challenges in applying model checking is in devising algorithms and data structures for large search spaces. This is because model checking can suffer from state explosion due to an exhaustive search on the state space. Stated differently, using model checking to verify properties of software is like conducting an ‘‘exhaustive’’ testing which might not always be possible in practice. A number of state-saving techniques were proposed to counter the complexity of state exploration. For example, Sidorova and Steffen [26] suggested a technique for closing SDL components with a chaotic environment without considering all possible combinations of values in the input queues. Several studies have applied control-flow- and data-flow-based test selection criteria to system specifications in SDL for generating tests [7,18, 19,29]. These tests are then used to test the implemented system based on the specifications with an assumption that the underlying SDL specification itself is correct. It is focused on the consistency between the specification and the implementation, i.e., to determine whether an implementation of a system has the desired behavior

362

W.E. Wong et al. / Computer Networks 42 (2003) 359–374

with respect to the flow of control and data dependencies expressed in its corresponding specification. It is the conformance testing (or protocol conformance testing) to test the control- and dataflow aspects of the implementation against its specification. The coverage testing discussed in this paper is very different from the above as the emphasis is on the testing of the architectural design expressed in SDL rather than the implementation. More specifically, it is a white box-based specification testing for architectural design. Other test generation methods were also explored. For example, Bourhfir et al. [6] presented a tool to generate test cases for a protocol modeled by an SDL system. An EFSM was extracted from the system for each SDL process. Test cases were generated with respect to these EFSMs separately without considering the possible communications among them. An algorithm similar to the one used for the reachability analysis was applied to these tests in order to generate a partial product (a reachability graph with certain marked transitions) for each SDL process (which can be viewed as a CEFSM) in the system context. Finally, test cases for each CEFSM were generated from the corresponding partial product. Once test cases were generated, input/output sequences were extracted and used to test the conformance of the implementation against the specification. Coverage of these test cases was measured by using two criteria: unique input output (UIO) and all-defuse. A significant problem using this approach with respect to the two criteria mentioned above is that constructing a partial product can be very expensive. For example, as reported by the authors, for a CEFSM with 11 states and 24 transitions, its partial product has 1027 states and 3992 transitions. As a result no test cases could be generated for this CEFSM. A solution to this is to use different criteria such as the all-uses criterion discussed in Section 3. Two other examples are Grabowski et al. [14] who proposed a method for automatic test generation of concurrent test cases in the TTCN format based on SDL system specifications and MSC test purposes, and Ek et al. [11] who applied the tool SDT/ITEX for automatic TTCN test generation with constraints and dynamic behavior tables. A comparison among sev-

eral different test generation algorithms can be found in [13]. Researchers have also studied how to generate tests from UML specifications. For example, Offutt and Abdurazik [25] use change event enabled transitions to define four levels of testing: transition coverage level, full predicate coverage level, transition-pair coverage level, and complete sequence level. However, these tests are once again intended to be executed on an implementation of the specification, not the UML specification itself. In [25], tests generated from UML specifications were executed on a program of 400 lines of C code. An important point worth noting is that although papers referenced in the previous three paragraphs are related to the subject discussed in this paper, there is a major difference between the focus of those papers (which is on either test generation from SDL specifications or application of these tests to the implemented system based on a given SDL specification). The focus of our paper is on how to effectively increase the coverage on testing SDL specifications themselves. Refer to Section 4.4 for more discussion. A different approach to analyzing the quality of an SDL specification is to develop a set of metrics to identify which parts of the specification are more likely to be fault-prone [34]. One significant problem of this approach is that these metrics by themselves need to be validated first before they can be used. Besides, these metrics only measure static complexity based on the syntax of the SDL specification being analyzed. They do not take any dynamic testing information into account. As a result, it is possible that not all stress points so identified are fault-prone. For example, one part of an SDL specification (say Sa ) can have a higher static complexity than another part of the specification (say Sb ); however, Sa is not necessarily more fault-prone than Sb , if Sa is also more tested than Sb . One good solution to this problem is to apply coverage data collected during testing of the SDL specification to calibrate the fault-proneness computed by using the static complexity metrics. More specifically, the fault-proneness of a certain part of an SDL specification might be decreased depending on how this part is tested. And testing coverage with respect to different criteria on the various

W.E. Wong et al. / Computer Networks 42 (2003) 359–374

parts of an SDL specification can be used as a good indicator (e.g., an additional sub-metric of the metrics used in [34]) to decide how much such a calibration should be performed. There are some commercial tools such as the Tau suite from Telelogic [28] for simulating SDL specifications and verifying them against certain properties such as absence of deadlocks. Tau also provides a facility for test case generation from an SDL specification. However, such simulation and test case generation are black box oriented with a focus on functionality. Tau does not provide any white box-based testing coverage measurement. On the contrary, our coverage-based testing for SDL specifications gives a precise measurement on what has been tested so far and what is still missing with respect to two control-flow-based and three data-flow-based testing criteria. The coverage analysis tool, CATSDL , we implemented works with Tau for such a measurement. Refer to Section 4 for more details.

3. An overview of control-flow- and data-flow-based coverage criteria Specifications in SDL can be viewed as ‘‘programs’’ in a specification and description language, just like conventional programs in C, C++, or Java. Thus we can adopt coverage criteria for the latter [16,24], with minor modifications, if necessary, and apply them to the SDL specifications. We propose to conduct coverage testing on SDL processes with respect to five different criteria: Two of them are control-flow-based and three of them are data-flow-based. To be consistent with the terminology used for testing C programs, we also name the two controlflow-based criteria for SDL as the block coverage criterion and the decision coverage criterion. The block coverage for SDL specifications is similar to the well-known basic block coverage for C programs where a basic block, also known as a block, is a sequence of consecutive statements or expressions containing no transfers of control except at the end, so that if one element of it is executed, all are. This of course assumes that the underlying hardware does not fail during the execution of a

363

block. As for SDL, the ‘‘block’’ in the ‘‘block coverage’’ means a subset of an SDL specification that will always be executed together. It has a different meaning from the ‘‘block structure’’ in SDL which represents either a local specification within a system specification or a remote specification to which the system specification must contain a reference which then becomes a specification of the next abstraction level. Although this might cause some inconvenience, there should be no ambiguity about which ‘‘block’’ is referred to, i.e., whether it means a block in block coverage for testing SDL or a block as a structuring element in SDL, after the surrounding context is considered. Hereafter, we refer to ‘‘block’’ as the former unless otherwise specified. For explanatory purposes, we use the alternating bit protocol, which is a simple form of the ‘‘sliding window protocol’’ with a window size of 1 [27], as the example. It can be used to provide reliable communication over non-reliable network channels through a one-bit sequence number (which alternates between 0 and 1) in each message. This protocol is constituted by a sender and a receiver who exchange messages through two channels, Medium1 and Medium2 (Fig. 1). When the sender sends a message (containing a protocol bit, 0 or 1) to the receiver through Medium1, it sends the message repeatedly (with the corresponding protocol bit) until receiving an acknowledgment from the receiver that contains the same protocol bit as the message being sent. When

Fig. 1. An SDL specification for the alternating bit protocol at the system level.

364

W.E. Wong et al. / Computer Networks 42 (2003) 359–374

the receiver receives a message, it sends an acknowledgment to the sender through Medium2 and includes the protocol bit of the message received. The first time the message is received, the protocol delivers the message for processing. Subsequent messages with the same bit are simply acknowledged. When the sender receives an acknowledgment containing the same bit as the message it is currently transmitting, it stops transmitting that message, flips the protocol bit, and repeats the protocol for the next message. This implies that the sender associates each message with a protocol bit which is alternated between 0 and 1 to differentiate consecutive messages. Let us focus on the sender process in the alternating bit protocol as shown in Fig. 2. An example of a block is the specification grouped by the brace. This block contains an input ‘‘medium_ error,’’ an output ‘‘dmðm; iÞ,’’ and a state denoted by ‘‘-’’. It specifies that the sender process receives a ‘‘medium_error’’ message, resends the message with the corresponding protocol bit, and then waits for the acknowledgment from the receiver. The specification in this block will be executed in such a way that if one part of it is executed, all of it is.

The ‘‘decision’’ coverage 4 in SDL considers not only ‘‘data-related decision’’ but also ‘‘event alternatives’’ [22]. In Fig. 2, the branching of blocks is caused either by the state transitions due to different inputs or by decision matching. For example, following state ‘‘wait_am’’ are two branches corresponding to inputs ‘‘medium_error’’ and ‘‘amðjÞ’’ for two different conditions, namely, the sender process receives a medium_error message or an acknowledgement with a protocol bit ‘‘j’’. For the latter, it is followed by a decision construct represented by a diamond in SDL to test whether ‘‘j ¼ i,’’ i.e., whether the protocol bit in the acknowledgment from the receiver is the same as that contained in the message being sent by the sender. This introduces two additional branches: the TRUE branch and the ELSE branch that covers all remaining conditions. Altogether, there are four branches to be tested based on the decision criterion. Three data-flow-based coverage criteria, p-use, c-use, and all-uses, which have been used for testing programs written in languages such as C [16,24] are also used in coverage testing SDL specifications. Below we give a brief description of these criteria. A more complete definition can be found in [29]. Let bi and bj be two points in an SDL specification at which variable x is defined and used, respectively. This definition and use is referred to as di ðxÞ and uj ðxÞ, respectively. The pair ðdi ðxÞ; uj ðxÞÞ is a def-use pair. A def-use pair is a puse or a c-use pair depending on whether bj is a predicate or not, respectively. Any such pair is feasible if there exists a test case t such that the execution of the SDL specification on t causes control to transfer from bi to bj without defining x at any point other than at bi . Such a path is also considered a definition clear path with respect to x. We refer to the set of all p- and c-uses as all-uses. In an SDL specification, the data flow in a process that is represented by an EFSM can be traced

4

Fig. 2. The SDL specification for the sender process.

In [22], the criterion is named as ‘‘branching’’ coverage criterion because it considers more than just the decision construct represented by a diamond. However, to be consistent with the name (i.e., decision coverage) used for C programs, we name this criterion also as ‘‘decision’’ coverage criterion.

W.E. Wong et al. / Computer Networks 42 (2003) 359–374

through the relationships among input signals, context variables, and output signals, etc. Refer to the sender process in Fig. 2: an example of c-use with respect to m contains its definition at ‘‘putðmÞ’’ which specifies that the sender process receives a message m, and the corresponding use at ‘‘dðm; iÞ’’ that specifies the sender sends m (along with a protocol bit i) via a channel to the receiver. Similarly, an example of p-use with respect to j has the definition at ‘‘amðjÞ’’ which specifies the sender receives a protocol bit j (that is a part of the acknowledgment) from the receiver, and its use at ‘‘j ¼ i’’ to make sure j is the right protocol bit. A test set T may be evaluated against the alluses (p- or c-use) criterion by computing the ratio of the number of all-uses (p- or c-use) covered to the total number of all-uses (p- or c-use). In theory, the c-use, p-use, and all-uses criteria are based on ‘‘feasible’’ def-use pairs [9,24]. However, feasibility is an undecidable problem. Although we can apply some simple analysis techniques to eliminate certain infeasible def-use pairs from consideration, in practice it is impossible to manually identify and eliminate all the infeasible def-use pairs while computing the coverage with respect to data-flowbased criteria. Thus, in our coverage-based testing some of the def-use pairs may be infeasible. As a result, the coverage so computed may be a little bit different from that if none of the infeasible def-use pairs is included. Nevertheless, because the syntax of an SDL specification is much simpler than that of a C program, the number of infeasible def-use pairs, if any, in an SDL specification is much smaller than that in a C program. This implies that the impact of infeasible def-use pairs on SDL coverage is much less than that on C programs. The same applies to block and decision criteria.

4. CATSDL: a coverage analysis tool for SDL specifications A fundamental problem with applying the coverage criteria discussed in the previous section for testing SDL specifications is that we need to have a coverage analysis tool to support this process. It is impossible to manually collect and analyze coverage data, not even for a small SDL

365

specification, because it can be very time consuming and mistake prone. With this in mind, we implemented CATSDL , a coverage analysis tool 5 that aids in testing software architectural design written in SDL. Using CATSDL focuses on three main activities: instrumenting an SDL specification, collecting the dynamic execution trace during the SDL simulation, and determining how well the SDL specification has been tested. Given an SDL specification, CATSDL performs a parsing on it to generate a parse tree that is used to determine which part of the specification should be grouped together as a block with respect to the block coverage. A control-flow graph is generated for each SDL process. Instrumentation is also done on each of these processes by inserting a probe (a user-defined function) in each block. The resulting instrumented specification is then exported for simulation to an SDL simulator (Telelogic Tau [28] in our case). A file is created to record the trace information during the simulation. As subsequent simulation continues, the trace information is appended to the trace file, which is then exported back to CATSDL for coverage analysis. CATSDL measures how thoroughly an SDL specification has been tested by a set of tests and identifies which parts of the specification are not well tested. CATSDL can be used to measure the adequacy of a test set with respect to five different coverage criteria (block, decision, c-use, p-use, and all-uses) and identify areas in an SDL specification that require further testing. These areas can be precisely displayed by CATSDL in different colors. For a given area, its color is determined by its weight which is computed based on (at least) how many areas will be covered if this area is covered. This information can be very useful to help testers develop new tests to increase the current level of coverage in an effective way. Refer to Section 4.1 for more explanation. CATSDL can generate coverage reports to reveal the current coverage measures for each criterion,

5

A primitive prototype vSuds-SDL [21] was once implemented by the vSuds team at Telcordia Technologies (formerly Bellcore) with a limited set of features.

366

W.E. Wong et al. / Computer Networks 42 (2003) 359–374

indicating how adequate the existing test set is and providing a high-level view of progress during testing. Such reports can be tailored according to a customerÕs need using only the information collected from some selected tests on certain parts of the specification. Refer to Section 4.3 for how such customized information can help programmers in better debugging, program comprehension, etc. Below, we explain in detail some of the important features in CATSDL . 4.1. Improving coverage through a dominator analysis Studies have shown that it is important to generate a test set with high coverage while testing C programs so that more faults are likely to be detected [23,31,32]. The same applies to testing SDL specifications. Without losing generality, we use the pseudo control-flow graph in Fig. 3 to illustrate how different areas of an uncovered SDL specification (blocks in this illustration) can be assigned with different weights computed by using a dominator analysis [2]. The objective is to generate a test to cover one of the areas with the highest weight, if possible, before other areas with a lower weight are covered so that the maximum coverage (block coverage in this illustration) can be added in each single execution. In this way, we provide useful hints on how to generate efficient test cases to increase, as much as possible with as few tests as possible, the control-flow-based (such

Fig. 3. A sample control-flow graph. Each node represents a block. The loops are given implicitly by reusing the name of a block. For example, the transition from block 6 back to 1 indicates a loop.

as block and decision) and data-flow-based coverage (such as c-use, p-use, and all-uses) of the specification being tested. A block A dominates a block B if covering B implies the coverage of A, that is, a test execution 6 cannot reach block B without going through block A. The weight of a given block is the number of blocks that have not been covered but will be if that block is covered. For example, to arrive at block 18 in Fig. 3 requires the execution also to go through blocks 1, 2, 4, 7, 12, and 13. This implies block18 is dominated by blocks 1, 2, 4, 7, 12, and 13. It also means blocks 1, 2, 4, 7, 12, and 13 will be covered (if they have not been) by a test execution if that execution covers block 18. Similarly, arriving at block 6 in Fig. 3 requires only going through blocks 1 and 2, i.e., block 6 is dominated only by blocks 1 and 2. Consequently, blocks 1 and 2 will be covered (if they have not been) by a test execution which covers block 6. Assuming none of the blocks is covered so far, we say that block 18 has a weight of at least 7 because covering it will increase the coverage by at least seven blocks, and block 6 only has a weight of 3 because its coverage only guarantees an increase of coverage by at least three blocks. A very important note is that while assigning the weight to a given block, we use a conservative approach by counting only the blocks that will definitely be covered because of the coverage of this given block. This is why we use the phrase ‘‘at least’’ in the previous statements. Of course, covering a block of a weight 7 (like block 18 in our example) may end up covering more than seven blocks (e.g., the execution can go through blocks 1, 2, 6, 1, 2, 4, 7, 12, 13 before it reaches block 18). Nevertheless, the additional blocks (block 6 in our example) are not guaranteed to be covered. Because block 18 has a higher weight than block 6, block 18 is given a higher priority to be covered than block 6 (tests that cover block 18 have a higher priority to be executed than tests that only cover block 6). Applying this priority, the maximum block coverage can be added in each single execution. In this way, the

6 Assuming that the underlying hardware does not fail during the execution of a test.

W.E. Wong et al. / Computer Networks 42 (2003) 359–374

block coverage can be increased as much as possible with as few tests as possible. The execution of certain tests may change the weights of blocks that are not covered by these tests. For example, after the execution of a test (say ta ) that covers block 18 (which also guarantees the coverage of blocks 1, 2, 4, 7, 12 and 13), the execution of another test (say tb ) to cover block 6 will increase the coverage by only one block (namely, block 6 itself). This is because both blocks 1 and 2 have already been covered by test ta and the execution of other tests (such as tb ) will not change this fact. Under this scenario, block 6 will have the same weight as block 3 even though the former has a higher weight than the latter before test ta is executed, which implies tests that cover block 6 now have the same priority as those that cover block 3 (assuming that these tests are designed to cover block 6 or 3). This also implies that after a test (ta in our case) is executed to cover block 18, the priority, in terms of increasing the coverage, of executing another test (tb in our case) to cover block 6 is reduced. This procedure is applied recursively after the execution of each test to recompute the weight of each block that has not been covered by the current tests, as well as the priority of tests to be executed for covering such blocks. The objective is to continue this recursive procedure until all the blocks, if possible, are covered by at least one test (100% block coverage) or the execution will stop after a predefined time-out has been reached. In the latter case, although tests so executed might not give 100% block coverage, they still provide as high a block coverage as possible. Fig. 4 shows how the initial weight (before any test is executed) of each block of the sender process in the alternating bit protocol is displayed by CATSDL . The color spectrum at the top of the window indicates the minimum number of blocks that will be covered for each weight. Part (a) displays the process in its textual representation, whereas part (b) is a corresponding control-flow graph. From this figure, blocks 6 and 7 have the highest weight 7 and are highlighted in red. A tester should try to generate a test to cover either block 6 or 7 first in order to increase the coverage effectively. Because weights in Fig. 4 are displayed

367

Fig. 4. The initial display of the sender process with respect to the block criterion: (a) the textual representation and (b) a control-flow graph.

in different colors, it is better viewed in color. This is also the case for other figures in the remaining part of the paper. Interested readers can contact the authors to obtain color versions of these figures. 4.2. Visualizing coverage in SDL specification and its control-flow graph In addition to what kind of coverage data should be collected and analyzed, it is also important to study how such data should be

368

W.E. Wong et al. / Computer Networks 42 (2003) 359–374

Fig. 5. A display of the sender process with respect to the decision criterion: (a) the first layer textual representation; (b) the second layer textual representation; (c) the first layer control-flow graph and (d) the second layer control-flow graph.

presented. For each SDL process, CATSDL provides two views: one for its textual representation (e.g., Fig. 4(a)) and the other for its control-flow graph (e.g., Fig. 4(b)). The former displays the SDL source code of a process, whereas the latter makes its flow of control more evident. Note that if the cursor is moved onto a block in a controlflow graph, the SDL code of that block will be displayed. Together, these two provide different views of an SDL process and are complementary to each other. Also displayed in each view is the weight information. The part in white implies it has already been covered. Thus, it has a zero

weight. Each non-white part has an associated weight (referring to the color spectrum at the top) indicating how much coverage can be improved if it is covered. Fig. 4 gives these two views with respect to the block coverage. Because it is an initial display, no test has been executed yet. That is, no blocks have been covered, and hence, none of them is in white. Fig. 5 displays the sender process with respect to the decision criterion after two tests are executed. A two-layer of representation is used. The first layer (Fig. 5(a) and (c)) shows all the decisions. For each decision, its weight is the maxi-

W.E. Wong et al. / Computer Networks 42 (2003) 359–374

369

weight of this definition is 5. A definition has a zero weight if and only if all its c-uses are covered. Such definitions are displayed with a white background. A similar representation applies to the puse criterion except that the second layer displays each p-use associated with a given definition, where a p-use in a control-flow graph is represented by two blocks of an edge. The first contains the predicate and the second is one of the destinations of this predicate. The same also applies to the all-uses criterion, where the weight of a definition is the maximum weight of its p-uses or c-uses. 4.3. Customizing a coverage report Fig. 6. A coverage report with respect to each process: (a) block coverage per SDL process and (b) coverage with respect to different criteria.

mum weight of its branches. Suppose a definition has two branches: one has a weight of 2 and the other has 5. The weight of this decision is 5. The two decisions in part (c) are in different colors because they have different weights. A decision has a zero weight if and only if all its branches are covered. Such decisions are highlighted in white. Double-clicking on a definition goes to the second layer (parts (b) and (d)) to view all the branches associated with this decision. There are two branches in part (d): one from block 5 to 6 is covered (and hence in white), and the other from block 5 to 7 is not covered yet (and hence in a non-white background––it is in red in this particular example). Similar to how a decision and its branches are displayed, a 2-layer representation is also used to display c-uses and p-uses. For the c-use criterion, the first layer shows all the definitions which have at least one c-use. Double-clicking on a definition goes to the second layer to view all the c-uses associated with this definition. For each definition, its weight is the maximum weight of its c-uses. Suppose a definition has three c-uses (C1 , C2 , and C3 ) and the weights of these c-uses are 3, 1, and 5, respectively (i.e., covering C1 , C2 , and C3 guarantees at least 3, 1, and 5 c-uses will be covered), the

As discussed earlier, it is important to decompose an overall cumulative coverage obtained by executing a set of tests on an SDL specification into coverage with respect to a subset of selected tests on certain parts of the specification. For example, one might be interested in the block coverage with respect to each SDL process to discover how much code of each process has been tested. The block coverage of each process can also serve as an indicator on how well a process is tested in comparison with others. CATSDL provides its users with an option to report coverage with respect to each process such as the one in Fig. 6(a). It is important to know the coverage measurement with respect to each criterion. This information can help testers decide whether a ‘‘saturation effect’’ has been reached with respect to a given criterion. If so, the coverage should be measured with respect to another stronger criterion (e.g., moving from the block criterion to the decision criterion, or the decision criterion to the all-uses criterion). CATSDL can report coverage with respect to five different criteria: block, decision, c-use, p-use, and all-uses. Fig. 6(b) gives such a report. It is also essential to understand how the execution slice with respect to each test case can be constructed, where an execution slice is the set of blocks (defined in the block coverage), decisions, c-uses, p-uses, or all-uses executed by the

370

W.E. Wong et al. / Computer Networks 42 (2003) 359–374

Fig. 7. The updated display of the sender process with respect to the block criterion after executing ‘‘Test Case 0001’’ which covers block 6––one of the initial ‘‘hot spots’’: (a) the textual representation and (b) a control-flow graph.

corresponding test. This is because such slices can be used, following similar methodologies as those developed for C programs [3,30,33], to help practitioners conduct smart debugging and gain better comprehension of SDL specifications. For a given test, its execution slice can be easily constructed if we know the coverage of that test. Such coverage is available with CATSDL . An example of this appears in Fig. 8. Note also that coverage with respect to each test can be visualized as well in the SDL specification and in its controlflow graph just as in Fig. 4. 4.4. Developing a testing strategy To achieve high coverage, we can use test cases generated at two different stages. First, test cases can be generated by using any test generation algorithms (such as those discussed in Section 2) or

Fig. 8. Block coverage per test case

based on a black box-based functional testing or testersÕ expert knowledge of the SDL specification being tested. These tests can be either in the Tree and Tabular Combined Notation, ISO 9646 (TTCN) format or the Message Sequence Chart (MSC) format. Note that TTCN is a programming language that helps build test cases for any applications, as long as one can interact with the application by sending and receiving messages. MSC, on the other hand, is generally used to provide snapshots of the behavior of a system such as examples of what happens when certain messages are sent between certain instances of a system. TTCN allows such things as variables, control-flow, data types, etc., all of which are essentially missing in MSCs, which only address values, not data. Test cases so generated can be exported to an SDL simulator (Telelogic Tau in our case) for SDL simulation. A conversion of these tests to an interchange format Machine Processable (MP; defined in the TTCN Standard ISO-9646-3) may be necessary. These test cases can serve as a good starting set for coverage-based measurement. If the resulting coverage does not meet our goal (e.g., we want to have 90% block coverage, but the test cases generated at the first stage only give 60% coverage), we can use the hints provided by CATSDL to generate additional test cases to effec-

W.E. Wong et al. / Computer Networks 42 (2003) 359–374

371

tively increase the coverage (from 60% to 90% in our example). Test cases generated at this second stage can be saved either in the TTCN format or the MSC format. An important point worth noting is that although the weights computed from the coverage point of view provide useful hints to help generate test cases so that the maximum coverage can be added in each single execution, these weights are not used to identify the ‘‘core functionality’’. Nor are they used to prioritize the functional criticality. If certain parts of an SDL specification are very critical from the functionality point of view, a good strategy is to make sure such parts are tested first (regardless of their weights) before any other parts of the specification are covered. The coverage of these parts can be verified by examining the display (such as Fig. 7) provided by CATSDL . These functionally critical parts can be covered first before applying the weights (which are recomputed after the critical parts have been tested) to the remaining uncovered parts in order to decide which one of them should be covered first. Of course, to identify which parts are critical to the core functionality requires some expert knowledge of the specification being tested. Another important concept is the incremental approach. Studies such as [9,24] have reported that the block coverage criterion in general is easier to be satisfied than the decision criterion which is then easier satisfied than the all-uses criterion. Although such a conclusion is based on testing C programs, the same should also apply to testing SDL specifications. Hence, a good incremental testing strategy is to apply the block criterion first, followed by the decision criterion, resources permitting, and finally by the data flow (such as c-use, p-use, and all-uses) criteria.

6 and 7) are identified as ‘‘hot spots’’ highlighted in red as they have the highest weight 7. This implies that a test that covers either one of these blocks will increase the number of blocks to be covered by at least seven. Since such a weight is computed by a conservative approach, the actual improvement on block coverage can be more than seven blocks. Indeed, this is the case. After executing ‘‘Test Case 0001’’ which covers block 6 (one of the ‘‘hot spots’’), the block coverage is increased to 80% (Fig. 8). The updated display of the sender process with respect to the block criterion is in Fig. 7. In this figure, we can clearly see that many blocks are in white because they are covered by ‘‘Test Case 0001.’’ Also, the hot spots, i.e., the remaining uncovered blocks with the highest weights, move to different locations to provide different hints on how to generate the next effective test after ‘‘Test Case 0001’’. The difference between the weights in the initial display and those in the updated display after executing ‘‘Test Case 0001’’ can be clearly identified by comparing Figs. 4 and 7. The highest weight in our case is reduced from 7 to 1. This is consistent with the common understanding in coverage-based testing that it becomes more and more difficult to further increase the coverage after a few tests have been executed. For the purpose of comparison, suppose instead of taking advantage of the hints provided by CATSDL to generate a test to cover the hot spot, a tester generates another test (‘‘Test Case 0002’’ in Fig. 8) by which the highest weight of all the blocks covered is only 5. We notice a clear difference between the coverage (block in our case) achieved by executing ‘‘Test Case 0001’’ (which is 80%) and that by ‘‘Test Case 0002’’ (which is only 56%). A similar difference is also found with respect to decision, c-use, p-use, and all-uses criteria.

5. An illustration on the use of CATSDL in effective test generation

6. Conclusion and future research

One of the most important facilities of CATSDL is to provide useful hints through a dominator analysis to help generate tests which can increase the coverage as much as possible in each single execution. Referring to Fig. 4, two blocks (blocks

We propose a white box-based coverage testing on software architectural design in SDL. The coverage is measured directly on SDL specifications instead of the systems implemented based on these specifications. Five different coverage criteria are used. Two of them are control-flow-based

372

W.E. Wong et al. / Computer Networks 42 (2003) 359–374

(block and decision) and three are data-flow-based (c-use, p-use, and all-uses). We emphasize that the focus of coverage testing should not be limited to measuring the coverage only. We should also think one step ahead in how coverage data collected during testing can help practitioners in subsequent activities such as debugging and understanding SDL specifications. A coverage analysis tool, CATSDL , was implemented to aid in testing software architectural design written in SDL. In addition to reporting the current coverage with respect to a set of tests on an SDL specification, some of the special facilities of CATSDL also provide the following important features: • detailed coverage reports with respect to each test case, each coverage criterion, and each SDL process; • customized reports by decomposing an overall cumulative coverage with respect to selected tests on certain parts of a given SDL specification; • an easy way to construct an execution slice (simply by converting the coverage data collected during the testing into another format) with respect to any test or a group of tests to support various heuristics (similar to those developed for C programs) for smart debugging and better understanding of the underlying SDL specification; • user-friendly interfaces for visualizing controlflow-based and data flow-based coverage in SDL specification as well as in its control flow graph; • hints through a dominator analysis to guide testers in generating effective tests to increase coverage as much as possible with as few tests as possible. Finally, for future research an interesting and important avenue of study is to apply CATSDL to several ongoing industry projects to examine how much time testers can save by using the hints provided by our tool in generating tests to achieve high coverage in comparison to the approach without using CATSDL . In addition, it is also important to incorporate such hints provided into a

process for automatic test generation. We have already developed a test generation prototype, TestGen [22], for this purpose. Our next step is to integrate TestGen with technologies that can automatically (or semi-automatically) create SDL specifications for software architectural designs that are represented in different models or even not yet represented by a formal model. The scalability of CATSDL also needs to be studied to provide more information on how it can be applied in a real-life context. Other interesting questions are: can we make a good prediction based on the experience of testing an SDL specification about how difficult it would be to test the corresponding implementation? Can such a prediction help us design for testability––at least from the coverage point of view?

Acknowledgements This work is supported in part by the University of Texas at Dallas and in part by CAPES under grant 1692/01-1. The authors want to thank Telelogic for providing a free education license of the Tau suite. Thanks as well to the vSuds research team at Telcordia Technologies (formerly Bellcore), the technical support group for Tau at Telelogic (in particular Mr. Erik Mats) and Mr. Auri Vincenzi at USP for many useful discussions.

References [1] P. Ammann, P. Black, W. Majurski, Using model checking to generate tests from specifications, in: Proceedings of International Conference on Formal Engineering Methods, Brisbane, Australia, December 1998, pp. 46– 54. [2] H. Agrawal, Dominators, super blocks, and program coverage, in: Proceedings of the 21st Symposium on Principles of Programming Languages, Portland, OR, January 1994, pp. 25–34. [3] H. Agrawal, J.J. Li, W.E. Wong, et al., Mining system tests to aid software maintenance, IEEE Computer 31 (7) (1998) 64–73. [4] J.M. Atlee, J. Gannon, State-based model checking of event-driven systems requirements, IEEE Transactions on Software Engineering 19 (1) (1993) 24–40.

W.E. Wong et al. / Computer Networks 42 (2003) 359–374 [5] F. Belina, D. Hogrefe, A. Sarma, SDL with Applications from Protocol Specification, Prentice-Hall, Englewood Cliffs, NJ, 1991. [6] C. Bourhfir, R. Dssouli, E.M. Aboulhamid, N. Rico, A test case generation tool for conformance testing of SDL specifications, in: Proceedings of the 9th SDL Forum, Montreal, Canada, Elsevier, Amsterdam, June 1999, pp. 405–419. [7] L. Bromstrup, D. Hogrefe, TESDL: experience with generating test cases from SDL specifications, in: Proceedings of the 4th SDL Forum, Elsevier, Amsterdam, 1989, pp. 267–279. [8] E. Clark, J. Wing, Formal methods: state-of-the-art and future directions, ACM Computing Surveys 28 (4) (1996) 626–643. [9] L.A. Clarke, A. Podgurski, D.J. Richardson, S.J. Zeil, A formal evaluation of data flow path selection criteria, IEEE Transactions on Software Engineering 15 (11) (1989) 1318– 1332. [10] J. Ellsberger, D. Hogrefe, A. Sarma, SDL: Formal objectoriented language for communicating systems, PrenticeHall, Englewood Cliffs, NJ, 1997. [11] A. Ek, J. Grabowski, D. Hogrefe, R. Jerome, B. Koch, M. Schmitt, Towards the industrial use of validation techniques and automatic test generation methods for SDL specifications, in: Proceedings of the 8th SDL Forum, Evry, France, Elsevier, Amsterdam, September 1997. [12] P.G. Frankl, E.J. Weyuker, A formal analysis of the faultdetecting ability testing methods, IEEE Transactions on Software Engineering 19 (3) (1993) 202–213. [13] N. Goga, Comparing TorX, Autolink, TGV and UIO test algorithms, in: Proceedings of the 10th SDL Forum, Copenhagen, Denmark, June 2001, Lecture Notes in Computer Science, vol. 2078, Springer, Berlin, 2001, pp. 379–402. [14] J. Grabowski, B. Koch, M. Scmitt, D. Hogrefe, SDL and MSC-based test generation for distributed test architectures, in: Proceedings of the 9th SDL Forum, Montreal, Canada, Elsevier, Amsterdam, June 1999, pp. 389–404. [15] R.G. Hamlet, R. Taylor, Partition testing does not inspire confidence, IEEE Transactions on Software Engineering 16 (12) (1990) 1402–1411. [16] J.R. Horgan, S.A. London, Data flow coverage and the C language, in: Proceedings of the 4th Symposium on Software Testing, Analysis, and Verification, Victoria, BC, Canada, October 1991, pp. 87–97. [17] D.C. Luckham, J.J. Kenney, L.M. Augustin, J. Vera, D. Bryan, W. Mana, Specification and analysis of system architecture using rapide, IEEE Transactions on Software Engineering 21 (4) (1995) 336–354. [18] G. Luo, A. Das, G. Bochmann, Software test selection based on SDL specification with Save, in: Proceedings of 5th SDL Forum, Glasgow, Elsevier, Amsterdam, 1991, pp. 313–324. [19] F. Kristoffersen, L. Verhaard, M. Zeeberg, Test derivation for SDL based on ACTs, in: Proceedings of FORTEÕ92, Lannion, France, 1992, pp. 373–388.

373

[20] P.B. Kruchten, The 4 + 1 view model of architecture, IEEE Software 12 (6) (1995) 42–50. [21] J.J. Li and J.R. Horgan, A tool for diagnosis and testing software design specifications, in: Proceedings of International Conference on Dependable Systems and Networks, New York, NY, June 2000. [22] J.J. Li, W.E. Wong, Automatic test generation from communicating extended finite state machine (CEFSM)-based models, in: Proceedings of International Symposium on Object-Oriented Teal-time Distributed Computing, Washington, DC, May 2002, pp. 181– 185. [23] P. Piwowarski, M. Ohba, J. Caruso, Coverage measurement experience during function test, in: Proceedings of International Conference on Software Engineering, Baltimore, MD, May 1993, pp. 287–301. [24] S. Rapps, E.J. Weyuker, Selecting software test data using data flow information, IEEE Transactions on Software Engineering SE-11 (4) (1985) 367–375. [25] J. Offutt, A. Abdurazik, Generating tests from UML specifications, in: Proceedings of International Conference on the Unified Modeling Language, Fort Collins, CO, October 1999, pp. 416–429. [26] N. Sidorova, M. Steffen, Verifying large SDL-specifications using model checking, in: Proceedings of the 10th SDL Forum, Copenhagen, Denmark, June 2001, Lecture Notes in Computer Science, vol. 2078, Springer, Berlin, 2001, pp. 403–420. [27] A.S. Tanenbaum, Computer networks, Prentice-Hall, Upper Saddle River, NJ, 1996. [28] Telelogic Tau 4.3 Telelogic AB, Malmo, Sweden, 2002. [29] H. Ural, K. Saleh, A. Williams, Test generation based on control and data dependencies within system specifications in SDL, Computer Communications 23 (7) (2000) 609– 627. [30] N. Wilde, M.S. Scully, Software reconnaissance: mapping program features to code, Software Maintenance: Research and Practice 7 (1) (1995) 49–62. [31] W.E. Wong, J.R. Horgan, S. London, A.P. Mathur, Effect of test set size and block coverage on fault detection effectiveness, in: Proceedings of International Symposium on Software Reliability Engineering, Monterey, CA, November 1994, pp. 230–238. [32] W.E. Wong, J.R. Horgan, S. London, A.P. Mathur, Effect of test set minimization on fault detection effectiveness, Software – Practice and Experience 28 (4) (1998) 347– 369. [33] W.E. Wong, S.S. Gokhale, J.R. Horgan, K.S. Trivedi, Locating program features using execution slices, in: Proceedings of Symposium on Application-Specific Systems and Software Engineering Technology, Richardson, TX, March, 1999. [34] W. Zage, D. Zage, J.M. McGrew, N. Sood, Using design metrics to identify stress points in SDL designs, Technical Report SERC-TR-176-P, Ball State University, Muncie, Indiana, 1998.

374

W.E. Wong et al. / Computer Networks 42 (2003) 359–374

W. Eric Wong received his B.S. in Computer Science from Eastern Michigan University, and his M.S. and Ph.D. in Computer Science from Purdue University. In 1997, Dr. Wong received the Quality Assurance Special Achievement Recognition from Johnson Space Center, NASA. Currently, he is an Associate Professor in Computer Science at the University of Texas at Dallas. Before joining UTD, he was with Telcordia Technologies (formerly Bellcore) as a Senior Research Scientist and a project manager in charge of the initiative for Dependable Telecom Software Development. Dr. WongÕs research interests include software testing, reliability, metrics, program comprehension and QoS for program and architectural design-based as well as network communication-based assessment and diagnosis. He served as program chair for Mutation 2000, program co-chair and general co-chair for the IEEE International Conference on Computer Communications and Networks in 2002 and 2003, respectively. He has also served as a program committee member of several international conferences. He is a senior member of IEEE. Tatiana Sugeta received her B.S. in Computer Science in 1996 from the Maringa State University, Brazil, and her M.S. in Computer Science in 1999 from the University of Sa~ o Paulo, Brazil, where she is currently a Ph.D. student in Computer Science. From March 2002 to February 2003 she was a visiting scholar in Computer Science at the University of Texas at Dallas. Her research interests are in the areas of software testing and formal methods.

J. Jenny Li is currently a researcher at Avaya Labs research (formerly part of Lucent Bell Labs). Her research focus includes network reliability, multimedia on wireless and automatic testing. Prior to joining Avaya in 2001, she was a research scientist in Software Environment Research department at Bellcore (now Telcordia). Dr. Li received her Ph.D. from University of Waterloo, Canada, in 1996. She worked as a member of the Scientific Staff at the Data Packet Network Division of Bell-Northern Research Ltd. (now Nortel) before attending the University of Waterloo in the fall of 1991.

Jose C. Maldonado received his B.S. in Electrical Engineering/Electronics in 1978 from the University of Sa~ o Paulo, Brazil, his M.S. in Telecommunications/Digital Systems in 1983 from the National Space Research Institute (INPE), Brazil, and his D.S. degree in Electrical Engineering/Automation and Control in 1991 from the University of Campinas, Brazil. He worked at INPE from 1979 up to 1985 as a researcher. In 1985 he joined the Computer Science Department of the Institute of Mathematics and Computer Science at the University of Sa~ o Paulo (ICMC-USP) where he is currently the Head of the Department. His research interests are in the areas of software engineering environments, software testing and validation, reliability and software quality evaluation. He is a member of the Brazilian Computer Society.