Event-driven web application testing based on model-based mutation testing

Event-driven web application testing based on model-based mutation testing

Information and Software Technology 67 (2015) 159–179 Contents lists available at ScienceDirect Information and Software Technology journal homepage...

5MB Sizes 2 Downloads 88 Views

Information and Software Technology 67 (2015) 159–179

Contents lists available at ScienceDirect

Information and Software Technology journal homepage: www.elsevier.com/locate/infsof

Event-driven web application testing based on model-based mutation testing Elahe Habibi, Seyed-Hassan Mirian-Hosseinabadi ⇑ Sharif University of Technology, Computer Engineering Department, Tehran, Iran

a r t i c l e

i n f o

Article history: Received 12 September 2014 Received in revised form 6 July 2015 Accepted 7 July 2015 Available online 13 July 2015 Keywords: Event-driven software Web application Event Test case generation Functional graph Mutation analysis

a b s t r a c t Context: Event-Driven Software (EDS) is a class of software whose behavior is driven by incoming events. Web and desktop applications that respond to user-initiated events on their Graphical User Interface (GUI), or embedded software responding to events and signals received from the equipment in its operating environment are examples of EDS. Testing EDS poses great challenges to software testers. One of these challenges is the need to generate a huge number of possible event sequences that could sufficiently cover the EDS’s state space. Objective: In this paper, we introduce a new six-stage testing procedure for event-driven web applications to overcome EDS testing challenges. Method: The stages of the testing procedure include dividing the application based on its structure, creating functional graphs for each section, creating mutants from functional graphs, choosing coverage criteria to produce test paths, merging event sequences to make longer ones, and deriving and running test cases. We have analyzed our proposed testing procedure with the help of four metrics consisting of Fault Detection Density (FDD), Fault Detection Effectiveness (FDE), Mutation Score, and Unique Fault. Results: Using this procedure, we have prepared prioritized test cases and also discovered a list of unique faults by running the suggested test cases on a sample real-world web application called Academic E-mail System. Conclusion: We propose that our suggested testing procedure has some advantages such as creating functional graphs with requirements document, resolving the problem of removing infeasible test cases with these graphs and conditions on the ‘‘add edge’’ operator before creating mutants. But the suggested testing procedure, like any other method, had some drawbacks. Because most of the stages in the approach were performed manually, the testing time was increased. Ó 2015 Elsevier B.V. All rights reserved.

1. Introduction Technologies that are currently widely used in software systems from sophisticated GUIs to distributed and network communication frameworks have become ever more complex. Each of these technologies brings along changes not only to the way software is used but also to the way it is designed and developed. For example, the advent of GUIs led to the birth of a new software behavior and design model, i.e. Event-Driven Software (EDS) as opposed to the old procedure-oriented software [1]. ⇑ Corresponding author at: Department of Computer Engineering, Sharif University of Technology, Azadi Ave., Room: 807, CE New Building, Tehran, Iran. Tel.: +98 21 66166629 (O). E-mail addresses: [email protected] (E. Habibi), [email protected] (S.-H. Mirian-Hosseinabadi). http://dx.doi.org/10.1016/j.infsof.2015.07.003 0950-5849/Ó 2015 Elsevier B.V. All rights reserved.

A procedure-oriented program asks for a specific input from the user and then creates the desired output. In such applications, the user cannot select any other options and should merely provide input to the program in a predetermined order [1]. In contrast, EDS provides the user with different options for the way data is entered. An ‘‘event’’ is defined as an action taken by a user while using an EDS. The application reacts to the action by changing its state, and then waiting for the next event. So, in this environment, users are no longer bound to entering data in a predefined order providing users with much more flexibility and freedom in interacting with the program [2]. However, this flexibility comes at the cost of bringing much more complexity to the behavior of the software, allowing numerous permutations of events that could potentially cause unpredictable interactions and perhaps software failures [3–5]. This in turn further complicates the way such software can be tested. Various approaches have been suggested for EDS testing.

160

E. Habibi, S.-H. Mirian-Hosseinabadi / Information and Software Technology 67 (2015) 159–179

In this paper, we present a new testing procedure for event-driven web applications. This procedure offers a method to tackle the complexity of EDS testing in six stages. During the initial stage, we break down the application into different sections by applying the structural division method. In the second stage, graphs will be created from the functionalities of each section. In the following step, the mutated graphs can be inferred from the resulting graphs. To produce test paths in the fourth stage, the coverage criteria will be selected. Longer sequences will be created by merging event sequences in the fifth stage. In the final stage, the derived test cases will be run on the target software, a sample e-mail system, to find the faults. In addition, we evaluate our testing procedure to measure its effectiveness and ability in finding faults through four metrics, i.e. Fault Detection Density (FDD), Fault Detection Effectiveness (FDE), Mutation Score and Unique Fault. 2. Related work In this section, we present a categorized survey of several existing works in EDS testing. These related works are classified according to the different strategies they have chosen for their test case generation methods. 2.1. Generating event-flow graph An event-flow graph consists of nodes which represent events and edges that show the order of occurrence of the two attached events. On this basis, each event can show behavioral changes in EDS, e.g. changing the color of a button on a page in a web application [6]. Memon [6] has presented a technique based on the event-flow graph. In this paper, the event-flow graph is compared with other graphs, including a Finite State Machine (FSM), Complete Interaction Sequence (CIS) [4] and Genetic Model [7]. The suggested solution is categorized as a semi-automatic method, because of using the GUIRipper tool for dividing the application and generating the event-flow graph. Creating overlapped sections when dividing the application [4] and detecting only enabled features on the page are among the problems of using the GUIRipper. Belli et al. [8] have introduced an Event Sequence Graph (ESG) to model the behavior of GUI. In this model, faults can be detected by some refinement of the graph. As a result, Test cases can be created and run using the GATE tool. The GATE works with the ESG matrix and user input. Herbold et al. [9] have suggested usage-based testing for EDS. This method has three layers for finding when and which functionality of the application is used by the user. In the first layer users’ actions are recorded. These recorded actions are converted to events in the following layer. Finally, in the last layer the usage profile is created from the events. To reduce the number of event sequences, the probability of the next event is composed on the edges of the usage profile. An inappropriate selection of users often causes problems for this technique. Herbold [10] has also introduced three new approaches for test case generation based on usage-based testing. In the first approach test paths are selected with high probability. However, the number of valid sequences will grow exponentially. In the second approach the hybrid method, including the first approach and the random walk technique, is used to reduce the number of event sequences. The third approach for bigger scales uses a heuristic greedy strategy to provide more gain in selecting test paths. In addition, Herbold and Harms [11] have developed an AutoQUEST platform for EDS testing. On this platform many testing techniques, such as usage-based testing, are implemented. One of the main goals of AutoQUEST is to provide

a testing technique independent from platforms. AutoQUEST has four components, including a ‘‘Core Library’’ section for implementing the abstract event API, a ‘‘Testing Techniques’’ section which consists of coverage criteria and test case generation, a ‘‘Plug-in’’ section to gather up related classes and a ‘‘Front End’’ section to contact with users. Tonella et al. [12] have proposed a dynamic analysis for the extraction of a web application model. For this model the HTML code created by the server and input values created by user actions on the web application are required. In this paper the model created by ReWeb tool is presented as a Markov Chain with probability values on the edges. In addition using the TestWeb tool, including Test Generator and Test Executer, causes their suggestion to be known as a semi-automatic method. In the test section test criterion can be selected by the tester. In this method, like [9], if the input values are not selected completely by the user the model does not cover all functionalities of web application. Yuan et al. [13,14] have developed a new style based on the ‘‘run-time state as feedback’’ technique. This technique requires an initial test suite to execute an application. During the execution several states of the application are recorded. After gathering up the states an Event Semantic Interaction (ESI) graph of the target application is created using the recorded items and formal predicates. Finally, test cases are created by edge-pair coverage in the ESI graph. This technique is suitable for testing predictable events [15]. In addition, Yuan et al. [16] have used evolutionary algorithms to have a complete set of states and fewer infeasible test cases. In the context of formal predicates Xie et al. [17] have suggested a new abstract model for GUI. The first step of the model is converting the event flow graph to the Event Interaction Group (EIG). In the second step formal predicates of the Minimal Effective Event Context of an event are produced using a mutation technique; and finally, test cases are extracted from the mutants. 2.2. Generating covering array In this strategy, CA (N: t, k, v) is defined as a covering array, where ‘‘v’’ stands for the number of events; ‘‘t’’ indicates the interactions between events (e.g. interaction between the two events, 2-way); ‘‘k’’ is the length of event sequence that comes from the graph and ‘‘N’’ is the total number of test cases [18]. Yuan et al. [18] and Yilmaz et al. [19] have introduced a method with the purpose of creating long event sequences instead of short ones. First, an event interaction graph is created using the GUIRipper tool. Next, infeasible sequences are detected by grouping the similar events. Afterwards, the covering array of event sequences is created; and finally, test cases can be run again to produce more new event sequences. Yuan et al. [20] have also suggested combinatorial testing based on the covering array. This method can reduce the exponential creation of test cases by using the location of events in the sequences, but it cannot produce a perfect test due to the absence of other events than those selected. In the method proposed by Yuan et al. [18] and Yilmaz et al. [19], stages such as grouping similar events or separating infeasible event sequences require manual work [15]. Huang et al. [21] have used a genetic algorithm [7] to automate these stages. In this algorithm, a function of conditions for detecting infeasible event sequences is defined. But this all hinges on the function being properly defined otherwise this modified algorithm may fail. The method works properly for small programs, but is not suitable for large ones [15]. 2.3. Generating mutant graphs Testing methods based on the syntactic structure can be divided into two categories. In one category valid strings of the program

E. Habibi, S.-H. Mirian-Hosseinabadi / Information and Software Technology 67 (2015) 159–179

are examined while in the other category the focus is mainly on creating invalid strings (although valid strings are also permitted). This testing strategy is known as mutation testing. There are four key concepts in mutation testing [22–24]:  Ground string: A string that belongs to the grammar.  Mutation operator: A rule that specifies syntactic variations of strings generated from a grammar.  Mutant: A program that is the result of applying the mutation operators. The number of mutants depends on the variation of the operators.  Killing a mutant: A mutant is killed if and only if the output of running mutant is different from the output of the program. Mutation testing is grouped into four different approaches: (a) program-based,1 (b) integration,2 (c) input-based3 and (d) model-based. In the model-based approach, the syntactic structure of the program is modeled using a state machine. Each mutant can be created by defining graph-based mutation operators, e.g. add or remove edges [22,24]. Beyazıt et al. [25] have compared the state-based graph with the event-flow graph by applying three different operators including addition, removal, and the inversion of edges. In this paper, edge coverage has been used as the coverage criterion for producing test cases. The results of their comparison show that the power of finding errors in the event-flow graph is almost the same as the state-based graph. In addition, the remove operator was found to be a weak operator for detecting errors in both graphs. Belli et al. [26] have introduced two more operators (adding and removing nodes), in addition to the earlier defined operators. In this paper, the edge coverage for the state-base graph and the edge-pair coverage for the event-flow graph have been chosen as the coverage criteria. The results show that the state-based graph is (about 40%) more powerful at finding errors than the event-flow graph. Using further operators and different types of coverage criterion can explain the difference between the results produced by this method and the results of the other methods. Belli et al. [27] have also presented a framework for the mutation testing. In this framework, all earlier operators are used and both edge-pair and the 3-length-paths are defined to produce test paths. In addition, the pattern of how to validate the invalid model due to the actions of the mutation operators is described in the framework. This validation is done through checking the start and the end nodes and removing non-useful nodes.4 Mutation testing has also applied to other models such as a sequence model in the object-oriented program [28], the automata in the real-time applications [29], or the component model [30]. Operators introduced in these papers are similar to those in previous methods. Based on the type of the domain in which the application under the test is meant to be used, the results may have different meanings. 3. Event-driven web application testing procedure As far as their structure is concerned, web applications have seen the greatest amount of change and improvement in the way they are built. For instance, static web applications have been replaced with dynamic web applications. The dynamism basically comes from the changes that occur to them at the time, for instance changes in news content or user interactions (e.g. web 1 2 3 4

Program-based mutation focuses on a code of the program. Integration mutation focuses on mutating the connections between components. Input-based mutation focuses on testing for invalid data. Nodes that are not reachable from the start node.

161

games) [31]. A dynamic page designed using Flash5 or the AJAX6 technologies can have unpredictable events. In these pages only one part of the page may be changed after selecting an object [33]. The suggested testing procedure is focused on dynamic web applications. The fundamental EDS testing problem is detecting infeasible test cases. Some of the mentioned methods have tried to remove infeasible test cases by applying automatic algorithms (e.g. [21]). Therefore, EDS testing needs a technique that avoids producing infeasible test cases. Generating event sequences using behavioral and functional graphs helps to eliminate infeasible test cases, and that is why we set up our approach based on these types of graphs. On the other hand, using one graph to describe the whole application can result in the complex and improper design of test cases. Therefore, an application under test should be divided into smaller groups of components where each object is placed in merely one group. In other words, each group with its objects must be independent. In our suggested testing procedure, the application is divided into sections based on the structure and functionality of each object. Our proposed testing procedure consists of the following six stages: 1. 2. 3. 4. 5. 6.

Dividing the application based on its structure. Creating a functional graph for each section. Creating mutant graphs from functional graphs. Selecting the coverage criteria and producing test paths. Choosing candidates for merging event sequences. Deriving and running test cases. These stages are shown in Fig. 1.

3.1. Dividing the application based on its structure In the first step, four lists of items should be prepared for the application: 1. List of Objects: identifying all the objects and their functionalities with the help of analysis and requirement documents. 2. List of Events7: identifying all the possible events with the help of objects’ list and analysis documents such as use case or class diagram. 3. List of Results: ‘‘what are the results of running an event?’’; specifying the results of each identified event with the help of analysis and design documents. 4. List of affected parts: ‘‘which part of the application is affected or updated by the running of the certain event?’’; identifying all affected parts for each event. Depending on the type of technology used in the page, such as Ajax or Flash, web pages can be updated asynchronously by exchanging small amounts of data with the server. This implies that it is possible to update some parts of a web page, without reloading the whole page. On this basis, we can divide the objects of the web application into two groups: 1. Inside-page objects: Objects that are making changes to some parts of the page.

5 Flash is a multimedia and software platform used for creating vector graphics, animation, games and rich Internet applications (RIAs) [32]. 6 AJAX (Asynchronous JavaScript and XML) is a group of interrelated Web development techniques used on the client-side to create asynchronous Web applications [32]. 7 Executing an object in a web page is known as an event.

162

E. Habibi, S.-H. Mirian-Hosseinabadi / Information and Software Technology 67 (2015) 159–179

Fig. 1. The stages of suggested testing procedure.

2. With respect to the list of affected parts, the groups should be edited or created. 3. New groups that have similar functionalities on their members can be joined together or be created in the different sections.

Fig. 2. Functional graph.

2. Outside-page objects: Objects that are making changes to the entire page. When Inside-page objects are selected only certain parts of the page are updated. Depending on the functionality of each object and the updated parts, different groups of objects, called sections, can be defined. There are some guidances to meet this goal: 1. The objects with the same results of events should be grouped together based on the list of results.

The web application can be divided into various sections separated by a number or a name. In contrast, when an Outside-page object is selected the entire web page is reloaded. The functional state-based graphs are created based on Inside-page objects in the subsequent stage. However, some of the well-known Inside-page events, such as opening a menu, are stateless. So, Inside-page objects can be divided into state-based Inside-page objects and stateless Inside-page objects. The output of the first stage includes Inside-page and Outside-page events. However, only the state-based Inside-page events are required for the next step. 3.2. Creating a functional graph for each section At this stage event-flow graphs, namely functional graphs, are created by the functionalities of each section. These functionalities can be extracted from the requirement documents. Each node in

Table 1 Types of graph coverage criteria [24]. Name (abbreviation)

Name

Description

Terms

NC

Node Coverage

– Test requirements (TR): Describe properties of test paths

EC

Edge Coverage

EPC

Edge Pair Coverage

CPC

Complete Path Coverage Specified Path Coverage Prime Path Coverage

Test set T satisfies node coverage on graph G iff for every syntactically reachable node n in N, there is some path p in path(T) such that p visits n. Test Requirement (TR)⁄ contains each reachable node in G TR contains each reachable path of length up to 1, inclusive, in G TR contains each reachable path of length up to 2, inclusive, in G TR contains all paths in G TR contains a set S of test paths, where S is supplied as a parameter Simple path: A path from node ni to nj is simple if no node appears more than once, except possibly the first and last nodes are the same Prime path: A simple path that does not appear as a proper subpath of any other simple path TR contains each prime path in G

Given a set TR of test requirements for a criterion C, a set of tests T satisfies C on a graph if and only if for every test requirement in TR, there is a test path in path(T) that meets the test requirement tr

SPC PPC

– Test criterion: The rules that define test requirements – Satisfaction:

E. Habibi, S.-H. Mirian-Hosseinabadi / Information and Software Technology 67 (2015) 159–179

163

Fig. 3. Graph coverage criteria subsumption (Adapted from Ammann and Offutt [24]).

Table 2 Test path tools. Name

Characteristics

PESTT1

1. Supports codes written in Java 2. Produces control-flow graph of the code 3. Supports all graph coverage criteria 1. Supports web and Java applications 2. Produces event-flow graph without the code 3. Supports shortest path coverage 1. Supports codes written in C# 2. Produces control-flow graph of the code 3. Supports complete path coverage 1. No language is defined 2. Produces requirements and test paths from the entry graph 3. Supports all graph coverage criteria

TestOptimal2

SpecExplore3

Web Graph Coverage4

1 PESTT testing tool, available at: http://www.methodsandtools.com/tools/ pesttunittesting.php. 2 TestOptimal, model based testing, available at: http://TestOptimal.com. 3 SpecExplore, model based testing, available at: http://msdn.microsoft.com/ en-us/library/ee620411.aspx. 4 Graph coverage testing tool, available at: http://cs.gmu.edu/~offutt/ softwaretest.

the graph represents one state of the program and the event composed on the edge describes the relation between the two states. Fig. 2 is an example of the functional graph representing the four different states of an E-mail application. According to the number of the sections the output of the second stage can be one or more functional graphs. 3.3. Creating mutant graphs from functional graphs Different mutants can be created to discover the behavior of each event in every state of the program. In this approach, based on the definition of mutation testing,8 the third classification (model-based mutation testing) is used to generate mutant graphs by applying mutation operators. In our testing procedure we need mutation operators to add events to the functional graphs. Since the operators, such as delete node, add node, delete edge or reverse edge, cannot create the new events, add edge is considered as the mutation operator. As a result new edges can be produced by changing the source and the target state of each event with another state in the graph. However, 8

Section 2.3: Generating mutant graphs.

changing the states may have some limitations. In some cases the source or the target of the event is mandatory and cannot change place with any other states. For example, the e-mail moves to the ‘‘Sent folder’’ by the ‘‘Compose’’ event. This target is mandatory and cannot change with any other folders like the ‘‘Inbox folder’’. These restrictions can be easily extracted from the functional graphs. Thus, we can eliminate the infeasible test cases by detecting all restrictions. According to the number of created edges the output of the third stage can be one or more mutant graphs for each functional graph. 3.4. Producing test paths The goal of the fourth stage is to produce test paths from the functional and mutant graphs. At this stage a proper coverage criteria (Table 1) and an automatic tool are required. As shown in Fig. 3 [24], Complete Path Coverage (CPC) can produce all test paths from a graph. Nevertheless, the number of paths obtained can be infinite through a loop in the graph. Thus, to satisfy all the coverage criteria, Prime Path Coverage (PPC) and Edge-Pair Coverage (EPC) – the two subsets of CPC – should be selected for the coverage criteria. There are many different tools to extract the test requirements and test paths from the graph. As can be seen in Table 2, we listed four tools with different characteristics (features of other tools are the subset of these characteristics). According to Table 2 PESTT and SpecExplore are code-based and not suitable for the suggested testing procedure. In addition, TestOptimal does not support PPC and EPC. So, Web Graph Coverage is used as the tool to produce test paths in our approach. 3.4.1. The Web Graph Coverage The Web Graph Coverage has a simple user interface to extract the test requirements and test paths from the graph. Connecting to the internet is necessary while working with the tool.9 3.5. Choosing candidates for merging event sequences In the earlier stage all event sequences are extracted from state-based Inside-page objects. Moreover, the interactions between these sequences and other objects, including stateless Inside-page and Outside-page objects need to be tested. 9

See Appendix A.

164

E. Habibi, S.-H. Mirian-Hosseinabadi / Information and Software Technology 67 (2015) 159–179

Fig. 4. Merging representative set.

Fig. 5. E-mail system as case study.

Table 3 Sample of events and objects list. List of objects

List of events

List of objects

List of events

Inbox folder Trash button Search box Junk folder New folder Send button Read button Flagged button Columns menu Sort menu Print button Select-unread button

Move to folders Delete Search Move to folders Move to folders Send Read Flag Add information email columns Sort the letters Print the email Select unread

Select-current page button Reply button Filter menu Filter–all button Filter–flagged button Mark message menu Display borders Filter–unanswered button Select-all button Address book button Log out button

Select current page Reply Open filter menu Filter all Filter flagged Opening mark messages Chang the size of borders Filter unanswered Select all Click the address book Click on the log out

However, a huge number of sequences are created by a combination of all sequences and these two groups of objects. So, we need to choose candidates of test paths which represent the functionalities of each section. 3.5.1. Choosing the representative set of the inside-page events The most significant criteria for choosing the representative set is to have all objects of the section in the set. In addition, different criteria in the prioritization of test cases can be useful to this end.

The paths with the highest priority are selected as the representative of each section. To produce the representative set we use three parameters including the number of coverage test requirements of each test path; the number of prerequisites for running the test path; and the cost of run-time for each test path [34,35,36]. The priority of test paths is calculated as follows: 1. All test paths are ordered from highest to lowest based on the number of coverage test requirements.

E. Habibi, S.-H. Mirian-Hosseinabadi / Information and Software Technology 67 (2015) 159–179

165

Table 4 Sample of events and related results.

Table 5 Sample of events and affected parts.

2. The paths with the highest number of coverage test requirements and with the lower number of prerequisites receive a higher priority than other paths. 3. The paths with lower run-time cost are selected in the last step (the path with lower number of prerequisite also has lower run-time cost). Since the final states have an important role in a test case, the representative set should be at least filled out by each of the final

states. For instance, if the graph has 3-final states, the set must at least contain test paths with these three members, even if these paths have low priority.

3.5.2. Merging representative sets with other objects All of the representative sets should be merged with the stateless Inside-page and Outside-page events to create new test paths. So, there are three different cases:

166

E. Habibi, S.-H. Mirian-Hosseinabadi / Information and Software Technology 67 (2015) 159–179

Fig. 6. Dividing the E-mail application based on its structure.

Table 6 Number of sections and objects. Outside-page objects no.

Stateless inside-page objects no.

State-based inside-page objects no.

Outside-page section no.

Inside-page section no.

9

13

24

1

4

Fig. 7. The functional graph of inside-page Section 1.

1. Merging the representative set of each section with all the stateless Inside-page events. 2. Merging the representative set of each section with all the Outside-page events. 3. Merging the representative set of each section with each other.

3.5.2.1. Merging the representative set with the stateless inside-page events. In order to discover the behavior of each event, we need a direction to merge stateless Inside-page events with all of the representative sets. The merge algorithm in the pseudo code and an example of the algorithm are presented below:

167

E. Habibi, S.-H. Mirian-Hosseinabadi / Information and Software Technology 67 (2015) 159–179

Table 7 The example of limitations in the functional graph of Section 1. Event name

Source and destination of the event

Description

Move to

All nodes of the graph of Section 1 can be placed on Multipoint lines

Delete

All nodes of the graph of Section 1 can be placed on Multipoint line. The ‘‘Trash’’ as destination is mandatory

Create message

All nodes of the graph of Section 1 can be placed on Multipoint line. The ‘‘Compose’’ as destination is mandatory

Search

All nodes of the graph of Section 1 can be placed on Multipoint line. The ‘‘Searched’’ as destination is mandatory The ‘‘Sent’’ as destination is mandatory. The ‘‘Draft’’ and ‘‘Compose’’ as Source are mandatory. No mutant is created by this event

Send

Save

The ‘‘Draft’’ as destination and the ‘‘Compose’’ as Source is mandatory. No mutant is created by this event

Some points:  The ‘‘Compose’’ node can be only accessed through ‘‘Send’’ and ‘‘Save’’ event. So this node should not be placed on any sources or destination (just the mentioned above).  The ‘‘Searched’’ node can be only accessed through ‘‘Search’’ event. So this node should not be placed on any sources.  The ‘‘End’’ node has no functionality and it is unuseful.

Table 8 Detail of created mutants. Number of mutants

Total

Inside-page Section 1

Inside-page Section 2

Inside-page Section 3

Inside-page Section 4

48

26

17

17

108

Fig. 8. A sample of the mutant graph for the inside-page Section 1.

Table 9 Number of test paths for 4 functional graphs.

Table 10 Number of test paths for 108 mutants.

Number of test path

Total

Inside-page Section 1

Inside-page Section 2

Inside-page Section 3

Inside-page Section 4

40

43

16

23

131

Number of test path

Total

Inside-page Section 1

Inside-page Section 2

Inside-page Section 3

Inside-page Section 4

570

241

102

149

1062

168

E. Habibi, S.-H. Mirian-Hosseinabadi / Information and Software Technology 67 (2015) 159–179

Inputs: R: A path in the representative set. (Array) S: Stateless inside-page events. (Circular Link-List)

var Memory = GetMemory(sizeof(R + S)); var Cartesian = GetMemory(sizeof(R ⁄ S)); var stop = 0; var CLen = 0 for (each path in representative set) { while (Cartesian is (Cartesian product of R, S ---> R ⁄ S)) { for (int i, j, k = 0; i < sizeof(|R| + |S|); i++, j++) { if(j >= R.lenght) { if(S[k] not used in Memory) { Memory[i] = S[k]; K++; } } else { Memory[i] = R[j]; While(stop = (S.lenght – 1)) { if(set of (R[j] and S[k]) are not in Cartesian & S[k] not used in Memory) { i++; Memory[i] = S[k]; Cartesian[CLen] = set (R[j], S[k]); K++; CLen++; break; } else //if k is reached the end, it will start from 0 (Circular list) k++; stop++; } stop = 0; } } print(Memory); empty(Memory); } }

3.5.2.2. Merging the representative set with the outside-page events. Unlike stateless Inside-page events, different order of running Outside-page objects do not make sense. In other words, it is not possible to run any other objects after Outside-page events. Consequently, a different algorithm should be taken. In this algorithm, the location of each Outside-page event in the paths of the representative set is considered. The merge algorithm in the pseudo code and an example of the algorithm are presented below: Inputs: R: A path in the representative set. O: One of the Outside-page events.

var Memory = GetMemory(up to (sizeof(R) + 1)); int Memory_Len; for (each path in the representative set) { for(each outside-page event) { for(int i = 0; i < Lenght(R); i++) { for(int j, k = 0; j <= i; j++, k++) { Memory[k] = R[j]; Memory_Len = k; } Memory[Memory_Len + 1] = O; print(Memory); } } }

The Example: Inputs: R: [a, b, c, a, d] and O: [1] Outputs: [a, 1], [a, b, 1], [a, b, c, 1], [a, b, c, a, 1], [a, b, c, a, d, 1] If the paths in the representative set have common elementary events (e.g. two paths are started with [1, 2, . . .]), some of the new event sequences will be duplicated. So, we need to remove these event sequences from the results. 3.5.2.3. Merging the representative set of each section with other sections. To estimate the full number of event sequences in merging the two paths in representative sets we should use the k-combination formula.10 For example, if the number of events in the first and the second path is both six, respectively, paths with 12 events are achieved. The number of test paths can be computed using following formula:

Number of ev ents in the resulted test path!= ðNumber of ev ents in the 1st path!  Number of ev ents in the 2nd path!Þ ¼ 12!=ð6!  6!Þ ¼ 924

The Example: Inputs: R: [a, b, c, a, d] and S: [1, 2, 3, 4] Outputs: [a, 1, b, 2, c, 3, a, 4, d], [a, 2, b, 3, c, 4, a, d, 1], [a, 3, b, 4, c, 1, a, d, 2], [a, b, 1, c, 2, a, d, 3, 4], [a, b, c, a, d, 4, 1, 2, 3]

According to the example, merging the two paths produces many test paths. However, we can see the similar functionality of these paths. Notably, the final states have important effects on the functionality of the graph. These final states create similar 10



n k

n! ¼ k!ðnkÞn! .

169

E. Habibi, S.-H. Mirian-Hosseinabadi / Information and Software Technology 67 (2015) 159–179 Table 11 Candidates for merging event sequences. Name

Number of test path

Candidate test path

Number of coverage test requirements

Number of prerequisites for running

Inside-page Section 1 (representative set)

7

Inside-page Section 2 (representative set)

3

Inside-page Section 3 (representative set)

4

Inside-page Section 4 (representative set)

5

[1, 6, 7, 1, 6, 8] [1, 6, 8, 1, 4, 2] [1, 6, 7, 2, 7, 8] [1, 6, 7, 1, 3, 1] [1, 3, 1, 6, 8, 2] [1, 5,1, 6, 8, 1] [1, 6, 7, 2, 9] [1, 5, 6, 3, 2, 5,4, 5] [1, 3, 2, 5, 4, 6] [1, 3, 2, 5, 4, 7] [1, 2, 4, 1, 5, 1] [1–3, 1, 2] [1, 5, 1, 2, 4] [1–3, 1–3] [1, 3, 5, 1, 4, 3, 5, 1] [1, 4, 3, 5, 1, 4, 6, 1] [1, 4, 3, 5, 1–3] [1, 3, 5, 1, 4, 3, 2] [1, 4, 2, 1, 4]

4 4 4 4 4 4 3 4 3 3 4 3 3 3 4 4 3 3 4

0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0

Table 12 Merging representative sets. Number of test paths

Total

Merging representative sets with stateless inside-page events

Merging representative sets with outside-page events

Merging representative sets with each other

126

421

2-combinations 3-combinations 4-combinations

262 389 420

1618

Table 13 An example of test case. Test paths

Expected results

Actual results

Pass/fail

[1, 5, 3, 1, 7, 8] {Inbox–Move To Folder–Search–Move To Inbox–Move To Draft–Send}

Moving an E-mail to draft folder is not allowed.

A mail can be moved to the draft folder by hand and then it can be send.

Failed

Table 14 Evaluating suggested testing procedure with five methods. Name

FDD (%)

FDE (%)

Mutation score (%)

Unique faults no.

Usage profile [9] Event-driven web-based mutation [25] Event-driven mutation [26] Framework for mutation [27] Event-based model [8] Suggested testing procedure

4.3 0.33 0.28 0.63 5 0.85

100 10 8 3 7 4

– 4 8 10 – 66

1 10 25 31 19 92

functionalities in test paths. On this basis, we can reduce the number of test paths. As can be seen in Fig. 4, we only sample the six new test paths from 924 test paths, created by merging two test paths of Inside-page events. The events of graph #1 and graph #2 are shown with an underline and a double underline respectively. The final nodes of each graph are presented with a color. As a result, the test paths #2, #4 and #6 can be grouped in a one category and the test paths #1, #3 and #5 can be placed in the other category. The transposition of final nodes, make the same functionality for the test paths. So we can choose one test path from each category. Consequently, the program can be stated in two different functionalities by merging the two paths, with two final nodes, based on the transposition of final nodes.

With this level of merging, interaction between the events can be completed. For larger scenarios interactions between more than two sets can also be considered. Therefore, by merging the paths in n sets, n  (n  1) different functionalities can be provided. 3.6. Deriving and running test cases All the test paths that are extracted from functional graphs, mutants and the Merging stage are inputs to the final stage to generate test cases. The test cases are executed and the actual results are compared with the expected results. Lastly, a list of faults in the program can be detected. The output of the final stage is represented as a list of faults arising from running test cases.

170

E. Habibi, S.-H. Mirian-Hosseinabadi / Information and Software Technology 67 (2015) 159–179

4. Case study and results The web applications were designed in the form of the Client/Server side. In this design, the entire page reloaded through a user’s request. However this function, associated with server-side processing, leads to transmitted data redundancy and increase waiting time for users [37,32]. Rich web applications, first introduced in 2002 by the Macromedia company, were created to integrate web applications with desktop software to enhance the performance of these applications [38]. These web-based applications, using technologies such as Ajax, have made it possible to communicate the dynamic data [37]. In traditional web applications the server updates the entire page by user’s requests. In this case the program is locked. So, no more requests can be posted to the server. By contrast, in rich web applications requests are sent by the Ajax engine. The engine renders the user input and shows the answers to the user. If the motor requires more data the server will handle the requests. On this basis, the user is allowed to interact with other parts of the program [32]. Rich web applications are grouped into four different categories: (1) scripting-based, (2) plug-in based, (3) browser-based and (4) web-based desktop [37,38]. In this paper, an example of the first category of rich web applications is used as a case study, namely The E-mail system. The E-mail system is a program for managing mail via the Internet. This program has features such as receiving, sending and deleting letters, punctuating, grouping letters by creating new folders and many others. An open source free E-mail program designed with

Ajax technology has been chosen from http://roundcube.net. Our case study, Academic E-mail system, is a running sample of the E-mail application available at http://mail.ce.sharif.edu (Fig. 5).

4.1. Dividing the application based on its structure For this stage of testing procedure, the sample of required list, including objects, events, results of events and the affected parts, have been presented as an example in Tables 3–5. Table 3 presents the sample of objects and their related events. As can be seen in Table 4, four groups, with a different color, have been created based on the similar related results. Notably, similar results are returned to the similar main actions. For instance, Moving to the certain folder in group #1 or working with emails such as reading or flagging in group #2 are the main actions. In the next step, we can have more groups with the help of the affected parts (Table 5). In this step, we also changed the groups of some events; For example, ‘‘Open filter menu’’ should be joined with another group because of the higher priority of the affected parts than the related results. At this level, we created groups which all of them have both similar related results and affected parts. It should be noted that ‘‘Add information email columns’’ and ‘‘Sort the letters’’ have the same affected parts with group #3 and #4, but cannot join them due to the lack of similarity in the related results. Finally, with the respect to the similar functionalities, groups should be merged, edited or created. In our example, before applying the last step, the ‘‘Search’’ event belongs to group #4. However, there are some functional reasons, that makes this event merge with the group #1: Users often use this feature for moving the specified emails to a folder or finding somebody to compose to. In comparison, filtering is used to sort

Fig. 9. FDD: evaluating suggested testing procedure with five methods. Fig. 11. Unique faults no.: evaluating suggested testing procedure with five methods.

Fig. 10. FDE: evaluating suggested testing procedure with five methods.

Fig. 12. Mutation score: evaluating suggested testing procedure with five methods.

171

E. Habibi, S.-H. Mirian-Hosseinabadi / Information and Software Technology 67 (2015) 159–179 Table 15 Total time of suggested testing procedure and the five selected methods.

Number of test cases Total required time (h)

Suggested testing procedure

Event-based model [8]

Framework for mutation [27]

Event-driven mutation [26]

Event-driven web-based mutation [25]

Usage profile [9]

2811 116.5

223 44

4516 158

4236 156.5

3021 124

23 22

Table 16 Evaluating level of suggested testing procedure with each other. Name

FDD (%)

FDE (%)

Unique fault no.

2nd level (functional graphs) 3rd level (mutated graphs) 5th level (merging with stateless inside –page objects) 5th level (merging with outside –page objects) 5th level (merging representative sets – 2-combination) 5th level (merging representative sets – 3-combination) 5th level (merging representative sets – 4-combination)

5 2 3 1 0.6 1.4 3.6

13 3 25 22 62 18 6

4 18 1 17 37 65 74

the incoming emails. In the other hand, users can search within the filtering, but two different types of filtering, in group #4, cannot be applied at the same time. Therefore, by analyzing the functionalities of events, ‘‘Search’’ event will be joined to group #1 and ‘‘Add information email columns’’ and ‘‘Sort the letters’’ can be in group #6. The results of structural division of all events in the first page of E-mail application have been shown in Fig. 6 and Table 6. 4.2. Creating the functional graph for each section Fig. 7 shows one of the sections of the application and its functional graph. According to the number of Inside-page sections, four functional graphs have been created by 24 state-based Inside-page events. 4.3. Creating mutant graphs from functional graphs By considering ‘‘add edge’’ as the mutation operator, lists of limitations (e.g. Table 7) and changing the source and the target state of each event in the four functional graphs, 108 mutants are created (Table 8). Fig. 8 shows a sample of the mutant graph for the Inside-page Section 1. 4.4. Producing test paths The number of Test paths for 4 functional graphs and 108 mutants are shown in order in Tables 9 and 10. 4.5. Choosing candidates for merging event sequences For selecting the candidates we used three parameters including the number of coverage test requirements of each test path, the number of prerequisites for running the test path and cost of run-time for each test path. Since in our study the cost of run-time were almost the same for all test paths we ignored this parameter. The results can be seen in Table 11. Results of merging the representative sets with other objects have been shown in Table 12. According to the four created sections, we could have four different combinations of representative sets. As a result, two different functionalities were produced by merging of the two representative sets. Moreover, merging three sets with three final nodes produced 6 different functionalities

(2334 test paths) and four sets with four final nodes produced 12 different functionalities (5040 test paths). It should be noted that to reduce the number of test paths in 3- and 4-combinations, we only chose one from the total 6 and 12 functionalities. 4.6. Deriving and running test cases By applying our suggested testing procedure on the E-mail program, 2811 test cases were produced. An example of a test case is shown in Table 13. As a result, 92 unique faults have been discovered during the running of the test cases. The most important faults with their results are listed below:  Moving E-mails to the Draft Folder and the Send Folder by hand: E-mails can be sent.  Choosing a filter and switch between the folders: incorrect Number of E-mails in filter is shown.  Selecting an event ‘‘E-mail: Open in New Window’’: E-mails are still marked as Unread.  No changing in the Reply and the Forwarded E-mails: E-mails cannot be saved in the Draft folder. 5. Evaluation For evaluating the effectiveness we applied four criteria on the suggested testing procedure. Moreover, we have considered two different evaluations for our approach. First, the suggested testing procedure is compared with five methods introduced in the ‘‘related work’’ section. These methods have been run on the E-mail program.11 Then, some stages of the testing procedure that play a role in producing test paths are compared with each other. In addition, we have calculated the total required time for applying the selected methods and our suggested testing procedure.12 The four criteria used are as follows: 1. FDD: Fault Detection Density is the ratio of the total number of faults detected by each test case (#tfi) and the total number of test cases (#T) in total number of unique faults (#FUnique) [39]. FDD is calculated as below:

FDD ¼ #ðtf 1 þ    þ tf n Þ=ð#T  #FUnique Þ

11 12

See Appendix B. See Appendix C.

ð5-1Þ

172

E. Habibi, S.-H. Mirian-Hosseinabadi / Information and Software Technology 67 (2015) 159–179

Fig. 13. FDD: evaluating levels of suggested testing procedure with each other.

Fig. 14. FDE: evaluating levels of suggested testing procedure with each other.

2. FDE: Fault Detection Effectiveness is the ratio of the total number of faults (#FTotal)13 and total number of unique faults (#FUnique) [18]. FDE is calculated as below:

FDE : #FUnique =#FTotal

ð5-2Þ

3. Mutation Score: Mutation score, used for evaluating mutated graphs, is the ratio of the total number of killing mutants (#MutantKilled) and total number of mutants (#MutantTotal) with no equivalent mutants (#MutantEquivalent) [27]. Mutation Score is calculated as below:

Mutation Score : #MutantKilled =ð#MutantTotal  #MutantEquivalent Þ

ð5-3Þ

4. Number of Unique Faults: Unique Fault can be discovered by multiple test cases, but is only counted once. 5.1. Evaluation of selected methods and the suggested testing procedure Table 14 shows the values of each criterion achieved by the five methods and suggested testing procedure. Values have also been presented in Figs. 9–12. According to Fig. 9, the Event Based Model has the highest values in the FDD criteria, that means this method has a good density in detecting faults compared to the others. The Usage Profile method has also high values. However, this method has a fundamental problem. Some of the important functions may not be modeled so some of the errors cannot be detected. For instance, if the selected users do not choose the Flag Filter, the possible error in

13

A fault that is detected by more than one test case.

this filter can never be found. Thus, the FDD value of the Usage Profile method is not valid for the evaluation. From the FDD criteria, it can be concluded that each test case in the Suggested Testing Procedure has more ability to find faults in comparison with the three remained methods. The problem mentioned for the Usage Profile method can also be generalized for the FDE criteria. As shown in Fig. 10, our approach has a low value in the FDE. This value implies that we discover more repetitive faults when the number of test cases increases. However, we can demonstrate the power of our approach by finding the total number of unique faults in Fig. 11. It can be seen from the Fig. 12 that the Mutation scores of the three mentioned methods are much smaller than our score. So, we can conclude that creating one graph for the entire program in these methods causes some operators to have incorrect actions. We overcome this problem by dividing the program based on the structure and then creating the graph and many mutants for each subdivision. Table 15 presents the number of test cases and total required time for each method. By comparing the growth of total required time to test cases, we can end that, the total required time for our suggested testing procedure is higher than pure event-based methods, including Usage Profile [9] and Event-based model [8]. However, it is equal to or slightly lower than the three other mutation based approaches. For instance, if we have 10,000 test cases, we need nearly 400 h and 350 h in order for applying the suggested testing procedure and the framework for Mutation [27] on the application. We can also compare the required time in the step of running test cases for each method (Appendix C, Table C.2). By contrast through this item, the results show that our suggested testing procedure is similar to the others except Usage Profile [9]. It should be noted that the required time of running test cases in the Usage Profile [9] approach, is considerably lower than the others due to the simply extracted test paths from the graph.

E. Habibi, S.-H. Mirian-Hosseinabadi / Information and Software Technology 67 (2015) 159–179

173

Fig. 15. Unique faults no.: evaluating levels of suggested testing procedure with each other.

5.2. Evaluation of the suggested testing procedure Table 16 presents the values of the FDD, FDE and the unique fault for stages of the suggested testing procedure. Values have also been shown in Figs. 13–15. According to Fig. 13, the stage of the functional graph has the highest value in the FDD criteria. This implies that at this level faults can be detected by most of the test cases. In the FDE criteria in Fig. 14, steps with 2, 3 and 4-combinations have downward values. As shown in Fig. 14 the number of faults in further test cases has an indirect relationship with the number of unique faults. So, we conclude that in these steps more repetitive faults have been found. On this basis, we can infer that fewer number of unique faults will be detected by running more event interactions. Thus, the rest of the failed test cases result in repetitive faults. As can be seen in Fig. 15, the number of faults rises when the number of test cases increases. For example, in the level of the representative set (2, 3 and 4-combinations), the number of unique faults has increased with the test cases. Another significant point that can be concluded from Fig. 15 is the role of each stage in finding the faults. In other words, each stage has detected at least one new unique fault.

Fig. A.4. Test requirement results.

Fig. A.5. Test paths and test requirement results.

Table B.1 Levels of a model for usage-based testing. Layers

Description

1st layer

Fig. A.1. Choosing the graph coverage section.

– Choosing 10 users for recording their actions – Using record and replay tool, namely vTest to record actions and to produce scripts. vTest is available at: www.verisium.com/ – Fig. B.1: a sample of the user’s script using the E-mail application 2nd layer – The events are extracted using the created scripts –Fig. B.2: a sample of extracting events 3rd layer – Analyzing the events and the scripts – Creating Markov Chain model (the usage profile) – Fig. B.3: the Markov Chain model

Fig. A.2. Entering the information.

Fig. A.3. Choosing the coverage criteria.

174

E. Habibi, S.-H. Mirian-Hosseinabadi / Information and Software Technology 67 (2015) 159–179

6. Conclusions and future work One of the main challenges in EDS testing, as seen in all of the reviewed methods, is the elimination of infeasible test cases. It was also observed that each of the presented methods can be applied to a specific group of EDS. Few methods have been focused on Web applications functional testing with Ajax Technology due

Fig. B.1. A sample of user’s script.

to the dissimilar nature of these programs. Thus, the proposed approach focuses on these types of EDS. We propose that our suggested testing procedure has the following advantages: (1) dividing the application based on the structure to produce independent sections; (2) creating functional graphs by requirements document; (3) creating mutant graphs derived from functional graphs to have event sequences; and (4) resolving the problem of removing infeasible test cases by functional graphs and conditions on the ‘‘add edge’’ operator in creating mutants. In addition, in our approach reducing test cases without removing essential functionalities is done by using the functional graph and prioritization criteria for choosing the representative set. Our proposed test procedure is considered as a semi-automatic method because we used the ‘‘Web Coverage Graph’’ tool. It should be noted that some of the faults detected in the sample E-mail application have now been corrected in newer versions by its vendor. The suggested testing procedure, like any other method, has some drawbacks. Most of the stages in the approach are performed manually such as the 1st, 2nd, 3rd and 6th stages. As noted earlier, one of the strong points of the proposed testing procedure is to test all of the main functions of the program. However, an inappropriate structural division may lead to removing some of the main functions of the application being tested. However, an inappropriate structural division may lead to removing some of the main functions of the application being tested. This stage of the suggested testing procedure is the most important stage. So, inadequate preciseness during events, objects or sections collection, may cause numerous errors in the subsequent stages. Among these concerns, automating the entire operation of the proposed method will be the biggest challenge. In addition, using the code of the program to free the structural division stage from knowledge of implementations of objects and extending the approach to other areas of the EDS are possible future works of this paper. Another issue that generally affects these types of event-driven web applications is the rate of data transfer via the Internet. In other words, with a poor rate, many errors can be detected. So, this temporal characteristic can also be one of the future directions of work for improving our testing procedure.

Fig. B.2. A sample of extracting events.

175

E. Habibi, S.-H. Mirian-Hosseinabadi / Information and Software Technology 67 (2015) 159–179

Fig. B.3. The Markov Chain model.

Table B.2 The results of the method of Herbold et al. [9]. Coverage criteria

Test cases no.

Unique faults

Total faults

FDD (%)

FDE (%)

– Prime path – Edge-pair

12 11

1 0

1

4.3

100

6.1. Threats to validity There are some validity threats to design of the suggested testing procedure. Modeling the application and extracting test cases from the model is the main goal of the suggested testing procedure. To create a model of the main objects of the application and to avoid complexity, the application was divided into subsections based on its structure and its functionality in the first stage of our procedure. The functionalities of each object can be obtained from analysis models and their structures are specified during the implementation level. So, the created sections can be validated by the analysis and implementation documents of the application. Lack of analysis documents is the main threat in identifying the objects and their related events. Nonetheless, we tried to validate our research with the help of the analyzer and direct interaction with the programmer of the application. Afterward, we produced the functional graphs based on the information provided by the first stage. In the third stage, we used a model-based mutation technique and mutation operators to produce mutants. This technique is

based on the literature found in Ammann and Offutt [24]. Then, we used graph coverage criteria to extract event sequences for graphs created in the second and the third stages. The selected coverage criteria are also presented in Ammann and Offutt [24]. At the fifth stage, for testing all the objects and their event sequences on the application, merging algorithms were prepared based on the behavior of the objects provided by the first stage. A possible path for extending our work can be, to merge the combination of stateless and state-based Inside-page objects with Outside-page objects. This would give more test cases and thus, increase the precision of the test. Another threat to our testing procedure is the increasing of the test cases in the merging stage. Care should be taken to ensure that the quality of the testing does not decline during the reduction of test cases. We used the prioritization algorithm to mitigate this threat. At the last stage, the expected results are extracted from analysis and requirements documents of the application. Then, the test cases are executed and the actual results are compared with the expected results Again, the extraction of the expected results from analysis and requirements documents could threaten the validity of our method, which tried to mitigate similar to what was for the first stage. Acknowledgement We would like to thank Mr. A. Jalali for guidance and editing and Dr. L. Habibi for her help on formatting. This research has been supported by a Grant from Sharif University of Technology – Iran.

Table B.3 The results of the method of Beyazıt et al. [25].

[25]

[25]

Coverage criteria

Test cases no.

Unique faults

Total faults

FDD (%)

Edge coverage for each mutants

3021

10

100

0.33

Mutation operator

Number of mutantTotal

Number of mutantKilled

Number of mutantEquivalent

– Add edge – Remove edge – Invert edge

13 2384 58

0 100 0

0 45 0

Total

2455

100

45

FDE (%) 10 Mutation score (%)

4

176

E. Habibi, S.-H. Mirian-Hosseinabadi / Information and Software Technology 67 (2015) 159–179

Table B.4 The results of the method of Belli et al. [26]. Coverage criteria [26]

[26]

Test cases no.

Edge-pair for each mutants

Unique faults

4236

25

Total faults

FDD (%)

305

0.28

Mutation operator

Number of mutantTotal

Number of mutantKilled

Number of mutantEquivalent

– – – – –

13 2384 58 No node can be added 47

0 171 16

0 45 0

11

0

2502

198

45

Add edge Remove edge Invert edge Add node Remove node

Total

FDE (%) 8 Mutation score (%)

8

Table B.5 The results of the method of Beyazıt [27].

[27]

[27]

Coverage criteria

Test cases no.

Unique faults

Total faults

FDD (%)

FDE (%)

Edge-pair and up to 3-length path for each mutants

4516

31

895

0.63

3

Mutation operator

Number of mutantTotal

Number of mutantKilled

Number of mutantEquivalent

– – – – –

13 2384 58 No node can be added 47

0 194 30

0 45 0

42

0

2502

266

45

Add edge Remove edge Invert edge Add node Remove node

Total

Fig. B.4. The ESG.

Mutation score (%)

10

E. Habibi, S.-H. Mirian-Hosseinabadi / Information and Software Technology 67 (2015) 159–179

177

Fig. B.5. The ESG-Complementary.

Appendix B. Applying the selected methods on the E-mail application

Table B.6 The results of the method of Belli et al. [8]. Coverage criteria

Test cases no.

Unique faults

Total faults

FDD (%)

FDE (%)

– Edge-pair

223

19

244

5

7

Appendix A. User manual – graph coverage web application The Web Graph Coverage has a simple user interface to extract the test requirements and test paths from the graph. Connecting to the internet is necessary while working with the tool. The user manual of this tool is as follows: 1. Choose the Graph Coverage section from the left bottom of the page http://cs.gmu.edu/~offutt/softwaretest/ (Fig. A.1). 2. Enter the Information: In the first box, the edge of the graph should be entered. For example, if the node #1 has a connection to node #2 (1 ? 2); ‘‘1 2’’ is written in the box. In the second box, initial nodes and in the third box, final nodes are specified (Fig. A.2). 3. Choose the Coverage Criteria includes all the criteria defined in Table 1, with test paths or without test paths. The test path option is with test requirement (Fig. A.3). 4. Results of test requirement option and both test requirement and test path can be seen as tables (Figs. A.4 and A.5).

The suggested testing procedure is compared with 5 selected methods introduced in ‘‘related work’’ section. These methods have been run on the E-mail program. The results of each method are presented below. 1. Herbold et al. [9] ‘‘A Model For Usage-Based Testing Of Event-Driven Software’’. This method has three layers for finding when and which functionality of the application is used by the user. (Table B.1 and Figs. B.1–B.3). According to Table B.2, The Usage Profile method has a good value in both the FDD and the FDE criteria. 2. Beyazıt et al. [25] ‘‘Event-Based Mutation Testing vs. State-Based Mutation Testing–Comparison Using A Web-Based System’’. 3. Belli et al. [26] ‘‘Event-Based Mutation Testing vs. State-Based Mutation Testing-An Experimental Comparison’’. 4. Beyazit [27] ‘‘A Formal Framework For Mutation Testing’’. The Event-flow model of the E-mail application was created for these three methods, with 49 nodes and 2384 edges. The results of running of the three methods on the E-mail application are listed in Tables B.3–B.5. 5. Belli et al. [8] ‘‘Event-based modelling, analysis and testing of user interactions: approach and case study’’.

Table C.1 Total time of suggested testing procedure. Stages of suggested testing procedure 1st stage 2nd stage 3rd stage 4th stage 5th stage

6th stage Total

Dividing the application based on its structure Creating a functional graph for each section Creating mutant graphs from functional graphs Selecting the coverage criteria and producing test paths Choosing candidates for merging event sequences

Deriving and running test cases

Total required time (h)

Description

1

Nearly 1 h for dividing a simple web application (E-mail)

4

Nearly 1 h for creating the graph of each section

8

Nearly 2 h for creating all mutants of each section

8

Nearly 8 h for creating test paths of 108 mutants and 4 functional graphs by Graph Coverage Web Application  Nearly 1 h for sorting test paths based on prioritization criteria  Nearly 5 h for Merging the Representative Set with the Stateless Inside-page events  Nearly 3.5 h for Merging the Representative Set with the Outside-Page events  Nearly 6 h for Merging the Representative Set of Each Section with Other Sections Nearly 80 h for creating and running test cases and also for detecting faults

15.5

80 116.5

178

E. Habibi, S.-H. Mirian-Hosseinabadi / Information and Software Technology 67 (2015) 159–179

Table C.2 Total time of suggested testing procedure and the five selected methods (details). Name of methods

Number of test cases

Usage profile [9]

Total required time (h)

23

22

Event-driven web-based mutation [25]

3021

124

Event-driven mutation [26]

4236

156.5

Framework for mutation [27]

4516

158

223

44

2811

116.5

Event-based model [8] Suggested testing procedure

The ESGs and ESG-Complementary of the E-mail application were created based on rules of the model (Fig. B.4 and B.5). However, we did not have any access to the GATE tool. So to produce test paths, we used the Web Graph Coverage tool and 2-Length paths as coverage criteria. The results are shown in Table B.6. Appendix C. Estimation of the time In this Section, the total required time of the running of the suggested testing procedure on the E-mail application is presented in Table C.1. The total time is calculated for each stage of the procedure individually. In addition, we compared our total time with the five methods, mentioned in the text (Table C.2). As we can see in Table C.2, the total required time of the methods increases based on the number of test cases. References [1] X.L. Ding, C. Ding, Y.H. Hou, An approach to event-driven software testing, Trans. Tianjin Univ. 8 (4) (2002) 265–268. [2] Kent D. Lee, Python Programming Fundamentals, Springer, 2010. [3] Atif M. Memon, Developing testing techniques for event-driven pervasive computing applications, in: Proceedings of The OOPSLA 2004 Workshop on Building Software for Pervasive Computing (BSPC 2004), 2004. [4] Ping Li, Toan Huynh, Marek Reformat, James Miller, A practical approach to testing GUI systems, Empirical Softw. Eng. 12 (4) (2007) 331–357. [5] Atif M. Memon, Martha E. Pollack, Mary Lou Soffa, Hierarchical GUI test case generation using automated planning, IEEE Trans. Softw. Eng. 27 (2) (2001) 144–155. [6] Atif M. Memon, An event-flow model of GUI-based applications for testing, Softw. Test. Verif. Reliab. 17 (3) (2007) 137–157. [7] Roy P. Pargas, Mary Jean Harrold, Robert R. Peck, Test-data generation using genetic algorithms, Softw. Test. Verif. Reliab. 9 (4) (1999) 263–282. [8] Fevzi Belli, Christof J. Budnik, Lee White, Event-based modelling, analysis and testing of user interactions: approach and case study, Softw. Test. Verif. Reliab. 16 (1) (2006) 3–32. [9] Steffen Herbold, Jens Grabowski, Stephan Waack, A model for usage-based testing of event-driven software, in: 5th International Conference on Secure Software Integration & Reliability Improvement Companion (SSIRI-C), 2011, IEEE, 2011, pp. 172–178. [10] Steffen Herbold, Usage-based Testing of Event-driven Software, PhD diss., Niedersächsische Staats-und Universitätsbibliothek Göttingen, 2012. [11] Steffen Herbold, Patrick Harms, AutoQUEST–automated quality engineering of event-driven software, in: IEEE Sixth International Conference on Software Testing, Verification and Validation Workshops (ICSTW), 2013, IEEE, 2013, pp. 134–139. [12] Paolo Tonella, Filippo Ricca, Statistical testing of web applications, J. Softw. Maintenance Evol.: Res. Pract. 16 (1–2) (2004) 103–127. [13] Xun Yuan, Atif M. Memon, Generating event sequence-based test cases using GUI runtime state feedback, IEEE Trans. Softw. Eng. 36 (1) (2010) 81–95. [14] Xun Yuan, Atif M. Memon, Using GUI run-time state as feedback to generate test cases, in: 29th International Conference on Software Engineering, 2007, ICSE 2007, IEEE, 2007, pp. 396–405. [15] A. Isabella, Emi Retna, Study Paper on Test Case generation for GUI Based Testing, 2012. Available from preprint arXiv:1202.4527.

Description – Nearly 5 h for recording of 10 users’ action – Nearly 15 h for analyzing the scripts and extracting events – Nearly 4 h for creating usage profile – Nearly 2 h for creating and running test cases – Nearly 30 h for creating 2455 mutants – Nearly 94 h for creating and running test cases – Nearly 33 h for creating 2502 mutants – Nearly 123.5 h for creating and running test cases – Nearly 33 h for creating 2502 mutants (We spend 0 h for this level. Because this level is the same as the previous method.) – Nearly 125 h for creating and running test cases – Nearly 30 h for creating ESG – Nearly 14 h for creating and running test cases Ref: See Table C.1

[16] Xun Yuan, Myra B. Cohen, Atif M. Memon, Towards dynamic adaptive automated test generation for graphical user interfaces, in: International Conference on Software Testing, Verification and Validation Workshops, 2009, ICSTW’09, IEEE, 2009, pp. 263–266. [17] Qing Xie, Atif M. Memon, Using a pilot study to derive a GUI model for automated testing, ACM Trans. Softw. Eng. Methodol. (TOSEM) 18 (2) (2008) 7. [18] Xun Yuan, Myra Cohen, Atif M. Memon, Covering array sampling of input event sequences for automated GUI testing, in: Proceedings of the TwentySecond IEEE/ACM International Conference on Automated Software Engineering, ACM, 2007, pp. 405–408. [19] Cemal Yilmaz, Myra B. Cohen, Adam A. Porter, Covering arrays for efficient fault characterization in complex configuration spaces, IEEE Trans. Softw. Eng. 32 (1) (2006) 20–34. [20] Xun Yuan, Myra B. Cohen, Atif M. Memon, GUI interaction testing: incorporating event context, IEEE Trans. Softw. Eng. 37 (4) (2011) 559–574. [21] Si Huang, Myra B. Cohen, Atif M. Memon, Repairing GUI test suites using a genetic algorithm, in: Third International Conference on Software Testing, Verification and Validation (ICST), 2010, IEEE, 2010, pp. 245–254. [22] Megha Jhamb, Abhishek Singhal, Abhay Bansal, A survey on different approaches for efficient mutation testing, Int. J. Sci. Res. Publ. 4 (3) (2013) 513. [23] A. Jefferson Offutt, A practical system for mutation testing: help for the common programmer, in: Proceedings of the International Test Conference, 1994, IEEE, 1994, pp. 824–830. [24] Paul Ammann, Jeff Offutt, Introduction to software testing, Cambridge University Press, 2008. [25] Mutlu Beyazıt, Timon Sebastian Deistler, Nida Gökçe. Event-based mutation testing vs. state-based mutation testing–comparison using a web-based system. Modellbasiertes Testen und Testautomatisierung (MOTES 2010), GI Jahrestagung (2), Lecture Notes in Informatics, Gesellschaft für Informatik, 2010, pp. 327–332. [26] Fevzi Belli, Mutlu Beyazit, Event-based mutation testing vs. state-based mutation testing-an experimental comparison, in: 2011 IEEE 35th Annual Computer Software and Applications Conference (COMPSAC), IEEE, 2011, pp. 650–655. [27] Fevzi Belli, Mutlu Beyazit, A formal framework for mutation testing, in: Fourth International Conference on Secure Software Integration and Reliability Improvement (SSIRI), 2010, IEEE, 2010, pp. 121–130. [28] S. Brahma, A. Punganti, P.K. Pattanaik, S. Prasad, R. Mall, Model-based mutation testing of object-oriented programs, 2nd International Conference on it & Business Intelligence, vol. 32, IEEE, 2010. [29] Bernhard K. Aichernig, Florian Lorber, Dejan Nicˇkovic´, Time for mutants— model-based mutation testing with timed automata, in: Tests and Proofs, Springer, Berlin Heidelberg, 2013, pp. 20–38. [30] Fevzi Belli, Axel Hollmann, Sascha Padberg, Communication sequence graphs for mutation-oriented integration testing, in: Third IEEE International Conference on Secure Software Integration and Reliability Improvement, 2009, SSIRI 2009, IEEE, 2009, pp. 387–392. [31] Jonathan Robert Okin, The Information Revolution: The Not-for-Dummies Guide to the History, Technology, and Use of the World Wide Web, Ironbound Pr, 2005. [32] Linda Dailey Paulson, Building rich web applications with Ajax, Computer 38 (10) (2005) 14–17. [33] Wikipedia, Dynamic Web Pages, 2013. . [34] Gregg Rothermel, Roland H. Untch, Chengyun Chu, Mary Jean Harrold, Test case prioritization: an empirical study, in: Proceedings of the IEEE International Conference on Software Maintenance, 1999, ICSM’99, IEEE, 1999, pp. 179–188. [35] Kalyana Rao Konda, Measuring defect removal accurately, Softw. Test Perform. 2 (6) (2005) 35–39. [36] Siripong Roongruangsuwan, Jirapun Daengdej, A test case prioritization method with practical weight factors, J. Softw. Eng. 4 (2010) 193–214.

E. Habibi, S.-H. Mirian-Hosseinabadi / Information and Software Technology 67 (2015) 159–179 [37] Jason Farrell, George S. Nezlek, Rich internet applications the next stage of application development, in: 29th International Conference on Information Technology Interfaces, 2007, ITI 2007, IEEE, 2007, pp. 413–418. [38] Alessandro Bozzon, Sara Comai, Piero Fraternali, Giovanni Toffetti Carughi, Conceptual modeling and code generation for rich internet applications, in:

179

Proceedings of the 6th International Conference on Web Engineering, ACM, 2006, pp. 353–360. [39] Renee C. Bryce, Sreedevi Sampath, Atif M. Memon, Developing a single model and test prioritization strategies for event-driven software, IEEE Trans. Softw. Eng. 37 (1) (2011) 48–64.