Change impact analysis and changeability assessment for a change proposal: An empirical study ☆☆

Change impact analysis and changeability assessment for a change proposal: An empirical study ☆☆

G Model ARTICLE IN PRESS JSS-9336; No. of Pages 10 The Journal of Systems and Software xxx (2014) xxx–xxx Contents lists available at ScienceDirec...

1MB Sizes 0 Downloads 44 Views

G Model

ARTICLE IN PRESS

JSS-9336; No. of Pages 10

The Journal of Systems and Software xxx (2014) xxx–xxx

Contents lists available at ScienceDirect

The Journal of Systems and Software journal homepage: www.elsevier.com/locate/jss

Change impact analysis and changeability assessment for a change proposal: An empirical study 夽夽夽,夽夽 Xiaobing Sun a,d,∗ , Hareton Leung c , Bin Li a,d , Bixin Li b a

School of Information Engineering, Yangzhou University, Yangzhou, China School of Computer Science and Engineering, Southeast University, Nanjing, China Department of Computing, Hong Kong Polytechnic University, Hong Kong, China d State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China b c

a r t i c l e

i n f o

Article history: Received 25 August 2013 Received in revised form 8 April 2014 Accepted 15 May 2014 Available online xxx Keywords: Change impact analysis Changeability assessment Empirical study

a b s t r a c t Software change is a fundamental ingredient of software maintenance and evolution. Effectively supporting software modification is essential to provide a reliable high-quality evolution of software systems, as even a slight change may cause some unpredictable and undesirable effects on other parts of the software. To address this issue, this work used change impact analysis (CIA) to guide software modification. CIA can be used to help make correct decision on the change proposal, that is changeability assessment, and to implement effective changes for a change proposal. In this article, we conducted an empirical study on three Java open-source systems to show how CIA can be used during software modification. The results indicate that: (1) assessing changeability of a change proposal based on the impact results of the CIA is not accurate from the precision perspective; (2) the proposed impactness metric is an effective indicator of changeability assessment for the change proposal; and (3) CIA can make the change implementation process more efficient and easier. © 2014 Elsevier Inc. All rights reserved.

1. Introduction Software change is a fundamental ingredient of software maintenance and evolution. Effectively supporting software modification is essential to provide a reliable high-quality evolution of software systems, as even a slight change may cause some unpredictable and undesirable effects on other parts of the system. One of the most critical issues of the software maintenance process is to predict the impact of a change proposal (Schneidewind, 1987; Nesi, 1998; Kemerer and Slaughter, 1999; Fioravanti and Nesi, 2001;

夽 A preliminary edition of this article was accepted by COMPSAC 2012 as a short research track paper. This work extends and provides wider experimental evidence of the proposed method. 夽夽 This work is supported partially by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant No. 13KJB520027, partially by the Open Funds of State Key Laboratory for Novel Software Technology of Nanjing University under Grant No. KFKT2014B13, partially by the Program for New Century Excellent Talents of Yangzhou University, and partially by the Cultivating Fund for Science and Technology Innovation of Yangzhou University under Grant No. 2013CXJ025. ∗ Corresponding author. Tel.: +86 18252740912. E-mail addresses: [email protected], [email protected] (X. Sun), [email protected] (H. Leung), [email protected] (B. Li), [email protected] (B. Li).

Lucia et al., 2002). In order to deal with a change proposal, some predictive measurement of its change ripples should be conducted. There have been a large amount of research work on the measurement and metrics for software development (Chidamber and Kemerer, 1994; Briand et al., 1999b; Gopal et al., 2002; Olague et al., 2007; Habra et al., 2008), but only a few on software maintenance (Bandi et al., 2003; Schneidewind, 2000). Accurate measurement is a prerequisite for all engineering disciplines, and software maintenance is no exception. Given a change proposal, software maintenance must address three problems: to make a preliminary estimation of the ripple effects affected by the modification, to determine whether to accept, reject, or further evaluate this given change proposal, and to implement changes according to the change proposal. In this article, we integrate change impact analysis (CIA) and changeability assessment to perform a predictive measurement for the proposed change proposal. CIA, often simply called impact analysis, is an approach used to identify the potential effects caused by changes made to software (Bohner and Arnold, 1996). CIA starts with a set of proposed changed elements in a software system, called the change set, and attempts to determine a possibly larger set of elements, called the impact set, that requires attention or maintenance effort (Bohner and Arnold, 1996). The impact set can facilitate the change implementation process (Li et al., 2012).

http://dx.doi.org/10.1016/j.jss.2014.05.036 0164-1212/© 2014 Elsevier Inc. All rights reserved.

Please cite this article in press as: Sun, X., et al., Change impact analysis and changeability assessment for a change proposal: An empirical study 夽夽. J. Syst. Software (2014), http://dx.doi.org/10.1016/j.jss.2014.05.036

G Model JSS-9336; No. of Pages 10 2

ARTICLE IN PRESS X. Sun et al. / The Journal of Systems and Software xxx (2014) xxx–xxx

Specifically, maintainers can check the elements in the impact set to see whether they need further consideration. However, a critical threat to CIA is the accuracy of its impact set, i.e., the impact set may have some false-positives (a.k.a., the elements in the estimated impact set are not really impacted) and false-negatives (a.k.a.,some of the real impacted elements are not identified) (Li et al., 2012). Our work computes a ranked list of potentially impacted elements, which helps maintainers to estimate the probability of the impacted methods to be false-positives. Moreover, our work can remove the possibility of the false-negatives by choosing an appropriate impact set. Such impact results provide an eclectic approach for CIA. In addition, changeability assessment evaluates the ease to implement a change proposal (Board, in press). Based on changeability assessment, we can make a decision on the change proposal before actual change implementation. There has been some work that used CIA to assess the changeability of the proposed change proposal (Chaumun et al., 1999). As the impact set computed by CIA is often inaccurate, the computed changeability of the proposed change proposal is also not accurate (Sun et al., 2012). The focus of this study is on CIA and changeability assessment for the code-level change proposal. As class is the basic element in object oriented programming environment, one of the most popular development environment, we assume that the change proposal is composed of a set of proposed changed classes. Given the class-level change proposal, we use FCA–CIA (Formal Concept Analysis–Change Impact Analysis) to calculate a ranked list of the potential impact set from these proposed changed classes since FCA–CIA has shown to be effective to compute the change effects (Sun et al., 2012; Li et al., 2013). FCA–CIA is a cross-level CIA, which starts from proposed class-level changes and produces a ranked list of potentially impacted methods. The potential impacted methods are ranked according to an impact factor metric which corresponds to the priority of these methods to be inspected. Then, we use an impactness metric based on the results of FCA–CIA to indicate the changeability of this change proposal. The impactness metric measures the degree the proposed change proposal may affect the original system. Finally, we use the impact results from FCA–CIA to facilitate change implementation according to the proposed change proposal. FCA–CIA have been evaluated in our previous work (Li et al., 2013). However, the following issues have not been addressed: • Can CIA be directly used for changeability assessment? • How to use CIA for changeability assessment and change implementation? To answer these two questions, we conducted some empirical studies based on three open source systems. The main contributions of this article are threefold as follows: • Our study shows that the impact results produced by CIA are inaccurate for changeability assessment. This implies that some other metrics should be developed for changeability assessment. • The empirical studies show that the proposed impactness metric based on CIA is effective for changeability assessment, which can help users make correct decision on accepting or rejecting the change proposal. • Based on a user study involving 16 students to fix four bugs in the jEdit subject program, from the results of time performance and the users’ perception, it appears that CIA can make the change implementation process more efficient and effective. To the best of our knowledge, there is no other such evaluation in the literature. The rest of the article is organized as follows. In the next section, we discuss the background to support CIA and changeability

Table 1 Formal context.

C1 C2 C3 C4 C5 C6

M1

M2

M3

× × ×

× ×

× ×

M4

M5

M6 ×

×

× ×

M7

M8

M9

M10

× ×

×

×

×

×

M11

M12

× ×

× ×

×

× ×

× ×

×

assessment. Section 3 presents our work of CIA and changeability assessment for software modification. We conduct some empirical studies to validate the effectiveness of our approach in Section 4. In Section 5, some related work in the field of CIA and changeability assessment is introduced. Finally, we present our conclusion and future work in Section 6. 2. Background FCA–CIA is performed based on concept lattice. In this section, we introduce the background of concept lattice. Concept lattice, also called formal concept analysis (FCA), is a field of applying mathematics to study the relation between entities and entity properties to infer a hierarchy of concepts (Ganter and Wille, 1986). For every binary relation between entities and their properties, a lattice can be constructed to provide insight into the structure of the original relation (Ganter and Wille, 1986). It has been shown that FCA is a powerful code analysis technique for software maintenance in the last few years (Snelting and Tip, 2000; Tilley et al., 2005). More details of FCA can be referred to Ganter and Wille (1986). Typically, FCA follows three steps: (1) a formal context with formal object and formal attribute is provided; (2) concept lattice is generated by applying concept lattice construction algorithm to the formal context obtained from Step (1) (Ganter and Wille, 1986). (3) Analysis (for example, CIA and refactoring (Snelting and Tip, 2000; Tilley et al., 2005)) is conducted based on the properties (for example, hierarchical property) of the concept lattice. A formal context can be easily represented by a relation table. In the relation table, rows are headed by classes and columns headed by methods. A cross in row o and column a means that the formal object o (corresponding to class c) has formal attribute a (corresponding to method m), in other words, class c depends on method m, defined as follows: Definition 1 (Dependence between class and method). Given that a class c and a method m in a program, class c depends on method m, if and only if, at least one of the following conditions is satisfied: 1 2 3 4

m belongs to c; m belongs to any superclass of c; c depends on another method k calling m; c depends on another method k called by m.

Table 1 shows the relation table between classes and methods for the Java program shown in Fig. 1. Such a table forms the formal context to be analyzed. Applying FCA technique to the formal context in Table 1, a set of formal concepts can be generated, which is composed of sets of classes sharing sets of methods as shown in Fig. 2. Formal concept is defined as a pair consisting of a set of formal objects (called the extent) and a set of formal attributes (called the intent) such that the extent consists of all formal objects that depend on the given formal attributes, and the intent consists of all formal attributes depended on by the given formal objects. The graphical representation of concept lattice uses the simple labeling approach to represent the formal concepts in a more compact and readable form (Ganter and Wille, 1986). This representation

Please cite this article in press as: Sun, X., et al., Change impact analysis and changeability assessment for a change proposal: An empirical study 夽夽. J. Syst. Software (2014), http://dx.doi.org/10.1016/j.jss.2014.05.036

G Model JSS-9336; No. of Pages 10

ARTICLE IN PRESS X. Sun et al. / The Journal of Systems and Software xxx (2014) xxx–xxx

3

ripples effects may sometimes confine to a small scope, or affect a large part of the system. Therefore,Given a change proposal, there are three necessary activities for software modification: 1 To predict the change ripples may be impacted by the change proposal. 2 To assess the change proposal and make a decision on whether to accept or reject the change proposal. 3 To implement the change proposal.

Fig. 1. A simple Java example program.

is called lattice of class and method dependence, LoCMD, as shown in Fig. 2. On the LoCMD, nodes are associated with formal concepts while edges correspond to the containment relation between formal concepts. Each lattice node of the LoCMD in Fig. 3 is marked with the I Set (which represents its intent) and E Set (which represents its extent). 3. Approach to manage a change proposal Changes made to software often have some unpredictable and undesirable ripple effects on other parts of the software. These

In this section, we present how to effectively perform these three activities. For the first activity, we use CIA to estimate the change ripples may be affected by the change proposal. For the second activity, we define a metric to estimate the degree the changes may affect the whole system. For the last activity, we use CIA to guide software modification. The process of these three activities is shown in Fig. 3. Assuming that the change proposal has been mapped into the source code changes by a feature location technique (Rajlich and Wilde, 2002; Dit et al., 2013b). As the output of many feature location techniques is composed of a set of classes (Dit et al., 2013b), we assume that the proposed change set also composes of a set of classes. Given the original system and the proposed class-level change set, we first employ formal concept analysis to construct an intermediate representation of the program called LoCMD as introduced above. Then, we perform CIA to compute the impact set based on LoCMD from the proposed change set. Finally, we assess the changeability of the system based on the impact set produced from the impact analysis. According to the changeability results, maintainers make a decision on the change

Fig. 2. LoCMD for the simple Java example program.

Fig. 3. The process of CIA and changeability assessment.

Please cite this article in press as: Sun, X., et al., Change impact analysis and changeability assessment for a change proposal: An empirical study 夽夽. J. Syst. Software (2014), http://dx.doi.org/10.1016/j.jss.2014.05.036

G Model

ARTICLE IN PRESS

JSS-9336; No. of Pages 10

X. Sun et al. / The Journal of Systems and Software xxx (2014) xxx–xxx

4 Table 2 Impact set of {C1, C2, C5} changes. Priority

Node

IM

IF

1 2 2 4 4 4 7 7

co5 co1 co2 co7 co8 co10 co3 co4

M2, M3, M5 M1 M6 M7, M9 M8 M10 M4 M11

2.3 2.2 2.2 1.5 1.5 1.5 1.3 1.3

proposal. If the decision is to accept the change proposal, the impact set of CIA can be used to guide the change implementation. As CIA and changeability assessment has been realized in our previous work (Sun et al., 2012; Li et al., 2013), we will use the example shown in Fig. 1 to present the working principle of these activities. 3.1. Change impact analysis CIA can be used to predict the ripple effects induced by the proposed changes. Our study focuses on using FCA–CIA to predict the impact set. FCA–CIA is a cross-level technique, which computes a finer method level impact set according to a given coarser classlevel change set. Most of current approaches are mainly focused on computing the impact set from single element in the proposed change set, but FCA–CIA takes all the elements in the change set as a whole. Therefore, FCA–CIA is particularly suitable for multiple changes. Specifically, FCA–CIA is performed based on reachability analysis on the LoCMD. The generated LoCMD structures methods into a hierarchical order. For CIA, an IF (impact factor) metric is defined on the lattice node of the LoCMD. IF is defined as follows: IF j = n +

n i=1

1 min(dist(j, i)) + 1

(1)

where n is the number of multiple changed classes upward reachable to lattice node j; min(dist(j, i)) is the least number of edges needed to traverse upward from lattice node i to node j. IF represents the probability of the method to be affected by the proposed changed classes. Then, methods in the impact set can be ranked according to this metric. FCA–CIA is conducted in a straightforward upward way on the LoCMD. The input of our FCA–CIA is composed of a set of proposed changed classes and the output is a ranked list of potentially impacted methods. Given the proposed classes to be changed are {c1 , c2 , · · · , cn }, FCA–CIA is performed by means of the following way: 1 Determine the set of lattice nodes {co1 , co2 , · · · , cok }, which are labeled by {c1 , c2 , · · · , cn } on the LoCMD. 2 Compute the set of lattice nodes upward reachable from {co1 , co2 , · · · , cok }. 3 Calculate the impact factor of those lattice nodes obtained in previous two steps. 4 Prioritize the set of lattice nodes according to their impact factor results, and provide the potentially impacted methods labeling these ordered lattice nodes to maintainers for inspection. We give a simple introduction here, and more details can be referred to Li et al. (2013). Assuming that {C1, C2, C5} are proposed to be changed in the Java example shown in Fig. 1. According to the impact factor definition, we can compute the impact factor of upward reachable concepts labeling the potentially impacted methods as shown in Table 2. Columns in this table represent the priority to check the potential impacted methods (Priority), lattice nodes reachable from the concepts labeled by changed classes (Node), potential impacted

methods labeling these reachable concepts (IM), and their impact factor values (IF), respectively. These impacted methods can be prioritized as presented in Column 1. The results show that methods {M2, M3, M5} have the highest impact factor value (2.3), which suggest that they are the most probably impacted methods which need to be checked and modified. 3.2. Changeability assessment Changeability is one of the critical features of the software because changes are continuously introduced to cope with new requirements, existing faults and change requests, etc. Changeability is a metric to assess the degree a change proposal may affect other parts of the original system. The changeability result can help to determine whether a change proposal should be accepted or to determine which change schedule is more suitable to employ. If a change proposal affects a great part of the system, its changeability will be high, and we may reject this change proposal, or consider another change schedule, or even redevelop a new system. If the change proposal only affects a small part of the system, its changeability is appropriate and we may accept the change proposal. Changeability assessment needs a metric to measure the ease of a system to absorb a change proposal. Changeability assessment is most useful for two cases: one is to evaluate whether to implement a change proposal, and the other is to assess other change proposals to select the one that produces fewer ripple effects to the system. Here we use a metric, impactness, to perform the changeability assessment of a change proposal. Impactness measures the degree a change proposal may affect the original system. It is defined based on the impact set, as follows: Impactness =

m wi IF i × 100% ni=1 j=1

wj IF j

(2)

where IFi and IFj are the IF values of methods i and j respectively in the impact set; wi and wj are their nonnegative weights. If a method is more probably to be a false-positive, this method has a lower weight. m is the number of methods potentially impacted by the proposed changed classes. Therefore, the numerator of Eq. (2) represents the whole impact of the proposed changes. For the denominator of Eq. (2), n is the number of methods potentially impacted by all the proposed class changes. Thus, the denominator shows the impact of the changes when all classes in the system are assumed to be changed. The range of impactness is between 0 and 1. The smaller is the impactness of the changes to the system, the less risky is the changeability. When the impactness approaches to 0, it illustrates that the proposed changed classes have few impact on the system. When the impactness approaches to 1, it shows that almost all the system is affected by the proposed changes. Then, we may decide to reject the given change proposal and search for an alternative change schedule. The impactness metric can easily deal with the two typical cases of the changeability assessment. Given a change proposal, the impactness result can guide maintainers to make a decision whether to implement such change proposal. When given some alternative change proposals, maintainers can select the change proposal with the smallest impactness value. In Eq. (2), wi and wj are closely related to the precision of the impacted method with some IF value. That is to say, if a method i has high IF value, wi will be high. As a method with high IF value is more probably affected, a rough way to set the wi value is to use IFi . Then, Eq. (2) can be transformed into: Impactness =

m 2 IF ni=1 i2 × 100% j=1

IF j

(3)

Please cite this article in press as: Sun, X., et al., Change impact analysis and changeability assessment for a change proposal: An empirical study 夽夽. J. Syst. Software (2014), http://dx.doi.org/10.1016/j.jss.2014.05.036

G Model JSS-9336; No. of Pages 10

ARTICLE IN PRESS X. Sun et al. / The Journal of Systems and Software xxx (2014) xxx–xxx

We next use the above example to exemplify the computation of impactness. Assuming that {C1, C2, C5} are proposed to be changed in the Java example in Fig. 1, we assess the changeability of this change proposal. First, we compute the value of the numerator of (3), which presents the impact of these three classes, and the result is 19.9. Then, we compute the value of the denominator, which shows the ripple effects affected by all six classes, and the result is 30.9. Thus, the impactness of the proposed changes is 50.1%. This shows that these three proposed changed classes may affect about half of the system. If we use the impact set of the CIA to evaluate the changeability of the change proposal, according to the impact results in Table 2, 11 methods are probably impacted. Such assessment result shows that almost all the program may be affected. Accordingly, this change proposal should be reviewed and a better alternative be considered. However, some methods predicted by the CIA are false-positives, which may lead to a wrong assessment result. Therefore, our proposed approach considers the possible accuracy of the impacted methods based on their IF values, which should improve the effectiveness of the changeability assessment result.

5

Table 3 Research subjects. Name

Version

Class

Method

KLoC

ArgoUML JabRef jEdit

0.22 2.6 4.3

1439 577 503

11000 4604 6413

149 74 104

The rationale behind RQ1 is that the impact set computed by CIA is hoped to be as close to the actual changes implemented to accomplish the change proposal as possible. Some work has used CIA to assess the changeability of a change proposal (Chaumun et al., 1999). We like to check whether the impact results of CIA can be directly used for changeability assessment. We also like to check whether the impactness metric is a good changeability indicator for a change proposal (RQ2). Finally, as CIA predicts a ranked list of impact results for a change proposal, we investigate RQ3 to see whether the impact results predicted by CIA can well facilitate maintainers to make changes to accomplish a change proposal. 4.1. Subject programs

3.3. Change implementation When the changeability assessment result suggests that the change proposal be accepted, maintainers can implement the change proposal. The proposed change set includes the elements that do need changes. In fact, these changes may induce some side effects that need corresponding secondary changes to keep consistency. Thus we should not only modify the elements in the proposed change set, but also inspect the elements that are probably impacted by the proposed change set. Hence, the way to implement the changes can be guided by the impact set from the CIA. FCA–CIA computes a ranked list of potentially impacted methods derived from proposed class-level changes. The impacted methods are ranked according to the impact factor values which correspond to the priority of these methods to be inspected and modified. So change implementation can be proceeded in this way, i.e., maintainers implement the proposed changes by firstly inspecting and modifying the methods with the highest IF values, then the method with the next highest IF value is selected and the process is repeated until all the methods in the impact set are checked. Take the Java program in Fig. 1 as an example. Assume {C1, C2, C5} are proposed to be changed. We use FCA–CIA to compute the impact set, as shown in Table 2. During change implementation, maintainers can use the impact results in this table to guide modification. The results show that {M2, M4} have the highest impact factor value. Then, maintainers check these two methods first to see whether they need modification. Then, the next method with the highest IF value will be checked and the process is repeated.

4. Evaluation Our study aims to help maintainers make a more informed decision on a change proposal and make more efficient modification to finish a change proposal. To evaluate our approach, we conduct empirical studies to investigate the following research questions: RQ1: Can CIA predict an accurate impact set for changeability assessment in practical software modification? RQ2: Is impactness an effective indicator for changeability assessment? RQ3: Can CIA help to achieve better performance in software modification?

We select three Java subject programs from open-source projects for our studies. These subjects were selected from the benchmarks1 provided in Dit et al. (2013), as shown in Table 3 for our case studies. The table shows some basic information of these projects, including: name of the project (Name), the evaluation version (Version), the number of classes (Class), the number of methods (Method), and kilo lines of code (KLoC). The first subject program is ArgoUML,2 which is an open source UML modeling tool that supports standard UML 1.4 diagrams. The second subject is JabRef,3 which is a graphical application for managing bibliographical databases. The final subject is jEdit,4 which is a text editor written in Java. The choice of these three systems is motivated by the need to have: (1) systems of different sizes that are not too small nor too large, to allow maintainers to comprehend and implement changes for the system; (2) systems belonging to different problem domains that are general enough to represent real-world software systems, and (3) sufficient historical data to facilitate comprehension and evaluation. For each subject system, a set of bug reports is mined from their bug tracking system, such as Bugzilla.5 During the process of collecting a set of bug reports and sets of proposed changed classes respectively, we only include the bugs which contained more than three modified classes since FCA–CIA is more suitable for tackling multiple proposed changed classes. Specific details on the process of the identification of the bug reports and changed classes can refer to Poshyvanyk et al. (2009). Here, the number of bugs for each subject program is four. 4.2. Methods and measures To answer RQ1, we used precision and recall, two widely used metrics of information retrieval (van Rijsbergen, 1979), to validate the accuracy of the CIA techniques. Precision (P) is an inverse measure of false-positives while recall (R) is an inverse measure of false-negatives. False-positives in the impact set are entities that are not really impacted but identified by the CIA technique, and false-negatives are the real impacted entities that are not successfully identified in the impact set. These two metrics are used to

1 2 3 4 5

http://www.cs.wm.edu/semeru/data/benchmarks. http://argouml.tigris.org/. http://sourceforge.net/projects/jabref/. http://www.jedit.org/. http://bugzilla.mozilla.org/.

Please cite this article in press as: Sun, X., et al., Change impact analysis and changeability assessment for a change proposal: An empirical study 夽夽. J. Syst. Software (2014), http://dx.doi.org/10.1016/j.jss.2014.05.036

G Model

ARTICLE IN PRESS

JSS-9336; No. of Pages 10

X. Sun et al. / The Journal of Systems and Software xxx (2014) xxx–xxx

6

Table 4 Size of the proposed change set of the subject programs.

Table 5 Bugs fixing of jEdit subject.

System

Bug1

Bug2

Bug3

Bug4

Bug ID

Bug description

ArgoUML JabRef jEdit

5 2 3

4 3 3

6 3 4

3 4 42

2220033 1942313 1721208 1593576

Relative position in status bar is inaccurate Expanding a fold twice moves caret unexpectedly keybord not working properly Autoindenting should copy exact whitespaces

measure the extent of the predicted results related to the actual results in a posteriori way. They are defined as follows: P=

|Actual Set ∩ Estimated Set| × 100% |Estimated Set|

(4)

R=

|Actual Set ∩ Estimated Set| × 100% |Actual Set|

(5)

Actual Set is the set of methods which are really changed to fix the bugs for the system. Estimated Set is the set of methods may be impacted by the proposed change set based on the CIA technique. The proposed change set is composed of a set of classes used to fix each bug, which are mined from their software repositories. The size of the proposed change set of different bugs is shown in Table 4. With the proposed change set, we applied FCA–CIA to compute the Estimated Set (Li et al., 2013). Then, precision and recall values are computed based on the Actual Set and Estimated Set. For RQ2, to validate the effectiveness of the impactness metric, we proposed ActualImpact% to measure the degree actual classes changes affect the system: ActualImpact% =

|Actual Set| × 100% |Method Set|

(6)

In Eq. (6), Actual Set is the set of methods which are actually changed during versions evolution. Method Set incudes all the methods of the system proposed for changes. For the impactness metric, we use (3) to indicate the changeability of the change proposal. Then, we compared ActualImpact% with impactness to see whether our impactness metric embodies the actual impacts in real changes environment. The validation of impactness is also conducted in a posteriori way. Finally, we conducted a user study to answer RQ3. Our study involved 16 participants (four graduate students and 12 senior undergraduate students) from our department of computer science and software engineering. These 16 participants were divided into two groups (G1 and G2) based on a pre-evaluation survey on their capability, for example, software development experience, familiarity with Java and Eclipse, which can allow a fair comparison of the performance of the two groups on the same set of tasks. The tasks of change implementation in our study is the bug fixing task. The general survey of the experience of these participants is shown in Fig. 4. Group G1 was the experimental group that used the impact

results from FCA–CIA for bug fixing task. And participants in Group G1 are not allowed to use Eclipse IDE to perform bug correction. Then, the participants used these proposed changed classes to fix the bugs and finish the proposed changes. Group G2 was the control group that used Eclipse IDE or their own experience to perform the same set of tasks. For the study on time performance to answer RQ3, based on an overall consideration of the complexity of the subject system, and participants familiarity with the system, we chose jEdit as the subject system. The selected bugs are shown in Table 5 as the task set of our study. The participants individually performed the work in our lab and were given a half day to accomplish their tasks. Each participant was assigned one bug for fixing. After each task, they were asked to fill in a post-study questionnaire to provide feedback about the difficulty of their tasks, time for completing the task, and their experience on bug fixing. All participants were requested to rate the difficulty of the bug fixing tasks (1 being very easy, and 5 being very hard). 4.3. Empirical results and analysis In this section, we report the results from the empirical studies. 4.3.1. Study 1 The purpose of RQ1 is to see whether CIA can offer accurate changeability assessment results. Some traditional changeability assessment approaches rely on the impact set results, including their size and accuracy to evaluate the changeability of a change proposal. Some researchers assume that the bigger the size of the impact set, the worse the changeability assessment results are. Hence, we need to see whether this assumption is correct. First, we see that whether the size of the impact set is correct enough for fixing a bug. Table 6 shows the size of the impact set for different bugs. The results show that impact sets for different bugs are different. Table 3 shows the size of the proposed change set for each bug fixing. The sizes of the proposed change sets of some bugs are the same, for example, Bug2 and Bug3 of the JabRef subject, Bug1 and Bug2 of the jEdit subject. From the results of their corresponding impact sets in Table 6, their sizes are different, i.e., the sizes of the impact set for Bug2 and Bug3 of the JabRef subject are 37 and 43, respectively; and for Bug1 and Bug2 of the jEdit subject, their sizes are 34 and 31, respectively. So from the size of the impact set, it is difficult to give accurate changeability assessment result. In addition, we compute the precision and recall of the impact set to see whether the CIA can produce accurate results. Table 7 shows the precision and recall results for different bugs fixing. From the recall results, we see that the CIA can predict high recall results, most of which are above 70%. The results show that the CIA can estimate most of the change effects. However, the precision results in Table 6 Size of the impact set of the subject programs.

Fig. 4. Experience of these participants for programming skills.

System

Bug1

Bug2

Bug3

Bug4

ArgoUML JabRef jEdit

57 26 34

49 37 31

82 43 51

31 46 32

Please cite this article in press as: Sun, X., et al., Change impact analysis and changeability assessment for a change proposal: An empirical study 夽夽. J. Syst. Software (2014), http://dx.doi.org/10.1016/j.jss.2014.05.036

G Model

ARTICLE IN PRESS

JSS-9336; No. of Pages 10

X. Sun et al. / The Journal of Systems and Software xxx (2014) xxx–xxx

7

Table 7 Precision and recall of the impact set for different bugs prediction. System

ArgoUML JabRef jEdit

Bug1

Bug2

Bug3

Bug4

P

R

P

R

P

R

P

R

9% 11% 13%

82% 75% 77%

7% 14% 11%

80% 73% 89%

10% 7% 9%

84% 76% 82%

12% 8% 15%

79% 81% 79%

Table 7 is not as satisfactory as the recall results. Most of the precision results are below 10%, which shows that most of the elements in the impact set are false-positives, which is also one of the main pursuits in the CIA research community. It illustrates that these elements are not really impacted but identified by the CIA technique. Hence, from the size and precision of the impact results of CIA, we can conclude that CIA cannot produce accurate changeability assessment. 4.3.2. Study 2 For RQ2, whether the definition of the impactness can be effectively used for changeability assessment, we compute impactness and ActualImpact%. Fig. 5 shows the impactness and ActualImpact% values for different bugs fixing for the three subject programs. The results show that the estimation of the impactness for all change proposals are higher than the ActualImpact%. For example, the impactness of Bug1 for ArgoUML program is 36%, and the ActualImpact% is 27%. For other bugs of these subject programs, the relative difference between them is less than 10%. In other words, in spite of the overestimation of the impact induced by the change proposals, their difference is not big. In addition, from Fig. 5, we learn that the variation tendency of the impactness values is in accord with the ActualImpact% values. That is to say, when given two or more change proposals, our changeability assessment model can accurately reflect which change proposal has fewer impact on the original system. In addition, in spite of the overestimation of the impact induced by the change proposal, we can regard our impactness as a conservative evaluation approach of changeability, i.e., its deviation is about 10% higher than the actual impact to the system. For example, when the impactness value of a change proposal is 50%, its actual impact to the whole system may be a little lower. 4.3.3. Study 3 For RQ3, we would like to see whether the impact set provided by CIA can be effectively used in practice. We measure the time performance of the two groups on bugs fixing of jEdit. Table 8 shows the time of each participant in bug fixing. As the time cannot be accurately recorded, we use a time range of their performance on bug fixing. The results show that for each bug, the time of the experimental group (G1) with CIA help is less than the control group (G2) using Eclipse. The average time performance for these two groups Table 8 Performance of the participants for bug fixing of jEdit. Bug

2220033 1942313 1721208 1593576 AVG

G1

G2

Participant

Time

Participant

Time

P1 P2 P3 P4 P5 P6 P7 P8

0–30 min 0–30 min 30–60 min 0–30 min 30–60 min 60–90 min 0–30 min 0–30 min

P9 P10 P11 P12 P13 P14 P15 P16

90–120 min 60–90 min 90–120 min 90–120 min 60–90 min 120–150 min 90–120 min 90–120 min

G1

15–45 min

G2

90–120 min

Fig. 5. Impactness and ActualImpact% for different bugs prediction.

are shown in the last row of Table 8, with the average time of G1 being 15–45 min, and G2, 90–120 min. It illustrates that the CIA can be effectively used for efficient change implementation (in this case bug fixing). In addition, all participants were requested to rate the difficulty of their bug fixing tasks. The results are shown in Table 9. We can see that G2 rated the difficulty at 4 on average, and G1 rated the difficulty at 2.4 on average. Thus, from the users’ perception of the

Table 9 Rated difficulty of bug fixing tasks. Group

G1 G2

Rated difficulty

AVG

1

2

3

4

5

0 0

5 0

3 2

0 4

0 2

2.4 4

Please cite this article in press as: Sun, X., et al., Change impact analysis and changeability assessment for a change proposal: An empirical study 夽夽. J. Syst. Software (2014), http://dx.doi.org/10.1016/j.jss.2014.05.036

G Model JSS-9336; No. of Pages 10

ARTICLE IN PRESS X. Sun et al. / The Journal of Systems and Software xxx (2014) xxx–xxx

8

difficulty involved in bug fixing, CIA can help to perform the bug fixing task more easily. For the participants in G2, most of them claimed that they depend on the structural analysis and their own experience to implement the changes, however, they are hard to decide when the changes should be ended. For G1, most of the participants use the results from FCA–CIA to facilitate change implementation. They first finish the changes in the proposed change set, then they use FCA–CIA to estimate an ordered method-level impact set. After this, they inspect the methods in the impact set according to the IF values to see whether the methods need changes. Usually, to some extent, i.e., the remaining 40–50% of the methods in the impact set do not need any changes when they check these elements. At this time, they can terminate the change process. Hence, FCA–CIA can be effective to facilitate change implementation. Hence, from the time performance and users’ perception, the results strongly suggest that CIA can be effectively used in practical changes implementation.

4.4. Threats to validity Like any empirical validation, our study also has its limitations. In the following, some threats to the validity of our empirical study are discussed. First, we only conduct empirical studies based on three subject programs. Thus we cannot guarantee that the results from our studies can be generalized to other more complex or arbitrary subjects. However, our subjects are selected from open source projects and have been widely employed for experimental studies, and they belong to different problem domains that are general enough to represent real-world software systems. Moreover, the historical repositories of these subjects are available online, which facilitates others in repeating the study. A second concern is the CIA technique selected for changeability assessment study. We used FCA–CIA to compute the impact set (Li et al., 2013) to assess its effectiveness for changeability assessment. Some other CIA techniques may be more suitable for changeability assessment. However, from our previous evaluation, FCA–CIA is effective to predict the change effects when compared to other CIA techniques (Li et al., 2013). A third threat to the validity of our study is the difference in the capabilities of the participants. Specifically, we cannot eliminate the threat that the experimental group and the control group have different capabilities. To alleviate this, we attempted to allocate participants into different groups based on a survey of their capabilities, such as their project experience, and familiarity with Java and Eclipse. Another threat to our study is the evaluation of RQ3, which is used to show that CIA can help to achieve better performance in software modification. software modification may include other tasks, for example, new requirements, change request, bug fixing, and etc. In our study, software modification here is only for bug fixing. However, bug fixing is one of the typical software modification tasks. Finally, we used the IF value to replace the weight value in the impactness definition, which may lead to the overestimation of impactness. Selection of different weight values may leads to different impactness results. For example, we may select the weight from a different perspective, for example, based on the precision of different IF values.

5. Related work In this section, we introduce some related work from two aspects: (1) CIA, and (2) changeability assessment.

5.1. Change impact analysis Change impact analysis can be performed based on static analysis (Abdi et al., 2009, 2009; Sun et al., 2010, 2011; Poshyvanyk et al., 2009; Petrenko and Rajlich, 2009; Kagdi et al., 2010) or dynamic analysis (Orso and Harrold, 2003; Law and Rothernel, 2003; Apiwattanapong et al., 2005). Some CIA also utilized both static and dynamic analysis in combination (Ren et al., 2004; Zhang et al., 2008; Cavallaro and Monga, 2009; Gethers et al., 2012). FCA–CIA focuses on static analysis of the program. So we only introduce some related work of static CIA. Static CIA techniques take all possible behaviors and inputs into account, and they are often performed by analyzing the syntax and semantic dependence of the program (Bohner and Arnold, 1996; Li et al., 2012). The static analysis includes structural static analysis, textual analysis, and historical analysis. Structural static dependencies between program entities are very crucial to CIA, i.e., if a program entity changes, other dependent entities might also have to change (Petrenko and Rajlich, 2009, 2009; Sun et al., 2010, 2011, 2011a). Structural static analysis usually analyzes the structural dependencies to build an intermediate representation for the program, and then computes the impact set based on transitive closure on the representation. FCA–CIA also performs structural static analysis. It uses FCA to produce an intermediate representation of the program called LoCMD. LoCMD is a kind of concept lattice, which describes the dependences between classes and methods for the object oriented Java programs. It is really a compact and effective representation, in which one node can express either the classes in its extent or the methods in its intent. Thus the size of the LoCMD is smaller than traditional dependence graph (Horwitz et al., 1990; Ferrante et al., 1987; Sun et al., 2011a), on which each class and method are expressed with one node. In spite of the loss of an amount of information (different dependency types defined in the edge of traditional dependence graph) on our representation, our representation can meet the demands that CIA needs. Additionally, LoCMD organizes methods into a hierarchical order, which can help define the probability of the methods belonging to the impact set. With these impact results, maintainers can be effective and motivated to inspect which elements are more probably affected. Concept analysis has been combined with program slicing technique to perform a fine-grain CIA at the statement-level granularity (Tonella, 2003). However, FCA–CIA is cross-level, i.e., it computes method-level impact set from class-level proposed change set. Textual analysis extracts the conceptual dependence (conceptual coupling) based on analysis of the non-source information (Poshyvanyk et al., 2009; Gethers and Poshyvanyk, 2010). These coupling measures provide a new perspective to traditional structural coupling measures (Briand et al., 1999a). Conceptual coupling is based on measuring the degree to which the identifiers and comments from different classes relate to each other. The coupling measures can help rank classes in software systems based on different types of dependencies among classes (Poshyvanyk et al., 2009; Gethers and Poshyvanyk, 2010). Those classes strongly coupled with proposed changed class are thought as the most probable impacted classes which need inspection. The granularity level analyzed is at the class level, while FCA–CIA is cross level, that is, to estimate method-level impact set from class-level proposed change set. Historical analysis is performed by mining the information from multiple evolutionary versions from software historical repositories (Zimmermann et al., 2005; Canfora and Cerulo, 2006; Kagdi et al., 2010; Nguyen et al., 2011; Gethers et al., 2012). With this technique, some evolutionary dependencies between program entities that can not be distilled by traditional program analysis technique

Please cite this article in press as: Sun, X., et al., Change impact analysis and changeability assessment for a change proposal: An empirical study 夽夽. J. Syst. Software (2014), http://dx.doi.org/10.1016/j.jss.2014.05.036

G Model JSS-9336; No. of Pages 10

ARTICLE IN PRESS X. Sun et al. / The Journal of Systems and Software xxx (2014) xxx–xxx

can be mined from these repositories. Evolutionary dependencies suggest that for those entities that are (historically) changed together in software repositories, i.e., co-changes, they may need to change when one (some) of the entities are changed during future software evolution. CIA is computed based on these cochange dependencies. These CIA techniques need multiple versions of the subject programs from the software historical repository, and usually generate the impact set at a coarse class (file) level while FCA–CIA needs only the current version of program, and produces the impact set at the method level. 5.2. Changeability assessment Changeability is an important nonfunctional quality attribute of software maintainability (Board, in press). Research into software changeability assessment includes changeability predictors based on measurable factors of software maintenance activity (Riaz et al., 2009). Currently, studies on changeability assessment used some design metrics as changeability indicators, e.g., cohesion, complexity, coupling, etc. (Genero et al., 2001; Bandi et al., 2003; Riaz et al., 2009; Chaumun et al., 2000). In addition, Fluri proposed a changeability assessment model based on a taxonomy of different change types and a classification of these change types in terms of change significance levels for consecutive versions of software entities (Fluri, 2007). According to this changeability model, each source code entity is classified as low, medium, or high. Then maintainers select appropriate modification strategy according to the changeability of the source code. However, these studies focus mainly on the system quality, and rarely consider the concrete change proposal in predicting the changeability of the system. Though, the maintenance effort and cost of different change proposals may be different during change implementation. So some studies used CIA to estimate the impact set of the change proposal to assess its changeability (Chaumun et al., 1999; Sun et al., 2012). For example, Chaumun et al. proposed a changeability assessment model which relies on computing the impact of proposed classes changes (Chaumun et al., 1999). The change impact model considers the types of the dependence between classes. Then, changeability assessment is predicted based on the impact results of the proposed classes changes. In our previous work, changeability assessment is also based on the impact results computed by CIA. Then, an impactness metric is used to indicate the changeability of the software to absorb a change proposal (Sun et al., 2012). In this article, we will extend and provide wider experimental evidence of the approach in Sun et al. (2012) to evaluate the impactness metric for changeability assessment. On the one hand, we show the effectiveness of changeability assessment with more general subject programs, such as ArgoUML, JabRef, jEdit. On the other hand, we conduct user studies with the jEdit subject program to show the effectiveness of CIA in practice. 6. Conclusion and future work Given a change proposal, estimation of the expected change ripples and the changeability of the system to absorb this change proposal should be conducted. This article used CIA to assess the changeability of a change proposal and also guide its implementation. Some empirical studies were conducted to evaluate our approach. The results show that CIA is not an effective approach for changeability assessment. In addition, the proposed impactness metric based on FCA–CIA is effective for changeability assessment of a change proposal. The results from our case studies based on the jEdit subject program show that CIA can make the change implementation process more efficient and easier in practice.

9

Though we have shown the effectiveness of our approach through some real-world subject programs, it may not be generalized to other programs. We will conduct more experiments on other more complex and large-scale programs to evaluate the generality of our research. Also, we want to consider extending the user study to generate stronger evidence for the effectiveness of FCA–CIA usage by programmers. In addition, we will further investigate the selection of the weight for the impactness formula to accurately assess the changeability of a system. Also, we will study other design metrics, such as coupling between object classes, lack of cohesion of methods, depth of inheritance tree, as indicators of software changeability. Acknowledgments The authors would like to thank Qiandong Zhang for his cooperation in developing the FCA–CIA tool and building the experimental environment. Special thanks to the participants of our empirical study. References Abdi, M.K., Lounis, H., Sahraoui, H.A., 2009. Predicting change impact in objectoriented applications with Bayesian networks. In: Proceedings of the IEEE Conference on Computer Software and Applications, pp. 234–239. Abdi, M.K., Lounis, H., Sahraoui, H.A., 2009. A probabilistic approach for change impact prediction in object-oriented systems. In: Proceedings of the Workshops of the 5th Conference on Artificial Intelligence Applications and Innovations, pp. 89–200. Apiwattanapong, T., Orso, A., Harrold, M.J., 2005. Efficient and precise dynamic impact analysis using execute after sequences. In: Proceedings of the International Conference on Software Engineering, pp. 432–441. Bandi, R.K., Vaishnavi, V.K., Turk, D.E., 2003. Predicting maintenance performance using object-oriented design complexity metrics. IEEE Trans. Softw. Eng. 29 (1), 77–87. Board, IEEE Standard Glossary of Software Engineering Terminology – IEEE Std 610.12-1990. Bohner, S., Arnold, R., 1996. Software Change Impact Analysis. IEEE Computer Society Press, Los Alamitos, CA,USA. Briand, L.C., Daly, J., Wst, J., 1999a. Aunified framework for coupling measurement in object oriented systems. IEEE Trans. Softw. Eng. 25 (1), 91–121. Briand, L.C., Daly, J.W., Wust, J.K., 1999b. A unified framework for coupling measurement in object-oriented systems. IEEE Trans. Softw. Eng. 25 (1), 91–121. Canfora, G., Cerulo, L., 2006. Fine grained indexing of software repositories to support impact analysis. In: Proceedings of the International Workshop on Mining Software Repositories, pp. 105–111. Cavallaro, L., Monga, M., 2009. Unweaving the impact of aspect changes in aspectj. In: Proceedings of the Workshop on Foundations of Aspect-Oriented Languages, pp. 13–18. Chaumun, M.A., Kabaili, H., Keller, R.K., Lustman, F., 1999. A change impact model for changeability assessment in object-oriented software systems. In: Proceedings of the European Conference on Software Maintenance and Reengineering, 1999, pp. 130–138. Chaumun, M.A., Kabaili, H., Keller, R.K., Lustman, F., Denis, G.S., 2000. Design properties and object-oriented software changeability. In: Proceedings of the Conference on Software Maintenance and Reengineering, p. 45. Chidamber, S.R., Kemerer, C.F., 1994. A metrics suite for object oriented design. IEEE Trans. Softw. Eng. 20 (6), 476–493. Dit, B., Holtzhauer, A., Poshyvanyk, D., Kagdi, H., 2013. A dataset from change history to support evaluation of software maintenance tasks. In: Proceedings of the 10th Working Conference on Mining Software Repositories, pp. 131–134. Dit, B., Revelle, M., Gethers, M., Poshyvanyk, D., 2013b. Feature location in source code: a taxonomy and survey. J. Softw.: Evol. Process 25 (1), 53–95. Ferrante, J., Ottenstein, K., Warren, J., 1987. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst. 9 (3), 319–349. Fioravanti, F., Nesi, P., 2001. Estimation and prediction metrics for adaptive maintenance effort of object-oriented systems. IEEE Trans. Softw. Eng. 27 (12), 1062–1084. Fluri, B., 2007. Assessing changeability by investigating the propagation of change types. In: Proceedings of the 29th International Conference on Software Engineering, pp. 97–98. Ganter, B., Wille, R., 1986. Formal Concept Analysis: Mathematical Foundations. Springer-Verlag, Berlin. Genero, M., Olivas, J., Piattini, M., Romero, F., 2001. Using metrics to predict OO information systems maintainability. In: Proceedings of the 13th International Conference on Advanced Information Systems Engineering, pp. 388–401. Gethers, M., Dit, B., Kagdi, H., Poshyvanyk, D., 2012. Integrated impact analysis for managing software changes. In: Proceedings of the 2012 International Conference on Software Engineering, pp. 430–440.

Please cite this article in press as: Sun, X., et al., Change impact analysis and changeability assessment for a change proposal: An empirical study 夽夽. J. Syst. Software (2014), http://dx.doi.org/10.1016/j.jss.2014.05.036

G Model JSS-9336; No. of Pages 10 10

ARTICLE IN PRESS X. Sun et al. / The Journal of Systems and Software xxx (2014) xxx–xxx

Gethers, M., Dit, B., Kagdi, H., Poshyvanyk, D., 2012. Integrated impact analysis for managing software changes. In: Proceedings of the International Conference on Software Engineering, pp. 430–440. Gethers, M., Poshyvanyk, D., 2010. Using relational topic models to capture coupling among classes in object-oriented software systems. In: Proceedings of the 2010 IEEE International Conference on Software Maintenance, pp. 1–10. Gopal, A., Krishnan, M.S., Mukhopadhyay, T., Goldenson, D.R., 2002. Measurement programs in software development: determinants of success. IEEE Trans. Softw. Eng. 28 (9), 863–875. Habra, N., Abran, A., Lopez, M., Sellami, A., 2008. A framework for the design and verification of software measurement methods. IEEE Trans. Softw. Eng. 81 (5), 633–648. Horwitz, S., Reps, T., Binkley, D., 1990. Interprocedural slicing usingng dependence graphs. ACM Trans. Program. Lang. Syst. 12 (1), 27–60. Kagdi, H., Gethers, M., Poshyvanyk, D., Collard, M., 2010. Blending conceptual and evolutionary couplings to support change impact analysis in source code. In: Proceedings of the IEEE Working Conference on Reverse Engineering, pp. 119–128. Kemerer, C.F., Slaughter, S., 1999. An empirical approach to studying software evolution. IEEE Trans. Softw. Eng. 25 (4), 493–509. Law, J., Rothernel, G., 2003. Whole program path-based dynamic impact analysis. In: Proceedings of the International Conference on Software Engineering, pp. 308–318. Li, B., Sun, X., Keung, J., 2013. FCA–CIA: an approach of using fca to support crosslevel change impact analysis for object oriented java programs. Inform. Softw. Technol. 55 (8), 1437–1449. Li, B., Sun, X., Leung, H., Zhang, S., 2012. A survey of code-based change impact analysis techniques. J. Softw. Test. Verif. Reliab., http://dx.doi.org/10.1002/stvr.1475. Lucia, A.D., Pompella, E., Stefanucci, S., 2002. Effort estimation for corrective software maintenance. In: Proceedings of the 14th International Conference on Software Engineering and Knowledge Engineering, pp. 409–416. Nesi, P., 1998. Managing OO projects better. IEEE Softw. 15 (4), 50–60. Nguyen, A.T., Nguyen, T.T., Al-Kofahi, J., Nguyen, H.V., Nguyen, T.N., 2011. A topicbased approach for narrowing the search space of buggy files from a bug report. In: Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, pp. 263–272. Olague, H.M., Etzkorn, L.H.G.S., Quattlebaum, S., 2007. Empirical validation of three software metrics suites to predict fault-proneness of object-oriented classes developed using highly iterative or agile software development processes. IEEE Trans. Softw. Eng. 33 (6), 402–419. Orso, A., Harrold, M.J., 2003. Leveraging field data for impact analysis and regression testing. In: Proceedings of the ACM SIGSOFT Symposium on Foundations of Software Engineering, pp. 128–137. Petrenko, M., Rajlich, V., 2009. Variable granularity for improving precision of impact analysis. In: Proceedings of the International Conference on Program Comprehension, pp. 10–19. Poshyvanyk, D., Marcus, A., Ferenc, R., Gyimothy, T., 2009. Using information retrieval based coupling measures for impact analysis. Empir. Softw. Eng. 14 (1), 5–32. Rajlich, V., Wilde, N., 2002. The role of concepts in program comprehension. In: Proceedings of the 10th International Workshop on Program Comprehension, pp. 271–278. Ren, X., Shah, F., Tip, F., Ryder, B.G., Chesley, O., 2004. Chianti: a tool for change impact analysis of Java programs. In: Proceedings of the International Conference on Object Oriented Programming, Systems, Languages and Applications, pp. 432–448. Riaz, M., Mendes, E., Tempero, E., 2009. A systematic review of software maintainability prediction and metrics. In: Proceedings of the 2009 3rd

International Symposium on Empirical Software Engineering and Measurement, pp. 367–377. Schneidewind, N.F., 1987. The state of software maintenance. IEEE Trans. Softw. Eng. 13 (3), 303–310. Schneidewind, N.F., 2000. Software quality control and prediction model for maintenance. Ann. Softw. Eng. 9 (1–4), 79–101. Snelting, G., Tip, F., 2000. Reengineering class hierarchies using concept analysis. ACM Trans. Program. Lang. Syst. 22 (3), 540–582. Sun, X., Li, B., Li, B., Wen, W., 2012. A comparative study of static cia techniques. In: Proceedings of the Fourth Asia-Pacific Symposium on Internetware, pp. 23:1–23:8. Sun, X., Li, B., Zhang, Q., 2012. A change proposal driven approach for changeability assessment using FCA-based impact analysis. In: 36th Annual IEEE Computer Software and Applications Conference, pp. 328–333. Sun, X., Li, B., Zhang, S., Tao, C., 2011a. HSM-based change impact analysis of objectoriented java programs. Chin. J. Electr. 20 (2), 247–251. Sun, X.B., Li, B.X., Tao, C.Q., Wen, W.Z., Zhang, S., 2010. Change impact analysis based on a taxonomy of change types. In: Proceedings of the International Conference on Computer Software and Applications, pp. 373–382. Sun, X.B., Li, B.X., Zhang, S., Tao, C.Q., Chen, X., Wen, W.Z., 2011. Using lattice of class and method dependence for change impact analysis of object oriented programs. In: Proceedings of the Symposium on Applied Computing, pp. 1444–1449. Tilley, T., Cole, R., Becker, P., Eklund, P., 2005. A survey of formal concept analysis support for software engineering activities. In: Bernhard, G., Gerd, S., Rudolf, W. (Eds.), Formal Concept Analysis. Vol. 3626 of Lecture Notes in Computer Science. Springer, Berlin/Heidelberg, pp. 250–271. Tonella, P., 2003. Using a concept lattice of decomposition slices for program understanding and impact analysis. IEEE Trans. Softw. Eng. 29 (6), 495–509. van Rijsbergen, C.J., 1979. Information Retrieval. Butterworths, London. Zhang, S., Gu, Z., Lin, Y., Zhao, J.J., 2008. Change impact analysis for AspectJ programs. In: Proceedings of the International Conference on Software Maintenance, pp. 87–96. Zimmermann, T., Zeller, M.A., Weissgerber, P., Diehl, S., 2005. Mining version histories to guide software changes. IEEE Trans. Softw. Eng. 31 (6), 429–445. Xiaobing Sun is an assistant professor in School of Information Engineering at Yangzhou University. In 2012, he received his Ph.D. from Southeast University. His research interests include change comprehension, analysis and testing. He has published more than 30 papers in referred international journals (STVR, IST, IJSEKE, ADES, etc.) and conferences (COMPSAC, ASE, QSIC, etc.). He is a CCF and ACM member. Hareton Leung joined Hong Kong Polytechnic University in 1994 and is now Director of the Lab for Software Development and Management. He serves on the Editorial Board of Software Quality Journal. He is a Fellow of Hong Kong Computer Society, Chairperson of its Quality Management Division (QMSID) and ex-Chairperson of HKSPIN. Bin Li is a professor in School of Information Engineering at Yangzhou University. His research interests include software modelling, data mining and intelligent analysis. He has published more than 100 papers in international journals and conferences. Bixin Li is a Professor of School of Computer Science and Engineering at the Southeast University, Nanjing, China. His research interests include: Program slicing and its application; Software evolution and maintenance; and Software modeling, analysis, testing and verification. He has published over 100 articles in refereed conferences and journals. He leads a young research group named ISEU.

Please cite this article in press as: Sun, X., et al., Change impact analysis and changeability assessment for a change proposal: An empirical study 夽夽. J. Syst. Software (2014), http://dx.doi.org/10.1016/j.jss.2014.05.036