MSR4SM: Using topic models to effectively mining software repositories for software maintenance tasks

MSR4SM: Using topic models to effectively mining software repositories for software maintenance tasks

Information and Software Technology 66 (2015) 1–12 Contents lists available at ScienceDirect Information and Software Technology journal homepage: w...

1MB Sizes 0 Downloads 65 Views

Information and Software Technology 66 (2015) 1–12

Contents lists available at ScienceDirect

Information and Software Technology journal homepage: www.elsevier.com/locate/infsof

MSR4SM: Using topic models to effectively mining software repositories for software maintenance tasks q,qq Xiaobing Sun a,b,⇑, Bixin Li c, Hareton Leung d, Bin Li a, Yun Li a a

School of Information Engineering, Yangzhou University, Yangzhou, China State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China c School of Computer Science and Engineering, Southeast University, Nanjing, China d Department of Computing, Hong Kong Polytechnic University, Hong Kong, China b

a r t i c l e

i n f o

Article history: Received 14 October 2014 Received in revised form 11 May 2015 Accepted 13 May 2015 Available online 21 May 2015 Keywords: Software maintenance Mining software historical repositories Topic model Empirical study

a b s t r a c t Context: Mining software repositories has emerged as a research direction over the past decade, achieving substantial success in both research and practice to support various software maintenance tasks. Software repositories include bug repository, communication archives, source control repository, etc. When using these repositories to support software maintenance, inclusion of irrelevant information in each repository can lead to decreased effectiveness or even wrong results. Objective: This article aims at selecting the relevant information from each of the repositories to improve effectiveness of software maintenance tasks. Method: For a maintenance task at hand, maintainers need to implement the maintenance request on the current system. In this article, we propose an approach, MSR4SM, to extract the relevant information from each software repository based on the maintenance request and the current system. That is, if the information in a software repository is relevant to either the maintenance request or the current system, this information should be included to perform the current maintenance task. MSR4SM uses the topic model to extract the topics from these software repositories. Then, relevant information in each software repository is extracted based on the topics. Results: MSR4SM is evaluated for two software maintenance tasks, feature location and change impact analysis, which are based on four subject systems, namely jEdit, ArgoUML, Rhino and KOffice. The empirical results show that the effectiveness of traditional software repositories based maintenance tasks can be greatly improved by MSR4SM. Conclusions: There is a lot of irrelevant information in software repositories. Before we use them to implement a maintenance task at hand, we need to preprocess them. Then, the effectiveness of the software maintenance tasks can be improved. Ó 2015 Elsevier B.V. All rights reserved.

1. Introduction

q A preliminary edition of this article was accepted by ICIS 2014 as a research track paper. This work extends and provides wider experimental evidence of the proposed method. qq This work is supported partially by Natural Science Foundation of China under Grant Nos. 61402396 and 61472344, partially by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant No. 13KJB520027, partially by the Open Funds of State Key Laboratory for Novel Software Technology of Nanjing University under Grant No. KFKT2014B13. ⇑ Corresponding author at: School of Information Engineering, Yangzhou University, Yangzhou, China. E-mail addresses: [email protected] (X. Sun), [email protected] (B. Li), hareton. [email protected] (H. Leung), [email protected] (B. Li), [email protected] (Y. Li).

http://dx.doi.org/10.1016/j.infsof.2015.05.003 0950-5849/Ó 2015 Elsevier B.V. All rights reserved.

Software maintenance has been recognized as the most difficult, costly and labor-intensive activity in the software development life cycle [1]. Effectively supporting software maintenance is essential to provide a reliable high-quality evolution of software systems. The nature of software maintenance is to deal with the changes that occur during the software evolution [2]. To effectively manage and control these changes, software repositories such as source code changes, bug repository, communication archives, deployment logs, and execution logs are used to record various information about these changes [3–5]. Hence, these repositories store a sea of data about the changes during software development and evolution. Given a new software

2

X. Sun et al. / Information and Software Technology 66 (2015) 1–12

maintenance request, the data recorded in these repositories can be used to understand, analyze, implement and verify the new request. The software engineering (SE) community analyzed the repositories to perform some software maintenance tasks, for example, bug prediction, testing, impact analysis, etc. [5–9]. These studies have shown that interesting and practical results can be obtained from mining these software repositories, thus allowing maintainers or managers to better support software evolution and ultimately improve the software quality. Software repositories such as source control repositories, bug repositories, and archived communications are commonly used to record information about the evolution and progress of the software. The SE community analyzes and explores the rich data available in software repositories to uncover interesting and actionable information (for example, co-change coupling) about software systems [5,10]. Traditional software maintenance tasks usually directly used the information in software repositories with few examination on the relevance of them. In practice, a software system may have a long history of evolution and there is extensive information about the current system in the software repositories. But the problem is how long and how much of the information in each of repositories should be used to support maintenance of current software. In addition, use of different versions of the software may derive different results for the same technique of a software maintenance task [5,11]. And inclusion of irrelevant information in software repositories may lead to decreased effectiveness or worse results [12,13]. There are lots of data mining techniques and information retrieval techniques for software repositories mining [14,15]. However, if the quality of the target data is poor, the best data mining or information retrieval techniques may be of little use [16]. This also fits to the case of mining software data. Hence, the main research question comes to: ‘‘What information to exploit from each of the repositories to support software maintenance tasks?’’. To the best of our knowledge, there is still few work to address this issue. In this article, we investigate into how we can effectively use the information in each of repositories for software maintenance. Since irrelevant information in each of the software repositories is noise information which interferes with the effectiveness of software maintenance tasks, we need to preprocess them. That is, we need to remove the irrelevant information, and select only the relevant information in each of software repositories to support various software maintenance tasks. To select the relevant information from each of the repositories, we perceive that the extracted information from software repositories should be relevant to the maintenance request and the current system. Then the relevant information can be used more effectively to support implementation of the change request. As the rich data in software repositories can be viewed as unstructured text, we need an effective approach to retrieve the relevant information. Topic model is one of the popular ways to analyze unstructured text in other domains such as social sciences and computer vision [17,18], which aims to uncover relationships between words and documents. Here, we propose a preprocess before directly using software repositories, which uses the topic model to help select the relevant information from each of the software repositories. For example, for a particular maintenance task, it is better to consider the bug reports #1, #3, and #5 because they are relevant to the maintenance request, and to ignore #2 and #4 because they are irrelevant. After the preprocess of extracting relevant information with the topic model, the effectiveness of traditional software repositories based techniques for software maintenance tasks is expected to be improved. The main contributions of this article are summarized as follows:

 We introduce an approach, MSR4SM, which uses the topic model to preprocess the information in software repositories before using them to support software maintenance. Based on our approach, some irrelevant information from each of the repositories are removed.  We conduct empirical studies on two open source software systems, jEdit and Rhino, which are used to evaluate the effectiveness of feature location technique. The empirical results show that MSR4SM can improve the effectiveness of traditional software repositories based feature location.  We conduct empirical studies on three open source software systems, jEdit, ArgoUML andKOffice, which are used to evaluate the effectiveness of change impact analysis technique. The empirical results show that MSR4SM can improve the effectiveness of traditional software repositories based change impact analysis. The rest of the article is organized as follows: in the next section, we introduce the topic model to support information retrieval from software repositories, the background of software repositories and some software maintenance tasks that are performed based on software repositories. In Section 3, we present MSR4SM to extract the relevant information from each of the software repositories. In Section 4, empirical studies are conducted to show the effectiveness of MSR4SM. Related work in mining software repositories is introduced in Section 5. Finally, we conclude and identify future work in Section 6. 2. Background MSR4SM uses topic model to facilitate preprocessing the software repositories to extract the necessary information relevant to a software maintenance task. In this section, we first introduce the topic model. Then, we discuss software repositories and two software maintenance tasks based on software repositories. 2.1. Topic model As information stored in software repositories is mostly unstructured text, various ways to process such unstructured information have been proposed. An increasingly popular way is to use topic models, which focus on uncovering relationships between words and documents [19]. Topic models were originated from the field of natural language processing and information retrieval to index, search, and cluster a large amount of unstructured and unlabeled documents [19]. A topic is a collection of terms that co-occur frequently in the documents of the corpus. One of the commonly used topic models in software engineering community is Latent Dirichlet Allocation (LDA) [4,20]. It has been widely applied to support various software maintenance tasks: feature location, change impact analysis, bug localization, and many others [21–23]. LDA is a probabilistic generative topic model for collections of discrete data such as text corpora [19]. It models each document as a mixture of K corpus-wide topics, with each topic as a mixture of the terms in the corpus [19]. More specifically, it means that there is a set of topics to describe the entire corpus, with each document containing more than one of these topics, and each term in the entire repository can be contained in more than one of these topics. Hence, LDA is able to discover a set of ideas or themes that well describe the entire corpus. LDA assumes that the documents have been generated using the probability distribution of the topics, and that the words in the documents were generated probabilistically in a similar way [19]. With LDA, some latent topics can

X. Sun et al. / Information and Software Technology 66 (2015) 1–12

be mined. In this article, we use LDA to extract the latent topics from various software artifacts in software repositories. 2.2. Software repositories based software maintenance In this article, we focus on software maintenance tasks that are supported by mining software repositories related to a project. These software repositories mainly include source control repositories, bug repositories, and communication archives. A description of these repositories is shown in Table 1. Historical analysis of the software repositories is performed by mining the information from multiple evolutionary versions of the software in the software repositories. However, with the evolution of the project, some information in the repositories may be outdated. Therefore, the information should be preprocessed to better support software maintenance. In this article, we focus on selection of useful or relevant information from each of the software repositories to support practical software maintenance tasks. The relevant information from software repositories is considered to be that related to the current software system and the maintenance request proposed by users. In the following, we discuss two software maintenance tasks (feature location and change impact analysis), for which the input is the maintenance request and the current software system. These two software maintenance tasks can be performed by mining software repositories. For each software maintenance task, we choose a representative technique for simple introduction, which is then used in the empirical studies in Section 4. 2.2.1. Software repositories based feature location Software maintenance is achieved by adding new features to programs, improving existing functionalities, etc. To implement these functionalities, software maintainers need to identify the location in the source code that corresponds to a specific functionality, which is called feature location [8]. There are many feature location techniques proposed in recent years, for example, feature location based on dynamic analysis [24], static structural analysis [25], software repositories analysis [26], or their combinations [27]. Here, we focus on how feature location is performed based on software repositories analysis. We select CVSSearch as a representative technique [26]. CVSSearch is a feature location technique that uses textual and historical information from CVS repositories [26]. It is performed based on the text in the CVS commit messages. CVS commit messages describe the changes made to the lines of code which are being committed, and those comments typically hold true for many future revisions of the software. That is, a historical context for all of the lines in the latest version of a system was formed by associating each line with all of the commit messages in the software repositories. If a line of code was changed in multiple commits, it will be associated with all the CVS comments from those commits. CVSSearch takes a request as the input, and returns all lines of code whose associated comments contain at least one of the request words. The textual search in software repositories is performed with grep. For the returned results, they are ranked by a score indicating how well they match the request. Hence, based on the request (feature) proposed by the user, CVSSearch can identify the lines of code corresponding to the request. 2.2.2. Software repositories based change impact analysis Changes made to the software are inevitable during software maintenance and evolution. When changes are made to software, they may have some unexpected effects and may cause inconsistencies to other parts of the original software. Software change impact analysis is used for determining the effects of the proposed changes on other parts of the software [28]. As discussed in

3

Table 1 Software repositories. Software repositories

Description

Source control repositories

These repositories record the information of the development history of a project. They track all the changes to the source code along with the meta-data associated with each change, for example, who and when performed the change and a short message describing the change. CVS and subversion belong to these repositories

Bug repositories

These repositories track the resolution history of bug reports or feature requests that are reported by users and developers of the projects. Bugzilla is an example of this type of repositories

Communication archives

These repositories track discussions and communications about various aspects of the project over its lifetime. Mailing lists and emails belong to the communication archives

Section 2.2.1, given a maintenance request, developers need to first perform feature location, i.e., locate the source code of the request, and then perform change impact analysis on the results of feature location. That is, the full extent of the change is handled by change impact analysis, which starts with the source code identified by feature location and then finds all the source code affected by the change. There are also many change impact analysis techniques proposed in recent years [28]. Some can be performed based on dynamic analysis [29], static structural analysis [30], software repositories analysis [31], or their combinations [11]. In this article, we focus on how change impact analysis is performed based on software repositories analysis. We select ROSE as a representative technique [31]. ROSE (Reengineering of Software Evolution) applies data mining to version histories to guide change impact analysis [31]. It is performed based on the assumption that some evolutionary dependencies between program entities that cannot be extracted by traditional program analysis technique can be mined from the repositories. Evolutionary dependencies suggest that for those entities that are (historically) changed together in software repositories, i.e., co-changes, they may need to change when one (some) of the entities are changed during future software evolution. ROSE is implemented by identifying the co-change coupling between the files (or program entities) that are changed together in the software repository. It performs change impact analysis relying on distinguishing between different kinds of changes based on computing multidimensional association rules. The association rule has a probabilistic interpretation based on the amount of evidence in the transactions they are derived from. This amount of evidence is determined by two measures: Support and Confidence. The support measure determines the number of transactions the rule has been derived from, while the confidence measure determines the strength of the consequence, or the relative amount of the given consequences across all alternatives. The input of ROSE can be coarser file or class level changes, or finer method level changes. Its output is the corresponding likely impacted entities at the same granularity level as the change set. For the returned results, they are ranked by their confidence indicating how well they match the association rules. Hence, based on the source code changes, ROSE can identify the impacts of these changes. 3. Our approach: MSR4SM In this article, our focus is to provide an effective way to use software repositories in support of software maintenance tasks. It is achieved by preprocessing the information in software

4

X. Sun et al. / Information and Software Technology 66 (2015) 1–12

repositories. Specifically, we only extract the information relevant to the maintenance request and current software from each of the repositories. The process of MSR4SM is shown in Fig. 1, which can be seen as a preprocess to traditional software repositories based techniques. Given a maintenance request, we need to analyze the relevant information in software repositories to support comprehension and implementation of this request. The data source of MSR4SM includes a maintenance request, software repositories and current software. We extract current software from software repositories because the current software usually needs some necessary changes to accomplish the change request. For a software maintenance request, it is composed of a textual description, which needs to be tokenized, i.e., the request is turned into tokens and some irrelevant and unimportant words should be removed. For other data source, as they can be seen as unstructured text, we use LDA to extract the latent topics in them. Before analyzing the software repositories, they need some preprocessing operations for effective use of LDA. First, some irrelevant and unimportant words should be removed. For example, the stop words (e.g., the), committer names (e.g., Xiaobing), the punctuation and numeric characters, the characters relevant to the syntax of the programming language (e.g., &; !), and programming language keywords (e.g., if, else, for) are removed. After this, we use LDA to generate the topics for each repository. Then, we conduct frequency analysis and similarity analysis among these preprocessed data. Finally, the relevant information in each software repository can be obtained based on the feedback from the frequency analysis and similarity analysis. Thus, developers can get the relevant information from each software repository which is related to the maintenance request and current software. For example, when there is a bug report or email thread which also has similar words in the software maintenance request, we consider the bug report and the email thread as relevant data source to analyze the current software maintenance request. In addition, there is also the feedback from this bug report or the email thread to its corresponding versions in source control repositories. That is, the corresponding versions evolution corresponding to this bug report or email thread is also considered to be useful data source. Similarly, in the source control repositories, if the topics extracted from them also occur in the current software, these versions are extracted as relevant information for analysis. Hence, the extracted relevant information from each software repository includes the relevant bug repository (bug reports #1, #3, etc.), useful communication archive (email threads #2, #4, etc.), and some evolutionary code versions (1:1 ! 1:2; 2:2 ! 2:3, etc.) related to this maintenance request and the current software. 3.1. Extracting relevant information from bug repository and communication archive First, we extract the relevant information from bug repositories and communication archives to see what information in each of these software repositories is relevant to the maintenance request. The ways to deal with the maintenance request and the repositories are different. For the bug repository and communication archive which are mostly composed of unstructured text, we use the LDA model to extract the latent topics from them. For example, for bug repository, a bug report is extracted as a document. For communication archive such as mailing lists, the title and its content in each email thread can be seen as a document. Then, the topics extracted from each document in each of the repository can be represented by a set of topics, which are composed of some words. For the maintenance request, it is a short textual description. So tokenization of the maintenance request is conducted. During the

process of tokenization, the maintenance request text is turned into tokens. Then, we also get a set of words to represent the maintenance request. After extracting the words from these data sources, we conduct the frequency analysis between them to show the relationship between them. Specifically, we choose the relevant information in each of the repositories for the task at hand by measuring how many words of the maintenance request are also present in the LDA representation of a document. Then relevant bug reports and email threads in bug repository and communication archive are extracted, which are denoted as Bug Req and CommReq , respectively. 3.2. Extracting relevant information from source control repository In addition to the bug repository and communication archive, there is another bigger data source, source control repository, which tracks and records all the changes to the source code. So they provide key information for software evolution. However, as software evolves, some information may be outdated and become ‘‘noise’’ to current software analysis [32,33]. Hence, we need to extract the relevant versions in the source control repository to support software maintenance tasks. In the bug repository, different bug entries correspond to some versions of the system, and each bug contains an official patch file with modifications [34,35]. So changed source code versions can be mined from their corresponding bug reports in the bug repository. To collect links between bugs and changed source code versions, we use heuristics based on regular expressions matching the bug ID (#1, #3, etc.) and keywords (Bug; Feature, etc.) in the change logs [34]. In the communication archive, different email threads may correspond to communications of different changes in various versions of the system [36,37]. To collect links between email threads and changed source code versions, we first select all the code versions that changed at the same date with the email threads. Then, we can also use heuristics based on regular expressions matching the email keywords (Bug; Feature, etc.) in the change logs. In this way, we extract the corresponding versions evolution of the system. In the previous step, we obtained the relevant bug reports and email threads in the bug repository and communication archive relevant to the maintenance request. Then changes in the consecutive source code versions corresponding to these bug reports or email threads are important and should be included as relevant information for the request analysis. Consecutive source code versions are the before-change version and revised version of a software in a change commit [35]. We extract consecutive versions because many software maintenance tasks are performed based on the differences between consecutive versions in source control repositories. The results of this step are formalized as follows:

SCRReq ¼ fðv i ; v j ÞjMapðRepÞ ! ðv i ; v j Þ; Rep 2 fBug Req ; CommReq gg ð2Þ In (2), MapðRepÞ represents the version changes that correspond to a bug report in bug repository or a email thread in communication archive. Thus, this kind of source code versions is produced based on the maintenance request. As the maintenance request is usually implemented in the current system, we also need some versions that are similar to the current system to support the comprehension and analysis of the current system. So we also extract the source versions that are similar to the current software. As most of these data sources are composed of large and unstructured text, we use LDA model to extract the topics from them. For each source code repository, a version of the whole software can be modeled with LDA. Then we can also use Rep ¼ fw1 ; w2 ; . . . ; wk g to describe each version. For the current system which is also modeled with LDA, we use

X. Sun et al. / Information and Software Technology 66 (2015) 1–12

5

Fig. 1. Process of MSR4SM.

Curr ¼ fw1 ; w2 ; . . . ; wj g. Then, we compute the similarity between them. There are many approaches to compute the similarity metric, for example, cosine similarity, correlation coefficients, and association coefficients [38]. Any similarity measure can be used in our approach. For example, we can use cosine similarity, computed as follows:

Cosine ¼

Rep  Curr kRepkkCurrk

ð3Þ

kRepk and kCurrk represent the size of Rep and Curr, respectively. The higher the Cosine value, the more similar between Rep and Curr. After computing the similarity results, we extract the source code versions from the repository which are relevant to the current software. Since some software maintenance tasks like change impact analysis use the co-change coupling between consecutive versions for analysis, we also extract consecutive versions which are relevant to the current software. The results of this step are formalized as follows:

SCRCurr ¼ fðv i ; v iþ1 Þ; ðv i1 ; v i ÞjCosineðv i ; CurrÞ > Hg

ð4Þ

(4) shows that if any software version in the source code repository is similar to the current system, the version (v i ) and its previous (v i1 ) and subsequent (v iþ1 ) versions are all included in the results. Hence, the final source code versions produced based on the maintenance request and the current system are the union of SCRCurr and SCRReq . Finally, all the relevant information have been extracted from various software repositories. We believe that such a preprocess to software repositories can effectively improve the effectiveness of traditional software repositories based software maintenance tasks. 4. Empirical studies MSR4SM aims to improve the effectiveness of traditional software repositories based software maintenance tasks by preprocessing the software repositories. In the context of our studies, we aim at addressing the following two research questions (RQs): RQ1: Does MSR4SM improve the effectiveness of traditional software repositories based feature location technique? RQ2: Does MSR4SM improve the effectiveness of traditional software repositories based change impact analysis technique?

The quality focus of our studies is to provide improved effectiveness for the feature location and change impact analysis techniques. The perspective is that, given a maintenance request, a software developer can quickly and accurately identify the initial location and the relevant source code entities of the maintenance request. With regards to the effectiveness, it is desirable to have a feature location (change impact analysis) technique that provides all, and only, the entities (impacted entities), i.e., prevents false positives and false negatives in the results as much as possible. 4.1. Subjects Our studies are conducted on four open source software systems, namely jEdit, ArgoUML, Rhino and KOffice. These systems represent different sizes, development environments, and application domains. They are selected because their empirical benchmarks have been well established and widely used for software maintenance tasks evaluation by Poshyvanyk et al. [11,27]. Specifics of these subject systems are shown in Table 2. The first subject program Rhino1 is a JavaScript engine developed entirely in Java and managed by the Mozilla Foundation as open source software. The second subject is jEdit,2 which is a text editor written in Java. The third one is ArgoUML,3 which is an open source UML modeling tool that supports standard UML 1.4 diagrams. The final subject is KOffice,4 which is a free and open source office suite and graphics suite by KDE for Unix-like systems and Windows. 4.2. Measures 4.2.1. Feature location The measures to evaluate feature location follow those of Dit and Revelle et al. [27,39]. In their evaluation, the position of the first relevant method was used as a primary measure [27]. It is an accepted metric to evaluate the effectiveness of feature location techniques [27,39,40]. The results of feature location that rank relevant elements near the top of the list are more effective because they reduce the number of false positives a developer has to consider. Descriptive statistics of the effectiveness measure for each system are reported that summarize the data in terms of mean, 1 2 3 4

http://www.mozilla.org/rhino/. http://www.jedit.org/. http://argouml.tigris.org/. http://kde.org/applications/office/.

6

X. Sun et al. / Information and Software Technology 66 (2015) 1–12

Table 2 Subject systems. Subject

Version

Class

Method

KLoC

History

Rhino jEdit ArgoUML KOffice

1.5 4.3 0.28 2.1

138 483 1995 6500

1870 6400 13,300 104,600

32 109 367 231

1.4–1.5 4.2–4.3 0.24–0.28 2.0–2.1

Actual Set is the set of elements in the ground-truth impact set. Estimated Set is the set of elements that are estimated based on the proposed change impact analysis technique. As precision and recall are mutually constrained, there is a F-measure that combines precision and recall, that is, the harmonic mean of precision and recall. It is defined as follows:

F-measure ¼ median, minimum, maximum, lower quartile, and upper quartile. We used violin plots5 to represent the results. In addition, feature location may occasionally filter out all of a feature’s gold set methods, which are the mappings methods between features and program methods that are relevant to implement the feature. Therefore, the percentage of features for which a technique can locate at least one relevant method is also reported. If a feature location technique (A) can rank one of a feature’s relevant methods closer to the top of the list than another technique (B), A is considered to be more effective. In our study, feature location based on MSR4SM is also compared to traditional software repositories based feature location in this way. In addition, in order to test whether the effectiveness of traditional software repositories based approach is improved by MSR4SM, the null and alternative hypothesis (H0 and Ha ) to be evaluated at a 0.05 level of significance is performed. As there are multiple comparisons to a baseline technique, which are referred to as many-to-one comparisons, we used the Dunnett’s test to check if the difference between the effectiveness measures of two feature location techniques is statistically significant [42]. The Dunnett’s test is a multiple comparison procedure to compare each of a number of treatments with a single control. In our study, e procedure [43] for Dunnett-type contrasts calcuwe applied the T lation, which can be effectively used in a software engineering context [44]. Since a technique may not rank any of a feature’s gold set methods, it would have no data to be paired with the data from another feature location technique. Therefore, only cases where both feature location techniques rank a method are used for the test. For each pair of techniques we analyse the 95% confidence interval to test whether the corresponding null hypothesis can be rejected. If the upper boundary of the interval is less than zero for techniques A and B, we claim that the metric value is lower in A than in B. Similarly, if the lower boundary of the interval is greater than zero for A and B, we claim that the metric value is higher in A than in B. Finally, if the lower boundary of the interval is less than zero and the upper boundary is greater than zero, we conclude that the data does not provide enough evidence to reject the null hypothesis. We also estimated the magnitude of the difference between the employed metrics. We used Cliff’s Delta (or d), a non-parametric effect size measure [45] for the obtained data. The effect size is small for jdj < 0:33 (positive as well as negative values), medium for 0:33 6 jdj < 0:474 and large for jdj P 0:474 [45]. 4.2.2. Change impact analysis To evaluate the effectiveness of change impact analysis, we used precision and recall, two widely used metrics of information retrieval [46]. They are defined as follows:

jActual Set \ Estimated Setj  100% jEstimated Setj jActual Set \ Estimated Setj  100% Recall ¼ jActual Setj

Precision ¼

ð5Þ ð6Þ

5 A violin plot is a method of plotting numeric data, which is a box plot with a rotated kernel density plot on each side [41].

2  Precision  Recall Precision þ Recall

ð7Þ

Descriptive statistics of the precision, recall and F results are also reported that summarize the data in terms of mean, median, minimum, maximum, lower quartile, and upper quartile, for which, we used violin plots to represent them. In addition, in order to test whether the accuracy of traditional software repositories based change impact analysis approach is improved by MSR4SM, the null and alternative hypotheses (H0 and Ha ) to be evaluated at a 0.05 level of significance are also performed. We also used the Dunnett’s test to check if the difference between the precision, recall and F measures of two change impact analysis techniques is statistically significant. Moreover, we also used Cliff’s Delta to estimate the magnitude of the difference between the employed metrics. 4.3. Procedure In this subsection, we discuss the procedure to evaluate the effectiveness of the improved feature location and change impact analysis techniques. There are several parameters for MSR4SM, i.e., alpha, beta, and the number of topics for topics extraction, and the Cosine to show the similarity between the current system and its historical versions. For the parameters such as alpha, beta, and the number of topics, we used LDA-GA proposed by Panichella et al. to estimate their values [21]. LDA-GA uses genetic algorithms to determine a near-optimal configuration for LDA in the context of software engineering tasks. It is performed based on the conjecture that the higher the clustering quality produced by LDA, the higher the accuracy when used for software engineering tasks. With this approach, the values of the parameters in LDA can be obtained. In particular, the number of topics for jEdit, ArgoUML, Rhino and KOffice is 102, 324, 55, and 496, respectively. In our approach, the Cosine parameter affects the number of versions extracted from source control repositories. In our studies, we believe that the similarity between historical versions and the current system should be at least 50%. To reach 90% of similarity, the number of selected versions are too small (some results show that there is only one version left for 90% similarity value) in our studies, which greatly decreases the advantages of the software repositories based software maintenance tasks. Hence, we choose 50%, 60%, 70% and 80% for investigation in our studies. In the following, we introduce the detailed procedures for feature location and change impact analysis, respectively. 4.3.1. Feature location CVSSearch is selected as the representative feature location technique for the first study. The procedure of feature location evaluation with the benchmark is referred to Dit et al. [27]. In the study of feature location, jEdit and Rhino were used as the subject systems. To perform feature location, we need to provide the features and its corresponding gold set. For Rhino, the text from the specification was used to formulate IR requests. There are in all 241 features. For the gold set benchmarks for each feature, the mappings of source code to features made available by Eaddy et al. [47] were used. For jEdit, the SVN commits with issues from its bug tracking

X. Sun et al. / Information and Software Technology 66 (2015) 1–12

system between releases 4.2 and 4.3 were analyzed. The title and the description of the issue were used as the feature requests. In addition, the changes associated with each SVN commit were used to build the gold set of methods that were modified during that commit. From the benchmark of Dit et al. [27], we choose 150 features. With the feature requests, we used the feature location tech0 niques (CVSSearch and CVSSearch improved by MSR4SM) to estimate the locations of the requests. The output of CVSSearch is the lines of code corresponding to the requests. However, the gold set in the empirical benchmark is the methods of corresponding requests. Hence, we need to map the lines of code into their enclosing methods. Finally, we can compute the position of first relevant method, the percentage of features for which a technique can locate at least one relevant method, and the percentage of times 0 CVSSearch is more effective, to evaluate the feature location technique. 4.3.2. Change impact analysis ROSE is selected as the representative change impact analysis technique for the second study. In this study of change impact analysis, jEdit, ArgoUML and KOffice were used as the subject systems. To perform change impact analysis, we need to provide the change set and its corresponding actual set impacted by the change set. The source code changes in software repositories, i.e., commits, are used for the evaluation purpose. For a particular version (e.g., jEdit 4.3) of a subject system, a set of bug reports is mined from their bug tracking system, such as Bugzilla.6 The set of classes, which had been changed to fix each bug is then mined from their repositories. Specific details on the process of the identification of the bug reports and changed classes can refer to [48]. For each bug with n corresponding modified classes, we selected one class as the change set, and the other classes as the ground-truth impact set. So for each bug, change impact analysis is performed n times. During the process of collecting a set of bug reports and sets of changed classes respectively, we only include the bugs which contained more than two modified classes. The reason is that there should be at least one class in the change set, and the other class(es) as the ground-truth impact. As a result, we choose 123 bugs for these three subject systems. And the total number of corresponding changed classes is 469 for these bugs. In addition, we select a set of commits in a history period after the selected version as the actual changes. Each commit in this set is considered as an actual impact set, i.e., the ground truth, for evaluation purposes. Given the change set, we apply ROSE and ROSE0 improved by MSR4SM to compute their respective class-level impact sets. There are two thresholds when using ROSE, namely confidence and support. The confidence threshold should be set high (i.e., 0.8) so that the learned rules are precise enough. In addition, the co-changed elements should appear in a limited percentage of software history repository. Therefore, we set a low support value, i.e., 3, which is suitable for change impact analysis environment [31]. Finally, we obtain the precision, recall and F results to evaluate the change impact analysis techniques. 4.4. Results In this subsection, we discuss the results from our studies for feature location and change impact analysis, respectively. 4.4.1. Feature location First, we consider the empirical results of the feature location techniques to identify the first relevant method of 241 features 6

http://bugzilla.mozilla.org/.

7

of Rhino and 150 features of jEdit. We use three measures to evaluate MSR4SM. Fig. 2 shows the violin plots representing the descriptive statistics of the effectiveness measure for different feature location techniques based on Rhino and jEdit. The feature location techniques 0 include CVSSearch and CVSSearch based on different similarity values (Cosine ¼ 50%; 60%; 70%; 80%). In the violin plots, low values which represent positions of relevant methods, suggest potentially less effort for developers to locate relevant methods, because the ranks are among the first results returned by the feature location techniques. The y-axis represents the effectiveness measure. The results show that with MSR4SM to preprocess the software repositories, the effectiveness of the feature location technique (CVSSearch) is improved for both Rhino and jEdit. It shows that the irrelevant information in each of software repositories can affect the effectiveness of feature location. Another interesting observation is that when the similarity value, Cosine ¼ 70% or 80%, there is little difference between their effectiveness. So higher similarity value between the current system and historical versions do not always improve the effectiveness of the feature location, which will be further validated in our future studies. In addition, we also report the percentage of features for which a feature location technique can locate at least one relevant method. The results are shown in Table 3. From the results, we 0 notice that, for Cosine ¼ 50% or 60%, CVSSearch can locate all the feature’s methods as that of CVSSearch. In addition, when 0 Cosine ¼ 70% or 80%, sometimes CVSSearch cannot locate the relevant methods. To quantify the improvement based on MSR4SM, we report the percentage that a particular feature location technique is better and the percent of times the effectiveness measure of different 0 CVSSearch is better than that of CVSSearch. The results show that 0 is always better than CVSSearch when CVSSearch 0 Cosine ¼ 60%; 70%; 80%. Only fewer than 5% of CVSSearch (Cosine ¼ 50%) results are worse than CVSSearch. In addition, Table 4 shows the percent of times the effectiveness measure of 0 different CVSSearch is better than that of CVSSearch. From the results, after extracting the relevant information by MSR4SM, the 0 percent of times of CVSSearch is at least 16% higher than the original approach directly based on software repositories. In Table 4, there is also an interesting phenomenon similar to that in Fig. 2. That is, when the similarity measure, Cosine ¼ 80%, the percent 0 of times for CVSSearch is lower than that of Cosine ¼ 70%. This is because when setting high similarity value, some useful information may be filtered out by MSR4SM. Hence, when using MSR4SM, the similarity measure between the current system and its historical versions should not be set too high. Finally, we check whether the effectiveness of traditional software repositories based approach is improved by MSR4SM. The statistical results of Dunnett test and Cliff’s Delta results are shown in Table 5. From the statistical results, we see that when 0 Cosine ¼ 60%; 70%; 80%, the test (p-values) for the CVSSearch based on MSR4SM as compared to CVSSearch is less than 0.05, with a high effect size. So we should reject the null hypothesis. Moreover, the upper boundary of the interval is less than zero. This indicates that feature location based on the information extracted by MSR4SM can really improve the effectiveness of traditional software repositories based feature location technique. In addition, for Cosine ¼ 50%, the p-values is bigger than 0.05, with a small effect size. So we should accept the null hypothesis. Moreover, as the lower boundary of the interval is less than zero and the upper boundary is greater than zero, we conclude that the test does not provide enough evidence to reject the null 0 hypothesis. Therefore, from the statistical results, CVSSearch when

8

X. Sun et al. / Information and Software Technology 66 (2015) 1–12

Fig. 2. Effectiveness of different feature location techniques.

Table 3 The percentage of features for a feature location technique to locate at least one relevant method. System

Techniques

Rhino jEdit

0

0

CVSSearch60%

CVSSearch70%

CVSSearch80%

100% 100%

100% 100%

100% 100%

87% 91%

74% 81%

Techniques 0

Rhino jEdit

0

0

0

CVSSearch50%

CVSSearch60%

CVSSearch70%

CVSSearch80%

16% 20%

32% 34%

35% 35%

26% 18%

Table 5 Dunnett test and Cliff’s Delta results of feature location. Baseline FL

FL technique

Rhino CVSSearch

CVSSearch50%

p-value

Lower

Upper

Cliff’s d

0

0.839

54.2251

27.7251

0.1215

0

0.014

89.8084

7.8582

0.4774

0

0.000

116.1834

34.2332

0.8056

0

0.000

118.5584

36.6082

0.8281

CVSSearch60% CVSSearch70% CVSSearch80% jEdit CVSSearch

0

CVSSearch50%

Table 4 The percentage of times improved by MSR4SM. System

0

CVSSearch

0

0.178

24.8603

3.1937

0.2691

0

0.000

42.9020

14.8480

0.6302

0

0.000

65.0270

36.9730

0.9757

0

0.000

67.1103

39.0563

0.9913

CVSSearch50% CVSSearch60% CVSSearch70% CVSSearch80%

Cosine ¼ 60%; 70%; 80% can obviously improve the effectiveness of CVSSearch. To sum up, MSR4SM does help improve the feature location effectiveness of CVSSearch, in particular when Cosine ¼ 60%; 70%; 80%. An important factor that affects MSR4SM is the Cosine measure. The higher the Cosine value, the better the effectiveness of CVSSearch’. But when Cosine ¼ 70% or 80%, CVSSearch’ cannot identify all feature’s methods. So in practice, we can use Cosine ¼ 60%, when CVSSearch can effectively identify all feature’s methods.

4.4.2. Change impact analysis In this subsection, we discuss the empirical results of the change impact analysis techniques on jEdit, ArgoUML and KOffice. We use the precision, recall and F measures to evaluate MSR4SM. We first see the precision results of ROSE and ROSE0 based on different similarity values after preprocessing software repositories. The results are shown in Fig. 3, which shows the violin plots representing the descriptive statistics of the precision results for different change impact analysis techniques. The change impact analysis techniques include ROSE and ROSE0 based on different similarity values (Cosine ¼ 50%; 60%; 70%; 80%). In the violin plots, high values which represent the inverse measure of false positives, suggest potentially less effort for developers to identify the actually impacted classes. The y-axis represents the precision measure. From the results, we see that all the precision values of ROSE0 are improved based on MSR4SM. It shows that there is indeed some irrelevant information in software repositories during software maintenance and evolution. Hence, we should filter these irrelevant information to improve the precision of change impact analysis. In addition, during the process of preprocessing the software repositories, there may be some relevant information filtered out by MSR4SM, so we need to see whether the recall of ROSE is seriously decreased after information filtering. The recall results are shown in Fig. 4, which also shows the violin plots representing the descriptive statistics of the recall results for different change impact analysis techniques. The results show that most of the recall values are decreased after the information filtering process based on MSR4SM. However, the degree of the decreased values is not as big as the increase of the precision values. Hence, we can conclude that only a small amount of relevant information is filtered out during the process, which has little effect on software maintenance and evolution. As discussed above, precision of ROSE is increased by MSR4SM, but the recall is decreased. We show the F results in Fig. 5. The results show that all of the F values of ROSE are improved by

X. Sun et al. / Information and Software Technology 66 (2015) 1–12

9

Fig. 3. Precision of different change impact analysis techniques.

Fig. 4. Recall of different change impact analysis techniques.

Fig. 5. F-measure of different change impact analysis techniques.

MSR4SM. But when Cosine ¼ 50%, the improvement in F value is not obvious. Finally, we check whether the effectiveness of traditional software repositories based change impact analysis technique (ROSE) is improved by MSR4SM. The statistical results of Dunnett test and Cliff’s Delta results are shown in Table 6. We first see the precision results. The statistical results show that all the precision values of the test (p-values) for ROSE0 based on MSR4SM as compared to ROSE is less than 0.05, with a medium/high effect size. Hence, the corresponding hypothesis should be rejected. Moreover, the upper boundary of the interval is less than zero. This indicates that

MSR4SM can really improve the precision of traditional software repositories based change impact analysis. For the recall results, when Cosine ¼ 50%; 60%; 70%, the p values are bigger than 0.05, with a small effect size. Moreover, as the lower boundary of the interval is less than zero and the upper boundary is greater than zero, we conclude that the test does not provide enough evidence to reject the null hypothesis. This shows that there is no significant difference between the recall results for ROSE0 and ROSE. But when Cosine ¼ 80%, the test results show that there is obvious difference between them, and the lower boundary of the interval is greater than zero. We can conclude that the recall of ROSE is decreased,

10

X. Sun et al. / Information and Software Technology 66 (2015) 1–12

Table 6 Dunnett test and Cliff’s Delta results of change impact analysis. Baseline CIA

CIA technique

p-value

Lower

Upper

Cliff’s d

Precision ROSE

ROSE050%

0.010

0.0568

0.0049

0.4325

ROSE060%

0.000

0.1254

0.0764

0.9602

ROSE070%

0.000

0.1853

0.1341

1

ROSE080%

0.000

0.2082

0.1624

1

Baseline FL

FL technique

Recall ROSE

ROSE050%

1.000

0.0382

0.0447

0.0354

ROSE060%

1.000

0.0352

0.0481

0.0667

ROSE070%

0.721

0.0193

0.0640

0.2076

ROSE080%

0.003

0.0129

0.0966

0.4836

ROSE050%

0.008

0.0688

0.0068

0.4360

ROSE060%

0.000

0.1394

0.0833

0.9542

ROSE070%

0.000

0.1856

0.1300

1

ROSE080%

0.000

0.1924

0.1409

1

F-measure

ROSE

which shows that some useful and relevant information is also filtered by MSR4SM. Finally, we see the statistical results for F measure. The results show that all the F values are improved independent of the Cosine. To sum up, MSR4SM does help improve the change impact analysis accuracy of ROSE. In spite of some useful and relevant information being filtered by MSR4SM, the accuracy of ROSE is not seriously affected, in particular, when Cosine ¼ 50%; 60%; 70%. So when using MSR4SM, we can set the Cosine ranges between 50% and 70%. Such results can improve the precision of the impact results with little effect on the recall results. 4.5. Threats to validity Like any empirical validation, our study also has several threats to validity. First, we only applied our approach to four subject programs. Thus we cannot guarantee that the results from our empirical studies can be generalized to other subjects. However, our subjects are selected from open source projects and widely employed for empirical studies in the feature location and change impact analysis scenarios [11,27]. A second concern is the choice of the representative feature location and change impact analysis techniques. In this study, CVSSearch and ROSE are selected. Other techniques may produce different empirical results. In addition, CVSSearch was originally designed for CVS while being applied to SVN data in our study, which is also a threat to our study. For ROSE, there are two thresholds (confidence and support). In our study, these two values are set to be 0.8 and 3, respectively. The setting of these two thresholds are referred from the empirical results of Zimmerman et al. [49]. Other settings may produce different impact results for ROSE. A third concern is the gold set for feature location, and the change set and actual set for change impact analysis. To measure the effectiveness of feature location, we should provide the gold set. In Rhino, the gold set was manually extracted by other researchers who were not system experts. In jEdit, the gold set was extracted from SVN commits. Thus, relevant methods could be missing from the gold sets of each system. However, such evaluation approaches were also used by other researchers [11,27,47]. To perform change impact analysis, the change set should be first provided. In our study, we obtained the change set and actual impact set by mining the bug repositories. During the process,

we only included the bugs which contained more than two modified classes. These change impact analysis techniques are evaluated in such environment, which is different from the practical change impact analysis environment. However, such a method was also used in other studies [48,49]. In addition, to measure the precision and recall of change impact analysis, the actual impact set should be identified. It is extracted by selecting a set of commits over a short period after the selected version. In practice, the actual impacts may be different, even for the same change set. So we cannot easily obtain the real impact set of a given change set [48]. However, such an evaluation approach was also used in other studies [11,50,51]. A fourth concern is the measure of effectiveness for feature location and change impact analysis techniques. For feature location, we used the position of the first relevant method as a primate measure. Some other measures may provide different conclusion. However, the position of the first relevant method is a widely used measure for feature location in existing studies [27]. For change impact analysis, we used the precision, recall and F measures. Other accuracy metrics may produce different results for these techniques. However, these measures have also been widely used and accepted in the change impact analysis research community [28,50]. A fifth concern is the parameters used in our studies. For example, for the number of topics of LDA, we used LDA-GA proposed by Panichella et al. [21]. Some other different approaches to estimate the parameters may generate different results.

5. Related work Mining software repositories is a hot issue in the SE community in recent years [52–55]. In this section, we discuss the related work on effectively mining software repositories to support software maintenance. In our previous work, we addressed similar issues as in this work [56]. But there are some key differences, especially for the empirical study. First, we provided more details on how to select relevant information from each of the software repositories based on the defined measure, for example, the Cosine measure which denotes the similarity between current version and the historical version. The other difference is a wider experiment to further show the effectiveness of our approach. On the one hand, we evaluated our approach by a new study on another important software maintenance task, i.e., feature location. On the other hand, we performed our studies based on a benchmark with more subject programs and metrics, as well as some statistical analysis to fully evaluate the proposed approach. Herzig et al. performed empirical studies of tangled changes that introduce noise in software repositories [57]. The results show that about 20% of all bug fixes consist of multiple tangled changes. They also proposed an untangling algorithm, which reduces the impact of tangled changes on existing software repositories based techniques. Kiefer et al. proposed an approach to mine software repositories with iSPAROL and a software evolution ontology [58]. Their approach can accomplish a sizable number of tasks sought in software repository mining projects, for example, assessment of the amount of changes between versions or the detection of bad code smells. Their motivation is different from ours. In this article, we focus on extracting relevant information from each of the software repositories to support various software maintenance tasks. Keivanloo et al. proposed an open and collaborative platform for sharing software datasets in software repositories [59]. The platform supports data extraction and on-the-fly inter-dataset

X. Sun et al. / Information and Software Technology 66 (2015) 1–12

integration from various version control, issue tracking, and quality evaluation repositories. Their focus is on the integration of the information in software repositories. While in this article, we focus on extracting relevant information based on the maintenance request and the current system. Then, we leverage these extracted information to support software maintenance. Antoniol et al. presented an approach to recover time variant information from software repositories [60]. They use the Linear Predictive Coding and Cepstrum coefficients to model time varying software artifact histories. Based on their approach, the artifacts evolved in the same way or with similar evolution patterns can be identified. While in this article, we selected information from each repository in software repositories that is relevant to the maintenance request and the current software system. Yu et al. proposed a novel evolutionary programming based asymmetric weighted least squares support vector machine ensemble learning methodology for software repository mining [61]. They are more focused on the classification of the information in software repositories. Then the results of the classification are used for some software maintenance tasks, such as fault prediction. In this article, we did not classify the data; instead, we just perform the preprocessing operation on the data to remove irrelevant data in each of the software repositories. Thomas et al. proposed to use the topic model to study the software evolution [62]. They focused on investigating whether the topics correspond well with actual code changes. The results show the effectiveness of using topic models as tools for studying the evolution of a software system. Their studies provide good evidence to use the topic model to mine the topics from software repositories.

6. Conclusion and future work This article proposed MSR4SM, which can improve the effectiveness of traditional software repositories based software maintenance tasks. MSR4SM facilitates the software maintenance by extracting the information from each of the software repositories that is relevant to the maintenance request and the current system. The generated software repositories eliminate the information that is outdated during software evolution, thus improving the effectiveness of traditional software repositories based software maintenance tasks. Finally, we evaluate our approach based on two software maintenance tasks, i.e., feature location and change impact analysis, on four systems (jEdit, ArgoUML, Rhino and KOffice). The empirical results show that MSR4SM improves the effectiveness of traditional software repositories based feature location and change impact analysis techniques. Though we have shown the effectiveness of MSR4SM through empirical studies based on feature location and change impact analysis, it does not imply its generality for other software maintenance tasks, for example, bad smell detection [63,9], traceability recovery [64], etc. Moreover, we will conduct experiments on more subject systems to evaluate the generality of MSR4SM. In addition, for our future work, we will use a longer period of historical versions to study the impact of frequency analysis on MSR4SM. Finally, we will conduct more empirical studies to recommend a reasonable range for selecting appropriate Cosine values to further improve the effectiveness of MSR4SM. Acknowledgment The authors would like to sincerely thank the anonymous reviewers who provided useful suggestions to make the article clearer and stronger.

11

References [1] V. Rajlich, Software evolution and maintenance, in: Proceedings of the on Future of Software Engineering, FOSE 2014, Hyderabad, India, May 31–June 7, 2014, pp. 133–144. [2] S. Wang, D. Lo, L. Jiang, Understanding widespread changes: a taxonomic study, in: 17th European Conference on Software Maintenance and Reengineering, CSMR 2013, Genova, Italy, March 5–8, 2013, pp. 5–14. [3] S. Wang, D. Lo, Version history, similar report, and structure: putting them together for improved bug localization, in: 22nd International Conference on Program Comprehension, ICPC 2014, Hyderabad, India, June 2–3, 2014, pp. 53–63. [4] S.W. Thomas, Mining software repositories using topic models, in: Proceedings of the 33rd International Conference on Software Engineering, 2011, pp. 1138–1139. [5] H.H. Kagdi, M.L. Collard, J.I. Maletic, A survey and taxonomy of approaches for mining software repositories in the context of software evolution, J. Softw. Mainten. 19 (2) (2007) 77–131. [6] J. Anderson, S. Salem, H. Do, Improving the effectiveness of test suite through mining historical data, in: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, May 31–June 1, 2014, Hyderabad, India, 2014, pp. 142–151. [7] M. Torchiano, F. Ricca, Impact analysis by means of unstructured knowledge in the context of bug repositories, in: Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, 2010, pp. 47:1–47:4. [8] B. Dit, M. Revelle, M. Gethers, D. Poshyvanyk, Feature location in source code: a taxonomy and survey, J. Softw.: Evol. Process 25 (1) (2013) 53–95. [9] F. Palomba, G. Bavota, M. Di Penta, R. Oliveto, A. De Lucia, D. Poshyvanyk, Detecting bad smells in source code using change history information, in: IEEE/ ACM International Conference on Automated Software Engineering, 2013, pp. 268–278. [10] A. Mockus, R.T. Fielding, J.D. Herbsleb, Two case studies of open source software development: Apache and Mozilla, ACM Trans. Softw. Eng. Methodol. 11 (3) (2002) 309–346. [11] H.H. Kagdi, M. Gethers, D. Poshyvanyk, Integrating conceptual and logical couplings for change impact analysis in software, Empir. Softw. Eng. 18 (5) (2013) 933–969. [12] D.M. Germán, A study of the contributors of PostgreSQL, in: Proceedings of the 2006 International Workshop on Mining Software Repositories, MSR 2006, Shanghai, China, May 22–23, 2006, 2006, pp. 163–164. [13] E. Kouters, B. Vasilescu, A. Serebrenik, M.G.J. van den Brand, Who’s who in Gnome: Using LSA to merge software repository identities, in: 28th IEEE International Conference on Software Maintenance, ICSM 2012, Trento, Italy, September 23–28, 2012, pp. 592–595. [14] J. Han, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2005. ISBN 1558609016. [15] C.D. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval, Cambridge University Press, New York, NY, USA, 2008. ISBN 0521865719, 9780521865715. [16] D. Pyle, Data Preparation for Data Mining, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1999. ISBN 1-55860-529-0. [17] D. Li, Y. Ding, X. Shuai, J. Bollen, J. Tang, S. Chen, J. Zhu, G. Rocha, Adding community and dynamic to topic models, J. Informetrics 6 (2) (2012) 237–253. [18] K. Barnard, P. Duygulu, D.A. Forsyth, N. de F., D.M. Blei, M.I. Jordan, Matching words and pictures, J. Mach. Learn. Res. 3 (2003) 1107–1135. [19] D.M. Blei, A.Y. Ng, M.I. Jordan, Latent Dirichlet allocation, J. Mach. Learn. Res. 3 (2003) 993–1022. [20] S.W. Thomas, A.E. Hassan, D. Blostein, Mining unstructured software repositories, in: Evolving Software Systems, 2014, pp. 139–162. [21] A. Panichella, B. Dit, R. Oliveto, M. Di Penta, D. Poshyvanyk, A. De Lucia, How to effectively use topic models for software engineering tasks? An approach based on genetic algorithms, in: 35th International Conference on Software Engineering, 2013, pp. 522–531. [22] S.K. Lukins, N.A. Kraft, L.H. Etzkorn, Bug localization using latent Dirichlet allocation, Inf. Softw. Technol. 52 (9) (2010) 972–990. [23] L.R. Biggers, C. Bocovich, R. Capshaw, B.P. Eddy, L.H. Etzkorn, N.A. Kraft, Configuring latent Dirichlet allocation based feature location, Empir. Softw. Eng. 19 (3) (2014) 465–500. [24] G. Antoniol, Y. Guéhéneuc, Feature identification: an epidemiological metaphor, IEEE Trans. Softw. Eng. 32 (9) (2006) 627–641. [25] X. Peng, Z. Xing, X. Tan, Y. Yu, W. Zhao, Improving feature location using structural similarity and iterative graph mapping, J. Syst. Softw. 86 (3) (2013) 664–676. [26] A. Chen, E. Chou, J. Wong, A.Y. Yao, Q. Zhang, S. Zhang, A. Michail, CVSSearch: Searching through Source Code Using CVS Comments, in: ICSM, 2001, p. 364. [27] B. Dit, M. Revelle, D. Poshyvanyk, Integrating information retrieval, execution and link analysis algorithms to improve feature location in software, Empir. Softw. Eng. 18 (2) (2013) 277–309. [28] B. Li, X. Sun, H. Leung, S. Zhang, A survey of code-based change impact analysis techniques, Softw. Test., Verif. Reliab. 23 (8) (2013) 613–646. [29] J. Law, G. Rothernel, Whole program path-based dynamic impact analysis, in: Proceedings of the International Conference on Software Engineering, 2003, pp. 308–318. [30] S. Zhang, Z. Gu, Y. Lin, J.J. Zhao, Change impact analysis for AspectJ programs, in: Proceedings of the International Conference on Software Maintenance, 2008, pp. 87–96.

12

X. Sun et al. / Information and Software Technology 66 (2015) 1–12

[31] T. Zimmermann, A. Zeller, P. Weissgerber, S. Diehl, Mining version histories to guide software changes, IEEE Trans. Softw. Eng. 31 (6) (2005) 429–445. [32] A. Jermakovics, A. Sillitti, G. Succi, Mining and visualizing developer networks from version control systems, in: Proceedings of the 4th International Workshop on Cooperative and Human Aspects of Software Engineering, CHASE 2011, May 21, 2011, Waikiki, Honolulu, HI, USA, 2011, pp. 24–31. [33] C. Lewis, Z. Lin, C. Sadowski, X. Zhu, R. Ou, E.J. Whitehead Jr., Does bug prediction support human developers? Findings from a Google case study, in: 35th International Conference on Software Engineering, ICSE ’13, May 18–26, 2013, San Francisco, CA, USA, 2013, pp. 372–381. [34] M. Fischer, M. Pinzger, H.C. Gall, Populating a release history database from version control and bug tracking systems, in: 19th International Conference on Software Maintenance (ICSM 2003), The Architecture of Existing Systems, 22– 26 September 2003, Amsterdam, The Netherlands, 2003, p. 23. [35] S. Kim, H. Zhang, R. Wu, L. Gong, Dealing with noise in defect prediction, in: Proceedings of the 33rd International Conference on Software Engineering, ICSE 2011, May 21–28, 2011, Waikiki, Honolulu, HI, USA, 2011, pp. 481–490. [36] A. Guzzi, A. Bacchelli, M. Lanza, M. Pinzger, A. van Deursen, Communication in open source software development mailing lists, in: Proceedings of the 10th Working Conference on Mining Software Repositories, MSR ’13, May 18–19, 2013, San Francisco, CA, USA, 2013, pp. 277–286. [37] A. Bacchelli, M. Lanza, R. Robbes, Linking e-mails and source code artifacts, in: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, ICSE 2010, 1–8 May 2010, vol. 1, Cape Town, South Africa, 2010, pp. 375–384. [38] M. Shtern, V. Tzerpos, Clustering methodologies for software engineering, Adv. Softw. Eng. (2012). [39] M. Revelle, B. Dit, D. Poshyvanyk, Using data fusion and web mining to support feature location in software, in: The 18th IEEE International Conference on Program Comprehension, ICPC 2010, June 30–July 2, 2010, Braga, Minho, Portugal, 2010, pp. 14–23. [40] D. Liu, A. Marcus, D. Poshyvanyk, V. Rajlich, Feature location via information retrieval based filtering of a single scenario execution trace, in: 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2007), November 5–9, 2007, Atlanta, GA, USA, 2007, pp. 234–243. [41] J.L. Hintze, R.D. Nelson, Violin plots: a box plot-density trace synergism, Am. Statist. 52 (2) (1998) 181–184. [42] C.W. Dunnett, A multiple comparison procedure for comparing several treatments with a control, J. Am. Statist. Assoc. 50 (272) (1955) 1096–1121. [43] F. Konietschke, L.A. Hothorn, E. Brunner, Rank-based multiple test procedures and simultaneous confidence intervals, Electron. J. Statist. (6) (2012) 738–759. [44] S. Wang, D. Lo, B. Vasilescu, A. Serebrenik, EnTagRec: an enhanced tag recommendation system for software information sites, in: 30th IEEE International Conference on Software Maintenance and Evolution, September 29–October 3, 2014, Victoria, BC, Canada, 2014, pp. 291–300. [45] R.J. Grissom, J.J. Kim, Effect Size for Research. A Broad Practical Approach, Lawrence Earlbaum Associates, 2005. [46] C.J. van Rijsbergen, Information Retrieval, Butterworth, London, 1979. [47] M. Eaddy, T. Zimmermann, K.D. Sherwood, V. Garg, G.C. Murphy, N. Nagappan, A.V. Aho, Do crosscutting concerns cause defects?, IEEE Trans Software Eng. 34 (4) (2008) 497–515. [48] D. Poshyvanyk, A. Marcus, R. Ferenc, T. Gyimothy, Using information retrieval based coupling measures for impact analysis, Empir. Softw. Eng. 14 (1) (2009) 5–32.

[49] T. Zimmermann, P. Weißgerber, S. Diehl, A. Zeller, Mining version histories to guide software changes, IEEE Trans. Softw. Eng. 31 (6) (2005) 429–445. [50] B. Li, X. Sun, J. Keung, FCA-CIA: an approach of using FCA to support cross-level change impact analysis for object oriented Java programs, Inf. Softw. Technol. 55 (8) (2013) 1437–1449. [51] X. Sun, B. Li, B. Li, W. Wen, A comparative study of static CIA techniques, in: Proceedings of the Fourth Asia–Pacific Symposium on Internetware, 23, 2012. [52] N. Ali, Y.-G. Gueheneuc, G. Antoniol, Trustrace: mining software repositories to improve the accuracy of requirement traceability links, IEEE Trans. Softw. Eng. 39 (5) (2013) 725–741. ISSN 0098-5589. [53] T. Wang, H. Wang, G. Yin, C.X. Ling, X. Li, P. Zou, Mining software profile across multiple repositories for hierarchical categorization, in: Proceedings of the 2013 IEEE International Conference on Software Maintenance, ICSM ’13, 2013, pp. 240–249, ISBN 978-0-7695-4981-1. [54] J. Hu, X. Sun, D. Lo, B. Li, Modeling the evolution of development topics using dynamic topic models, in: 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering, SANER 2015, March 2–6, 2015, Montreal, QC, Canada, 2015, pp. 3–12. [55] S.W. Thomas, Mining software repositories using topic models, in: Proceedings of the 33rd International Conference on Software Engineering, ICSE ’11, 2011, pp. 1138–1139, ISBN 978-1-4503-0445-0. [56] X. Sun, B. Li, Y. Li, Y. Chen, What information in software historical repositories do we need to support software maintenance tasks? An approach based on topic model, in: Computer and Information Science, Springer International Publishing, 2015, pp. 27–37. [57] K. Herzig, A. Zeller, The impact of tangled code changes, in: Proceedings of the 10th Working Conference on Mining Software Repositories, MSR ’13, 2013, pp. 121–130, ISBN 978-1-4673-2936-1. [58] C. Kiefer, A. Bernstein, J. Tappolet, Mining software repositories with iSPAROL and a software evolution ontology, in: Proceedings of the Fourth International Workshop on Mining Software Repositories, MSR ’07, 2007, pp. 10–, ISBN 07695-2950-X. [59] I. Keivanloo, C. Forbes, A. Hmood, M. Erfani, C. Neal, G. Peristerakis, J. Rilling, A linked data platform for mining software repositories, in: Proceedings of the 9th IEEE Working Conference on Mining Software Repositories, 2012, pp. 32– 35, ISBN 978-1-4673-1761-0. [60] G. Antoniol, V.F. Rollo, G. Venturi, Linear predictive coding and cepstrum coefficients for mining time variant information from software repositories, in: Proceedings of the 2005 International Workshop on Mining Software Repositories, 2005, pp. 1–5, ISBN 1-59593-123-6. [61] L. Yu, An evolutionary programming based asymmetric weighted least squares support vector machine ensemble learning methodology for software repository mining, Inf. Sci. 191 (2012) 31–46. ISSN 0020-0255. [62] S.W. Thomas, B. Adams, A.E. Hassan, D. Blostein, Studying software evolution using topic models, Sci. Comput. Program. 80 (2014) 457–479. [63] F.A. Fontana, P. Braione, M. Zanoni, Automatic detection of bad smells in code: an experimental assessment, J. Object Technol. 11 (2) (2012) 5: 1–38. [64] A. De Lucia, M. Di Penta, R. Oliveto, A. Panichella, S. Panichella, Applying a smoothing filter to improve IR-based traceability recovery processes: An empirical investigation, Inf. Softw. Technol. 55 (4) (2013) 741–754.