Concept location using program dependencies and information retrieval (DepIR)

Concept location using program dependencies and information retrieval (DepIR)

Information and Software Technology 55 (2013) 651–659 Contents lists available at SciVerse ScienceDirect Information and Software Technology journal...

310KB Sizes 0 Downloads 61 Views

Information and Software Technology 55 (2013) 651–659

Contents lists available at SciVerse ScienceDirect

Information and Software Technology journal homepage: www.elsevier.com/locate/infsof

Concept location using program dependencies and information retrieval (DepIR) Maksym Petrenko a,⇑, Václav Rajlich b a b

Earth System Science Interdisciplinary Center, University of Maryland, College Park, USA Department of Computer Science, Wayne State University, Detroit, USA

a r t i c l e

i n f o

a b s t r a c t

Article history: Received 21 June 2012 Received in revised form 28 September 2012 Accepted 30 September 2012 Available online 13 October 2012

Context: The functionality of a software system is most often expressed in terms of concepts from its problem or solution domains. The process of finding where these concepts are implemented in the source code is known as concept location and it is a prerequisite of software change. Objective: We investigate a static approach to concept location named DepIR that combines program dependency search (DepS) with information retrieval-based search (IR). In this approach, programmers explore the static program dependencies of the source code components retrieved by the IR search engine. Method: The paper presents an empirical study that compares DepIR with its constituent techniques. The evaluation is based on an empirical method of reenactment that emulates the steps of concept location for 50 past changes mined from software repositories of five software systems. Results: The results of the study indicate that DepIR significantly outperforms both DepS and IR. Conclusion: DepIR allows developers to perform concept location efficiently. It allows finding concepts even with queries that do not rank the relevant software components highly. Since formulating a good query is not always easy, this tolerance of lower-quality queries significantly broadens the usability of DepIR compared to the traditional IR. Ó 2012 Elsevier B.V. All rights reserved.

Keywords: IR Dependency search Concept location

1. Introduction Concept (or feature) location is one of the most common program comprehension activities and it identifies parts of the source code that implement particular concepts. Concept location has been addressed in the context of many software engineering tasks [1,2]; it is an important and well defined part of the software change process [3]. In this process, concept location starts with the change request, which can be a new feature request or a bug report, and identifies the location in the code where the core of the change needs to be made. While concept location identifies the specific code snippet where the change starts, the full extent of the software change is established via impact analysis, which uses the results of concept location as an input and identifies all the remaining locations in source code that should be modified. These two activities are recognized to be complementary to each other yet they are methodologically different [4,5]. Developers start concept location by extracting relevant concepts from the change request. Then, they locate them in the

⇑ Corresponding author. Tel.: +1-301-614-5830. E-mail addresses: (V. Rajlich).

[email protected]

(M.

Petrenko),

[email protected]

0950-5849/$ - see front matter Ó 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.infsof.2012.09.013

source code, which is a search process, and may be difficult and time consuming in large software. Concept location tools and techniques assist developers in this activity. Concept locations tools should make use of a variety of available information in assisting the developer during the search. In this paper, we propose a concept location technique, named Dependency Search combined with Information Retrieval (DepIR), which guides developers in two ways: (1) by supporting searching the textual information in the source code (i.e., identifiers and comments) via the use of an Information Retrieval based search engine (IR); and (2) by supporting dependency search (DepS) to navigate the source code based on automatically inferred static program dependencies. We evaluate DepIR using the process of automated reenactment, where an algorithm repeats past changes and simulates actions of programmers during the concept location approach, while using a specific concept location technique. The results of the study in five systems indicate that DepIR offers significant advantages when compared to the traditional DepS and IR approaches. The rest of the paper is organized in the following way. Section 2 summarizes the previous work. Section 3 reviews the relevant background and presents the methodology and the tool support behind DepIR. In Section 4, DepIR is evaluated through an empirical study in open source software, while Section 5 contains conclusions and future work.

652

M. Petrenko, V. Rajlich / Information and Software Technology 55 (2013) 651–659

2. Related work

2.2. Concept location by searching program dependencies

Based on types of analyses and underlying information, existing approaches to concept location can be classified as static, dynamic, and hybrid [6]. In this work, we study techniques that belong to the category of static concept location [2,7]. One of the most popular approaches to static concept location is textual pattern matching, an approach where the programmers use a tool (e.g. grep [8]) to search the source code for patterns that match queries, formulated as regular expressions. Petrenko et al. described how ontology fragments can be used to formulate and improve the quality of these queries [9]. IR-based approaches provide an alternative to the pattern matching tools and rely on queries, formulated in terms of natural language; we provide a detailed review and discussion of IR approaches in Section 2.1. Other static techniques are based on using program dependencies that guide the search process; we discuss these techniques in Section 2.2.

Source code components (e.g., classes, methods, fields, etc.) relate to one another via data and control flow. These relationships between the software components can be modeled as graph, where the nodes represent the components and the edges represent dependencies between these components (e.g., method calls, inheritance, declarations of abstract data types, etc.). Program dependency search (DepS) is a concept location technique where the programmers traverse such graph of a software system in a depth-first manner, starting from a known location [12]. The following steps describe the process of DepS. The first step is performed once, while the second step iterates until the concept is located.

2.1. Concept location using information retrieval Information retrieval-based concept location (IR) is an extension and improvement over the traditional text search, and is based on the analysis of relationships between words and documents in large bodies of text (i.e., statically parsed documents from source code) [10,11]. The following steps summarize the IR: 1. Indexing the software system: The source code of the software system is parsed to extract software components at a specific granularity (e.g., methods or classes). Then, a corpus of the software system is created, where the documents in the corpus correspond to the source code of the extracted components. After this, each document is analyzed to identify meaningful words; this may include splitting composite identifiers, removing language-specific stop-words, and so forth. 2. Formulating a query: The developers identify a set of terms that are relevant to the target concept (e.g., ‘save document’, ‘add bookmark’, etc.). One or more of these terms form the initial query. 3. Ranking retrieved documents: Once the query is formulated and executed, the IR engine identifies relevant documents (i.e., parts of the source code) and presents them to the developers. Each document is assigned an IR relevance score on the scale from 0 (not relevant) to 1 (most relevant) and the results returned by the search engine are ranked in descending order based on this score. 4. Investigating the retrieved documents: The retrieved documents are inspected according to their ranks. If an inspected document is a part of the target concept, the search procedure is over. Otherwise, the investigation continues to a document with the next highest rank. 5. Reformulating the query: While reviewing a document, if the programmers acquired knowledge that can help to formulate a better query, the query is updated and all the documents are re-ranked. The effectiveness of the search depends on developers’ knowledge of the system and its domain, and their ability to formulate good queries, as well as on the quality of the source code comments and identifiers. Framing concept location as a text search problem simplifies the solution, but it ignores much of the inherent structure of the source code. DepIR addresses this issue by utilizing the dependencies between software elements to navigate the results (see Section 3 for details).

1. Identifying a starting point: DepS starts with an entry point of the system; for example, in a Java system, this can be one of the ‘main’ or ‘init’ methods. Alternatively, the search can start at a node known by the programmer to be relevant to the current task. 2. Browsing program dependencies: From the starting point, programmers inspect the nodes of the dependency graph in a depth-first order and decide whether these nodes implement the target concept. The node that is currently inspected is referred to as the focus node. If the node contains the concept, the search stops. If the programmers decide that the focus node does not implement the concept but it is related to this concept, the search continues to one of the adjacent nodes in the dependency graph. On the other hand, if the focus node is irrelevant to the concept, the search backtracks to an earlier focus node. One of the advantages of DepS is that it does not require knowledge of vocabulary of the system, as it operates on the structural information contained in the dependency graph alone; thus, it is suitable for locating concepts even in unknown systems with absent comments and poor identifier naming conventions. On the other hand, finding a good starting point for the search and choosing between the neighbors of a node are difficult problems.

2.3. Hybrid concept location techniques Hybrid static methods combine different types of static information. Zhao et al. described a fully automatic Sniafl concept location approach that combines structural and textual information [13]. In this approach, each feature implemented in a program is assigned a set of potentially relevant methods, based on call relationships of the methods as well as their IR similarity to the description of the feature. Then, methods truly relevant to a specific feature are determined as a set-theoretical difference between the methods in its potentially relevant set and potentially relevant sets of all other features. In this way, Sniafl uses dependency information to identify methods that implement the concept but do not contain IR search terms. As Sniafl requires description of all features implemented in the system, it may be less practical in the systems, where such descriptions are not available or are incomplete; this is different from DepIR that needs only a description of a single feature of interest (i.e., the change request). Shao et al. discussed an approach where IR ranks of relevant methods are increased if their direct neighbors in dependency graph are also relevant [14]. A similar approach is discussed by Hayashi et al., where IR ranks of relevant methods, collected by a dynamic tracing technique, are adjusted based on the number and the distance in the dependency graph of their relevant neighbors, both direct and transitive [15].

M. Petrenko, V. Rajlich / Information and Software Technology 55 (2013) 651–659

Unlike DepIR, these approaches do not support inspecting methods that have a low IR rank; in other words, these approaches do not allow programmers to explore the dependency graph, but rather use its dependencies to only adjust IR ranks. Eaddy et al. explain project ‘‘CERBERUS’’, where a limited number of methods returned by dynamic tracing and IR techniques are used as a ‘‘seeds’’ for a static code analysis tool [16,17]. The tool analyzes program dependencies to identify all program elements that would have to be either removed or modified if the seed methods are removed. Together with the seeds, the identified methods comprise a subgraph of the complete dependency graph, representing a potential implementation of the concept. Revelle and Poshyvanyk discuss a similar approach, where the subgraph is computed by identifying methods that have a high textual similarity to the seeds and appear in a dynamically created trace [18]. Ratanotayanon et al. present a tool where the dependency graph is used to expand a set of seed methods that are obtained by ranking CVS commits with IR [19]. Hayashi et al. discuss an approach where a set of seed methods acquired by IR is expanded by neighbors in dependency graph that are neighbors of the seed methods in a supplied ontology fragment [20]. Shepherd et al. used natural language processing (NLP) techniques to expand user queries with terms glanced from source code artifacts and dependencies, in order to produce potentially relevant subgraphs of the dependency graph [21]. These approaches use source code dependencies to expand an initial set of relevant methods suggested by IR or a programmer. Nonetheless, to explore the expanded set of methods, the programmers still need to use other concept location techniques where they navigate program dependencies. Concept slicing techniques decompose a program into a set of modules or executable slices that represent conceptually cohesive fragments of the program’s domain [22,23]. The techniques start by defining a domain model and identifying a set of seed statements based on the concept assignment process, where the concepts from the domain model are mapped onto the source code statements. A program dependency graph is then used to construct program slices that originate at the seed statements and are relevant to the concepts in the domain model. As in the previous group of approaches, other techniques have to be used to inspect the slices and locate the concept. Furthermore in contrast to the IRbased techniques like DepIR, the concept assignment process typically requires that concepts are mapped onto non-overlapping segments of code and this makes concept slicing challenging when overlapping concepts are present [24]. Hill et al. describe a tool ‘‘Dora’’ that uses call graph to expand an initial set of user-supplied seed methods based on an IR query [25]. First, the tool builds a subgraph of the call graph by recursively including those neighbors of the seed methods that have IR relevance score exceeding the ‘exploration threshold’. In this subgraph, the methods with the IR score exceeding an even higher ‘relevance threshold’ are presented to the user as relevant to the concept. Tool Dora does not require user input when exploring the dependency graph, but it requires a manual identification of the seed methods and is less suitable for identifying concepts in methods that have a low IR relevance score. Recent research has studied the information needs of developers during software maintenance. One such study revealed that in most cases, when faced with changes to existing software, developers ‘‘began by searching for relevant code both manually and using search tools . . . When developers found relevant code, they followed its incoming and outgoing dependencies, often returning to it and navigating its other dependencies’’ [26]. The findings are consistent with later studies that found that developers spend much of their time on activities to ‘‘finding focus points’’ and then ‘‘expanding the focus points’’ during software change [27]. These and other studies [28] strengthen the view that static

653

concept location is an instance of information seeking behavior [29]. Previous studies have also found that ‘‘during a program investigation task, a methodical investigation of the code of a system is more effective than an opportunistic approach’’ [30].

3. DepIR DepIR is a hybrid concept locations techniques that leverages the strengths of each constituent technique against the weaknesses of the other. Compared to the stand-alone DepS and IR, DepIR gives the users more options how to conduct search and hence it allows locating concepts that can be missed by these two techniques alone. One of the weaknesses of the IR is that it constraints the user to navigate the software in the order of the results returned to a given user query. This ordering, while semantically relevant, is void of structural information. Thus, two software components A and B, which are structurally strongly related, may not appear adjacent in the ranked list of results. While investigating A, the developer should be aware of or at least pointed to B. In DepIR, structural relationships are preserved and presented to the user together with the text retrieval results. On the other hand, when searching the dependency graph using DepS, some of dependencies are hard to detect by available program analysis tools, for example database dependencies, reflection, and so forth. Hence, the dependency graph may not contain a path to the node implementing the concept, making this node (location) hard to reach. Moreover in large systems, the path in the graph from the entry point could be long and developers may need to backtrack several times. DepIR overcomes these limitations by using IR to query source code and identify a relevant starting point for the search, and also by ranking neighbors of nodes in the dependency graph by their relevance to the user query. DepIR combines IR with DepS. It guides developers by maintaining a global context and a local context. The global context relates to the change request and is captured in the user query that is handled by information retrieval. The local context relates to the source code dependencies and allows browsing to the immediate neighbors. Although a different dependency graph can be used, we will describe DepIR based on the Method Dependency Graph (MDG). In MDG, the nodes of the graph are program methods and dependencies are calls between these methods as identified by traversing Eclipse Abstract Syntax Tree (AST) [31] and extracting method bindings [32]. The following steps summarize DepIR and are illustrated by Fig. 1: 1. Initial indexing of documents and extraction of program dependencies: The source code of the software system is parsed and all methods and their dependencies defined in the code are extracted to build MDG as explained above. The corpus of the software system is created and indexed with an IR technique as explained in Section 2.1. This part of the process is fully automatic (i.e., no user input is required) and performed every time major changes applied to the system. 2. Ranking methods: The developers formulate a text query that describes the target concept based on their knowledge of the code and the description of the change request. During concept location, they may learn new facts about the code and return to this step and reformulate the query. Once the query is formulated, the IR engine ranks all methods in descending order based on their relevance (i.e., textual similarity) to the query. 3. Selecting a promising method: The programmer inspects these ordered results and determines their relevance to the change

654

M. Petrenko, V. Rajlich / Information and Software Technology 55 (2013) 651–659

(1)

11

12

2

11

12

2

11

12

2

3

7

10

3

7

10

3

7

10

1

8

6

1

8

6

1

8

6

9

4

5

9

4

5

9

4

5

(2)

(3)

(4)

Fig. 1. Example of DepIR workflow: (1) Indexing methods and extracting their dependencies; circles represent methods and lines represent dependencies. (2) Formulating a query and ranking the methods; numbers indicate the ranks of the methods. (3) Investigating the results of the query; the method with the highest rank is shaded in gray. (4) Exploring neighborhood of the highly-ranked methods using dependency search, as highlighted by the bold edges, and locating the concept in the methods shaded in black.

task. For each method, the following decisions and actions are made: a. If the method is the place that will change, then concept location ends and the developer continues with the next phase of the software change (not discussed in this paper). b. If the method is irrelevant to the task at hand, the programmers inspect the method with the next highest rank, return to the step #2 and reformulate the query. c. If the method is related to the concept’s implementation, yet it is not where changes are to be made, then the programmers consider this method to be ‘‘focus method’’ continues with the step #4. 4. Navigating program dependencies: The developers explore the MDG by investigating methods in the neighborhood of the focus method. The adjacent methods are ranked based on their similarity to the user query. For each neighbor, the developer has the following options: a. If the method will be the one that changes, then concept location ends and the developer continues with the next phase of the software change (not discussed here). b. If the method is not the one that will change, then navigate to an adjacent method or backtrack. c. Return to step#2 and reformulate the query. d. Stop the navigation; return to the IR search results from step #3 and continue to investigate the list of retrieved methods. DepIR keeps track of the entire search and navigation path to guide the developer and to avoid repetitions. Once the programmer examines a method during the concept location and decides that it does not implement the target concept, DepIR marks the method and notifies the programmers if they attempt to visit it the second time. This ensures that the same method is not inspected twice if there are cycles in the MDG. 3.1. Example of locating a concept using DepIR A practical example of DepIR is locating a concept ‘Open a file by dragging and dropping it into the editor’ in jEdit. Note that jEdit is also used in the case study of Section 4. The concept location begins by formulating the following DepIR query: ‘open file using drag and drop’. The top ten methods returned by IR and their relevance scores are isDragInProgress (0.75), isDragEnabled (0.71), startDragAndDrop (0.66), setDragInProgress (0.64), drop (0.59), setDragEnabled (0.58), dragEnter (0.48), two overloaded openFile (0.47) methods in the jEdit class, and the insideSelection (0.46) method in the SelectionManager class. Upon inspecting the top ranked methods in the TexEdit class, the programmer concluded that although these methods are rele-

vant to the ‘drag-and-drop’ concept, they are irrelevant to the ‘open file’ concept. After inspecting one of the openFile (0.46) methods in the jEdit class, the programmer discovered this method does not implement any functionality but rather calls one of the other five overloaded openFile methods in the same class. Since the name of these methods suggested that one of them could be used to open a file using drag-and-drop, the programmer decided to follow program dependencies between them. After visiting two methods with IR relevance scores of 0.28 and 0.14, the programmer discovered the openFile method (0.26) that had two potentially relevant callers: the importFile (0.13) method in the TextAreaTransferHandler class and the run (0.26) method in the DraggedURLLoader class. Upon inspecting the higher-ranked run (0.26) method in the DraggedURLLoader class, the programmer decided this method is irrelevant and proceeded to inspect the importFile (0.13) method in the TextAreaTransferHandler class. After the inspection, he concluded the method is the location of the concept. 3.2. Tool support We implemented a tool support for DepIR in Java programs in JRipples, an interactive Eclipse plug-in that supports IR operations based on Lucene (http://lucene.apache.org), an open-source IR engine [33]. JRipples also builds the MDG and supports its navigation, keeping track of parts of the code the developers inspected. JRipples and its source code are freely available at http:// jripples.sourceforge.net. 4. Evaluation We conducted an exploratory case study where we compared and evaluated the proposed concept location approach DepIR against the basic IR and DepS approaches. Our hypothesis is that using DepIR, concept location can be performed faster than by using IR only or DepS only. 4.1. Design of empirical study Empirical studies of concept location are impacted by the fact that concept location is just one part of a larger process, i.e. software change. In our empirical studies, we chose to use reenactment – a case study technique, where information about past software changes is extracted from software repositories and the process of these changes is reenacted using the new technique under study, for example DepIR. During reenactment, steps of concept location can be done manually by a human or automatically by an algorithm that simulates

M. Petrenko, V. Rajlich / Information and Software Technology 55 (2013) 651–659

actions of a programmer. Automated reenactment was applied in [10], where several concept location tools were used to produce ranked lists of methods relevant to past change requests in opensource software; best ranks of the methods known to be changed for these requests were used as a measure of effectiveness of the tools. Automatic reenactment allows to collect and analyze many data points, strengthening the validity of our observations. In turn, it introduces some specific threats to validity, which we discuss later. An example of the other variant of reenactment, manual reenactment, is in [9]. In, it, programmers recorded knowledge that they used while manually locating concepts for a set of change requests already implemented in Mozilla. Reenactment is not unique to concept location. Examples of reenactment can be also found in the research on impact analysis [32,34–36] and software development [37]. The word ‘‘reenactment’’ originates from [37]; other papers use a different terminology for the same idea. Previous work on concept location used the number of methods a developer needs to inspect before locating a concept as a measure of effort [10,38]. We adopt this measure in our study because it is a proven measure and can be used in the combination with automatic reenactment. In our study, we used automated reenactment to compare DepIR to IR and the DepS, evaluating the potential performance of each technique in 50 cases (i.e., 10 change requests in five systems), based on historical information extracted from software repositories. For each case, we randomly selected a change request (e.g., bug descriptions or feature requests) such that we can identify (based on the patch, release notes, or commit comments) its corresponding change set – the exact methods in source code that were changed to satisfy this request. We used the change requests as the input to a specially designed algorithm that simulates user’s actions at each step of the concept location process, while using a particular concept location technique. After this, the change sets were used to verify and assess the results produced by the algorithm for each of the cases and each of the studied techniques. It is important to note that the reenactment algorithm simulated ‘‘perfect’’ users, meaning that at each step in the process it makes the choice that would find the target method fastest. Therefore, the results of the simulation indicate a best case scenario estimate of the effort needed to locate the target method for each change request with each technique.

4.2. Objects of the study Our goal for selecting objects of the study was to identify systems of considerable size and complexity such that concept location would not be trivial. We selected five open source software systems from the popular repository Sourceforge.com. All of the selected systems use bug trackers, and their commit comments include the IDs of the bugs that were fixed, which allowed us to determine the change sets for each of those bugs.jEdit (http:// sourceforge.net/projects/jEdit/) is a text and source code editor. DrJava (http://sourceforge.net/projects/drjava/) is a lightweight Java IDE. JabRef (http://sourceforge.net/projects/jabref/) is a bibliographical manager. Adempiere (http://sourceforge.net/projects/ Table 1 Properties of the systems used in the study. Software

Classes

Methods

Revisions total

Revisions with bug ID

Adempiere DrJava JabRef jEdit Megamek

4116 1147 632 536 1268

74,930 20,193 5768 7296 12,793

5581 3197 2300 3796 5910

954 440 35 500 1664

655

adempiere/) is an enterprise resource planning application. Megamek (http://sourceforge.net/projects/megamek/) is an interactive game. Table 1 summarizes the main properties of these systems. For each of the five systems, we selected 10 change requests, see the summary in Table 3. 4.3. Collected data In this study, for each bug in the five systems, we considered the available bug description as the starting point. We extracted the change set for each bug fix from the repository and extracted a snapshot of the system prior to the bug fix. The goal of concept location in each of these cases is to find one of the methods in the change set. Since repositories do not store information regarding which of the modified methods was the one identified during concept location (i.e., the first to change), we repeated our reenactment iteratively, treating each of the modified methods as the target. We assessed the efficiency of the studied techniques for a particular revision based on the smallest number of methods required by these techniques to locate each of the targets. We computed this number based on the reenactment summarized in Fig. 2; the following explains the reenactment in details. 4.3.1. DepS DepS technique calls for a starting point. In the absence of any change specific information, the obvious start is the main() method. We adopted this strategy and, in order to simulate user choice during DepS, we computed the shortest path in MDG from main() to the target method. The length (n) of the path is the effort measure we report for this technique. The shortest path was determined using the Dijkstra’s algorithm [39]. In other words, an ideal user would start the search on the MDG at main() and he would have to inspect n methods before finding the target method. In order to assess how different is the ideal case from the average case, we recorded statistics about the number of neighbors of the methods on the shortest path (see Table 3). The idea is that the more neighbor these methods have, the harder concept location is, as it may require inspecting more methods to decide which path to take. 4.3.2. IR IR technique requires the formulation of a textual query. In the absence of any additional information, one can use the change request description as the initial query. We used bug description as the initial query. Since formulating subsequent queries requires user intervention, we did not reformulate the query during the simulation. We computed the rank of the target methods in the ordered list of results retrieved by the IR engine. This is equal to the number of methods that would need to be investigated by traversing the ranked list of methods to find the target. This is the effort measure we collect in this case. 4.3.3. DepIR In the case of DepIR, we use the same initial query and the ranked list of retrieved methods as for IR. We identified the 10 methods with the highest ranks, treating them as the possible entry points for the DepS. For each of these entry points, we identified the shortest path [39] in the MDG to the target method. We determined the effort for each entry point by adding its rank to the number of the edges on its shortest path to the target method. The minimum of the effort measure for the 10 starting points represents the minimum effort needed to locate the concept using DepIR. Once again, we did not reformulate the query. As in the case of DepS, we recorded statistics about the number of neighbors of the methods on the shortest path (see Table 3). The assumption and bias here is the same as in the case of DepS.

656

M. Petrenko, V. Rajlich / Information and Software Technology 55 (2013) 651–659

M

(a) Dependency search

11

12

2

11

12

2

3

7

10

3

7

10

1

8

6

1

8

6

9

4

5

9

4

5

(b) Information retrieval

(c) DepIR (dependency search started in the node with rank 2)

Fig. 2. Reenactment of concept location techniques. The gray shading indicates methods inspected to locate the concept, black shading represents a method containing the concept, numbers are IR ranks of the methods, bold edges are dependencies followed while exploring the dependency graph, and ‘‘M’’ marks the main() method. (a) Dependency search is reenacted as the shortest path in the dependency graph between the main() method and the method containing the concept. (b) Reenactment of IR explores methods in the order suggested by their ranks until the concept is located. (c) Reenactment of DepIR inspects methods in the order suggested by their ranks until a method is selected as the starting point of dependency search; after this, the shortest path is computed from this starting method to the concept.

Table 2 Aggregate results for the five systems. Statistics

Inspected methods

Average Median StDev

DepS

IR

DepIR

4.3 4 0.97

190 10 647

2.98 3 1.23

4.4. Results and discussion Table 3 summarizes the results for each technique and change, while Table 2 presents aggregate results for all the 50 changes. Fig. 3 shows the first four columns of Table 3 in graph form. We analyzed the average effort for each of the compared approaches. Since we cannot assume that the population of our measurements follows the normal distribution, we used the nonparametric Wilcoxon matched pairs test [40]. We found a support for the following observations. On average, DepIR required a significantly (W = 0, p < 0.001) smaller effort than the pure IR (3 vs 10 median, 2.98 vs. 190 average). Since DepIR and IR used the same set of queries, whenever the target method retrieved the highest rank (in six cases out of 50), both techniques performed the same. On the other hand, DepIR performed well even in the cases, where the performance of

IR was less than perfect, indicating DepIR is a more robust heuristic. Particularly, even though in 26 cases IR required inspecting no more than 10 methods, there were eight cases where IR necessitated inspecting over 100 methods (Table 3); on the other hand, DepIR never required inspecting more than five methods. The closer examination of these outlier cases showed that their change requests were formulated either in terms that are used extensively throughout the code, or not used in the code at all. For example, the change request for revision 7074 in jEdit contains such common for this text editor terms as ‘buffer’, ‘character’, ‘menu’, etc.; as the result, when using this change request, IR ranks high many methods that contain these terms, but are not relevant to the target concept. These observations indicate that DepIR is especially useful in the cases when it is difficult to formulate a good initial query, and also when the vocabulary of the system is not suitable for IR (e.g., in the presence of synonyms, shared terms, etc.): as long as the top IR results are not far in the MDG from the target method, developers will still locate the target method quickly, relying on the navigation ability of DepIR. In other words, when using DepIR, developers may spend less effort and attention formulating ‘‘perfect’’ queries. The improvement over DepS is less impressive (3 vs 4 median, 2.98 vs. 4.3 average), however is statistically significant (W = 112, p < 0.001). DepS outperforms DepIR by at most two methods in

10000

Adempiere 1000

100

Megamek

DrJava

JabRef

jEdit

Dep Search IR DepIR

10

7387 7630 8052 8136 8189 8228 8275 8739 9289 9400 5085 5573 5583 5650 5656 5700 5912 5934 6459 6473 1294 1578 4305 4637 4744 4810 4927 4928 4966 838 1002 1336 1698 1728 1737 1745 1812 1831 1909 2383 11176 12522 13899 14898 15732 5312 5571 6686 7074 7629

1

Fig. 3. Plot with effort data for the 50 changes and the three techniques.

657

M. Petrenko, V. Rajlich / Information and Software Technology 55 (2013) 651–659 Table 3 Case study results. Revision

Inspected methods

Neighbors on DepS Search path

Neighbors on DepIR search path

DepS

IR

DepIR

Min

Max

Mean

Median

StDev

Min

Adempiere 7387 7630 8052 8136 8189 8228 8275 8739 9289 9400

5 6 5 5 5 5 5 5 5 5

13 21 3 2 7 1 6 7 4 1

4 5 3 2 3 1 3 5 4 1

2 2 2 2 2 2 2 2 2 2

107 2793 140 1156 54 1128 610 414 1049 97

35 572 58 306 22 298 169 112 279 33

17 8 46 33 17 32 33 17 33 17

34.8 571.8 57.8 305.8 21.8 297.8 168.8 111.8 278.8 32.8

1 1 Located Located 28 Located 5 2 39 Located

DrJava 1294 1578 4305 j4637 4744 4810 4927 4928 4966 838

4 4 4 3 3 6 4 4 3 5

2 44 53 3 4 585 4 2 11 2108

2 5 5 3 3 2 3 2 5 5

10 3 7 8 7 3 8 7 8 3

45 111 188 228 309 116 248 237 14 72

24 42 68 118 158 29 89 86 11 22

17 13 9 118 158 8 11 16 11 7

24.0 42.0 68.0 118.0 158.0 28.6 89.0 86.0 11.0 21.8

Located with IR part only 3 111 33 2 14 9 Located with IR part only 5 6 5 6 6 6 2 4 3 3 3 3 7 123 39 3 72 21

JabRef 1002 1336 1698 1728 1737 1745 1812 1831 1909 2383

3 5 5 4 4 4 5 4 4 3

1 3 99 11 3 14 27 6 53 104

1 3 4 3 2 3 4 4 3 2

17 1 1 1 1 1 1 1 1 1

47 82 405 91 411 413 55 424 58 15

32 29 106 35 142 142 19 146 24 8

32 17 10 14 14 14 10 15 15 8

32.0 28.8 105.8 34.7 141.7 141.7 18.8 145.7 23.7 7.5

Located with IR part only 9 88 48 6 405 140 5 409 207 22 22 22 10 92 51 5 95 35 1 8 4 9 9 9 15 15 15

jEdit 11,176 12,522 13,899 14,898 15,732 5312 5571 6686 7074 7629

3 3 5 6 3 3 3 3 4 3

1 1014 10 17 253 416 1 14 3997 485

1 5 2 2 5 3 1 4 3 4

49 7 14 4 53 42 9 42 11 46

274 49 102 107 107 239 44 237 248 252

161 28 51 38 80 140 26 139 100 149

161 28 45 15 80 140 26 139 42 149

161.0 28.0 51.0 37.8 80.0 8.4 26.0 139.0 100.0 149.0

Located with IR part only 1 287 77 4 4 4 23 23 23 2 107 34 3 4 3 Located with IR part only 3 237 82 6 11 8 10 77 36

Megamek 5085 5573 5583 5650 5656 5700 5912 5934 6459 6473

5 3 5 3 5 5 5 5 6 5

3 2 4 2 28 31 10 11 1 20

3 2 3 2 2 3 2 3 1 3

13 13 7 13 13 13 13 13 2 13

45 48 100 48 168 101 314 314 75 359

28 30 33 30 68 51 98 98 31 109

27 30 12 30 46 46 33 33 14 33

28.0 30.0 33.0 30.0 68.0 51.0 98.0 98.0 30.8 109.0

Located Located 38 Located 61 15 13 7 Located 51

seven out of 50 cases, with three cases in DrJava and four in jEdit. A closer analysis of the case study logs showed that the main() method of jEdit resides in the same class as a collection of methods that provide a centralized access to the common libraries and preferences of jEdit. Because of this design, the client methods that call the library and preference methods are very close in MDG to main(). The outlier cases required modification of such client methods, resulting in the observed performance of DepS. We also found a similar design in DrJava, where the main() method is used to load common libraries and initialize program preferences; this design peculiarity also resulted in the outlier cases.

Max

6650 6664 with IR with IR 2836 with IR 331 27 1049 with IR

with IR with IR 268 with IR 61 70 13 77 with IR 359

Mean

part part part

part

2218 2378 only only 1432 only 168 10 544 only

part only part only 153 part only 61 42 13 42 part only 205

Median

StDev

4 1424

2217.7 2377.8

1432

1432.0

168 7 544

168.0 9.5 544.0

9.5 11

32.8 8.8

5.5 6 3 3 13.5 5

5.0 6.0 3.0 3.0 39.0 20.8

48.5 10 207 22 51 7 5 9 15

48.0 140.0 207.0 22.0 51.0 35.0 3.7 9.0 15.0

11.5 4 23 14.5 3.5

76.8 4.0 23.0 33.8 18.0

8 8.5 21

82.0 8.0 36.0

153

153.0

61 42.5 13 42

61.0 42.0 13.0 42.0

205

205.0

Based on our observations, we argue that DepIR has a bigger advantage over DepS in well-structured systems that have a more complex (i.e., less shallow) dependency graph. Furthermore, in 25 out of 50 cases, methods on the search paths of DepIR had on average fewer neighbors than on the search paths of DepS; additionally, in 13 cases, DepIR efficiently located relevant methods using IR only and did not require inspecting dependency graph. Since, while exploring MDG, programmers might have to inspect some irrelevant neighbors before finding a neighbor that leads to the concept, the reduced number of the neighbors on the search paths in DepIR reflects additional benefits in the effort of CL offered by this technique.

658

M. Petrenko, V. Rajlich / Information and Software Technology 55 (2013) 651–659

When developers have good knowledge of the system, they usually can write good queries and the textual search gets them quickly to the desired part of the code. Based on the empirical work we observed we speculate that DepIR pays off when the user query is not too good and instead of returning the target methods on top of the list, other methods are ranked higher. In these situations, navigating the dependencies provides shortcuts through the search results and it leads to a more efficient concept location. 4.5. Threats to validity In this study, we evaluated the performance of the three concept location approaches based on automated reenactment of past changes. Similarly to other empirical techniques, reenactment has its own advantages, disadvantages, and biases that affected results of our study and may limit interpretation of its results. Unlike the studies that examine concept location by observing programmers, automated reenactment does not require human participants and, in this way, it lowers threats to internal validity associated with personal skills and preferences of subject programmers. These include familiarity with certain concept location techniques, programming experience, degree of understanding of the change requests and object software, and so forth [9,41,42]. Further, studies using automated reenactment are not biased by program learning, where knowledge learned by programmers while working with one change request affect their work with other change requests [9]. On the other hand, reenactment involves a threats to the construct validity because it always assumes the best choices made by the simulated users, while the real programmers often make less than perfect choices. Further, in the case study, most of the change requests modified more than one method. SVN repositories do not provide information on which of the fixed methods was located during the original concept location, nor do they provide the order of fixing; also, the repositories do not record the classes that were inspected by the programmers, but not modified. Therefore, we measured the effort needed by the studied techniques by modeling the ideal concept location process, where the simulation algorithm was required to locate any of the changed methods by inspecting the minimal number of methods. Moreover, in reality, the programmers might be less accurate in recognizing relevant neighbors while exploring dependency graph, selecting the shortest path to the concept, choosing staring point of DepS, and so forth; this may result in an effort of concept location that is higher than the measured minimum. Additionally, the programmers may choose to compose different search queries or to refine their queries during the search; as these queries can be more or less accurate than the change request-based queries used in the study, the effort of concept location using these queries may also differ. The JRipples tool used in the study extracts only certain types of dependencies. Capturing additional dependencies (e.g., dependencies defined in text files, database dependencies, reflection, etc.) might further improve the effectiveness of DepS and DepIR. The IR and DepIR techniques relied on the text of change requests, assuming this text provides a correct and accurate description of the required updates. Inaccuracies in the text may decrease the relevance of the ranking and, consequently, increase the effort of IR and DepIR. Nonetheless, as it was observed in the study, the quality of the queries had only a minor, if any, effect on the performance of DepIR. Furthermore, for the reenactment of DepIR, we allowed only the 10 highest-ranked methods to be selected as starting points of DepS. However, since any starting method with a rank below 10 would require inspecting more than 10 methods, and since in all

studied cases DepIR required inspecting no more than five methods, we conclude that this assumption had no effect on the results of the study. We assessed the effort required to locate the concepts by recording the number of inspected methods, which may pose a threat to external validity. Depending on the goals of a particular maintenance process, other measures of effort, such as the time that programmers needed to locate the concepts, might provide a more suitable insight into the difference between the compared techniques. Furthermore, different change requests and software systems might require a different effort to locate the concepts. A specific software system might have classes with more or less descriptive terms, which can impact the effectiveness of both IR and DepIR. In the studies, all of the selected change requests described bugs – program functionalities that can be viewed as unwanted features [10,38]. Other types of changes, such as adding a new functionality or modifying an existing functionality, may require a different effort. Furthermore, the selected change request described explicit features. We expect that DepS and DepIR will have an advantage over IR when locating implicit features, since such features are implied but not explicitly expressed in the code, and, therefore, can be hard or impossible to locate by querying source code for specific terms [43]. The systems we studied are open source and may not be representative of software developed by other processes. However, it should be noted that, for example, Megamek was developed by 29 programmers and 75% of its code was contributed by a core of five developers, where each of these core developers contributed over 5% of the total code. Adempiere was developed by 53 developers and the core of 6 developers contributed 80% of the total code base, where each of the core developers contributed over 5% of the code. In this way, Adempiere and Megamek display the characteristics similar to industrial software, where the bulk of the code is developed by a relatively small and stable team of developers; this is different from a more typical open-source software process, where the code is produced by large and spontaneous groups of independent contributors [44,45]. 5. Conclusions and future work In this paper, we proposed and discussed DepIR, a concept location technique that combines program dependency search and Information Retrieval based search approaches. Our case study indicates that concept location using DepIR requires a significantly smaller effort than the concept location using DepS or IR alone. Furthermore, the comparative analysis of results of the concept location using IR and DepIR indicates that DepIR allows finding concepts even with queries that do not rank the relevant methods highly. Since formulating a good query is not always trivial, this tolerance of lower-quality queries significantly broadens the usability of DepIR. In the future, we plan to investigate which specific software characteristics make DepIR a better choice for concept location technique compared to other concept location techniques. We would also like to investigate whether the concept location process can be further improved by combining DepIR with the recent techniques that, based on IR, generate a sub-graph of dependency graph as discussed in Section 4, i.e., [17–21]. Acknowledgments We would like to thank Andrian Marcus and Denys Poshyvanyk for their extensive input on IR techniques and help with the early versions of this paper. This work was partially supported by the

M. Petrenko, V. Rajlich / Information and Software Technology 55 (2013) 651–659

Grant CCF-0820133 from the National Science Foundation (NSF). Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF. References [1] N. Wilde, J.A. Gomez, T. Gust, D. Strasburg, Locating user functionality in old code, in: IEEE International Conference on Software Maintenance (ICSM’92), Orlando, FL, 1992, pp. 200–205. [2] T.J. Biggerstaff, B.G. Mitbander, D.E. Webster, The concept assignment problem in program understanding, in: 15th IEEE/ACM International Conference on Software Engineering (ICSE’94), 1994, pp. 482–498. [3] V. Rajlich, Software Engineering: The Current Practice, Chapman and Hall/CRC, 2011. [4] V. Rajlich, Changing the paradigm of software engineering, in: Communications of ACM, 2006, pp. 67–70. [5] S. Bohner, R. Arnold, Software Change Impact Analysis, IEEE Computer Society, Los Alamitos, CA, 1996. [6] B. Dit, M. Revelle, M. Gethers, D. Poshyvanyk, Feature location in source code: a taxonomy and survey, Journal of Software Maintenance and Evolution: Research and Practice , 2011, http://dx.doi.org/10.1002/smr.567. [7] A. Marcus, V. Rajlich, J. Buchta, M. Petrenko, A. Sergeyev, Static techniques for concept location in object-oriented code, in: 13th IEEE International Workshop on Program Comprehension (IWPC’05), 2005, pp. 33–42. [8] A.V. Aho, Pattern matching in strings, in: Formal Language Theory: Perspectives and Open Problems, Academic Press, New York, 1980, pp. 325– 347. [9] M. Petrenko, V. Rajlich, R. Vanciu, Partial domain comprehension in software evolution and maintenance, in: IEEE International Conference on Software Comprehension, 2008, pp. 13–22. [10] D. Poshyvanyk, Y.G. Guéhéneuc, A. Marcus, G. Antoniol, V. Rajlich, Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval, IEEE Transactions on Software Engineering 33 (2007) 420–432. [11] A. Marcus, A. Sergeyev, V. Rajlich, J. Maletic, An information retrieval approach to concept location in source code, in: 11th IEEE Working Conference on Reverse Engineering (WCRE’04), Delft, The Netherlands, 2004, pp. 214–223. [12] K. Chen, V. Rajlich, Case study of feature location using dependency graph, in: IEEE International Workshop on Program Comprehension, IEEE Computer Society, 2000, pp. 241–249. [13] W. Zhao, L. Zhang, Y. Liu, J. Sun, F. Yang, SNIAFL: towards a static noninteractive approach to feature location, ACM Transactions on Software Engineering and Methodologies (TOSEM) 15 (2006) 195–226. [14] P. Shao, R.K. Smith, Feature location by IR modules and call graph, in: Annual Southeast Regional Conference, 2009. [15] S. Hayashi, K. Sekine, M. Saeki, iFL: an interactive environment for understanding feature implementations in: IEEE International Conference on Software Maintenance, 2010, pp. 1–5. [16] E. Hill, L. Pollock, K. Vijay-Shanker, Investigating how to effectively combine static concern location techniques, in: 3rd International Workshop on SearchDriven Development: Users, Infrastructure, Tools, and, Evaluation, 2011. [17] M. Eaddy, A.V. Aho, G. Antoniol, Y.-G. Gueheneuc, CERBERUS: tracing requirements to source code using information retrieval, dynamic analysis, and program analysis, in: IEEE International Conference on Program Comprehension, 2008, pp. 53–62. [18] M. Revelle, D. Poshyvanyk, An exploratory study on assessing feature location techniques, in: IEEE International Conference on Program Comprehension, 2010, pp. 218–222. [19] S. Ratanotayanon, H.J. Choi, S.E. Sim, Using transitive changesets to support feature location, in: IEEE/ACM International Conference On Automated Software Engineering, 2010, pp. 341–344. [20] S. Hayashi, T. Yoshikawa, M. Saeki, Sentence-to-code traceability recovery with domain ontologies, in: Asia Pacific, Software Engineering Conference, 2010, pp. 385–394. [21] D. Shepherd, Z. Fry, E. Gibson, L. Pollock, K. Vijay-Shanker, Using natural language program analysis to locate and understand action-oriented concerns, in: International Conference on Aspect Oriented Software, Development (AOSD’07), 2007, pp. 212–224.

659

[22] N.E. Gold, M. Harman, D. Binkley, R.M. Hierons, Unifying program slicing and concept assignment for higher-level executable source code extraction, Journal of Software Practice and Experience 35 (2005) 977–1006. [23] R. Al-Ekram, K. Kontogiannis, Source code modularization using lattice of concept slices, in: the Eighth Euromicro Working Conference on Software Maintenance and Reengineering, 2004, pp. 195–203. [24] N.E. Gold, M. Harman, Z. Li, K. Mahdavi, Allowing overlapping boundaries in source code using a search based approach to concept binding, in: IEEE International Conference on Software, Maintenance, 2006, pp. 310–319. [25] E. Hill, L. Pollock, K. Vijay-Shanker, Exploring the neighborhood with dora to expedite software maintenance, in: IEEE/ACM International Conference on Automated Software Engineering, 2007, pp. 14–23. [26] A.J. Ko, B.A. Myers, M.J. Coblenz, H.H. Aung, An exploratory study of how developers seek, relate, and collect relevant information during software maintenance tasks, IEEE Transactions on Software Engineering (TSE) 32 (2006) 971–987. [27] J. Sillito, G.C. Murphy, K. De Volder, Asking and answering questions during a programming change task, IEEE Transactions on Software Engineering 34 (2008) 1–18. [28] J. Sillito, K. De Volder, B. Fisher, G.C. Murphy, Managing software change tasks: an exploratory study, in: International Symposium on Empirical Software Engineering (ISESE 2005), Noosa Heads, Australia, 2005, pp. 23–32. [29] G. Marchionini, Information Seeking in Electronic Environments, Cambridge University Press, Cambridge, United Kindom, 1997. [30] M.P. Robillard, W. Coelho, G.C. Murphy, How effective developers investigate source code: an exploratory study, IEEE Transactions on Software Engineering 30 (2004) 889–903. [31] T. Khun, O. Thomann, Abstract Syntax Tree, in: Eclipse Corner Articles, 2007. [32] M. Petrenko, V. Rajlich, Variable granularity for improving precision of impact analysis, in: IEEE International Conference on Program Comprehension, 2009, pp. 10–19. [33] D. Poshyvanyk, A. Marcus, Y. Dong, JIRiSS – an eclipse plug-in for source code exploration, in: 14th IEEE International Conference on Program Comprehension (ICPC’06), Athens, Greece, 2006, pp. 252–255. [34] A.E. Hassan, R.C. Holt, Replaying development history to assess the effectiveness of change propagation tools, Empirical Software Engineering 11 (2006) 335–367. [35] G. Canfora, L. Cerulo, Impact analysis by mining software and change request repositories, in: 11th IEEE International Symposium on Software Metrics, 2005, pp. 9–29. [36] G. Antoniol, G. Canfora, G. Casazza, A. De Lucia, Identifying the starting impact set of a maintenance request: a case study, in: European Conference on Software Maintenance and Reengineering, IEEE Computer Society, 2000, pp. 227–230. [37] C. Jensen, W. Scacchi, Discovering, modeling, and reenacting open source software development processes, new trends in software process modeling, Series in Software Engineering and Knowledge Engineering 18 (2006) 1–20. [38] D. Liu, A. Marcus, D. Poshyvanyk, V. Rajlich, Feature location via information retrieval based filtering of a single scenario execution trace, in: 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE’07), Atlanta, Georgia, 2007, pp. 234–243. [39] E.W. Dijkstra, A note on two problems in connexion with graphs, Numerische Mathematik 1 (1959) 269–271. [40] W.J. Conover, Practical Nonparametric Statistics, third ed., Wiley, New York, NY, 1999. [41] K.B. McKeithen, J.S. Reitman, H.H. Rueter, S.C. Hitle, Knowledge organisation and skill differences in computer programmers, Cognitive Psychology 13 (1981) 307–325. [42] D.N. Perkins, F. Martin, Fragile knowledge and neglected strategies in novice programmers, in: Empirical Studies of Programmers, 1986, pp. 213–229. [43] V. Rajlich, Intension are a key to program comprehension, in: IEEE International Conference on Program Comprehension (ICPC ‘09), 2009, pp. 1–9. [44] H.K. Wright, M. Kim, D.E. Perry, Validity concerns in software engineering research, in: FSE/SDP Workshop on Future of Software Engineering Research, 2010, pp. 411–414. [45] A. Mockus, R.T. Fielding, J.D. Herbsleb, Two case studies of open source software development: Apache and Mozilla, ACM Transactions on Software Engineering and Methodology 11 (2002) 309–346.