Software rejuvenation via a multi-agent approach

Software rejuvenation via a multi-agent approach

The Journal of Systems and Software 104 (2015) 41–59 Contents lists available at ScienceDirect The Journal of Systems and Software journal homepage:...

5MB Sizes 0 Downloads 82 Views

The Journal of Systems and Software 104 (2015) 41–59

Contents lists available at ScienceDirect

The Journal of Systems and Software journal homepage: www.elsevier.com/locate/jss

Software rejuvenation via a multi-agent approach Heliomar Santos a, João Felipe Pimentel a, Viviane Torres Da Silva b,#,∗, Leonardo Murta a a b

Universidade Federal Fluminense (UFF), Av. Gal. Milton Tavares de Souza, s/n°, Niterói, RJ, 24210-240, Brazil IBM Research, Av. Paster 138, Urca, RJ, 22290-240, Brazil

a r t i c l e

i n f o

Article history: Received 12 February 2014 Revised 5 February 2015 Accepted 9 February 2015 Available online 16 February 2015 Keywords: Software rejuvenation Refactoring Multi-agent systems

a b s t r a c t Usually, development teams devote a huge amount of time and effort on maintaining existing software. Since many of these maintenance tasks are not planned, the software tends to degrade over time, causing side effects mainly on its non-functional requirements. This paper proposes the use of a multi-agent system in order to perform perfective maintenance tasks in a software product through refactorings. The software developer chooses the quality attribute that the agents should improve and the agents are able to autonomously search the code for opportunities to apply perfective maintenance, apply the perfective maintenance, and evaluate if the source code quality has been improved. Its main contributions are: (i) the refactorings are autonomously done by software agents during the idle development time; (ii) all changes are stored in isolated branches in order to facilitate the communication with the developers; (iii) the refactorings are applied only when the program semantics is preserved; (iv) the agents are able to learn the more suitable sequence of refactorings to improve a specific quality attribute; and (v) the approach can be extended with other metrics and refactorings. This paper also presents a set of experimental studies that provide evidences of the benefits of our approach for software rejuvenation. © 2015 Elsevier Inc. All rights reserved.

1. Introduction A robust software design is expected to embrace changes, supporting consecutive requirement evolutions and other modifications. However, there are internal and external factors that may contribute to the degradation of the software design quality over time (Gurp and Bosch, 2002). Amongst these factors, two are of special interest for the context of this work: • Labor turnover: During the lifetime of software, people get retired, fired, hired, and, in some times, the original development team is completely replaced by a brand new team, leading to loss of organizational memory. • Tight schedule: When software is being developed and maintained under pressure, usually due to poorly estimated deadlines, time to market suppresses quality, causing technical debts (Kerievsky, 2004) in the product.



Corresponding author. Tel.: +55 21981267377. E-mail addresses: [email protected] (H. Santos), [email protected] (J.F. Pimentel), [email protected], [email protected] (V. Torres Da Silva), [email protected] (L. Murta). # On leave from Universidade Federal Fluminense (UFF), Niterói, Rio de Janeiro, 24210-240, Brazil. http://dx.doi.org/10.1016/j.jss.2015.02.017 0164-1212/© 2015 Elsevier Inc. All rights reserved.

These factors altogether contribute over time to the software design degeneration, even if best practices are in place. The resulting effect is that after patching a software product for a long time, the incorporation of new features becomes harder (Pressman, 2005). This is the most visible result of software aging (Parnas, 1994). The usual strategy to fight software aging, besides applying best practices during development, is performing perfective maintenance (ISO, 2001) from time to time. However, perfective maintenance usually demands high product knowledge and high effort, which go against labor turnover and tight schedule, forcing companies to postpone it indefinitely. In addition, performing perfective maintenance by hand is an error-prone and expensive activity (Murphy-Hill et al., 2012). It is worth mentioning that around 250 billons of lines of code were under maintenance in 2000 (Sommerville, 2006). Moreover, maintenance is responsible for 90% of the cost (Erlikh, 2000) and 60% of the effort (Pressman, 2005) of the software lifecycle. However, only 21% of the maintenance effort is devoted to bug fixing (Bennett and Rajlich, 2000). This means that even high quality systems are subject to considerable investments during maintenance. This scenario becomes even worse if we consider the current practices of software development, such as software ecosystem (Messerschmitt and Szyperski, 2005), which lead to huge code bases, early and often releases (Raymond, 2001), which anticipate maintenance activities, and offshore outsourcing (Sengupta et al., 2006), which leads to geographically distributed teams.

42

H. Santos et al. / The Journal of Systems and Software 104 (2015) 41–59

As a response to this problem, several researches focus on automatic ways to perform perfective maintenance, usually via refactoring, which is a technique to improve software code quality without changing its observable behavior (Fowler et al., 1999). They contribute by identifying refactoring opportunities (Sahraoui et al., 2000; Simon et al., 2001; Tourwé and Mens, 2003; O’Keeffe and Ó Cinnéide, 2008; Abdeen et al., 2013; Ouni et al., 2013), measuring the improvement provided by refactoring the code base (Chidamber and Kemerer, 1994; Mancoridis et al., 1998; Bansiya and Davis, 2002; Harman and Clark, 2004; O’Keeffe and Ó Cinnéide, 2006; Romano and Pinzger, 2011; Abdeen et al., 2013), and identifying the refactoring strategies that better improve specific source code metrics (Sahraoui et al., 2000; Du Bois et al., 2004). However, most approaches (Wiggerts, 1997; Mancoridis et al., 1998; Doval et al., 1999; Mancoridis et al., 1999; Harman et al., 2002; Seng et al., 2006; Harman and Tratt, 2007; Abdeen et al., 2009; Bavota et al., 2010; Räihä, 2010; Praditwong et al., 2011; Bavota et al., 2012; de Barros, 2012; Ouni et al., 2012; Abdeen et al., 2013) are classified as indirect (Harman and Tratt, 2007), as they extract models from source code, work over these models, and suggest a sequence of refactorings without applying them. The few direct approaches (Ryan, 1999; O’Keeffe and Ó Cinnéide, 2006; O’Keeffe and Ó Cinnéide, 2008), which in fact apply the refactorings over the source code, suffer from three main problems: communication of changes in the design to the developers, scalability, and preservation of the program semantics. These problems are worse in the aforementioned context, with large systems being developed by distributed teams. An approach that automatically applies refactorings to the code and does not communicate the changes to the developers turns not only all documentation about the software obsolete but also the knowledge the developers have about it. Moreover, an approach that does not preserve the program semantics is not reliable. After applying the refactorings, the software observable behavior must be exactly the same as before (Fowler et al., 1999). In addition, the approach must be scalable to perform safe refactorings (i.e., refactorings that do not change the program semantics) in big projects. Considering a company that used to postpone perfective maintenance tasks indefinitely due to labor turnover and tight schedule, an approach that performs those perfective maintenance tasks safely, no matter the size of the project, would be valuable. Big projects usually require more people to work on than small projects. As people only work on part of the day and development machines are usually not shared between different people, those machines remain idle the rest of the time. So, the bigger the project, the more idle machines a company has. The computational power of those machines could be used in parallel to scale approaches that perform safe refactorings on big projects. Therefore, in this paper we introduce a new approach, named Peixe-Espada, which consists on a multi-agent system (MAS) (Bond and Gasser, 1988) that runs during idle development time to perform autonomous perfective maintenance tasks in the software product via refactorings. The software developer chooses the quality attribute he/she wants to improve and the agents autonomously search the code for opportunities to apply perfective maintenance tasks, apply the perfective maintenance tasks, and evaluate if the source code quality has been improved. In our approach, all configurations that have improved the source code according to a quality attribute are automatically stored in an isolated branch of the configuration management (CM) repository. Such strategy makes it easier to the development team to figure out what happened (via diff) and to decide if the changes should be incorporated to the mainline (via merge). The MAS uses the feedback about the source code quality improvement (or not) to learn the most suitable sequence of refactorings to improve each quality attribute. The use of a MAS approach allows for a massive parallelism over idle development computers (standard PCs), which decreases the time spent on

making perfective maintenance. Therefore, the main characteristics of the proposed architecture are: • Autonomy: Peixe-Espada is able to autonomously perform perfective maintenance by employing software agents. Since agents autonomously execute the perfective maintenance, developers are not overloaded with such task and can spend their time on tasks that cannot be automated. • Distribution: Peixe-Espada is based on a MAS distributed solution that uses the power of the available computers to execute the perfective maintenance tasks. Several perfective improvements over the existing source code can be done in parallel, decreasing the time spent on making perfective maintenance. • Learning: Peixe-Espada behavior can be improved over time since it receives feedback about the sequence of refactorings that is more suitable to improve a specific quality attribute. Peixe-Espada evaluates the feedback of applying the refactorings and reconfigures the sequence of refactorings to be applied in future perfective maintenance tasks. • Isolation: The refactorings made in the source code are stored in isolated branches and the development team can choose to apply them or not to the mainline. • Idle computers exploration: The perfective maintenance is done during the idle development time. Therefore, the time spent during the maintenance does not increase the time on developing the software but only its quality. • Extensibility: Peixe-Espada works with a catalog of metrics and refactorings that can be easily extended in the future. The current version counts on six metrics, three quality attributes (composite metrics), and five refactorings. • Semantics preservation: Peixe-Espada applies refactorings only when the program semantics is preserved. Any situation that may change the program semantics is summarily discarded. Peixe-Espada was evaluated with four different projects, with different sizes. The autonomy of Peixe-Espada is demonstrated by a set of software agents able to apply the refactoring on each project by improving three quality attributes without the need to interact with developers. The distribution ability of Peixe-Espada was evaluated by comparing the performance of four computers executing in parallel against one when applying refactorings on each project. We demonstrate that the distributed approach with four computers was in average 3.77 times faster. In order to evaluate the ability of Peixe-Espada to learn, we explore the feedback provided by the agents that have applied the refactoring. We demonstrate that the sequence of refactorings suggested by the learning mechanism almost always provided better results than the execution of random sequence of refactorings. In order to demonstrate that Peixe-Espada works in isolation and does not directly changes the current version of the project, we show that different refactored branches are created when refactoring the projects and present some examples of refactorings committed to branches in a given project. In addition, since it is possible to define the time periods that Peixe-Espada should run, we show that it is able to run on idle computers. Finally, the extensibility and semantic preservation features are obtained by design, as our approach provides specific interfaces for measurement and refactorings, and adopted the RefactorIt1 library for applying refactorings. In fact, we are currently extending our MAS with other seventeen additional metrics, three additional quality attributes, and six additional refactorings. The remainder of this work is organized as follows: Section 2 presents the Peixe-Espada approach. Section 3 shows some experimental studies performed during the evaluation of our work. Section 4 describes the related work and, finally, Section 5 concludes

1

http://sourceforge.net/projects/refactorit.

H. Santos et al. / The Journal of Systems and Software 104 (2015) 41–59

the paper summarizing the contributions of our approach and discussing some limitations and future work. 2. Peixe-Espada approach The main goal of Peixe-Espada is to autonomously perform perfective maintenance of a project by refactoring the source code aiming at improving some quality attributes. As defined by Fowler et al. (1999): “Refactoring is the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure”. Empirical studies related to the use of refactoring have found that it can improve maintainability (Kolb et al., 2005; Moser et al., 2006). Not only does existing work suggest that refactoring is useful and important, but it also suggests that refactoring is a frequent practice (Murphy-Hill et al., 2012). In order to be able to create an autonomous and distributed solution, we have implemented an architecture that consists of a MAS where agents distributed among several idle computers are able to autonomously identify the parts of the code that should be improved, apply the refactorings, and evaluate if the source code quality has been improved. The software developers are only responsible to select the quality attributes that should be improved, run the MAS, and accept or not the modifications made by the agents. This section is organized as follows. In Section 2.1 we present an overview of the Peixe-Espada architecture. Section 2.2 describes in detail the goals and tasks of the agents that compose our architecture. In Section 2.3 the relationships between the metrics, quality attributes, and refactorings are defined. Section 2.4 discusses about how to achieve massive parallelism via MAS and Section 2.5 presents the implementation details. 2.1. The MAS architecture in a nutshell The Peixe-Espada architecture is composed of two main agent roles: orchestrator agent and worker agent. The orchestrator agent is responsible to improve the project code according to a specific quality attribute. It delegates tasks to worker agents, which are responsible for measuring the code, executing the refactorings, managing the repository, and coordinating themselves. The goal modeling illustrated in Fig. 1 depicts how the goal of the orchestrator agents is decomposed in the goals of the worker agents. The orchestrator agents run on a server (web application) while the worker agents run on the idle developers’ computers (desktop application). After knowing the quality attribute that the developer wants to improve, the first task executed by a worker agent is to checkout the current version of the source code, called current configuration from now on. After that, it asks to the orchestrator agent for a sequence of refactorings and performs the first refactoring in different parts of the current configuration, generating new versions of the source code, called derived configurations from now on. Subsequently, the derived configurations are evaluated in comparison to the current configuration and the information about successful and unsuccessful refactoring trials is sent to the orchestrator agent. By successful refactoring trials we mean the refactorings that were able to improve the desired quality attribute.

43

The orchestrator agent uses such information to learn about the success of the applied refactoring and to improve further rejuvenation cycles. The orchestrator agent is able to identify the refactorings that have the highest probability to improve each specific quality attribute, what alleviates the generation of useless derived configurations in subsequent executions. Moreover, if none of the refactorings improves the current configuration in terms of the selected quality attribute or if the idle period finishes, the work cycle ends. Finally, each successful derived configuration is independently committed to a special branch of the CM repository and the cycle resumes processing the remainder refactorings. The use of branches avoids disrupting the development team with unexpected changes in the source code. The development team can analyze each commit with diff tools, merging them back or discarding some of them. It is important to notice that a commit contains one and only one refactoring trial, together with contextual information, such as the refactoring name and the improved quality attribute. In Fig. 2 we depict the deployment diagram of our approach. As stated before, the orchestrator agent works on a server, together with the Peixe-Espada web application. The orchestrator agent can access a database where the metrics, quality attributes, and information learned about the sequence of refactorings are stored. The worker agents execute in the idle developers’ computers, together with a desktop application, and can access the CM repository to store the successful derived configurations. Section 2.5 details the usage of both web and desktop applications provided with Peixe-Espada. 2.2. Agents goals and tasks The worker agent is specialized as local manager, measurer, refactoring, and configuration manager agents, as illustrated in Fig. 3. In order to make clear the goals and tasks of each agent, Figs. 4–7 illustrate it by using i∗ diagrams (Yu et al., 2011). Note that the orchestrator agent works as a global manager of the whole MAS, while the local manager agent works as a local coordinator in a specific developer’s computer. Local manager is the only worker agent allowed to communicate with the orchestrator agent. In a usual operation cycle, as detailed in Fig. 8, after knowing which quality attribute the developer selected to be improved, each local manager agent requests to the configuration manager agent to checkout the current configuration and asks to the orchestrator agent to provide the metrics to calculate the quality attribute that should be improved. After that, the local manager agent requests the measurer agent to extract the metrics and calculate the quality attribute for the current configuration. It allows the MAS to know the current status of the software under improvement. Currently, Peixe-Espada works with metrics, quality attributes, and normalizations defined by Bansiya and Davis (2002), as described in Section 2.3. However, its design allows for future extensions to accommodate additional metrics and quality attributes, if necessary. Subsequently, the local manager agent asks the orchestrator agent for the refactoring sequence. The orchestrator agent consults a database where it stores information about the improvements made by each refactoring regarding a given quality attribute in past interactions. It calculates the ratio between improved and applied

Fig. 1. Modeling the goal decomposition.

44

H. Santos et al. / The Journal of Systems and Software 104 (2015) 41–59

Fig. 2. Deployment diagram.

Fig. 3. Entity decomposition.

refactorings for each quality attribute. Therefore, it is able to sort the refactoring to be applied when considering each quality attribute. The refactoring that has the highest ratio is the first one to be used. The orchestrator agent has an extra care on sorting the refactorings because the refactoring sequence may affect the results. For example, should Peixe-Espada apply Pull Up Fields and then Encapsulate Fields, a private field with getters and setters would appear in the superclass. However, should Peixe-Espada apply Encapsulate Fields and then Pull Up Fields, a protected field would appear in the superclass and getters and setters would appear in the subclass. Although all refactorings are always provided to the local manager agent, providing them in the correct sequence may speed up the process as a whole. A more detailed example of refactorings selection and refactorings sequence is provided in the end of Section 2.3. After receiving the refactoring sequence, the local manager requests to the refactoring agent to perform the refactoring according to this sequence. The refactoring agent divides the refactoring in two main phases: search and apply. The search phase searches for refac-

toring opportunities, i.e., it searches for elements that present the symptom that should be refactored by walking through the Abstract Syntax Tree (AST) (O’Keeffe and Ó Cinnéide, 2008) extracted from the source code. This search results in sets of programming elements (i.e., packages, classes, methods, attributes, etc.) that are the input to a particular refactoring. Complementary, the apply phase transforms the elements according to some specific rules defined by the refactoring. Table 3 in Section 2.3 presents more details about the search and apply phases. Note that not all detected symptoms lead to successfully applied refactorings due to conflicts that can arise during code rewriting. These conflicts consist on situations where it is not possible to apply the refactoring without causing errors to the source code. However, some conflicts can be resolved by rewriting other parts of the source code. Because of that, the application of a single refactoring over a specific symptom can produce multiple resolutions, culminating in distinct derived configurations. For example, suppose the application of Pull Up Methods refactoring over a given class. All methods of such class that do not overwrite other methods in the hierarchy are considered symptoms for this specific refactoring scenario. For each symptom, at least two resolutions exist: moving only the method signature to the superclass or moving both the signature and method body to the superclass. However, this second resolution is unsuccessful and should be aborted in situations where the method depends on other methods of the class. After that, the local manager agent requests the measurer agent to measure the current quality attribute in all derived configurations produced in the previous step and to compare the best measurement with the measurement of the current configuration. Finally, the local manager agent decides whether to discard the derived configurations in response to an inefficient or faulty refactoring or to promote the

H. Santos et al. / The Journal of Systems and Software 104 (2015) 41–59

45

Fig. 4. Local manager agent.

Fig. 5. Measurer agent.

Fig. 7. Refactoring agent.

2.3. Metrics, quality attributes, and refactorings

Fig. 6. Configuration manager agent.

best one in terms of the measured quality attribute to be the new current configuration. If the local manager agent decides to promote a derived configuration to be the new current configuration, it asks to the configuration manager agent to create a new branch (if it has not already been created) and to commit the derived configuration into this branch. Independently of this decision, the local manager agent informs the orchestrator agent if the quality attribute was improved, worsened, or not changed for each refactoring. This last step is essential to allow the orchestrator agent to provide better refactoring sequences according to specific quality attribute in further iterations. It is important to observe that, although this process may end up with multiple derived configurations with different values for the quality attributes, the selection of the best derived configurations and their subsequent merge into the code base is postponed. After a complete rejuvenation cycle, the developer is expected to analyze the committed derived configurations and merge the ones he/she consider adequate. We provide a tool to help in this task, described in Section 2.5 and exemplified in Section 3.5.

The set of metrics currently available in Peixe-Espada is listed in Table 1. Table 2 presents the formulas used to calculate some quality attributes, according to Bansiya and Davis (2002), that use the metrics presented in Table 1. The refactorings currently available in Peixe-Espada are presented in Table 3. Notice that we have chosen some of the refactorings that can be automated, i.e., the ones that do not depend on information provided by the developer to be applied (O’Keeffe and Ó Cinnéide, 2008). The design of the refactoring agent allows for future extensions to accommodate additional refactorings, if necessary. In order to relate the quality attributes, metrics, and refactorings presented in Tables 1–3, let us consider the following example. A software developer wants to improve the effectiveness of a given project. According to Table 2, one of the metrics used to measure the effectiveness of a project is DAM, which measures the number of private attributes of a given class. Supposing that there is at least one class in the project that has few private attributes but several public ones, DAM of such class is thus very low. If the Encapsulate Fields refactoring is applied to such class, the public attributes are transformed into private attributes. This will increase the number of private attributes of the class and will consequently increase DAM of such class. Now, if we measure the effectiveness of the project following Table 2, after applying Encapsulate Fields, we will notice that such quality attribute was increased. Thus, if we increase DAM by applying the Encapsulate Fields refactoring, we will probably increase the effectiveness of the project.

46

H. Santos et al. / The Journal of Systems and Software 104 (2015) 41–59

Fig. 8. UML activity diagram showing the interaction among agents. Table 1 Metrics collected by Peixe-Espada. Design property

Derived design metric

Description

Abstraction

Average number of ancestors (ANA)

This metric measures the average number of classes from which a class inherits information. It is computed by determining the number of classes along all paths from the “root” class(es) to all classes in an inheritance structure.

Encapsulation

Data access metric (DAM)

This metric is the ratio of the number of private (protected) attributes to the total number of attributes declared in the class.

Coupling

Direct class coupling (DCC)

This metric is a count of the different number of classes that a class is directly related to. The metric includes classes that are directly related by attribute declarations and message passing (parameters) in methods.

Composition

Measure of aggregation (MOA)

This metric measures the extent of the part-whole relationships, realized by using attributes. It counts the number of data declarations whose types are user defined classes.

Inheritance

Measure of functional abstraction (MFA)

This metric is the ratio of the number of methods inherited by a class to the total number of methods accessible by member methods of the class.

Polymorphism

Number of polymorphic methods (NOP)

This metric is a count of the methods that can exhibit polymorphic behavior.

Table 2 Quality attributes computed by Peixe-Espada. Quality attribute

Formula

Flexibility Extendibility Effectiveness

+0.25 × DAM - 0.25 × DCC + 0.50 × MOA + 0.50 × NOP +0.50 × ANA - 0.50 × DCC + 0.50 × MFA + 0.50 × NOP +0.20 × ANA + 0.20 × DAM + 0.20 × MOA + 0.20 × MFA + 0.20 × NOP

Let us now consider that a software developer wants to improve the effectiveness of a project called IDUFF. He/she uses our approach to select the orchestrator agent dedicated to do so. Next, the orchestrator agent provides to the local agent the formula used to measure the quality attribute in the current configuration, in our example (+0.20 × ANA + 0.20 × DAM + 0.20 × MOA + 0.20 × MFA + 0.20 × NOP). The

measurer agent is thus responsible to compute five individual metrics (ANA, DAM, MOA, MFA, and NOP) and, by the end, compute the whole formula. Next, the local manager asks the orchestrator agent for the refactoring sequence. Suppose that the orchestrator agent has already improved the effectiveness quality attribute in two projects named PE and BCEL. The results for previously improving effectiveness

H. Santos et al. / The Journal of Systems and Software 104 (2015) 41–59

47

Table 3 Refactorings provided by Peixe-Espada. Refactoring

Search phase

Apply phase

Encapsulate Fields (Fowler et al., 1999)

Symptoms: Public attributes. Targets: public attributes and its uses.

Change attributes’ access modifier to private; create getters and setters methods; and update references to use the created methods.

Pull Up Methods (Fowler et al., 1999)

Symptoms: Method whose class should have a superclass that does not belong to an external library. This method should not overwrite any method of the superclass hierarchy. Targets: methods and their classes/superclasses.

Move the method to the superclass or move only its signature; change methods’ access modifier to public, protected, or remove the modifier.

Pull Up Fields (Fowler et al., 1999)

Symptoms: Field whose class should have a superclass that does not belong to an external library. No other field from the superclass can have the same name. Targets: fields and their classes/superclasses.

Move the field to superclass; change fields’ access modifier to public, protected, or remove the modifier.

Push Down Methods (Fowler et al., 1999)

Symptoms: Method whose class should have a subclass that does not belong to an external library. This method should not be overridden by any subclass of the class hierarchy. Targets: methods and their classes/subclasses.

Move the method to the subclass or move only its signature; change methods’ access modifier to public, protected, or remove the modifier.

Push Down Fields (Sahraoui et al., 2000)

Symptoms: Field whose class should have a subclass that does not belong to an external library. No other field from the subclass can have the same name. Targets: fields and their classes/subclasses.

Move the field to subclass; change fields’ access modifier to public, protected, or remove the modifier.

Table 4 Learning rates for effectiveness before refactoring IDUFF. Quality attribute

Effectiveness

Project

PREV. PE PREV. BCEL Combined

Rate Pull Up Methods

Pull Up Fields

Push Down Methods

Push Down Fields

Encapsulate Fields

48.14% 51.28% 50.00%

0% 4.44% 3.77%

0% 0% 0%

0% 0% 0%

57.14% 34.48% 41.86%

Table 5 Learning rates for effectiveness after refactoring IDUFF. Quality Attribute

Effectiveness

Project

PREV. PE PREV. BCEL IDUFF Combined

Rate Pull Up Methods

Pull Up Fields

Push Down Methods

Push Down Fields

Encapsulate Fields

48.14% 51.28% 0% 44.44%

0% 4.44% 0% 3.64%

0% 0% 0% 0%

0% 0% 0% 0%

57.14% 34.48% 54.55% 44.87%

Strategy

Project

in these projects can be seen in Table 4. After calculating the ratio between improved and applied refactorings in each project, the orchestrator agent concludes, based on its previous experiences, that the (current) best sequence of refactorings for effectiveness is: Pull Up Methods, Encapsulate Fields, Pull Up Fields, Push Down Fields, and Push Down Methods (or Push Down Methods and Push Down Fields). Such sequence is provided to the local agent that requests the refactoring agent to perform the refactorings in IDUFF in this order. After applying the possible refactorings, the measurer agent measures the derived configurations and informs the results to the local manager. Finally, the local manager choses the derived configuration that will become the new current configuration. The local manager also informs the orchestrator agent about refactorings that improved, worsened, or not changed of effectiveness, as illustrated in Table 5. The orchestrator agent updates its database with the new information and changes the sequence of refactoring for effectiveness: Encapsulate Fields, Pull Up Methods, Pull Up Fields, Push Down Fields, and Push Down Methods. In order to add a penalty for refactoring types with a great amount of invalid symptoms, the local manager also informs the orchestrator agent of a failure for every 10 invalid symptoms. This way it learns it is not worth selecting refactorings that waste too much time trying to find a valid symptom, that is, a symptom with only resolvable conflicts. 2.4. Massive parallelism via MAS One of the key aspects of our approach is how the agents collaborate as a distributed system to reach their goal. On account of the

Orch. Agent

Work Agent

Computer

(a) (b) (c)

Fig. 9. Strategies for parallel work.

complexity of the steps involved in software rejuvenation, parallel work is fundamental. Our MAS allows three different strategies for parallel work, as depicted in Fig. 9: improve different projects in parallel, improve different quality attributes in parallel, and speed up the improvement of a quality attribute of a specific project. Usually organizations develop different products concomitantly. With our approach, it is possible to improve different projects at the same time, spending part of the total idle computational power for each project. For example, in a setting with five developers’ computers, up to five projects can be improved in parallel. Each project will have an individual orchestrator agent running on the server, bound to at least one worker agent running on each developer computer (see

48

H. Santos et al. / The Journal of Systems and Software 104 (2015) 41–59

Fig. 9a). In this strategy, there is no intra project parallelism, and the expected result is equivalent of deploying five independent instances of Peixe-Espada, leading to a speedup of five times. On the other hand, a Project can be improved according to different quality attributes. In our approach, different worker agents can work in parallel to improve different quality attributes of the same project. In other words, this strategy dedicates part of the idle computational power to improve each quality attribute of a given project. For example, in a setting with five developer computers, up to five orchestrator agents can be configured to improve different quality attributes of the same project, and at least one worker agent can be bound to each orchestrator agent (see Fig. 9b). This strategy may lead to conflicting results, as each orchestrator agent is focused on improving in parallel only one quality attribute and may lead to worse results to the others. The developer is responsible for checking which configurations actually deserve to be merged to the code base. These are probably the dominant configurations, which improved all selected quality attributes. We provide a tool to help in this task, described in Section 2.5 and exemplified in Section 3.5. Finally, different worker agents can work for the same orchestrator agent. In this strategy, each worker agent receives a distinct refactoring sequence, but all the sequences aim at improving the same quality attribute of the same project. At the end, the improvements of each worker agent are committed to a respective branch. For example, in a setting with five developer computers, an orchestrator agent can be configured to improve a specific quality attribute of a specific project, and five worker agents can be bound to this orchestrator agent (see Fig. 9c). This strategy uses the parallelism to explore different sequences of refactorings concomitantly, but aiming at improving the same quality attribute. Similarly to the previous strategy, the developer is responsible for checking which configurations actually deserve to be merged to the code base. Section 3.2 exemplifies this strategy in action. It is important to notice that these strategies can be combined to meet specific software rejuvenation goals. For example, an organization with 32 developers responsible for four mainstream products can configure Peixe-Espada with eight orchestrator agents. Each pair of orchestrator agent can be responsible for improving two different quality attributes of the same project. Moreover, four worker agents may be bound to each orchestrator agent. This way, 32 worker agents would be working in parallel to improve two quality attributes of four different projects. 2.5. Implementation details The implementation of our approach is composed of a web and a desktop application. The web application allows registering software projects under configuration management, creating orchestrator

agents via the bind between a software project and a pre-registered quality attribute, and invoking the desktop application via Java Web Start Technology (Marinilli, 2001). On the other hand, the desktop application controls the schedule of the idle time at each developer’s computer, as shown in Fig. 10a. After scheduling the idle period, it is possible to choose a specific orchestrator agent, which is related to a quality attribute to be improved in the software project. This selection is presented in Fig. 10b. The monitoring of the local manager agents can also be done via the web application, as shown in Fig. 11. It presents all orchestrator agents, their local manager agents, and information about improvement, worsening, or not changing of quality attributes for each applied refactoring. The nomenclature used in Fig. 11 is: total cycles (C), improving cycles (IC), worsening cycles (WC), and not changing cycles (NCH). The local manager agents provide local detailed outputs regarding their operations. Thus, the steps of work and the decisions made can be tracked if necessary. Fig. 12 shows a fragment of the output provided by these agents. This output shows agents getting the refactoring sequence, checking out the source code, and getting metrics to calculate a quality attribute. After this, they start the refactorings with the first refactoring of the suggested sequence. Finally, at the end of the scheduled time, the worker agents report the performed changes to the orchestrator agent. Some examples of this final report are: “Branch created at http://.../branches/effectiveness_1” and “Field br.uff.ic.model.Person.name encapsulated”. The implementation of Peixe-Espada was not based on any available MAS framework or platform since the agents have a fix set of tasks that are executed according to the messages they receive. Since our approach deals only with single objective refactorings, the results of a specific refactoring may improve the desired quality attribute but negatively affect others. For this reason, we implemented a standalone tool (see Fig. 13) to calculate all the quality attributes for all committed configurations in a branch, allowing cherry-picking just the desired ones. An example of this tool in action is provided in Section 3.5. 3. Evaluation In order to evaluate our approach in action, we applied PeixeEspada over four different projects with different sizes. We revisited some of the main characteristics of the proposed MAS architecture discussed in Section 1, such as distribution, learning, and isolation, and planned different evaluations for each of them. Section 3.1 describes the chosen projects that act as subjects of our evaluation and the setup adopted in the evaluation. Sections 3.2 and 3.3 describe the distribution and the learning evaluation

Fig. 10. The scheduling of idle time (a) and the choice of an orchestrator agent (b).

H. Santos et al. / The Journal of Systems and Software 104 (2015) 41–59

Fig. 11. Agents monitoring.

Fig. 12. Agent output in the desktop application.

49

50

H. Santos et al. / The Journal of Systems and Software 104 (2015) 41–59 Table 7 Computer’s specification. Component

Specification

CPU Freq. RAM SSD System Java

Core i7-3630QM 2.4–3.4 GHz 16 GB DDR3-1600 CT512M4SSD2 512 GB 6 Gb/s Windows 8 1.7.0 u45

the learning experiments, we also use previous versions of PeixeEspada (PREV. PE) and Byte Code Engineering Library (PREV. BCEL), as we described in Section 3.3. We applied Peixe-Espada to each project and each quality attribute in an isolated manner, using just one computer specification (characterized in Table 7) to be able to extract compatible information without worrying about different computer performances. The topology used in the experiments is depicted in Fig. 9c. 3.2. Distribution evaluation Fig. 13. Measuring all quality attributes.

processes, respectively. Section 3.4 describes the isolation evaluation, by showing the way Peixe-Espada interacts with the CM system. Section 3.5 presents how we deal with refactorings that improve a quality attribute but negatively affect other quality attributes. Finally, Section 3.6 presents some threats to validity of our evaluations. 3.1. Environment The current Peixe-Espada implementation imposes some technological requirements over the projects to allow them to be refactored and measured. These requirements are: to be a Maven project and to be versioned using Subversion. Moreover, we added some additional requirements to provide a broader analysis in our experiment: projects should have distinct sizes, be developed by different teams, and be real projects. In the following, we describe the chosen projects and their characteristics, as shown in Table 6. These characteristics comprise the date of the version used in the experiment, the revision number, the size of the projects in terms of total lines of code (LOCs), the number of developers, the total number of files, and the total number of java files. We are using Peixe-Espada (PE) itself as a subject of our experiment. Its main dependence is Oceano-Core (OC) module, which deals with CM communication, metrics gathering, and quality-attributes computation. The third system used to evaluate Peixe-Espada is the Academic Management System of Fluminense Federal University (IDUFF – http://id.uff.br), which is responsible for managing the academic information of around 30,000 students. This system was first released in 2008 and is in constant evolution. Finally, our fourth system is the Byte Code Engineering Library (BCEL – http://jakarta.apache.org/bcel). It gives users a convenient way to analyze, create, and manipulate binary Java class files (bytecodes). For

For the distribution evaluation process, we simulated the execution of four computers in parallel with a fixed time, all improving the same quality attribute. This experiment was repeated for each quality attribute and each project. The fixed time represents the idle time available to apply the refactorings. In this experiment, we want to determine if N agents applying a random subset of refactorings in a given time, representing the idle time, are able to improve more the quality attribute after a merge than a single agent in the same time. We decided to use a period of time that was not long enough to try all refactorings and not so short that it could not apply any refactorings; hence, we decided to use 1 h for smaller projects (PE and OC) and 2 h for larger projects (IDUFF and BCEL). Moreover, we configured the worker agents to select a maximum of 50 symptoms per refactoring type and to skip the current refactoring type after 10 unsuccessful refactorings. For example, consider that the worker agent finds 5824 possible symptoms for Push Down Fields on IDUFF. Some of these symptoms are not valid, since it is not possible to refactor them without compromising the program semantics. The worker agent randomly selects 50 valid symptoms and tries to refactor them sequentially. If a refactoring does not change the quality attribute, make it worse, or is not valid anymore due to a previous refactoring in the same area, it is marked as unsuccessful. After 10 unsuccessful refactorings, or after processing the 50 symptoms, the worker agent stops working on Push Down Fields and tries the next available refactoring type. The number of tried and applied refactorings by an agent depends on many factors such as project size, quality attribute, refactoring order, number of symptoms, available time and conditions to skip the refactoring type after N valid symptoms or M unsuccessful refactorings. Bigger projects require a more time for searching symptoms and calculating the quality attributes. Some metrics used by the quality attributes are computed slower than other metrics. While DCC requires an average of 3:48 min to be calculated for IDUFF, DAM requires an average of 2:30 min, MFA requires an average of 45 s, and

Table 6 Characteristics of projects analyzed in the experiment. Characteristic

PE

OC

IDUFF

BCEL

PREV. PE

PREV. BCEL

Date Revision #LOCs #Developers #Files #Java Files

7/Apr/12 1164 12,405 4 126 97

7/Apr/12 1164 43,280 7 586 371

30/Mar/11 21,093 241,818 27 1746 1037

06/Jun/14 1,600,809 82,104 9 521 444

6/Apr/11 965 8148 3 81 64

24/Apr/14 1,589,745 65,260 9 521 445

H. Santos et al. / The Journal of Systems and Software 104 (2015) 41–59

Fig. 14. Distribution results for effectiveness.

NOP, MOA, and ANA require less than 1 s. When a refactoring type has many symptoms for a project, leading to the same refactoring being applied multiple times, subsequent refactoring types will have less available time. During our evaluations with conditions to skip the refactoring type after 50 valid symptoms and 10 unsuccessful tries, the average number of refactorings tried for each refactoring type that did not present a timeout was 20.61 for Pull Up Methods, 10.03 for Push Down Methods, 15.49 for Encapsulate Fields, 8.15 for Pull Up Fields, and 7.73 for Push Down Fields. Fig. 14 shows the merge of execution results for each project improving effectiveness. Each merge just adds the non-overlapped results of refactorings. For example, if execution 1 applies refactorings A, B, and C, and execution 2 applies refactorings B, C, and D, the merge of execution 2 just introduces the improvement provided by refactoring D, as B and C have already been considered in execution 1. Since the executions occur in parallel, there is no order for the merge. For this reason, we sorted the execution from maximum to minimum refactorings’ contribution. The First Measure value in Fig. 14 represents the quality attribute value before applying refactorings, and the Measure value represent the absolute improvement over the First Measure. It is possible to notice that the use of multiple worker agents in parallel does improve the quality attribute, comparing to a single execution during the same time. The average improvement of effectiveness for all projects combined in the distributed approach, counting with four computers running in parallel, was 6.58. The average improvement of effectiveness for a single execution was 1.75. This way, the distributed approach was 3.76 times better, providing an almost linear speedup. The sub-linear speedup is explained due to the overlap of refactorings applied by different worker agents. In other words, as worker agents randomly select a refactoring and search for symptoms, two or more work agents may apply the same refactoring over the same symptom, leading to rework. Although this redundancy is possible, the obtained results (speedup of 3.76 times) show that this is rare due to the abundant number of symptoms. Fig. 15 shows the merge results for improving extendibility. Even with the small improvement of IDUFF, it is possible to notice that the use of multiple worker agents improves the quality attributes. One of the worker agents that tried to refactor IDUFF could not apply any successful refactoring in the fixed time of 2 h. All refactorings that it tried either worsened or not changed the quality attribute. The average improvement of extendibility for all projects combined in the distributed approach, counting with four computers running

Fig. 15. Distribution results for extendibility.

51

Fig. 16. Distribution results for flexibility.

Fig. 17. 40 worker agents improving effectiveness for PE.

in parallel, was 18.62. The average improvement of extendibility for a single execution was 4.78. The distributed approach was 3.9 times better, confirming our finding of linear speedup. Finally, Fig. 16 shows the merge results for improving flexibility. Again, one of IDUFF worker agents could not apply any successful refactorings in the fixed time of 2 h. It occurred for extendibility and flexibility mostly because the time required for gathering the metrics that compose these quality attributes is bigger than the time required for gathering the metrics that compose effectiveness. The average improvement of flexibility for all projects combined in the distributed approach, counting with four computers running in parallel, was 8.31. The average improvement of flexibility for a single execution was 2.27. The distributed approach was 3.66 times better, slightly below a linear speedup. The overlap of refactorings for flexibility was bigger than the overlap for the previous quality attributes, so it is possible to see that the addition of the latter agents barely improves anything. As new agents are added, the bigger are the chances of having an agent just doing rework. Fig. 17 presents this behavior by showing the use of 40 agents trying to improve effectiveness for PE. As it is possible to notice, no improvement is perceived after adding the 31st agent. In other words, in this situation Peixe-Espada performance would benefit from having up to 30 computers running in parallel. However, only the first added computers would provide an almost linear speedup. Moreover, it would be useless to add extra computers after the first 30, as the refactoring overlap would be 100% and they would be doing only rework. 3.3. Learning evaluation As discussed before, in order to accelerate the rejuvenation cycle, Peixe-Espada uses information about refactoring success (ratio between improved and applied refactorings for each quality attribute) learned from previous executions to suggest the refactoring sequence. In order to evaluate this learning capability, we answered two main questions: (1) can learning help in future version of the same project? and (2) can learning help to improve other projects? Aiming at answering the first question, we decided to learn each quality attribute from old revisions of PE and BCEL (referred as PREV. PE and PREV. BCEL in Section 3.1), execute again with the learning information in a more recent revision of these projects (PE and BCEL), and compare the results obtained with and without learning. Aiming at answering the second question, we combined the learning results

52

H. Santos et al. / The Journal of Systems and Software 104 (2015) 41–59 Table 8 Learning rates. Quality attribute

Project

Rate Pull Up Methods

Pull Up Fields

Push Down Methods

Push Down Fields

Encapsulate Fields

Effectiveness

PREV. PE PREV. BCEL Combined

48.14% 51.28% 50.00%

0% 4.44% 3.77%

0% 0% 0%

0% 0% 0%

57.14% 34.48% 41.86%

Extendibility

PREV. PE PREV. BCEL Combined

90.19% 50.00% 72.04%

0% 2.22% 1.88%

0% 0% 0%

7.69% 0% 2.85%

0% 0% 0%

Flexibility

PREV. PE PREV. BCEL Combined

75.92% 37.03% 62.96%

0% 2.17% 1.85%

0% 0% 0%

0% 0% 0%

57.14% 26.92% 37.50%

from PREV. PE and PREV. BCEL and used this information in a new execution of OC and IDUFF. Then, we compared these results with the results without learning information. We will refer to the results obtained with learning as LR (learning result) and the results obtained without learning as RR (random result). The results obtained without learning are referred as RR because the refactoring order is random. Because of the randomness, RR is calculated as an average of four executions without learning. In order to learn about all refactoring types, we did not limit the time of the worker agents that acquired learning data. All the other constraints were kept, such as the selection of 50 valid symptoms and the skip of the refactoring type after 10 unsuccessful refactorings. The worker agent that used the learning results had the same time limit as the worker agents without learning: 1 h for PE and OC and 2 h for IDUFF and BCEL. Table 8 presents the learning obtained for each quality attribute. It is possible to see in the table that there is no positive learning for Push Down Methods. It occurred due to the constraints that skip the refactoring type after 10 failures, making the agent learn that it is a bad refactoring for effectiveness, extendibility, and flexibility. However, it is not always the case, since the Push Down Methods refactorings were able to improve extendibility and flexibility for OC in the previous experiment. By looking at this table, it is possible to infer the refactoring sequence Peixe-Espada will use in the future for each quality attribute. For example, an agent improving extendibility for IDUFF would use the combined results, so orchestrator agent could suggest the following refactoring sequence: start with Pull Up Methods (72.04% of success rate), then switch to Push Down Fields (2.85% of success rate), then switch to Pull Up Fields (1.88% of success rate), and then switch to Push Down Methods and Encapsulate Fields in any sequence. On the other hand, an agent improving extendibility for BCEL would use the specific results for BCEL, so orchestrator agent could suggest the following refactoring sequence: start with Pull Up Methods (50.00% of success rate), then switch to Pull Up Fields (2.22% of success rate), and then switch to any other refactoring in any sequence. Fig. 18 compares LR with RR improving effectiveness. It is possible to see that LR was better than RR for all projects. However, there was

Fig. 18. Learning results for effectiveness.

Fig. 19. Learning results for extendibility.

Fig. 20. Learning results for flexibility.

an individual RR that improved the quality attributes better than the LR, indicating that the learning did not provide an optimal sequence. We believe that the accuracy of the LR could be improved if the learning were based on more than two projects. In this specific situation, OC and IDUFF could strongly benefit from learning obtained during the execution of PE and BCEL. This result is important, indicating that it is possible to reuse knowledge obtained by running Peixe-Espada over other projects. Fig. 19 compares LR with RR improving extendibility and Fig. 20 provides the same comparison for flexibility. The learning seems to provide an appropriate sequence for the refactorings as the LR were always better than the RR. Moreover, the projects that received learning from previous versions (PE and BCEL) presented outstanding results. Although the improvement using learning information from other projects was minor for these two quality attributes, it still occurred (see OC and IDUFF). Revisiting our two research questions, we can say that: (1) a project can be highly benefited by learning obtained from their previous version, and (2) projects can also benefit, in a smaller amount, by learning obtained from other projects. However, as we could observe, the benefits of learning is maximized if it is done in a project basis. This way, when Peixe-Espada is applied for the first time over a project, it starts using the information learned from other projects and, as soon as it learns from the project itself, it switches to the project-specific leaning information.

H. Santos et al. / The Journal of Systems and Software 104 (2015) 41–59

53

Fig. 21. Pull Up Methods refactoring commit diff.

3.4. Isolation evaluation As previously discussed, all successful refactorings are committed to branches, guaranteeing isolation among the worker agents. Because of that, developers can choose to merge the refactored project branches into the mainline or discard them, or even just accept some commits of each branch and discard the others. In this section, we show some examples of commits performed by Peixe-Espada while refactoring IDUFF. To do so, we are using the Unified Format defined by GNU Diff as patch notation, enhanced with a color notation: green for added lines (marked with “+”) and red for deleted lines (marked with “-”). It is important to notice that there are plenty of high-level tools able to interpret this Unified Format, providing comfortable ways to visualize and merge specific changes. To support the refac-

toring identification, Peixe-Espada also adds to the commit message the name of the refactoring performed in the commit, some context information about how the refactoring was applied, and how much the quality attribute improved. Using Pull Up Methods as an example, Fig. 21 shows that the candidatou method, implemented in InscricaoOnlineCoordenacaoMB and in InscricaoPresencialCoordenacaoMB, was moved to the BaseMB superclass. The metadata information added in the commit message is shown in Fig. 22. As it is possible to notice in the metadata, in order to pull the candidatou method up, Peixe-Espada had to decide either to delete similar methods in other subclasses or to keep them. This decision was presented as a conflict. To solve the conflict, Peixe-Espada deleted the method from InscricaoPresencialCoordenacaoMB. It is also important to notice that during the application of this refactoring,

Fig. 22. Pull Up Methods refactoring commit message.

54

H. Santos et al. / The Journal of Systems and Software 104 (2015) 41–59

Fig. 23. Encapsulate Fields refactoring commit diff.

Fig. 24. Encapsulate Fields refactoring commit message.

Peixe-Espada tried both possibilities to solve the conflict and selected the one that best improved the desired quality attribute. This refactoring is positive since both subclasses had the same implementation to this method, leading to code duplication. After the refactoring, the measure of functional abstraction (MFA) increased and, consequently, did the effectiveness quality attribute. As this refactoring may have a side effect on the cohesion of BaseMB class, the developer may create an intermediate superclass in between the two subclasses and BaseMB during merge, and move the candidatou method to this class, without affecting the cohesion of BaseMB class. In addition, Fig. 23 depicts a commit of the Encapsulate Fields refactoring, which encapsulated the boletimService field in AlunoService class. To do so, the field access modifier was changed to private and the accessor methods getBoletimService and setBoletimService were created. Fortunately, no other changes were necessary, because there were no external usages of this field. As a result, the encapsulation metric (DAM) increased and, consequently, both flexibility and effectiveness increased. Fig. 24 presents the commit message. It is interesting to mention that, by definition (Bansiya and Davis, 2002), the inheritance design property, among other properties, positively influences the extendibility quality attribute. Moreover, the MFA design metric, which is used to compute the inheritance design property, measures the proportion of inherited methods among all defined methods. This way, as the proportion of inherited methods increases, the extendibility quality attribute also increases. This leads to an intriguing but explainable situation: when Peixe-Espada applies the Encapsulate Fields refactoring, two accessor methods are created (get and set) and, consequently, the proportion of inherited methods decreases, also decreasing the extendibility quality attribute. On the one hand, accessor methods would never improve the extendibility of a system, as it is very uncommon to inherit such methods. On the other hand, it pollutes the class interface with additional methods that are not useful for the purpose of inheritance, and, consequently, extendibility. However, it is worth to notice that, besides slightly de-

creasing extendibility, the Encapsulate Fields refactoring significantly improves the DAM design metric, consequently improving the flexibility, understandability, and effectiveness quality attributes. Conversely, in the situation depicted in Fig. 25, related to Pull Up Fields refactoring, the PAGINA_VOLUME_LIVRO field of PaginaMB class was moved to the BaseMB superclass. This refactoring was committed to the branch because it decreases direct class coupling (DCC) and, thus, improves flexibility and extendibility. However, this refactoring is undesired when considering architectural constraints of the project, since some diploma-related code was moved to a generic class. Therefore, the developer should not merge this change back. Fig. 26 shows the message of this refactoring. Regarding the Push Down Methods refactoring, Fig. 27 shows a situation where the getSelectItems method from BaseComboBoxBean was moved to its ComboBoxDiploma subclass. At that moment, this refactoring was positive, because the getSelectedItems method was used only in the ComboBoxDiploma class. However, this may disrupt future implementations, because other subclasses of BaseComboBoxBean may need to access the selected items. Fig. 28 shows the message of this refactoring. Correspondingly, Fig. 29 shows a case of Push Down Fields refactoring where the mostrar_detalhes_pontuacao field was moved from InscricaoBaseService to its InscricaoPresencialBaseService subclass. This refactoring alleviated the superclass, transferring the field to a place where it is used in fact. However, it is difficult to foresee if this change produces side effects or not without domain knowledge. This is a clear situation where only humans can balance the pros and cons and decide if the code should be merged into the mainline or discarded. Fig. 30 presents the commit message of this refactoring. 3.5. Dealing with side effects As discussed before, our approach focuses on improving a single quality attribute. Because of the isolation provided by committing the successful refactorings to isolated branches, the developer can select later which refactorings are in fact going to be applied to the

H. Santos et al. / The Journal of Systems and Software 104 (2015) 41–59

Fig. 25. Pull Up Fields refactoring commit diff.

Fig. 26. Pull Up Fields refactoring commit message.

Fig. 27. Push Down Methods refactoring commit diff.

Fig. 28. Push Down Methods refactoring commit message.

55

56

H. Santos et al. / The Journal of Systems and Software 104 (2015) 41–59

Fig. 29. Push Down Fields refactoring commit diff.

Fig. 30. Push Down Fields refactoring commit message.

code base. Although successful for the selected quality attribute, these refactorings may have side effects to other quality attributes. Aiming at avoiding improving a quality attribute and drastically worsening others, we provide a tool (see Section 2.5) to calculate all quality attributes for all committed refactorings in a given branch, supporting cherry-picking just the ones that do not negatively affect other quality attributes. We applied this tool to calculate all quality attributes from the BCEL project during the distribution evaluation for effectiveness shown in Fig. 13. Fig. 31 shows the evolution of each quality attribute during the application of the refactorings that improve effectiveness by the agent 2. While most refactorings improve all quality attributes due to the improvement of NOP, the revision 1295 decreases both extendibility and flexibility due to the improvement of DCC. In this situation, the developer may opt to accept all refactorings but this one, resulting in the graphic shown in Fig. 32. We cherry-picked commit by commit in this example, aiming at better traceability and understandability, but it is possible to cherry-pick a subsequence of commits at once. Since all refactor-

Fig. 31. Agent 2 improving effectiveness on BCEL.

ings from agents 1, 3, and 4 did not worsen other quality attributes, we present in Fig. 33 the result of cherry-picking all the commits of these agents at once as revisions 3227, 3228, and 3229, respectively. To merge the contributions of agent 1 with the already merged contributions of agent 2, we had to solve two conflicts manually.

H. Santos et al. / The Journal of Systems and Software 104 (2015) 41–59

Fig. 32. Cherry-picked commits from agent 2.

Fig. 33. Cherry-picked commits from all agents.

3.6. Threats to validity As every experimental evaluation, our evaluation suffers from some threats to validity. The small number of projects (four) does not allow us to generalize the evidences presented in this section. In addition, our study is observational and based on historical data, leading to assumptions that need to be further validated. Moreover, we built our approach upon metrics and quality attributes described in the literature (Bansiya and Davis, 2002), which are subject to their imprecision and mistakes. Furthermore, our evaluation considered only three quality attributes and a limited list of refactoring (five refactoring types). Finally, our experiment considered only the technical benefits of refactorings in terms of the adopted metrics, but some refactorings may have side effects on architectural constraints or company policies. 4. Related work We could identify a broad range of related work, which focus on different stages of Peixe-Espada lifecycle. Some related work contributes by identifying refactoring opportunities (Sahraoui et al., 2000; Simon et al., 2001; Tourwé and Mens, 2003; O’Keeffe and Ó Cinnéide, 2008; Abdeen et al., 2013; Ouni et al., 2013), measuring the code base after the application of refactorings (Chidamber and Kemerer, 1994; Mancoridis et al., 1998; Bansiya and Davis, 2002; Harman and Clark, 2004; O’Keeffe and Ó Cinnéide, 2006; Romano and Pinzger, 2011; Abdeen et al., 2013), or identifying the refactorings that better improve specific source code metrics (Sahraoui et al., 2000; Du Bois et al., 2004). However, most related work suggests a sequence of refactorings that better improves the code base, possibly applying these refactorings over it. Harman and Tratt (2007) classify these approaches according to their output as indirect and direct. Indirect approaches (Wiggerts, 1997; Mancoridis et al., 1998; Doval et al., 1999; Mancoridis et al., 1999; Harman et al., 2002; Seng et al., 2006; Harman and Tratt, 2007; Abdeen et al., 2009; Bavota et al., 2010; Räihä, 2010; Praditwong et al., 2011; Bavota et al., 2012; de Barros, 2012; Ouni et al., 2012; Abdeen et al., 2013) extract models from the source code, work over these models, and suggest a sequence of refactorings as output. On the other hand, direct approaches (Ryan, 1999; O’Keeffe and Ó Cinnéide, 2006; O’Keeffe and Ó Cinnéide, 2008) apply the refac-

57

torings directly over the source code and produce new revisions of the source code as output. As Peixe-Espada works directly over the source code stored in the CM system, the works more related to ours are the direct approaches. We differ from existing direct approaches by focusing on aspects they enlist as challenges: communication of changes in design to the developers, scalability, and preservation of the program semantics. In this section we discuss these approaches and contrast them with Peixe-Espada. One of the key challenges to refactor source code is identifying refactoring opportunities (i.e., symptoms), usually encoded in bad smells, anti-patterns, or anomalies. The usual approaches for identifying refactoring opportunities consist on searching the code base for specific structures (Tourwé and Mens, 2003), usually adopting an AST, running manual inspections using visualizations over the code base (Simon et al., 2001), or using rules composed of metric thresholds (Sahraoui et al., 2000). Ouni et al. (2013) innovates by looking at bug fixes examples obtained from CM repositories. These examples are used for automatically inferring refactoring opportunities rules, based on metric thresholds. Abdeen et al. (2013) work on a neglected element of object-oriented software in terms of refactorings: interfaces. They use metrics to identify interfaces with duplicate methods and with unused methods. Their experiments show that these anomalies are reliable symptoms of poor interface design. Peixe-Espada currently uses the first approach, by extracting the AST from the source code and searching for specific structures in it. However, it learns which refactorings actually help on improving specific quality attributes from previous execution, and uses this knowledge for prioritizing structures to search in the AST in future executions. After refactoring the code base, it is necessary to check if the refactoring provided the expected benefits. One of the ways to assess this is by measuring the code base (Harman and Clark, 2004). Multiple works provided metrics to this end, focusing on different abstraction levels. At the low design level, CBO (coupling between object classes) and LCOM (lack of cohesion in methods), metrics of the Chidamber and Kemerer (1994) suite (C&K), are widely used by refactoring approaches for measuring coupling and cohesion. Another metric broadly used by clustering approaches is MQ (modularization quality) (Mancoridis et al., 1998), which conciliates coupling and cohesion. When dealing with interfaces, IUC (interface usage cohesion) (Romano and Pinzger, 2011) provides a strong correlation with source code changes, and can be seen as a candidate for replacing C&K metrics. Moreover, Abdeen et al. (2013) identified that developers accept well some interface design principles (Gamma et al., 1994; Martin, 2000), but consider less the impact of interface design on the cohesion of its implementing classes, enforcing the motivation of using specific metrics for interfaces. However, as it is usually desired to select metrics that better represent high level quality needs (Sahraoui et al., 2000), Bansiya and Davis (2002) introduced a set of high level metrics called QMOOD (quality model for object-oriented design). These high level metrics represent quality attributes and are described in terms of lower level design metrics, encoding the conciliation of multiple design decisions. As our contribution does not reside in providing new metrics for evaluating refactorings, and as QMOOD aims at conciliating lower level design metrics into quality attributes, we decided to use such metric set in our approach. It is important to notice that QMOOD has already been adopted by other approaches (O’Keeffe and Ó Cinnéide, 2006; O’Keeffe and Ó Cinnéide, 2008), and, like them, we do not focus in the subjective concepts behind QMOOD, but in the underlying mechanics provided by our MAS to improve the measured values. These mechanics would remain valid if other metrics are used in place of QMOOD. Moreover, some approaches identify which refactorings better improve specific metrics. For instance, Du Bois et al. (2004) analyzed how refactorings that redistribute responsibilities within and between classes, such as extract methods, move methods, and extract

58

H. Santos et al. / The Journal of Systems and Software 104 (2015) 41–59

classes, affect design metrics such as coupling and cohesion. As a result, they provide a list of guidelines for manual searching for symptoms and applying refactorings. Sahraoui et al. (2000) not only identify the refactorings that better improve specific metrics but also identified the thresholds on metrics that indicate refactoring opportunities. Tahvildari et al. (2003) decomposed maintainability soft-goals (e.g., good performance into good space performance and good time performance) and indicated how design patterns, encoded in a set of transformations (Tahvildari and Kontogiannis, 2002), positively or negatively affect these goals. Stroggylos and Spinellis (2007) analyzed commits that aimed at refactoring the code base and observed a significant change of certain metrics to the worse. This controversial result may indicate that the metric set is not being able to capture the improvements provided by the refactoring or that the refactoring is inappropriate for the situation. In order to identify which refactorings better improve specific metrics, Peixe-Espada learns from past experiences by registering the success rate of each refactoring regarding each metric. This allows Peixe-Espada to prioritize refactorings in the future when the user wants to improve the same metric. Most of the indirect approaches (Harman and Tratt, 2007) deal with software remodularization. This problem has been formalized in terms of a clustering problem (Wiggerts, 1997) and, further, in terms of a single-objective optimization problem (Mancoridis et al., 1998). Basically, modules (usually classes, but sometimes methods (Seng et al., 2006)) should be grouped into clusters (usually packages). Multiple approaches raised using different single-objective optimization strategies (Doval et al., 1999; Mancoridis et al., 1999; Harman et al., 2002; Abdeen et al., 2009; Bavota et al., 2010), amongst many others (Räihä, 2010). In general, these approaches work over high-level models of the software, usually extracted from the source code, and consider only a specific type of refactoring (moving classes amongst packages) and design metrics (cohesion and coupling). Differently from Peixe-Espada, they do not autonomously materialize the refactorings back to the source code. Moreover, they do not allow extensibility in terms of refactorings and metrics, nor distributed execution over multiple idle computers. Recently, this clustering problem was reformulated in terms of a multi-objective optimization problem (Praditwong et al., 2011), following a strategy already adopted for identifying the best sequence of refactorings (Harman and Tratt, 2007). The optimization objectives include: increasing clusters cohesion, reducing clusters coupling, and avoiding clusters of discrepant sizes. Subsequently, de Barros (2012) evaluated the efficiency and effectiveness of using multi-objective optimization for this problem, and Bavota et al. (2012) recognized the fact that some remodularization can inadvertently affect the semantics of the software, and transferred the final decision to the developer. Ouni et al. (2012) incorporated the semantics concerns as two additional objectives that should be maximized within clusters: vocabulary-based similarity and dependency-based similarity. Recently, Abdeen et al. (2013) introduced a constraint to respect as much as possible the original design, avoiding disruptive changes that would significantly affect software comprehension. Like these approaches, Peixe-Espada recognizes the importance of preserving semantics when applying refactorings. However, differently from PeixeEspada, none of these approaches actually materialize the refactorings back to the source code. Peixe-Espada does that incrementally, refactoring by refactoring, demarcating each refactoring with an individual commit in an isolated branch. This allows users to decide, after the fact, which commits should be merged back to the main development line or discarded. Amongst the direct approaches (Harman and Tratt, 2007), O’Keeffe and Ó Cinnéide (2006, 2008) strongly inspired our work by demonstrating that object-oriented source code can be automatically refactored. They applied hill climbing and simulated annealing optimization techniques for automatically refactoring the code base aiming

at improving specific quality attributes. In a similar way to PeixeEspada, they not only suggest refactorings, but also apply the refactorings and generate new revisions of the source-code. Moreover, they also adopted QMOOD quality attributes to evaluate the new revisions. They highlight two challenges that we tackled in our paper (O’Keeffe and Ó Cinnéide, 2008): communication of changes in design to the developers and scalability. The communication challenge is one of the main barriers for adopting such approaches, since multiple refactorings over a large system would negatively contribute to understandability. Regarding scalability, they evaluated their approach over two small-scale systems (less than 50 classes) and the measurement of each generated revision took 1 s. This clearly does not scale to medium and large systems, with hundreds or thousands of classes, considering that multiple measurements are necessary during the improvement process. This way, our approach contributes by providing an incremental and controlled way for communicating the design changes to developers via CM. This allows code review and semi-automatic selection of refactorings that should be incorporated in the main development line by cherry picking specific commits from an isolated branch. Moreover, we also contribute by providing a distributed MAS to search for symptoms, apply refactorings, measure their benefits, and learn which refactorings better improve each metric. As shown in our experiments, this mechanism could significantly boost the performance of our approach. Ryan (1999) also introduced a direct approach for refactoring source code. Differently from O’Keeffe and Cinnéide (2006, 2008), his work is highly concerned with parallelization. In this case, the parallelization is a natural consequence of adopting genetic programming optimization technique. However, applying genetic programming directly over the source code can lead to side effects, as its crossover and mutation operations do not preserve program semantics. Conversely, Peixe-Espada only generates new revisions of the source code by applying safe transformations over the previous revision. Any transformations that may generate side effects (e.g., moving a method to a class that already has a method with the same signature) are discarded. 5. Conclusions and future work Our work presented some alternative techniques to help on alleviating the premature aging of software. This is done by means of continuous monitoring and improving quality attributes of projects in a parallel, registering each improvement as a commit in a branch of the CM repository, and boosting future rejuvenations sessions via the learning feature of the MAS. Our experiments found some initial evidences of relationship among refactoring and quality attributes. For example, the Encapsulate Fields refactoring improves flexibility, but does not improve extendibility. On the other hand, the Pull Up Methods is a good refactoring for flexibility, extendibility, and effectiveness. Another innovation of our approach is the ability of running in parallel in the idle time of the development computers. This, together with the learning feature of the MAS, can be used to boost up the current optimization-based approaches. Finally, the use of CM systems not only to support accessing the versions to be improved but also to store the improved versions is also an advantage of our approach. Allowing the developer to discern which improvements should be incorporated in the product is an important step in the way to release the fear of knowledge disruption from the users. Our approach implementation is currently limited to Java projects. Moreover, they must be adherent to the Maven project structure and be managed by Subversion. Furthermore, we did not analyze the specific effects of our approach over projects that adopt techniques such as reflection, design patterns, and aspect-orientation. As future work, we foresee the improvement of Search and Apply tasks by combining our approach with existing search base software engineering solutions, considering that we face a multi-objective

H. Santos et al. / The Journal of Systems and Software 104 (2015) 41–59

optimization problem. Another natural evolution of our work is the addition of other automatic refactorings, quality attributes, and supporting other programming languages. We are also considering the development of self-tuning algorithms to change the parallelization strategy on demand and the inclusion of runtime metrics, which can be collected during test executions. Finally, it would be interesting to trace all changes of a refactoring to discover which characteristics create distinct behaviors in the same refactoring when applying it into distinct projects, even using the same quality attribute. Acknowledgments We would like to thank STI/UFF for allowing us to use IDUFF source code in our experiment. Moreover, we are deeply thankful to André van der Hoek, all members of the GEMS team, and the Journal of Systems and Software anonymous reviewers, who provided useful insights to this research. Finally, we would like to thank FAPERJ, CNPq, and CAPES for the financial support. References Abdeen, H., Ducasse, S., Sahraoui, H., Alloui, I., 2009. Automatic package coupling and cycle minimization. In: Working Conference on Reverse Engineering (WCRE). Washington, DC, USA, pp. 103–112. Abdeen, H., Sahraoui, H., Shata, O., 2013. How we design interfaces, and how to assess it. In: IEEE International Conference on Software Maintenance (ICSM), pp. 80–89. Abdeen, H., Sahraoui, H., Shata, O., Anquetil, N., Ducasse, S., 2013. Towards automatically improving package structure while respecting original design decisions. In: Working Conference on Reverse Engineering (WCRE) pp. 212–221. Abdeen, H., Shata, O., Erradi, A., 2013. Software interfaces: On the impact of interface design anomalies. In: International Conference on Computer Science and Information Technology (CSIT), pp. 184–193. Bansiya, J., Davis, C.G., 2002. A hierarchical model for object-oriented design quality assessment. IEEE Transactions on Software Engineering, 28 (1), 4–17. Bavota, G., Carnevale, F., Lucia, A.D., Penta, M.D., Oliveto, R., 2012. Putting the developer in-the-loop: An interactive GA for software re-modularization. In: Fraser, G., de Souza, J.T. (Eds.), Search Based Software Engineering. Springer, Berlin, Heidelberg, pp. 75–89. Bavota, G., De Lucia, A., Marcus, A., Oliveto, R., 2010. Software re-modularization based on structural and semantic metrics. In: Working Conference on Reverse Engineering (WCRE), pp. 195–204. Bennett, K.H., Rajlich, V.T., 2000. Software maintenance and evolution: A roadmap. International Conference on Software Engineering 73–87. Bond, A.H., Gasser, L.G., 1988. Readings in Distributed Artificial Intelligence. M. Kaufmann, San Mateo, Calif. Chidamber, S.R., Kemerer, C.F., 1994. A metrics suite for object oriented design. IEEE Trans. Softw. Eng. 20 (June (6)), 476–493. de Barros, M.O., 2012. An analysis of the effects of composite objectives in multiobjective software module clustering. In: Genetic and Evolutionary Computation Conference (GECCO). New York, NY, USA, pp. 1205–1212. Doval, D., Mancoridis, S., Mitchell, B.S., 1999. Automatic clustering of software systems using a genetic algorithm. In: Software Technology and Engineering Practice (STEP), pp. 73–81. Du Bois, B., Demeyer, S., Verelst, J., 2004. Refactoring – improving coupling and cohesion of existing code. In: Working Conference on Reverse Engineering (WCRE), pp. 144–151. Erlikh, L., 2000. Leveraging legacy system dollars for E-business. IT Professional 2 (3), 17–23. Fowler, M., Beck, K., Brant, J., Opdyke, W., Roberts, D., 1999. Refactoring: Improving the design of existing code. Addison-Wesley, Canada, USA. Gamma, E., Helm, R., Johnson, R., Vlissides, J.M., 1994. Design patterns: Elements of reusable object-oriented software, 1st ed. Addison-Wesley Publishing Co., USA. Gurp, J.V., Bosch, J., 2002. Design erosion: Problems & causes. J. Syst. Softw. 61 (March (2)), 105–119. Harman, M., Clark, J., 2004. Metrics are fitness functions too. In: International Software Metrics Symposium (METRICS), pp. 58–69. Harman, M., Hierons, R.M., Proctor, M., 2002. A new representation and crossover operator for search-based optimization of software modularization. In: Genetic and Evolutionary Computation Conference (GECCO).Vol. 2. San Francisco, CA, USA, pp. 1351–1358. Harman, M., Tratt, L., 2007. Pareto optimal search based refactoring at the design level. In: Genetic and Evolutionary Computation Conference (GECCO). New York, NY, USA, pp. 1106–1113. ISO, 2001 ISO/IEC 9126 – Software Engineering – Product Quality, International Organization for Standardization, Geneva, Switzerland Kerievsky, J., 2004. Refactoring to Patterns. Addison-Wesley. Kolb, R., Muthig, D., Patzke, T., Yamauchi, K., 2005. A case study in refactoring a legacy component for reuse in a product line. In: IEEE International Conference on Software Maintenance (ICSM), pp. 369–378.

59

Mancoridis, S., Mitchell, B.S., Chen, Y., Gansner, E.R., 1999. Bunch: A clustering tool for the recovery and maintenance of software system structures. In: IEEE International Conference on Software Maintenance (ICSM), pp. 50–59. Mancoridis, S., Mitchell, B.S., Rorres, C., Chen, Y., Gansner, E.R., 1998. Using automatic clustering to produce high-level system organizations of source code. In: International Workshop on Program Comprehension (IWPC). Washington, DC, USA, pp. 45–52. Marinilli, M., 2001. Java Deployment with JNLP and WebStart. Sams. Martin, R.C., 2000. Design principles and design patterns. Object Mentor 1–34. Messerschmitt, D.G., Szyperski, C., 2005. Software Ecosystem: Understanding an Indispensable Technology and Industry. The MIT Press, Cambridge, Mass. Moser, R., Sillitti, A., Abrahamsson, P., Succi, G., 2006. Does refactoring improve reusability?. In: Morisio, M. (Ed.), Reuse of Off-the-Shelf Components. Springer, Berlin, Heidelberg, pp. 287–297. Murphy-Hill, E., Parnin, C., Black, A., 2012. How we refactor, and how we know it. IEEE Transactions on Software Engineering 38 (January (1)), 5–18. O’Keeffe, M., Ó Cinnéide, M., 2006. Search-based software maintenance. European Conference on Software Maintenance and Reengineering (CSMR), pp. 10–260. O’Keeffe, M., Ó Cinnéide, M., Abril 2008. Search-based refactoring for software maintenance. J. Syst. Softw. 81 (4), 502–516. Ouni, A., Kessentini, M., Sahraoui, H., Boukadoum, M., 2013. Maintainability defects detection and correction: A multi-objective approach. Automated Softw. Eng. 20 (March (1)), 47–79. Ouni, A., Kessentini, M., Sahraoui, H., Hamdi, M.S., 2012. Search-based refactoring: towards semantics preservation. IEEE International Conference on Software Maintenance (ICSM), pp. 347–356. Parnas, D.L., 1994. Software aging. In: International Conference on Software Engineering (ICSE). Los Alamitos, CA, USA, pp. 279–287. Praditwong, K., Harman, M., Yao, X., 2011. Software module clustering as a multiobjective search problem. IEEE Trans. Softw. Eng. 37 (March (2)), 264–282. Pressman, R.S., 2005. Software Engineering: A Practitioner’s Approach, 6th ed. McGrawHill, USA. Räihä, O., 2010. A survey on search-based software design. Comput. Sci. Rev. 4 (November (4)), 203–249. Raymond, E., 2001. The Cathedral and the Bazaar: Musings on Linux and Open Source by An Accidental Revolutionary, Rev. ed. O’Reilly, Beijing, Cambridge, Mass. Romano, D., Pinzger, M., 2011. Using source code metrics to predict change-prone Java interfaces. In: IEEE International Conference on Software Maintenance (ICSM), pp. 303–312. Ryan, C., 1999. Automatic Re-engineering of Software Using Genetic Programming. Springer Science & Business Media. Sahraoui, H.A., Godin, R., Miceli, T., 2000. Can metrics help to bridge the gap between the improvement of OO design quality and its automation?. IEEE International Conference on Software Maintenance (ICSM), pp. 154–162. Seng, O., Stammel, J., Burkhart, D., 2006. Search-based determination of refactorings for improving the class structure of object-oriented systems. In: Genetic and Evolutionary Computation Conference (GECCO). Washington, USA, pp. 1909–1916. Sengupta, B., Chandra, S., Sinha, V., 2006. A research agenda for distributed software development. In: Proceedings of the 28th International Conference on Software Engineering. New York, NY, USA, pp. 731–740. Simon, F., Steinbrückner, F., Lewerentz, C., 2001. Metrics based refactoring. In: European Conference on Software Maintenance and Reengineering (CSMR). Lisbon, Portugal, pp. 30–38. Sommerville, I., 2006, 2006. Software Engineering, vol. 8. Addison-Wesley. Stroggylos, K., Spinellis, D., 2007. Refactoring – does it improve software quality? In: Proceedings of the 5th International Workshop on Software Quality. Washington, DC, USA, p. 10. Tahvildari, L., Kontogiannis, K., 2002. A software transformation framework for qualitydriven object-oriented re-engineering. In: International Conference on Software Maintenance, 2002. Proceedings, pp. 596–605. Tahvildari, L., Kontogiannis, K., Mylopoulos, J., June 2003. Quality-driven software reengineering. J. Syst. Softw. 66 (3), 225–239. Tourwé, T., Mens, T., 2003. Identifying refactoring opportunities using logic meta programming. In: European Conference on Software Maintenance and Reengineering (CSMR), pp. 91–100. Wiggerts, T.A., 1997. Using clustering algorithms in legacy systems remodularization. In: Working Conference on Reverse Engineering (WCRE), pp. 33–43. Yu, E., Giorgini, P., Maiden, N., Mylopoulos, J., 2011. Social Modeling for Requirements Engineering. The MIT Press. Heliomar Santos graduated in Computer Science at Universidade Federal Fluminense (UFF) in 2008 and received his master in 2011 from UFF. João Felipe Pimentel graduated in Computer Science at Universidade Federal Fluminense in 2014 and is now a Master student at the same university. Viviane Torres Da Silva graduated as a Computer Engineer at Pontifícia Universidade Católica do Rio de Janeiro (PUC-Rio) (1998), received her master in 2000 and her PhD in 2004, both from PUC-Rio. She is now working on IBM Research Brazil (on leave from Universidade Federal Fluminense where she is a professor). Leonardo Murta graduated as a Computer and System Engineer at COPPE/UFRJ (1999), received his master in 2002 and his PhD in 2006, both from COPPE/UFRJ. He is a professor at the Computer Institute of the Universidade Federal Fluminense.