Information and Software Technology 114 (2019) 107–120
Contents lists available at ScienceDirect
Information and Software Technology journal homepage: www.elsevier.com/locate/infsof
A novel approach for automatic remodularization of software systems using extended ant colony optimization algorithm Bright Gee Varghese R∗, Kumudha Raimond, Jeno Lovesum Karunya Institute of Technology and Sciences, Coimbatore, India
a r t i c l e
i n f o
Keywords: Remodularization Ant colony optimization Turbo modularization quality Software system Code dependency
a b s t r a c t Context: Software modularization is extremely important to streamline the inner structure of the program modules without influencing its core functionality. As the framework advances during the upkeep stage, the pristine design of the software package gets disintegrated and hence it is arduous to understand and maintain. There are many existing approaches being carried out to automatically remodularize using optimization techniques to ease the maintenance and improve the quality of the system. The outcomes are rather insufficiently optimal and depend on problem-specific operators, which in turn expands the time multifaceted nature to land at an answer. Apart from these limitations, the issues, such as time complexity, scalability and performance need to be addressed. Objective: In this paper, an efficient automatic software remodularization using extended Ant Colony Optimization (ACO) has been proposed to remodularize the software systems. Method: The proposed approach mainly includes two phases: optimised traversal of software system using ACO for finding the order of software files to be processed and remodularization of software system using the proposed approach of extended ACO. Results: We experimented our proposed approach on seven software systems. The performance is evaluated by using Turbo modularization quality (MQ) which supports Module dependency graph (MDG) that have edge weights. The time complexity of remodularized software system is evaluated based on number of Turbo MQ. Conclusion: It can be concluded that when the performance has been compared with the subsisting methodologies, for example, Genetic algorithm (GA), Hill climbing (HC) and Interactive genetic algorithms (I-GAs), the proposed approach has higher Turbo MQ value with lesser time complexity in the evaluated software systems.
1. Introduction The initial design of any large software system is characterized by maximum cohesion and reduced coupling among the modules. Generally, software systems comprise of modules that communicate with one another to achieve those framework’s real intended purpose. The principle of modularity implies partitioning software into different components according to functionality and responsibility with more cohesion and less coupling. Wide range of research works has been done to modularize any kind of complicated computer program frameworks. Let’s say, in systems developed using object-oriented programming, componentoriented programming, reflective-programming, aspect-oriented programming, context-oriented programming, model-driven engineering, feature-oriented programming and event-based programming, the concept of modularization has been followed. The unvarying point is, these extended software frameworks are always unveiled for revisions or alterations which is possibly practiced with an intention of finding and
∗
Corresponding author. E-mail addresses:
[email protected] (B.G. Varghese R), kraimond@ karunya.edu (K. Raimond),
[email protected] (J. Lovesum).
rectifying mistakes or the necessity of enhancing programming frameworks efficiency by offering further highlights according to its upcoming requisites. This causes poor modularization and the changes done in the software that is developed or matured can still lower the cohesiveness and gain the coupling among the modules. This leads the resulting programming framework firmer to manage and perhaps be more error inclined. During the maintenance phase of software system, the cohesion may reduce and coupling may increase which affect the principles of software modularization. This greatly makes the maintenance phase tedious for the software developer. If remodularization is performed manually, it will be a tedious task. Different remodularization strategies are utilized to surmount this challenge of reduced cohesion and increased coupling. Most of the large scale software systems evolve and hence become highly complex, fault-prone and difficult to maintain by software engineers [5,28]. In general, most of the changes that take place during the evolution of systems like introducing new features to the system or application that is developed or fixing bugs in the system, occurs within strict deadlines [29]. As a result, when the code undergoes changes it can have a negative severe impact on the quality of the design of the system, such as the distribution of the classes in packages. To address this issue in software, one of the commonly used techniques is called as
https://doi.org/10.1016/j.infsof.2019.06.002 Received 12 November 2018; Received in revised form 18 May 2019; Accepted 15 June 2019 Available online 16 June 2019 0950-5849/© 2019 Elsevier B.V. All rights reserved.
B.G. Varghese R, K. Raimond and J. Lovesum
Information and Software Technology 114 (2019) 107–120
software remodularization, which is also termed as software restructuring, which can be used to improve the existing decomposition of systems [3,17,21,22]. There has been much work on different techniques and tools for software remodularization [1,3,5,21–27]. The majority of existing works on software remodularization focus much on the problem of clustering of modules rather than improving existing modularizations. Further, the earlier works on software remodularization have not considered the time complexity and performance during the phase of change. Most of the works have considered only, cohesion and coupling as the main metrics that is used to improve the quality of existing modules by classifying which classes they should belong to in a package. Despite the fact that a large portion of the current methodologies are sufficiently incredible to give remodularization arrangements, a few issues still should be tended to. The most imperative issues are the time complexity and performance. Many research works are being carried out to address this software design issue. Researchers considered software remodularization as an optimization problem and the application of such algorithms has gained wide attention in recent years. The aim is to establish an optimal balance between cohesion and coupling of the system. A significant part of the past research on assessing software clustering results depended on little experimental investigations, emotional input from engineers, or estimating the likeness of a bunching result against a little accumulation of reference frameworks [25]. Mancoridis et al. [1] developed a tool, Bunch which incorporates HC and GA to address software remodularization problem. It does not consider the interconnection strength of the software system that exist between the modules which affects the quality of remodularization. Further this approach has yielded some isolated modules (module with only one file) which in turn increase the number of modules with zero cohesion and high coupling. Mancoridis et al. [30] re-evaluated the Bunch tool to detect isolated modules which yielded better remodularization but with less time convergence. Some of the other existing works [2–5,12,14,15,18,30] on software remodularization are based on GAs and Simulated Annealing. In this current work, a novel approach has been proposed to solve remodularization and the results obtained are proven to be more efficient with the evaluated software systems, when compared with the performance of existing research works in Bunch [34] and I-GAs [17] with respect to convergence time and quality factor. Rest of the paper is organized as follows: in Section 2, related work is presented and in Section 3, problem formulation is explained. The proposed methodology which consists of architecture and algorithm are described in Section 4, experimental results and performance evaluations are analysed in Section 5 and finally the conclusion and future work are in Section 6.
ilar techniques have been also used by Mitchell et al. [5,19], a tool to support automatic system decomposition with the elimination of isolated modules. However, subsystem decomposition was not considered by this approach. Harman et al. [2] used GA to carry out subsystem disintegration of a software system. The fitness function is described by grouping the quality metrics, for instance, coupling, cohesion, and complexity. Likewise, Seng [4] et al. considered the re-modularization assignments as a goal with the help of GA. For the remodularization of software systems, the objective is to build an approach beginning with the disintegration of existing subsystem, figuring out the disintegration with improved metric data along with lesser breach in the standards of the design schema. But in this methodology the dependency between packages was not considered. Hence Abdeen et al. [3] proposed a heuristic based methodology by means of simulated annealing to optimize robotically the dependencies between packages of a software system. This technique works by shifting the classes among the original packages. Abdeen et al. proposed complementary set of coupling and cohesion metrics that assess package organization in large legacy object-oriented software. Many immensely colossal object-oriented software systems comprising of thousands of classes that are formed into a number of packages. Although there subsist a lot of works aiming for the quality of a single class, however, there are only a handful of works treating with the quality of package organization and relationship and they also did not consider many objectives for optimization. Jaimes [6] et al. proposed a non-dominated sorting GA method for many objective optimization problems. In this approach, an evolutionary algorithm based on non-dominated sorting approach is used. Initially practical problems involving two and three objectives were only used, later evolutionary multi-objective algorithms for handling many-objective (having four or more objectives) optimization problems were proposed by Ketal [7] where GA is applied to solve the multi objective problem. Ramirez et al. [8] presented a fresh GA based methodology for finding the software structural framework from the high-up evaluation which was needed while considering multi objective function during solution searching. Nevertheless, such methods consider the accessibility of an entire collection of constraints afore the remodularizing commences. Normally this is challenging to be accomplished, especially for extensive systems. Bavota et al. [17] proposed different variants of I-GAs (IC-IGA, R-IGA, MGA, IC-IMGA and R-IMGA), a remodularizing concept that permits programmers to assess systematically generated re-modularizations. To evaluate its performance, the authors used two industrial projects, SMOS and GESA, where SMOS is a secondary school programming that offers a lot of highlights intended to rearrange correspondence between the school and the guardians of the understudy, and GESA mechanizes the most significant exercises in the administration of college courses, for example, timetable creation and study hall allotment. In this approach, the intra-module dependency, modularisation quality is automatically evaluated and include penalties for the files which should not be in the particular module by using the feedback mechanism. However, it yielded several isolated modules and the computational time is very high since it involves a lot of variables. Hence this issue was overcome in the work offered by Price et al. [11]. It is a forthright algorithm based on population. Both Differential Evolution (DE) and Artificial Bee Colony (ABC) algorithms are being used in various software engineering problems [9,10], which in turn has less computational time. To handle the complex software systems, remodularization is inevitable to ease the maintenance phase. Earlier works on remodularization have used several techniques and approaches to handle remodularization [1,2,7,8,16,19,24,25]. However, such approaches assume the availability of a whole set of constraints before the remodularization starts which is so challenging to achieve in extensive systems. While automatic re-modularization approaches proved to be very efficacious to strengthen cohesiveness and lessen coupling of software modules, they do not take into account developers’ expertise while determining to combine together (or not) various elements. Bright et al. [35] used GA with feedback based approach to perform remodulariza-
2. Related works Nowadays, large software systems have become complex and it is difficult to preserve the original structure due to the maintenance phase of the software development lifecycle. As a result of the changes there may be a negative impact on the quality of the system designed. To handle this issue, the widely used technique is software remodularization, which divides the software products into packages containing classes or files. It is very difficult to re-modularize the system as it evolves continuously. Several studies were carried out to address the problem of software re-modularization. In the last two decades, several research works have been carried out to support automatic maintenance of the software. Traditional optimization techniques, such as linear or dynamic programming are impractical for large scale software systems because of their large computational complexity. Hence researchers use meta heuristic search techniques to find the local and global solutions. To deal with the challenge of software modularization, Mancoridis et al. [1,13] applied search dependent procedure. This approach is based on the HC and GA optimization techniques which help to maximize cohesion and minimize coupling. Sim108
B.G. Varghese R, K. Raimond and J. Lovesum
Information and Software Technology 114 (2019) 107–120
tion in an effective way. But the limitation is that GA produces different optimal solutions for each run for the same input dataset, since GA belongs to a non-deterministic class of algorithms. The results are rather sub-optimal and it depends on problem specific operators which in turn increases the time complexity to arrive at a solution.
4.1. An overview of ACO The ACO is a probabilistic technique used for solving computational problems which in turn can be reduced to find the paths within a graph [31]. The artificial ants represent multi-agent methods that are inspired by the behaviour of real ants. The ants communicate using an indirect form of communication generated by a pheromone they deposit on the edges of the TSP graph while building the solutions. The combinations of Artificial Ants and local search algorithms are used in various optimization tasks which involve some sort of graph, e.g., vehicle routing and internet routing. The ants build the solutions as follows: each ant starts from a source point (vertex of the construction graph) chosen at random. Then, it moves along the edges of the graph at each construction step. Each ant keeps a memory of the way it takes, and in the following stage it picks among the edges that has a place with the vertices it has just visited. An ant finishes building a solution once it has visited all the vertices of the graph. The probabilistic rule is defined by the pheromone values and the heuristic information. The higher the pheromone and the heuristic value is associated to an edge, the higher the probability for an ant to choose a particular edge. Once all ants have completed their tour, the pheromones on the edges are updated. Each of the pheromone values is initially decreased by a certain percentage. Then, each edge receives an amount of additional pheromone proportional to the quality of the solutions to which it belongs. Several research works have proved that ACO can be applied in the problem of finding the optimized solution [31,32]. The accomplishment of the methodology in tackling the Traveling Salesman Problem has invigorated its exchange to an extensive number of other combinatorial optimization problems [33].
3. Problem formulation From the literature survey that has been carried out, it is clear that the earlier works have not considered the time convergence factor for the re-modularization to take place during the maintenance phase of the software. When a large software is being evolved incorporating the changes, it consumes a heavy convergence time during reverse engineering. The structure of software system can be represented in terms of MDG. Since ant colony approach [20, 31–33] has been utilized for solving a few graph problems, ACO can be a suitable approach for remodularizing process. Therefore, to achieve good remodularization with high cohesion and low coupling which are the basic principles of modularization, an extended ACO for automatic remodularization of software systems has been proposed and implemented to achieve good remodularized structure with less time complexity. Besides, to carry out the remodularization, it is required to understand the dependencies that exist among different modules. Also, the intensity of coupling among files of different modules and cohesion among the files within each module of the entire system are to be determined. Static code analysis tools can be used to determine the dependency matrix of the modules and the files can be grouped based on adjacency and connectivity factor. For smaller software systems, this can be achieved manually or using conventional algorithms. However, it is very difficult to handle large and complex system in a similar fashion. An optimal approach is required to solve this problem with less time and computational overhead. So, based on the above said facts, an approach has been designed in this work using extended ACO and a novel remodularizing approach. Fig. 1 depicts the procedure used for remodularizing the software system in the proposed approach. The objective function utilised is to maximise Turbo MQ, proposed by Mitchell and employed in BUNCH [25]. Low coupling and high cohesion are considered as characteristics of well-designed software systems. If 𝜇 i is the number of intra-edges in ith module and ∈ i, j is the number of inter-edges between modules, i and j, then the Turbo MQ can be calculated as 𝑇 𝑢𝑟𝑏𝑜 𝑀𝑄 =
𝑘 ∑ 𝑖=1
𝑀 𝐹𝑖
where MFi is the ith module factor. { 0, 𝜇𝑖 = 0 2 𝜇𝑖 𝑀 𝐹𝑖 = , otherwise ∑𝑘
4.2. Optimized traversal of software system using ACO This section describes the optimized traversal of software system which needs to be remodularized. In this approach, an ant is used as a simple plain computational agent in the ACO algorithm. An ant starts from a random location, say ith file to find all the files in the order to be processed for remodularization. To move from one file, i to another file, j, all the adjacent files (file dependency) are found and the cost is calculated to visit every dependent file. Then, the probability to move from file, i to the next file, j to be visited is calculated as shown below: [ ]𝛼 [ ]𝛽 𝜏𝑖𝑗 𝜂𝑖𝑗 𝜌𝑖𝑗 = ∑ [ ]𝛼 [ ]𝛽 𝜏𝑖𝑗 𝜂𝑖𝑗
(3)
where 𝜏 ij is the pheromone between i and j 𝜂 ij is the cost of connection between i and j 𝛼 is a factor to control the influence of 𝜏 ij 𝛽 is a factor to control the influence of 𝜂 ij
(1)
By using the above formula, all the files will be visited by all the ants which finally returns a list of files in the order to be visited.
(2)
2𝜇𝑖 + 𝑗=1,𝑖#𝑗 (∈𝑖,𝑗 +∈𝑗,𝑖 )
The Turbo MQ is intended to extract the acceptable design through maximizing it, i.e., minimising the coupling among the modules and increasing the cohesion with in module. The larger the Turbo MQ, the nearer the partition achieved is to a well-structured code.
4.3. Remodularizing files using extended ACO The outcome of Section 4.2 is to arrange all the files of the software system using ACO. This section explains how the files can be grouped together to form various modules from the outcome of an ant. Fig. 3. portrays the detailed flowchart to form the remodularized structure of software system. It starts with creating a new module, say Module-0. Take the first file from the ant, keep it as current file and insert it into Module-0. If more files are available in ant, then access the next file. If current file has either forward or backward dependency with next file, then insert next file into the same module. Now, mark the next file as current file and take the next file if more files are available in ant. Now, if current file has dependency with next file, then insert current file and next file into the current module. Mark the last accessed file as current file and
4. Proposed approach The proposed approach has two phases as shown in Fig. 2. The first phase constitutes optimized traversal of software system using ACO to find the sequence of files to be visited. During the second phase, files are remodularized using extended ACO approach to gain the remodularized structure. An overview of ACO and its problem relevance are shown in Section 4.1. The first phase, second phase, detailed steps and evaluation metrics of the proposed approach are described in Sections 4.2, 4.3, 4.4, and 4.5, respectively. 109
B.G. Varghese R, K. Raimond and J. Lovesum
Information and Software Technology 114 (2019) 107–120
Fig. 1. Illustration of proposed approach in software remodularization.
retrieve the next file. Continue the process of dependency of current file and next file. If there is no dependency between next and current files, then check if more files are available or not, for processing. If available, then take the next file as current file and go for the next. If there is dependency between current file and next file, then create a new module, mark it as current module and add both current file and next file into current module. If there is no dependency between next file and current file, add next file into a collection which is called ‘leftOverList’. Continue the availability of file and dependency till no more file is available to be processed. Then, insert the files of ‘leftOverList’ in the modules which has maximum dependency. Calculate the fitness quality, Turbo MQ. Iterate the whole process for the successive ants. Update the pheromone based on the best performance of ant.
4.4. Detailed steps of proposed approach The steps of the proposed approach are given below. The steps 4 to 10 gives the order in which files are to be visited using ACO and steps 11 to 15 shows the remodularizing procedure using the extended ACO. Step 1. Generate MDG by using static source code analysis tool. Step 2. Generate dependency matrix by parsing the MDG. Step 3. Initialize pheromones, say 0.01 in every node. Step 4. Start the ants from random location (or file). Step 5. Follow the steps to calculate the cost of travel from one file, i to another file, j. a. Find all the adjacent files (direct file dependency) of file, i.
110
B.G. Varghese R, K. Raimond and J. Lovesum
Information and Software Technology 114 (2019) 107–120
Fig. 2. Proposed architecture for software remodularization.
Step 7. Find cumulative probabilities.
b. Find the cost to visit every adjacent file as follows. 𝐶𝑜𝑠𝑡𝑖𝐽
𝐷𝑊𝑖𝑗 ∗ 𝐷𝑀 = ∑𝑀 𝑗=1 𝐷𝑊𝑖𝑗
(4)
where CostiJ is the cost to visit from file i to file J
0.19
0.61
0.8
0.99
1
[0]
[1]
[2]
[3]
[4]
[5]
Step 8. Generate a random value between 0.0 and 1.0. Step 9. Find the nearest lower range and upper range to the generated random value from cumulative probabilities. Step 10. Locate the index of lower range which gives the next file to be visited. Step 11. Repeat from steps 5 to 9, till the ant covers all the files to be visited. Step 12. Read ordered files from the first ant. Step 13. Take the first file from ant and keep it as current file. Step 14. Create a new module and mark it as current module and add current file into current module. Step 15. If more files are available in ant, take the next file from ant and keep it as next. Step 16. If no more file is available in ant, then continue from step 19. Step 17. If current file has dependency with next file, then
DM is the dependency mode (1 for forward direction and 0.5 for backward direction) DWij is the dependency weightage from file i to file j = 1…J…M In Fig. 4, the adjacent files of F1 are F2, F3, F4 F5 and F6. The connection line values show the number of times a file communicates with other. The weightage for forward dependency is 1 and backward dependency is 0.5. The individual cost to visit from F1 to all adjacent files is as follows. Step 6. Find the probability to move from file ‘i’ to file ‘j’. [ ]𝛼 [ ]𝛽 𝜏𝑖𝑗 𝜂𝑖𝑗 (5) 𝜌𝑖𝑗 = ∑ [ ]𝛼 [ ]𝛽 𝜏𝑖𝑗 𝜂𝑖𝑗 where
a. Add next file to current module. b. If more files are available in ant, mark the last visited file as current file and access next file from ant. c. If no more file is available in ant, then go to step 18(b)(ii). d. Continue from step 17.
𝜏 ij :pheromone between ith file and jth file 𝜂 ij :cost of connection between ith file and jth file 𝛼:a factor to control the influence of 𝜏 ij 𝛽 :a factor to control the influence of 𝜂 ij The sample results are shown below.
F1
0
Step 18. If current file does not have dependency with next file, then
F2
F3
F4
F5
F6
𝜌12 = 0.19
𝜌13 = 0.42
𝜌14 = 0.19
𝜌15 = 0.19
𝜌16 = 0.01
a. If more files are available to be processed, then i. Mark the recently accessed file as current file and access next file from ant. 111
B.G. Varghese R, K. Raimond and J. Lovesum
Information and Software Technology 114 (2019) 107–120
Fig. 3. Flowchart for remodularizing files using extended ACO.
Step 20. If more ants are available, then read ordered files from the next ant. Continue from step 13. Step 21. If no more ants are available, then update the pheromone based on the highest Turbo MQ as shown in Eq. (6) and repeat the process from step 4 till no more variation in Turbo MQ values. { (1 − 𝜌)𝜏𝑖𝑗 + 𝜌 ⋅ Δ 𝜏𝑖𝑗 , 𝑖𝑓 (𝑖, 𝑗) belongs to the best solution 𝜏𝑖𝑗 (6) 𝜏𝑖𝑗 , otherwise Δ𝜏 𝑘 =
Fig. 4. Cost of travel from one file to another.
𝑖𝑗
{ Q∕L𝑘 , if ant k uses curve xy in its tour 0, otherwise
(7)
where 𝜌 is the pheromone decay parameter Q is a constant Lk is the cost of the nth ant’s tour Lk =Q/max(Turbo MQ)where Q is the pheromone increase factor.
ii. If current file has dependency with next file, then create a new module and mark it as current module. Insert current file and next file into current module. Continue from step 17(b). iii. If current file does not have dependency with next file, then insert next file into a separate container, named ‘leftOverList’. iv. Continue from step 18(a). b. If no more file is available to be processed, then i. Insert recently accessed file into ‘leftOverList’. ii. Check the dependency of every file in ‘leftOverList’ with the newly generated modules and insert ‘leftOverList’ file into the module which has maximum dependency.
4.5. Evaluation of remodularized software system using fitness function To assess the performance of the approach, several quality metrics are existing in various literatures. One such approach is Basic MQ proposed by Mancoridis et al. [1]. ⎧ ∑𝑘 𝐴 ⎪ 𝑖=1 𝑖 − Basic 𝑀𝑄 = ⎨ 𝑘 ⎪ 𝐴1 ⎩
Step 19. Calculate Turbo MQ value for the remodularizing outcome of current ant according to the Eqs. (3) and (4). 112
∑𝑘
𝑖,𝑗=1 𝐸𝑖,𝑗 𝑘(𝑘−1) 2
,𝑘 > 1
𝑘=1
(8)
B.G. Varghese R, K. Raimond and J. Lovesum
Information and Software Technology 114 (2019) 107–120
Table 1 Benchmark data set used for software remodularization. S.No
Software systems
No. of files
No. of connections
1 2 3 4 5 6 7
Compiler Cia acqCIGNA grappa boxer SMOS GESA
13 38 114 86 18 111 257
32 636 188 295 29 737 2599
5.3. Remodularizing of files using extended ACO During the first phase, ‘n’ number of ants created are based on number of files of the software system to find the order of files to be visited. It gives ‘n’ set of ordered files. The order of files visited by one of the ants is shown in Fig. 5 for the boxer software system. One of the ants start from a random file, say ‘stackOfAnyWithTop’. Using the Eq. (5), the ant moves to next file and continues the process till all the files are visited. The result in Fig. 5 is processed in second phase to group the files based on dependency matrix and sample result is shown Fig. 6. It gives three modules and a ‘leftOverList’, which is shown in red coloured rectangle box. During the second phase, the extended ACO approach groups the files shown in ‘leftOverList’ with the remaining modules, module-0, module-1, module-2, module-3 and module-4 based on its strong association. Since the files in the ‘leftOverList’ have dependency only with module-1, they are inserted into module-1. The sample result is shown in Fig. 7. The performance of the remodularized structure is evaluated by using the fitness function, Turbo MQ which is the summation of all MFs.
where Ai is intra-connectivity of ith module Ei,j is inter-connectivity between two different modules (i and j) k is the number of modules. The Basic MQ measurement exhibits the exchange between interconnectivity and intra-connectivity by the creation of modules with higher cohesion, while disciplining the coupling among the modules. The Basic MQ measurement is confined between −1 (zero cohesion within the module) and 1 (no coupling amongst the modules). Although, this fitness component has two substantial downsides [1]. Primarily, the performance of this measurement is low since computational complexity is O(V3 ) which restricts its application to smaller software systems (i.e., software systems below 75 modules). The second issue with the Basic MQ measurement is that it cannot maintain MDGs that have edge weights [19]. To surmount the above two drawbacks of Basic MQ, Turbo MQ [16,19] measurement was proposed. In precise manner, Turbo MQ aids MDGs that have edge weights, and are so quicker than Basic MQ. Since its computational complexity is O(V), it can be used for smaller as well as larger software systems. The Turbo MQ measurement for a software system with k modules is evaluated by aggregating the MF for individual module in the software system which is mentioned in Eq. (1). MF ranges from zero to one, where zero means no cohesion with maximum coupling and one means high cohesion with zero coupling. Higher the Turbo MQ better the remodularization structure.
MF for module-0, MF0 = 0.96 MF for module-1, MF1 = 0.667 MF for module-2, MF2 = 0.8 Turbo MQ = 2.43 Fig. 7 shows module-0 has twenty-four intra-edges which is highly cohesive in connectivity and has very less coupling with remaining modules. Module-1 has one inter-edge and one inter-connection with module-0. Module-2 has better cohesion than module-1 and one interedge with module-0. The final restructured system yields minimum coupling among the modules and better cohesion within modules. Now the pheromone will be updated based on the best performing ant. This process is iterated until there is no variation in Turbo MQ value. 5.4. Comparison with bunch tool Bunch is a tool to perform automatic clustering of module using MDGs. It supports two types of clustering. One is automatic clustering, the other is user-driven clustering. Automatically clustering results that are often sub optimal will be performed automatically. Users can form a good clustering with their knowledge in the user directed clustering process. The two variations in Bunch are HC and GA. The proposed approach is evaluated using five benchmark datasets against the existing Bunch tool. In Table 4, the number of Turbo MQ evaluations and Turbo MQ results in proposed approach and Bunch are compared. The experiments were carried out for various software systems as shown in Table 3 and compared with the existing tool, Bunch based on the performance metrics like number of modules, number of Turbo MQ evaluations and Turbo MQ values. The number of Turbo MQ evaluations is calculated based on number of files in the software system and number of iterations to attain convergence. Fig. 8 depicts the improved results of the proposed approach when compared to Bunch – HC and Bunch – GA. It is found that the new approach does remodularization with minimum number of Turbo MQ evaluations and better Turbo MQ value. The approach outperforms Bunch in terms of source level dependencies of software systems, such as variable access and procedural invocation [30]. The approach performed equally with Bunch for a smaller software system, Compiler which has only 13 files and 32 connections. However, the approach outperformed the existing system, Bunch for larger software systems with more number of files and connections. From every iteration, the best Turbo MQ value is taken and the process is continued till the convergence of Turbo MQ value. The number of Turbo MQ evaluations in the proposed approach is smaller than the results obtained by Bunch tool as shown in Table 4. From this analysis, the time convergence of proposed approach is less than Bunch.
5. Experiments and result analysis The proposed approach is implemented by using core Java. This section details the experimentation executed to estimate the performance of the proposed approach. 5.1. Benchmark datasets The projected method has been applied and related with existing tools, such as Bunch and I-GA. The same benchmark datasets of Bunch [5] and I-GAs [21] tools are used for the evaluation of approach and the details of datasets are shown in Table 1. 5.2. Generation of MDG and matrix The goal of software modularization is to partition a graph of the source-level entities and relations into a collection of modules. In the preparatory period of the remodularization procedure, generate MDG by utilising the static source code analysis tools to parse the code and assemble an archive of data about the entities and relations in the software system. The MDG is where the source code segments are demonstrated as nodes and the dependencies of source code are displayed as edges. Generate module dependency matrix by executing this data repository with a series of scripts. Feed the dependency matrix as input to the ACO. A sample dependency matrix for the boxer software system is shown in Table 2. The non-zero value in Table 2 indicates that particular file has association with another file and zero indicates no association. 113
B.G. Varghese R, K. Raimond and J. Lovesum
Information and Software Technology 114 (2019) 107–120
Table 2 Dependency Matrix for boxer software system.
main (0) boxParserClass (1) boxClass (2) edgeClass (3) Fonts (4) Globals (5) Event (6) hashedBoxes (7) GenerateProlog (8) colorTable (9) Error (10) boxScannerClass (11) stackOfAnyWithTop (12) MathLib (13) hashGlobals (14) stackOfAny (15) nodeOfAny (16) Lexer (17)
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0
1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
Fig. 5. Order of visited files.
Fig. 6. Result of initial remodularizing process.
114
B.G. Varghese R, K. Raimond and J. Lovesum
Information and Software Technology 114 (2019) 107–120
Fig. 7. Final remodularized files. Table 3 Comparison of proposed approach with BUNCH TOOL. Software System
No. of modules Bunch - HC
Bunch - GA
Proposed approach
No. of isolated modules Bunch - HC
Bunch - GA
Proposed approach
Turbo MQ values Bunch - HC
Bunch - GA
Proposed approach
Compiler Cia acqCIGNA grappa boxer
4 4 12 5 2
5 5 22 15 2
4 7 19 24 3
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
1.5 2.04 6.66 6.6 1.44
1.43 1.79 2.25 1.65 1.52
1.5 4.2 12.84 10.88 2.43
Fig. 8. Comparison of the proposed approach with Bunch – HC and Bunch – GA.
The graphical representation which portrays the time convergence analysis of proposed approach and Bunch is shown in Fig. 9. The number of modules generated by the proposed approach is compared with Bunch tool as shown in Fig. 10 for various software systems. It is observed that proposed approach yields more modules than Bunch to reduce the coupling among the modules. The proposed approach tries to move highly cohesive files into a module which may increase the
number of modules. This has an advantage when we consider software maintenance and evolution. 5.5. Comparison with I-GAs tool The proposed approach is also evaluated using two benchmark datasets against the existing I-GAs tool. The authors of I-GAs evaluated its performance using two standard industry oriented projects, SMOS 115
B.G. Varghese R, K. Raimond and J. Lovesum
Information and Software Technology 114 (2019) 107–120
Fig. 9. No. of Turbo MQ evaluations for various software systems.
Fig. 10. No. of modules generated by Bunch and proposed approach.
Table 4 Time convergence comparison of proposed approach with Bunch. Software system
No. of Turbo MQ evaluations Bunch - HC
Bunch - GA
Proposed approach
Compiler Cia acqCIGNA grappa boxer
3.16E+02 6.10E+03 1.17E+05 6.55E+04 1.07E+03
6.77E+05 8.67E+06 1.30E+08 7.40E+07 1.95E+06
1.43E+02 5.66E+03 2.85E+03 3.27E+03 6.30E+02
and GESA. The proposed approach and I-GAs are compared based on the number of modules, isolated modules formed and Turbo MQ values. The results for the SMOS and GESA software systems are shown in Tables 5. Isolated module with zero cohesion yields MF value as zero. Since I-GAs generates so many isolated clusters, it affects Turbo MQ negatively. Based on the results, the proposed approach yields small number of modules, zero isolated modules and better Turbo MQ value. Fig. 10 depicts the graphical representation of Turbo MQ values for IGAs and the proposed approach.
Fig. 11. Comparison of proposed approach with I-GAs.
Fig. 11 depicts the graphical representation of Turbo MQ values for I-GAs and proposed approach. The number of modules generated by the proposed approach is compared with I-GAs as shown in Fig. 12 for two software systems. The pro116
B.G. Varghese R, K. Raimond and J. Lovesum
Information and Software Technology 114 (2019) 107–120
Table 5 Comparison of proposed approach with I-GAs for GESA and SMOS software systems. Algorithms
GA IC-IGA R-IGA MGA IC-IMGA R-IMGA Proposed approach
No. of modules
No. of isolated modules
Turbo MQ values
GESA
SMOS
GESA
SMOS
GESA
SMOS
149 79 113 155 90 155 28
54 35 42 63 37 63 13
72 22 57 83 25 91 0
22 9 15 32 10 37 0
3.94 4.86 4.64 2.04 4.19 2.14 6.13
3.4 3.67 3.78 1.91 2.66 1.97 3.83
Fig. 12. No. of modules generated by IGAs and proposed approach.
Fig. 13. Structure of boxer software system before remodularization.
Table 6 Analysis of MF and Turbo MQ.
posed approach yields better remodularization with smaller number of modules for the GESA and SMOS software systems.
5.6. Analysis of remodularization structure The structure of a sample boxer software system is shown in Fig. 13. The remodularized structure of boxer system obtained by Bunch – HC, Bunch – GA and Proposed Approach are shown in Figs. 14–16, respectively. Fig. 16 shows the remodularized structure of boxer software system with better Turbo MQ value when it is compared with Bunch - HC and
Approaches
MF Module - 0
Module - 1
Module - 2
Turbo MQ
Bunch - HC Bunch – GA Proposed approach
0.75 0.85 0.96
0.69 0.67 0.67
Nil Nil 0.80
1.44 1.52 2.43
GA approaches. Analysis of MF and Turbo MQ of proposed approach with Bunch-HC and Bunch-GA is shown in Table 6. 117
B.G. Varghese R, K. Raimond and J. Lovesum
Information and Software Technology 114 (2019) 107–120
Fig. 14. Remodularized structure of boxer software system using Bunch – HC Algorithm.
Fig. 15. Remodularized structure of boxer software system using Bunch – GA.
Bunch – HC: It yields two modules, namely module-0 and module1. The number of intra-edges in module-0 and module-1 are twelve and eight, respectively. There are eight inter-edges between these two modules. The MFs for module-0 and module-1 are 0.75 and 0.69, respectively. Bunch – HC gives Turbo MQ for boxer as 1.44. The number of Turbo MQ evaluations is 1.07E+03. Bunch – GA: It generated two modules, namely module-0 and module-1. The number of intra-edges in module-0 and module-1 are seventeen and six, respectively. Bunch – GA provides small number of coupling between these two modules comparing with Bunch – HC. The MFs for module-0 and module-1 are 0.85 and 0.67, respectively. Bunch – GA gives Turbo MQ for boxer as 1.52. The number of Turbo MQ eval-
uations is 1.95E+02. Its time convergence is low compared with Bunch – HC. Proposed Approach: It yields three modules namely, module-0, module-1 and module-2. The cohesion factor in our approach is better than Bunch – HC and Bunch – GA since it tries to manoeuvre extremely cohesive files into a module. The MF for modules, module-0, module1 and module-2 are 0.96, 0.667, and 0.80, respectively which yields a better Turbo MQ value, 2.43, higher than Bunch – HC and Bunch – GA. It is observed in our approach that even though the number of modules are higher, coupling among the modules is lower than Bunch – HC and Bunch – GA. The number of Turbo MQ evaluations is 6.30E+02 which provides high time convergence compared with Bunch – HC and Bunch – GA. 118
B.G. Varghese R, K. Raimond and J. Lovesum
Information and Software Technology 114 (2019) 107–120
Fig. 16. Remodularized structure of boxer software system using proposed algorithm.
6. Conclusion and future work
[7] K. Praditwong, M. Harman, X. Yao, Software module clustering as a multi-objective search problem, IEEE Trans. Softw. Eng. 37 (2) (2011) 264–282. [8] A. Ramírez, J.R. Romero, S. Ventura, An approach for the evolutionary discovery of software architectures, Inf. Sci. 30 (1) (2015) 234–255. [9] The artificial bee colony algorithm by offering a new implementation, Inf. Sci. 291 (2015) 115–127. [10] S.S. Dahiya, J.K. Chhabra, S. Kumar, Application of artificial bee colony algorithm to software testing, in: 21st Australian Software Engineering Conference, Auckland, 2010, pp. 149–154. [11] X. Wang, L. Tang, An adaptive multi-population differential evolution algorithm for continuous multi-objective optimization, Inf. Sci. 348 (20) (2016) 124–141. [12] W. Mkaouer, M. Kessentini, A. Shaout, P. Koligheu, S. Bechikh, K. Deb, A. Ouni, Many-objective software remodularization using NSGA-III, ACM Trans. Softw. Eng. Methodol. 24 (3) (2015) 1–45. [13] S.M. Brian, S. Mancoridis, Using heuristic search techniques to extract design abstractions from source code, in: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’02), San Francisco, CA, USA, 2002, pp. 1375–1382. [14] O. Raiha, E. Makinen, T. Poranen, Using simulated annealing for producing software architectures, in: Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers (GECCO ’09), New York, NY, USA, ACM, 2009, pp. 2131–2136. [15] M. Barros, An analysis of the effects of composite objectives in multiobjective software module clustering, in: Proceedings of the Fourteenth International Conference on Genetic and Evolutionary GECCC-12, 2012, pp. 1205–1212. [16] Mahjoubeh Tajgardan, Habib Izadkhah, Shahriar Lotfi, Software systems clustering using estimation of distribution approach in journal of applied computer science methods, pp. 99–113 (2017) [17] Gabriele Bavota, Filomena Carnevale, Andrea De Lucia1, Massimiliano Di Penta2, Rocco Oliveto3, Putting the developer in-the-loop: an interactive GA for software Re-Modularization., in: Proceedings of the International Symposium on Search Based Software Engineering on Search Based Software Engineering, 2012, pp. 75–89. [18] H Takagi, Interactive evolutionary computation: fusion of the capacities of EC optimization and human evaluation, in: Proceedings of the IEEE, 89, 2001, pp. 1275–1296. [19] B.S. Mitchell, A Heuristic Search Approach to Solving the Software Clustering Problem Ph.D Thesis, Drexel University, Philadelphia, 2002. [20] Ehsan Sherkat, Maseud Rahgozar, Masoud Asadpour, Structural link prediction based on ant colony approach in social networks, in: Physica A: Statistical Mechanics and its Applications, 2015, pp. 80–94. pp. [21] T.A. Wiggerts, Using clustering algorithms in legacy systems remodularization, in: Proceedings of 4th Working Conference on Reverse Engineering, IEEE CS Press, Amsterdam, The Netherlands, 1997, p. 33. [22] N. Anquetil, T. Lethbridge, Experiments with clustering as a software remodularization method, in: Proceedings of 6th Working Conference on Reverse Engineering, IEEE CS Press, Atlanta, Georgia, USA, 1999, pp. 235–255. [23] O. Maqbool, H.A. Babri, Hierarchical clustering for software architecture recovery, IEEE TSE 33 (11) (2007) 759–780. [24] M. Shtern, V. Azerpos, Methods for selecting and improving software clustering algorithms, in: Proceedings of 17th IEEE International Conference on Program Comprehension, IEEE CS Press, Vancouver, Canada, 2009, pp. 248–252. [25] B.S. Mitchell, S. Mancoridis, On the evaluation of the bunch search-based software modularization algorithm.soft comput, Fus. Found. Meth. Appl 12 (1) (2008) 77–93.
Modularizing a software system avails to organize the development in a more efficacious manner, integrate modifications effortlessly, carry out testing and debugging efficaciously and efficiently, and to conduct maintenance work without negatively affecting the working of the software. It is essential to maintain high cohesion and low coupling which are the basic principles of modularization. Hence in this work the performance of an extended ACO for automatic remodularization of software systems has been proposed and demonstrated. The proposed approach is evaluated using various software systems and the results obtained are proven to be more efficient when compared with the existing approaches like Bunch-HC, Bunch-GA and I-GAs. It has been observed that the proposed approach yielded better Turbo MQ values with less time complexity when compared with the evaluated software systems. For the future work, semantic relationships and history of software maintenance will be used in calculation of connection strength that can lead to better quality of the optimization process while preserving the original structure up to maximum. Conflict of interest None. References [1] S. Mancoridis, B.S. Mitchell, C. Rorres, Y.F. Chen, E. R.Gansner, Using automatic clustering to produce high-level system organizations of source code, in: Proceedings of the International Workshop on Program Comprehension, 1998, pp. 45–55. [2] M. Harman, R.M. Hierons, M. Proctor, A new representation and crossover operator for search based optimization of software modularization, in: Proceedings of the Genetic and Evolutionary Computation Conference, Morgan Kaufmann Publishers Inc, 2002. [3] H. Abdeen, S. Ducasse, H.A. Sahraoui, I. Alloui, Automatic package coupling and cycle minimization, in: Proceedings of the 16th Working Conference on Reverse Engineering, IEEE, 2009, pp. 103–112. [4] O. Seng, M. Bauer, M. Biehl, G. Pache, Search based improvement of subsystem decompositions, in: Proceedings of the Genetic and Evol. Comp. Conference (GECCO), ACM Press, 2005, pp. 1045–1051. [5] B.S. Mitchell, S. Mancoridis, On the automatic modularization of software systems using the bunch tool, IEEE Trans. Soft. Engg. 32 (3) (2006) 193–208. [6] A.L. Jaimes, C.A. Coello Coello, J.E.U. Barrientos, Online objective reduction to deal with many objective problems, in: Proceedings of the 5th International Conference on Evolutionary Multi criterion Optimization, 2009, pp. 423–437. 119
B.G. Varghese R, K. Raimond and J. Lovesum
Information and Software Technology 114 (2019) 107–120
[26] H. Abdeen, H. Sahraoui, O. Shata, N. Anquetil, S. Ducasse, in: Towards Automatically Improving Package Structure while Respecting Original Design Decisions, WCRE, 2013, pp. 212–221. 2013. [27] G. Bavota, A.D. Lucia, A. Marcus, R. Oliveto, Software remodularization based on structural and semantic metrics, in: Proceedings of WCRE’, 2010, pp. 195–204. 2010. [28] M. Fowler, K. Beck, J. Brant, W. Opdyke, D. Roberts, Refactoring – Improving the Design of Existing Code, First ed., Addison-Wesley, 1999. [29] W.F. OpdykeRefactoring, A Program Restructuring Aid in Designing Object-Oriented Application Frameworks Ph.D. thesis, University of Illinois at Urbana-Champaign, 1992. [30] S. Mancoridis, B.S. Mitchell, Y. Chen, E.R. Gansner, "Bunch: a clustering tool for the recovery and maintenance of software system structures", in Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM’99).
[31] Marco Dorigo, Ant colony System: a cooperative learning approach to the traveling salesman problem, IEEE Trans. Evolut. Comput. 1 (April (1)) (1997). [32] Pan Junjie, Wang Dingwei, An ant colony optimization algorithm for multiple travelling salesman problem, First International Conference on Innovative Computing, Information and Control - Volume I (ICICIC’06), 2006. [33] W.J. Gutjahr, A Graph-based ant system and its convergence, Future Gener. Comput. Syst. 16 (June (8)) (2000) 873–888. [34] Available from: https://www.cs.drexel.edu/∼spiros/bunch/ (2019) [35] M. Rajalakshmi, R. Bright Gee Varghese, An interactive approach for software system re-modularization based on chronicle data, Int. J. Res. Comput. Appl. Robotics (2014) 65–69.
120