Information and Software Technology 118 (2020) 106203
Contents lists available at ScienceDirect
Information and Software Technology journal homepage: www.elsevier.com/locate/infsof
Toward recursion aware complexity metrics☆ Gordana Rakić a,∗, Melinda Tóth b, Zoran Budimac a a b
Department of Mathematics and Informatics, Faculty of Sciences, University of Novi Sad, Trg Dositeja Obradovica 4, Novi Sad 21000, Serbia Department of Programming Languages and Compilers, Faculty of Informatics, Eötvös Loránd University, Pázmány Péter stny. 1/C, Budapest H-1117, Hungary
a r t i c l e
i n f o
Keywords: Software maintainability Source code readability Source code comprehension Debugging Source code complexity Complexity metrics
a b s t r a c t Context: Software developers spend a significant amount of time on reading, comprehending, and debugging of source code. Numerous software metrics can give us awareness of incomprehensible functions or of flaws in their collaboration. Invocation chains, especially recursive ones, affect solution complexity, readability, and understandability. Even though decomposed and recursive solutions are characterized as short and clear in comparison with iterative ones, they hide the complexity of the observed problem and solution. As the collaboration between functions can strongly depend on context, difficulties are usually detected in debugging, testing or by static analysis, while metrics support is still very weak. Objective: We introduce a new complexity metric, called Overall Path Complexity (OPC), which is aware of (recursive) call chains in the observed source code. As invocations are basic collaboration mechanism and recursions are broadly accepted, the OPC metric is intended to be applicable independently on programming language and paradigm. Method: We propose four different versions of the OPC calculation algorithm and explore and discuss their suitability. We have validated proposed metrics based on a Framework specially designed for evaluation and validation of software complexity metrics and accordingly performed theoretical, empirical and practical validation. Practical validation was performed on toy examples and industrial cases (47012 LOCs, 2899 functions, and 758 recursive paths) written in Erlang. Result: Based on our analysis we selected the most suitable (of 4 proposed) OPC calculation formula, and showed that the new metric expresses advanced properties of the software in comparison with other available metrics that was confirmed by low correlation. Conclusion: We introduced the OPC metric calculated on the Overall Control Flow Graph as an extension of Cyclomatic Complexity by adding awareness of (recursive) invocations. The values of the new metric can lead us to find the problematic fragments of the code or of the execution paths.
1. Introduction ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission) work actively in the field of worldwide standardization. ISO/IEC 250001 is a series of standards under the general title “Systems and Software Quality Requirements and Evaluation” (SQuaRE). Software quality standardization mainly refers to the presence of software quality attributes and characteristics in a final product. Software quality attribute can be described as an inherent property of software product that can be quantitatively or qualitatively distinguished. Category of software quality attributes
that bear on software quality is defined as software quality characteristic. Thus, software quality attributes are grouped in characteristics and sub-characteristics of software quality. ISO 250102 defines the following eight software product quality characteristics: functional suitability, reliability, performance efficiency, usability, security, compatibility, maintainability, and portability. Each of these characteristics is composed of a set of related
2 ISO 25010, 2019, https://www.iso.org/obp/ui/#iso:std:iso-iec:25010:ed1:v1:en.
☆ This joint work was partially supported by the Ministry of Education, Science, and Technological development, Republic of Serbia, through project no. OI 174023, and by the Hungarian Government through the Thematic Excellence Programme, Industry and Digitization Subprogramme, NRDI Office, 2019. ∗ Corresponding author. E-mail addresses:
[email protected] (G. Rakić),
[email protected] (M. Tóth),
[email protected] (Z. Budimac). 1 ISO 25000, 2019, https://www.iso.org/obp/ui/#iso:std:iso-iec:25000:ed1:v1:en.
https://doi.org/10.1016/j.infsof.2019.106203 Received 20 March 2019; Received in revised form 12 October 2019; Accepted 16 October 2019 Available online 16 October 2019 0950-5849/© 2019 Elsevier B.V. All rights reserved.
G. Rakić, M. Tóth and Z. Budimac
sub-characteristics. Quality attributes or characteristics recognized by software quality standardization are related mainly to the fulfilment of users’ requirements. From the producer viewpoint, quality is usually related to maintainability. Maintainability of software product expresses easiness of making changes during the product life cycle. It strongly depends on readability and clarity of source code. These two characteristics are usually reflections of source code complexity, which can include code size, data organization, and data usage. The standardization describes maintainability as a degree of effectiveness and efficiency with which the product can be modified. It is reflected by modularity, reusability, analyzability, changeability, modification stability, testability, and compliance. Software metrics are used to express software quality. The relation between maintainability and maintenance costs, and complexity described above has been observed for decades [1,2]. According to conclusions, we can say that maintainability level, including easiness of coding, debugging, testing, and changing the product, is impacted by different complexity attributes [3,4]. Usually complexity metrics are used to derive maintainability ones [5]. There are numerous complexity metrics. Some of them are traditional ones like Cyclomatic Complexity (CC), while innovative ones should fulfil specific needs. Following this trend, some of the complexity metrics are language/paradigm specific, while the other ones are language/paradigm independent. Furthermore, some of the metrics are elementary ones, usually created from scratch and applicable directly to source code or to its representations. The other ones are derived from other existing metrics. For example, Weighted Method per Class is calculated based on a value of CC [6]. More insights are available in Sections 3 and 4. However, there are some gaps in the field of complexity metrics. One of them is the weak consideration of communication between functions (procedures, methods, etc.), function calls, and particularly recursive function calls. Usually we use recursions to logically simplify the problem solution and, consequently, the algorithm implementation that are completely independent of the implementation language or paradigm. Furthermore, in some programming languages/paradigms (for example functional ones) recursion is the main control mechanism for iterations. We measured the ratio of recursive functions in observed real-life projects (Section 7.3.2). Functional languages and functional programming concepts are often applied in the industry by large IT companies. The application areas include parallel, distributed, decentralized data-processing, financial and telecom software, etc. Functional languages are getting widespread in the open-source community as well. In the GitHub pull request statistic ranking list, that contains 50 mostly represented languages, Scala is the 12th and Elixir is the 17th, followed by several other functional languages [7]. For example, Haskell is on the 28th position on the list and according to the [8] statistics more than 150 companies are using it. On the Elixir-stat’s list [8] there are more than four hundred companies. Some examples are the Glasgow Haskell Compiler3 and some industrial applications written in Haskell4 , and Bet365.5 The chat server of Facebook6 or AXD telecom switch [9] by Ericsson are written in Erlang. However, the functional programming concepts are welcome in other programming languages as well. For example, the well-known lambda constructs are first-class citizens of the mainstream imperative, objectoriented languages, such as Java, C++, C#, etc. We can also mention the popular multi-paradigm language, Scala, that is a kind of functional programming language and built on the top of the Java Virtual Machine.
3
GHC: Glasgow Haskell Compiler, 2019, https://wiki.haskell.org/GHC. Haskell in industry, 2019, https://wiki.haskell.org/Haskell_in_industry. 5 Bet365, 2019, https://github.com/bet365. 6 Erlang at Facebook, 2019, https://www.erlang-factory.com/upload/ presentations/31/EugeneLetuchy-ErlangatFacebook.pdf. 4
Information and Software Technology 118 (2020) 106203
Problem decomposition and partial and recursive solutions improve the readability and understandability of source code. If a solution is built of short chains of function calls and only self-recursions, this will probably positively affect clearness of the solution. However, if we use chains of recursive calls, it may affect readability and understandability of the code, including debugging process and maintenance. Introducing a new, language- and paradigm-independent software metric aware of recursive complexity can make a step toward the filling the gap in the field of the code complexity measurement. This is especially true in functional programming where recursions play an important role. Since nowadays languages (e.g., Java) are becoming hybrid by supporting different programming paradigms, including the functional one, this contribution brings additional value. In this article, we discuss possible effects of recursive calls on code complexity, readability, understandability, and product maintainability. Here we assume that in maintenance, especially in corrective changes, deep understanding and comprehension of the solution and debugging process take substantial time and effort. Then we propose a new software metric aware of (recursive) function calls and explore its usability. In Section 2, we introduce basic terms to be used and describe some available software metrics that our new metrics rely on. We also illustrate the motivation to introduce a new metric. Then we provide a description of the current state in the field of complexity metrics (Section 3) and related work (Section). Finally, definitions, demonstration and validation with results of new metrics follow in Sections 5–7, respectively. The last section presents conclusions and describes planned future work. 2. Background Software product metrics, as a basic static analysis set of techniques, are calculated on some static internal representation of source code. Examples of internal representations are different kinds of graphs and trees. One of the mostly used graph-based internal representation of source code is the Control Flow Graph (CFG) [10]. Let G = (V, E) be a directed graph, where nodes V correspond to basic blocks and edges and E⊆V × V connect two nodes vi , vj ∈ V iff vj can be executed immediately after vi . This graph G is called Control Flow Graph (CFG). Each node vi ∈ V which has no incoming edges (∄𝑣 ∈ 𝑉 ∶ (𝑣, 𝑣𝑖 ) ∈ 𝐸) represents the start node. Paths in control flow are one of the main parameters in measuring the complexity of the observed software. If we observe the Control Flow Graph (CFG) representing separate functions, the number of paths can be calculated by the Cyclomatic Complexity (CC) algorithm. CC is the most popular complexity metric and defined as the maximum number of linearly independent circuits in the CFG. If we observe a block of the source code, which is usually a function or some part of it, cyclomatic complexity may be defined as the count of the number of linearly independent paths through that block of the code [11]. More precisely, let G = (V, E) be a CFG, a directed graph containing the basic blocks of the program. An edge between two basic blocks exists if the control may pass from the one to another block. The cyclomatic number of the graph or cyclomatic complexity (CC) [11] is then defined as CC = e − v + 2p where e is the number of edges of the graph, v is the number of nodes of the graph, and p is the number of connected components (exit nodes). This definition of CC is based on observation of structured code with one entry and one exit point at the program block. We may allow more exit points from the observed block and attach each exit point to the entry point by adding a new branch. Now we can use another definition of the complexity number of this CFG [11]: CC = e − v + p A more straightforward way to calculate Cyclomatic Complexity is to count basic predicates or to count nodes representing simple conditions
G. Rakić, M. Tóth and Z. Budimac
Information and Software Technology 118 (2020) 106203
Fig. 1. Sum example: version 1.
with two outgoing edges. In this case, we can define that Cyclomatic Complexity (CC) is equal to the number of predicates (NoP) increased by 1: CC = NoP + 1 To illustrate our motivation to introduce a new metric, let us observe a simple example program (Fig. 1) that makes a sum of all numbers between two given values (including them). If the given values are equal it returns their sum. Based on the Control Flow Graph representing a single function (Fig. 1), the number of paths can be calculated by the CC algorithm or some of its extensions. Cyclomatic Complexity (CC) is a proper metric to express the complexity of this code fragment because the whole complexity is placed in one function. This function contains three branches: an if branch with a simple condition, an else if branch with another simple condition and an else branch without condition. Two of three branches contain a while loop with a simple condition. Therefore, the number of basic predicates is 4, and the total value of CC for this function is 5. One CFG usually represents the control flow within a single procedure/function/method (later on: function) where nodes are basic blocks or statements while directed edges follow the possible execution flows. In this kind of graph representation of source code one can observe functions separately without information about dependencies and communi-
cation between them based on invocations. If we need this information, we will need to incorporate information about these dependencies in the CFG by creating Overall Control Flow Graph (OCFG). This graph is also called Interprocedural (or Inlined) Control Flow Graph (ICFG) [12,13]. Since we will use it independently of paradigms and languages, OCFG is a more appropriate representation to be used. Even though these are the same representations, their building has some specific differences, as a result of the covered paradigms. OCFG is built from the union of control flow graphs representing all functions. Additionally, for each function, all call nodes are connected to the entry nodes of the functions they invoke, while exit nodes are attached to return nodes corresponding to these calls. Therefore, OCFG is the union of two different static representations of the source code: the Control Flow Graph (CFG) and the Static Call Graph (SCG), where calls are moved to the lower level. They are attached on the function call statement/expression level and not just connecting the higher level functions as it is in the SCG. Implemented on the CFG, this metric can be easily applied to functions observed separately. If we need to follow the complexity of several interconnected blocks, we can inline the CFG of these functions at the place of each function calls and generate OCFG. By applying an algorithm for calculation of CC to OCFG, we design a new software metric which gives a broader view to code complexity. CC calculated in this way can be called Interprocedural (or Inlined) Cyclomatic Complexity
G. Rakić, M. Tóth and Z. Budimac
Information and Software Technology 118 (2020) 106203
Fig. 2. Sum example: version 2. Fig. 3. Sum example: version 3.
(ICC). It gives us the number of paths that are crossing the borders of the observed block following the invocations. Let us now consider what would happen in our example if we would extract the duplicated fragment in a function and invoke it through function calls in the original function (Fig. 2). If we observe these functions separately, we can measure the complexity of the implementation separately by CC. Still, we lose the information about the overall complexity of the full functionality because we divided logic into two functions. While CC still expresses the complexity of separate functions, during the inspection and debugging, one has to make one more step for switching from one function to another one. Here we can observe an OCFG instead of a CFG. If we calculate CC on this graph, we will have information about CC of each function (CC(sum) = 3 and CC(sum_maximum) = 2). The overall complexity expressed by ICC can be calculated from CC and for each function call we increase the value by the CC value of the called function. As in the function sum_maximum there are no function calls ICC(sum_maximum) = CC(sum_maximum) = 2. The ICC value of the sum function will be affected by this value as for each call of the sum_maximum the ICC will be increased by 2. Therefore, ICC(sum) = CC(sum) + 2∗ ICC(sum_maximum) = 3 + 2∗ 2 = 7. While CC represents a local complexity of the function, ICC value could be used as an expression of the effort needed to inspect or debug the full implemented functionality. Problems with these metrics begin when instead of simple function call we have a recursive call in the function, due to replacement of the iteration by recursion (Fig. 3). For this implementation of the sum function, the value of CC is still 3. If we would try to inline the CFG for each recursive function call,
in order to calculate ICC, we would get an infinite loop of inlining. In CFG of the sum function we will inline CFG of the sum function at the point of the function call, and so on. If we do not know anything about input data, which is the case in the static analysis, we do not know when these inlinings will be completed. However, we have to apply ICC on a finite OCFG, but at the same time to be aware of recursive calls when calculating overall complexity. For example, we can calculate ICC by inlining each called function only once for each call. The rationale behind this is the fact that during the debugging process we will, most probably, try to trace the execution by entering to each sub-chain at least once, which means a step into the called function and inspection it again which has the same complexity as a pass on the first level. This means that we have to inspect the sum function on the first level. Its complexity is equal to the CC(sum) which is 3, and that we will make a step into both function calls. Once we enter the called function, we have the same functionality to inspect. As there are two invocations, this brings twice more the same complexity (3). Thus we get ICC(sum) = 9. In conclusion, we can use the following formula to calculate the ICC value for a given function f: 𝐼𝐶𝐶(𝑓 ) = 𝐶𝐶(𝑓 ) +
∑ 𝑔∈𝑟𝑒𝑐𝐶𝑎𝑙𝑙𝐶ℎ𝑎𝑖𝑛
𝐶𝐶(𝑔) +
∑
𝐼𝐶𝐶(𝑔)
𝑔∉𝑟𝑒𝑐𝐶𝑎𝑙𝑙𝐶ℎ𝑎𝑖𝑛
This formula and the calculated value ICC(sum) = 9 still does not contain full complexity because the information about the complexity of recursive chains is still not included. However, an iteration expressed by recursion affects the complexity stronger than a non-recursive call and it is necessary to rate it additionally. Therefore, we should increase
G. Rakić, M. Tóth and Z. Budimac
Information and Software Technology 118 (2020) 106203
Fig. 4. Sum example: implementations in two languages represented by the same graph.
of them contains one branching on the function clause level and a single recursive call.
3. State of the art Listing 1. Fibonacci.
Listing 2. recursive implementation of the sum function.
Listing 3. Tail recursive implementation of the sum function.
the complexity by some value (at least 1, see the discussion in Section 5) for each recursive call. Thus, we get a complexity value of at least 11. Our assumption is that this complexity further grows with the increase of the number of invocations and, especially with the length of the (recursive) invocation chains. The motivation was illustrated by the examples written in a Java-like language. However, the problem observed is language- and paradigmindependent as the recursion as a concept is one of the basic concepts in problem-solving and programming. The Fig. 4 illustrates how the equivalent implementations are written in two different languages (an Objectoriented Java-like language and functional language Erlang) can be observed through the same representation and expressed and evaluated completely analogously [14]. In the last example we had two recursive invocations which affects complexity stronger than in the case of the only one recursive call. For example, function sum in the Listing 2 is not as complex as the fib function in the Fibonacci implementation (Listing 1). Furthermore, the difference in complexity is affected by two parameters: (1) function f contains only one recursive invocation, while the fib contains two of them, and (2) in the function f recursive calls are nested in one of two branches, while in the case of fib we have three branches. Therefore, we have to mention that we expect complexity to be affected differently in cases when the recursive invocations are combined with other control-flow structures. Finally, Different types of recursion have to be considered. In the Listings 3 and 2 we illustrated the tail recursive and the primitive recursive implementation of the sum function, respectively. We expect the complexity of these two implementations to be the same, because both
One of the first efforts in complexity measurement was made approximately 40 years ago. During the time a large set of metrics were introduced to evaluate the complexity. Authors of [15] split subject of measurement in three categories: program size, control structures, and data structure and data flow. They analyzed numerous available metrics. Furthermore, they gave a preview of hybrid metrics - metrics derived from several measures. Finally, the mentioned paper provides analysis of effects of discussed metrics to software maintenance. In parallel with the evolution of complexity metrics, researchers and practitioners were working on the evaluation of created techniques. Weyuker’s set of properties [16] is for decades widely used for evaluation of complexity metrics. It evolved during the time to support novel constructs, languages, and paradigms in term of observed properties (e.g., [17]). Furthermore, this set of properties in its original form or some of its adaptations, has been included in evaluation frameworks for evaluation of complexity metrics (e.g., [18]). Cyclomatic complexity (CC), which is widely used nowadays, was proposed by Thomas McCabe [11]. It belongs to a family of control flow complexity metrics. Originally, CC was invented to express the complexity of testing of a program or precisely its single function. As has been described in Section 2, it expresses the number of linearly independent paths through the function and calculates the value based on a number of control flows in the program. It is supposed to be applied to a function and can be calculated based on source code, syntax tree, or control flow graph of the function. However, software debugging, testing, and maintenance are affected by the complexity of communication between functions which should be considered as a factor affecting complexity. A property-based comparative analysis of control-flow complexity measures is provided in [19]. Author start from the Weyuker’s properties with the aim to formalize properties that a control-flow metric should possess. It is shown that CC satisfies most of the suggested properties. On the other hand, in [20] it was shown that CC metric can underestimate or overestimate the complexity of the program. It also introduced the Control Flow Pattern to override the main obstruction in control flow complexity measurement. Some extensions of complexity metrics were defined. Many of them were introduced in order to satisfy requirements on some specific programming paradigm or even one single programming language. With the development of object-oriented metrics, whole families of metrics were invented to express different characteristics of relations between classes and methods [21]. The role of these metrics in maintainability prediction and improvement is also important [22]. Weighted Method per Class was defined as a metric derived from Cyclomatic complexity [6]. It is calculated as a sum of CC values of all methods in the class. Furthermore, development in functional lan-
G. Rakić, M. Tóth and Z. Budimac
guages requires modifications of the CC calculation algorithm or at least a slightly different approach to its calculation [23]. Finally, authors writing about the measurement of programs written in functional programming languages consider recursions as important factors related to complexity and maintainability. Kiraly Roland [23] discusses only the number of branches of recursion. It gives the number that reflects how many times a function calls itself. The other direction in measurements tends to become independent on the programming language or at least on the programming paradigm. Sipos et al. [24] derives new multi-paradigm complexity metric. It is calculated based on the complexity of the control structure of the program, the complexity of data types, and the complexity of data access. Still, complexity and maintainability metrics and software metrics in general, contain some imperfections which result in their weak applicability in real-life projects. Therefore, there are always new metrics and new combinations of existing metrics with a tendency to fill the existing gaps. One such metric is MI (Maintainability Index), which was designed to express the maintainability cost of the software. It is based on values of one or more basic size and complexity metrics such as CC, Halstead [25], or Lines Of Code (LOC). Cognitive metrics are an emerging wave of metrics focused on the complexity of tasks in software maintenance. They are measured by observing characteristics of source code that might affect coding, testing, debugging, or modifying the source code. One of the first results in this direction is Cognitive Functional Size (CFS) [26]. The complexity is estimated based on the complexity of basic control structures (BCS), where a constant value is assigned to each BCS. A significant contribution to cognitive-based complexity measurement is made by Sanjay Misra with a group of collaborators. Their early results begin in the year 2006 by introducing a Cognitive Information Complexity Measure (CICM) [27,28], a Cognitive Weight Complexity Measure (CWCM) [29], a Modified Cognitive Complexity Measure (MCCM) [30], a Cognitive Program Complexity Measure (CPCM) [31], and a New Cognitive Complexity of Program (NCCoP) [32]. In 2007 the focus was moved to the object-oriented programs and their characteristics, with the same idea and from the same perspective. Following this direction, a Class Complexity (CC) [33] and a Weighted Class Complexity (WCC) [34] are introduced. It was later continued to the direction of measurement of object-oriented design [35] which is outside of scope of this paper because of their paradigm-specific nature. Following the direction of universal metrics, Misra and co-authors introduced a Unique Complexity Metric (UCM) [36] to measure the complexity of a function by taking into account all statement-level factors affecting control flow. It also includes function calls and recursions, a Cognitive Functional Sizes (CFS) [37] and a Multi-paradigm Complexity Metrics [38]. Misra et al. further introduced metric suites, a general one [39] and the OO focused one [40]. Both are based on available metrics to enable overall complexity measurement and observation, and to provide a metric and tool support for assessing cognitive complexity of Java solutions [41]. The OO focused suite contains the following metrics: Method Complexity (MC), Message complexity (or Coupling Weight for a Class (CWC)), Attribute Complexity (AC), Class Complexity (CLC) and Code Complexity (CC), Average Method Complexity (AMC), Average Class Complexity (ACC), Average Coupling Factor (ACF), And Average Attribute per Class (AAC). All these contributions were followed by significant work in the area of an evaluation of complexity metrics. On the first level, multiple complexity metrics have been evaluated based on Weyuker’s properties and wise versa [42,43]. On the second level, Weyuker’s properties have been examined [43,44], modified, and extended [17]. They are included in a complexity metrics evaluation framework [18]. De Silva and Kodagoda [45] observe McCabe’s CC, Halstead metrics, and CFS through Weyuker’s properties and conclude that CFS is the most effective and comprehensive. Finally, [46] provides an anal-
Information and Software Technology 118 (2020) 106203
ysis and an improvement of cognitive metrics in term of sensitivity to recursive methods. In this article, we concentrate on a gap in control flow metrics, such as CC metric, by including recursive function calls in the observation, as some cognitive measures do. However, the observation of recursion is based on control-flow paths which brings an important novelty that is analysed in the following sections. Finally, we are going to use the framework introduced by Misra and coauthors [18] to evaluate our work. 4. Related work Our assumption is that in control-flow measurements it is not enough to observe the complexity of a specific function, but we need to express the complexity of the overall solution of the problem. We need to consider the complexity of the functions solving parts of the decomposed problem, but also to calculate the complexity of their integration, collaboration, and communication in term of (recursive) function calls. In this article, we try to express the complexity of the overall problem solution and logic built into it. There were previous research efforts with a similar intention. As one of the earliest works in this field, Maurice Halstead [25] introduced one of the oldest product size and complexity metrics set. The Halstead Metrics reflect the size and complexity of the program calculated based on the number of operators and operands. Some of the metrics from the Halstead’s set that are of our interest are program volume (V), difficulty (D) and effort (E). Halstead program volume (V) expresses the amount of information to be absorbed in order to understand the program; program difficulty (D) expresses how difficult is to understand the program after necessary information is absorbed; while program effort (E) reflects how much effort should be invested in rewriting the program. Even though the measurement objectives are not common with ours, general objectives are comparable, while the major difference is in the calculation. Halstead’s calculation is based on the number of operators and operands, without consideration of the internal complexity of the logic. Chapin and Lau [47] derived a metric to express the effective number of the executed modules and to compare it to the actual number of distinct modules. Even the goals in these two research might seem very similar, nowadays software solutions are observed from different perspectives, through new quality characteristics, and by taking into account contemporary languages and technologies. However, a specific difference between the two approaches is that the Effective Size Metric expresses the number of executed modules independently of their complexity, while we calculate the complexity of possible passes through them. Gorla et al. [48] present investigation on how style and program complexity metrics (including CC) influences debugging time, finding a strong correlation between some characteristics of COBOL programs and debugging time. The authors also developed an estimator of the debug time (a kind of metrics) that is a quadratic function of 9 program characteristics, related mainly to its control structure. This metric was designed with a very similar general objective as ours, but in the period when it was created characteristics of program solutions, and their development and maintenance, differed from nowadays. All Cognitive metrics have many similarities in comparison to our approach. Cognitive Functional Size (CFS) [26] is estimated based on the complexity of basic control structures (BCS) including sequences, iterations, branches, function calls, recursions, parallel paths, and interrupts. A constant value is assigned to each BCS. However, the significant difference in comparison to our approach is that this metric does not make difference between different (recursive) function calls based on their nature such as the length of the chain. On the other hand, this metric is taking into an account parallel constructs that we do not observe and derive final values by including information about inputs and outputs in the calculation.
G. Rakić, M. Tóth and Z. Budimac
All innovations in this direction are based on a similar basic idea with specific modifications. For example, the CWCM [29] was calculated only based on the cognitive weight of BCSs, CICM [27,28] includes identifiers, operators, and lines of code in the calculation, while MCCM [30] adds also the total number of occurrences of operands and operators to the weights of the BCSs. Furthermore, UCM [36] measures the complexity of a function by taking into account all statement-level factors affecting control flow, including function calls and recursions. These factors of complexity add some (usually constant) values, multiplied by the complexity of the statement calculated based on the number of operators and operands participating in it, to the full complexity of the observed function. Still this metric operated on the level of a single function without taking into consideration the nature of the called functions and their complexity, which is the main difference in comparison to our OPC metric. Finally, Improved Cognitive Based Complexity Metric (ICBCM) [46] pays additional attention to the recursions as it multiplies weights of all factors in a recursive function by the weight assigned to the recursive call. However, none of these metrics takes into account the length of a (recursive) invocation chain while observing the function call statement. 5. Definition of new metrics As it has been described in the Background section (Section 2), the interesting points in our observation are relations between functions, i.e., links between the separate corresponding Control Flow Graphs. We can use different metrics to observe a different aspect of complexity in relations between functions. Still, this complexity is usually calculated on a higher-level of abstraction from the dependency viewpoint of the observed function. Such metric is the number of incoming or outgoing links for a single function. Another option is to observe this complexity through derived metrics such as different cognitive metrics (Sections 3 and 4). However, there are not widely used metrics giving global picture by observing paths through the program following connections on both levels. A metric like this could express the complexity of the debugging process observed from a certain point in the program. Let us observe more than one function interconnected by function calls. Each function call is one node in a single branch of OCFG. From this viewpoint there are two important directions to follow: • Control-flow complexity on local (CFG) and global (OCFG) level where the global control-flow complexity is equivalent to local complexity but extended. Therefore, any complexity algorithm can be used. Its importance is widely recognized. • The number and the length of open and closed (recursive) chains of calls following the path through the function calls. The length of a chain of calls is very important. If the program contains long chains of calls, readability and clarity of this code will be very low. This problem is widely recognized as a “bad smell” [49]. The problem becomes even more complicated if these long chains are closed. These closed chains represent recursions. In this case usage of recursion is losing its value, as it is usually used to replace a longer, more complex, and more complicated iterative code. Hence, we identified the length of each (recursive) call chain as an additional factor affecting complexity on the level of code inspection and debugging and thus maintainability of the source code. An additional question is how strong it affects real complexity. Therefore, we defined four versions of the new metric algorithm and studied their suitability in order to select the most appropriate one. The new metric algorithm is defined in four steps. 1. Weaknesses of existing complexity metrics are analyzed, and the factors that additionally affect complexity are considered and defined (Sections 2, 5 and 8). 2. Four different versions of the algorithm involving determined factors are defined (Section 5.2).
Information and Software Technology 118 (2020) 106203
3. Defined algorithms are compared and evaluated (Sections 6, 7.3.1 and 7.3.2). 4. The best single choice is selected based on the former observations (Sections 6, 7.3.1 and 7.3.2). For practical validation purposes, potential new metrics were integrated into the RefactorErl tool [50,51], an existing tool for software quality assurance and refactoring dedicated to software products written in Erlang. However, the algorithms, their implementations, and further considerations are applicable independently on programming language and paradigm. Erlang and RefactorErl were carefully selected for the metric evaluation purposes, because of the nature of the problem and its expressiveness when evaluating programs written in functional languages. Erlang [52] is a dynamically typed functional programming language. Characteristically for functional languages, iteration is mainly expressed by recursion in the source code. Therefore, when we want to measure the complexity of source code, recursive complexity has to be considered as well. RefactorErl provides a useful framework and querying interface to calculate metrics. Moreover, it represents source code in a Semantic Program Graph [53], containing both syntactic and semantic information about the program, e.g., static and dynamic call information. To implement these metrics, we need an accurate Call Graph and a Control Flow Graph as well. 5.1. Length of recursion and recursive complexity We have already mentioned that we can extract graphs on two levels of abstraction from the OCFG: CFG representing separately the functions, and call graph representing communication between the functions. Let us observe a directed graph where nodes represent functions of source code and branches represent function calls, i.e., Static Call Graph (SCG). We can say that there is recursion in the observed source code if there exists a cyclic path through the graph. We can restrict the set of nodes and the set of branches to the nodes and branches existing in certain recursion (determined by a single function call). Then we observe a sub-graph of source code representing only functions and function calls participating in the given recursion chain. We observe only single closed chains without sub-chains. All subrecursions will be observed independently (Fig. 5) [54]. The main characteristic of these chains is their length. Therefore we introduce the Length of Recursion (LOR). LOR = number of nodes in a recursive chain. In Fig. 5 on the left-hand side, we can see a call graph for three involved functions. From this graph, we can determine three separate recursive chains: A → A of length 1, A → C → A of length 2, and A → B → C → A of length 3 (Fig. 5, right). The described approach can be demonstrated by the example presented in Listing 4 and Fig. 6. The figure illustrates the OCFG for a source code given in the listing for the special case when the function f is observed as an entry point. In this example, the case expression is used to express branching similar to the function clause. If we observe the whole fragment that defines the function h, we find one branching with two function
Fig. 5. Splitting SCG to separate recursive chains that will be inbuilt in OCFG.
G. Rakić, M. Tóth and Z. Budimac
Listing 4. A program (module) written in Erlang.
clauses and with another nested two-branches branching in the second clause. An alternative way to write this was to replace the fragment starting with h(N) -> case N of... with the following one: f(1) -> 1; f(2) -> f(h(N-1)). This would appear as one branching with three branches without nested levels. However, these are still three basic predicates in both cases. If we observe the graph representation of this module, we can notice the following recursive call chains: f → f, g → g, h → h, f → g → h → f, f → calc1 → f. These chains are of length 1, 1, 1, 3 and 2 respectively. Based on Length of Recursion (LOR), we can define Recursive Complexity (RC) of the recursive chain that will be assigned to each function in the chain: RC = LOR2 The rationale behind this definition is that it may happen that the entry point to the program is not fixed like in Erlang. Respectively, if we observe the static level, we can enter a recursive chain from each node participating in it (there are LOR of them). Independently from which node we start our walk through the recursion, we pass through all nodes in the chain. Thus for LOR entry nodes, we pass the path of LOR nodes, which is LOR2 . Both measures, LOR and RC, are related to a specific function call statement and its participation in concrete recursive chain. Therefore,
Information and Software Technology 118 (2020) 106203
LOR and RC can not be expressed on the level of a function but on the level of the observed statement - each statement which is a recursive function call has assigned the value of LOR and RC. Actually, when observing recursive chains, we can not stay at the level of the static call graph, but rather represent it on the deeper level, connecting function call statement nodes with the corresponding functions. Thus, each function call node participates in a separate chain and has its own LOR and RC value. A more general algorithm when considering recursions is to introduce a recursive complexity on function level as it is the case of CC. The problem with this approach is that it is not precise enough to use the same recursive complexity of a function at every point where the function is called - because the recursive complexity of calls are varying based on the execution path where we want to evaluate it. Therefore, we have introduced a path-dependent length of recursion and recursive complexity, as described above. These two metrics, together with CC and ICC, are the basis for the derivation of the new complexity metric - Overall Path Complexity (OPC). 5.2. Overall Path Complexity Overall Path Complexity (OPC) has its fundamental in the CC metric, and it is based on its extension. Similarly, as it was introduced by ICC but additionally taking into account LOR and RC for each recursive call. The main novelty that this new metric brings is that it expresses control flow complexity on a global level, based on the Overall Control Flow Graph code representation, taking into account all loops and branchings but also following chains of function calls with special focus on closed (recursive) ones. We observe separately each recursive chain and take into account their length. In this way, the OPC is affected by: • control-flow complexity of the observed function; • control-flow complexity of each function in call chain which starts with some function call in the observed function; • length of call chain which starts with some function call in the observed function, if the chain is closed (recursive). This metric does not consider different recursive chains that do not affect control flow, such as type-level recursions. Furthermore, how concurrency or some dynamic constructs such as higher-order functions or polymorphic functions will be considered strongly depends on the expressiveness of the internal representations and its capability to catch dynamic details during the code translation. Fig. 6. Erlang module (Listing 4) represented by OCFG.
G. Rakić, M. Tóth and Z. Budimac
Information and Software Technology 118 (2020) 106203
When we are talking about polymorphic calls, we have different options. • The easiest way is to require a proper call graph. Once we have the proper call graph, we can use that information during the inlining. • An intermediate option is to build a static call graph with possibly imprecise static information about polymorphic calls in which case we will meet multiple candidates during the inlining. The choice of candidates will be observed as a branching, regardless of the fact that it appears above the function definition level. • Finally, we can be conservative and consider polymorphism on method level as the function clauses in functional languages. Since we do not have information statically about which clause will be evaluated, we consider clauses as branching. A decision about what will be considered by the consumer of the metric results – such as which modules will be included in the analysis (e.g., libraries or different modules/functions implemented outside of the observed project) – are project-specific ones. Since we define a metric that could be applied independently of the selected components or projects, these specifics are not in the focus of this article. The main algorithm is based on traversing the OCFG. We start traversing from the entry function, taking into account its CC. We continue through its statements and if the statement is a function call we are taking into account the CC of the function and the LOR or RC of that specific chain (if the chain is a recursive one). In this case, we follow the same calculation for the called function. If the call is not part of the recursive chain, we will just take into account the CC values of the functions through the sub-graph. In this case, the value of the OPC for this specific sub-graph will be equal to ICC. This procedure is repeated until we passed all branches of all chains exactly once. We would like to note here, that the described iterative calculation of the CC results in the calculation of ICC enhanced with the consideration of RC or LOR values. In the explained calculation procedure we recognized two dilemmas: 1. Should OPC take into account LOR or RC? 2. Should CC of the called function be increased or multiplied by the chosen metric? In order to explore the possibilities we have already mentioned, we defined four modified versions of the main algorithm for the OPC calculation starting from an observed function f, by the following four formulas: OPC_IL for each function call we take into account ICC increased by LOR ∑ (𝐼𝐶𝐶(𝑔) + 𝐿𝑂𝑅(𝑔)) 𝑂𝑃 𝐶_𝐼𝐿(𝑓 ) = 𝐶𝐶(𝑓 ) + 𝑔
OPC_IR for each function call we take into account ICC increased by RC ∑ 𝑂𝑃 𝐶_𝐼𝑅(𝑓 ) = 𝐶𝐶(𝑓 ) + (𝐼𝐶𝐶(𝑔) + 𝑅𝐶(𝑔)) 𝑔
OPC_ML for each function call we take into account ICC multiplied by LOR ∑ (𝐼𝐶𝐶(𝑔) ∗ 𝐿𝑂𝑅(𝑔)) 𝑂𝑃 𝐶_𝑀𝐿(𝑓 ) = 𝐶𝐶(𝑓 ) + 𝑔
OPC_MR for each function call we take into account ICC multiplied by RC ∑ 𝑂𝑃 𝐶_𝑀𝑅(𝑓 ) = 𝐶𝐶(𝑓 ) + (𝐼𝐶𝐶(𝑔) ∗ 𝑅𝐶(𝑔)) 𝑔
where g iterates over all functions calls in f where each iteration corresponds to a single function call in f. Consequently, we will have as many iterations as the number of function calls in the observed function. Each call is processed independently,
Fig. 7. 𝑂𝑃 𝐶_𝐼𝐿(𝑓 ).
Fig. 8. contextLOR(g, f).
and each chain is passed as many times as it is generated as a consequence of the call. Namely, a function is involved in the sum (as a chain initiator) as many times as it was called in f, while it can appear also in the middle of some other chains initiated by some other calls. The algorithm in Fig. 7 shows the calculation of OPC_IL. The method functionCalls returns all the function applications in the body of the given function. The method referredFun returns the function called by an application. cc and icc stand for the complexity metric calculations, and finally, contextLOR returns the length of a recursive call chain started from function f through function g. Changing the calculation in line 7 of the algorithm in Fig. 7 to • 𝑖𝑛𝑖𝑡𝑉 𝑎𝑙 ← 𝑖𝑛𝑖𝑡𝑉 𝑎𝑙 + (𝑖𝑐𝑐𝑉 𝑎𝑙 + 𝑙𝑜𝑟𝑉 𝑎𝑙2 ), or • 𝑖𝑛𝑖𝑡𝑉 𝑎𝑙 ← 𝑖𝑛𝑖𝑡𝑉 𝑎𝑙 + (𝑖𝑐𝑐𝑉 𝑎𝑙 ∗ max(1, 𝑙𝑜𝑟𝑉 𝑎𝑙)) or • 𝑖𝑛𝑖𝑡𝑉 𝑎𝑙 ← 𝑖𝑛𝑖𝑡𝑉 𝑎𝑙 + (𝑖𝑐𝑐𝑉 𝑎𝑙 ∗ max(1, 𝑙𝑜𝑟𝑉 𝑎𝑙2 )) gives us the algorithms for OPC_IR, OPC_ML and OPC_MR respectively. The modification of the original formula by adding max(1, lorVal) in the calculation of OPC_ML and max(1, lorVal2 ) for in the formula for OPC_MR is introduced after analysis of trivial cases (Section 6.2.1). The algorithm in Fig. 8 describes the context-based length of recursion calculation. This algorithm is not calculated on the OCFG but on the Call Graph (CG) of the program. We traverse the CG staring from the function f.7 The next step is to filter those call-chains that are recursive. Then we select the call chain from the recursive ones (contextRec) that contain both f and g.8 Based on the result, we return either zero when no such chain found, or the length of the chain. The algorithm in Fig. 9 shows the calculation of ICC for a given function f. This recursive algorithm takes into account the cyclomatic complexity of function f and adds the ICC values for each function call detected in the OCFG. Whenever we find a recursive call9 , we stop the inlining and add the cc value of the function.
7 We would like to note here that this is the function which OPC value is calculated by the former algorithm. 8 It may happen that we have more than one call chains fulfilling the above condition. In this case, we select the longest. 9 The recursive calls are detected by storing the context of the touched functions during the traversal of the OCFG. For the sake of simplicity, we have not noted this extra argument in the algorithm.
G. Rakić, M. Tóth and Z. Budimac
Information and Software Technology 118 (2020) 106203
6.1. An illustrative example If we observe the module (presented in Listings 4) graphically illustrated in Fig. 6, one entry point to the module is function f. Therefore, we start traversing the graph from it. We have to do the following steps to calculate OPC: • Take into account its CC which is 2. • Calculate all function calls in its body, that are the calls to f, g and calc1. • Consider the self-recursive call (call f), and take into account its ICC = 19 and LOR = RC = 1. Thus, we increment the initVal according to the algorithm in Fig. 7: 𝑖𝑛𝑖𝑡𝑉 𝑎𝑙 = 2 + 19 + 1 = 22. • Come to the call of function g, which is part of a recursive chain (LOR = 3, RC = 9) and take into account its ICC (19). The updated OPC_IL value is 𝑖𝑛𝑖𝑡𝑉 𝑎𝑙 = 22 + 19 + 3 = 44. • Come to the call of calc1 (ICC = 18, LOR = 2, RC = 4). Update the calculation: 𝑖𝑛𝑖𝑡𝑉 𝑎𝑙 = 44 + 18 + 2 = 64. • End of the calculation because there are no other function calls in f. Therefore, the final OPC_IL value is 64. Fig. 9. ICC(f).
When considering the calculation of the other variants we alter the values: 6. Demonstration and evaluation of the algorithm As a demonstration of the presented methodology, we have introduced CC, Interprocedural CC (ICC), Length of Recursion (LOR) and recursive complexity (RC) metrics. Based on the results, we have introduced four versions of the OPC metric, as described in Section 5.2. To demonstrate and evaluate the new metric, we start from the motivation example (sum(a, b), Section 2). As there are no invocations in the first version of the fragment (Fig. 1), CC = ICC = OPC = 5. In the second implementation of the same problem (Fig. 2), we extracted a code fragment in a separate function that is invoked twice. As it has been explained in the Section 2, CC(sum) = 3 and ICC(sum)=7. However, there are no recursive calls, and all four versions of the OPC remain the same as ICC. The third implementation (Figs. 3 and 4) involves recursions. We have calculated CC and ICC values for the sum function already (3 and 9). In the calculation of OPC_IL, we add LOR (which is 1 for self-recursive calls) to ICC for each invocation. This results in the value of 23 for OPC_IL. Since LoR is 1, RC is also 1. Thus we get the same value (23) for OPC_IR. Finally, the calculation of OPC_ML and OPC_MR multiplies ICC for each invocation by LoR value 1 and RC value 1, respectively. Hence, the values of these two versions of OPC is also 23. We can conclude that in the basic cases, there are no differences between four versions of the new metric calculation algorithm. Different types of recursion were considered on Listings 2 and 3. All the OPC values are the same for the tail recursive and the primitive recursive function definitions: OPC_IL=OPC_IR=7 and OPC_ML=OPC_MR=6. The Fibonacci implementation (Listing 1) has more execution paths and two recursive calls, thus the OPC values are higher: OPC_IL=OPC_IR=23 and OPC_ML=OPC_MR=21. For the purpose of a deeper exploration, all these metrics were implemented and integrated into RefactorErl. We used the tool to measure code written in Erlang. The concrete implementation calculates the OPC of a function by following the edges of the OCFG of the function: • Once a branching expression is found, we summarize the OPC metrics calculated in the branches. • Once a function call is found, we calculate the OPC of the function recursively by tracking the chain of already visited functions. • Once a closed function call chain is found during the calculation, we use the LOR or RC to integrate the complexity of the recursive function call.
• OPC_IR: 𝑖𝑛𝑖𝑡𝑉 𝑎𝑙 = 2 + 19 + 12 + 19 + 32 + 18 + 22 = 72 • OPC_ML: 𝑖𝑛𝑖𝑡𝑉 𝑎𝑙 = 2 + 19 ∗ 1 + 19 ∗ 3 + 18 ∗ 2 = 114 • OPC_MR: 𝑖𝑛𝑖𝑡𝑉 𝑎𝑙 = 2 + 19 ∗ 12 + 19 ∗ 32 + 18 ∗ 22 = 264 In this algorithm ICC is calculated as described in the Section 5.2 based on the formula presented in Section 2. For example, for calculation of ICC for the function f, we accumulate CC for every function called in f. Steps in the calculation are the following ones • Initiate ICC to the value of CC(f) = 2. • Come to self-recursion, add CC = 2. As here we close the chain (current function is the same as the starting one) we skip to the next statement. • Come to the call of function g, add its CC (2) and go inside. − Come to self-recursion, add CC = 2. − Come to call of function h, add CC = 3, and go inside. ∗ Come to self-recursive call, add CC = 3. ∗ Come to call of function f and add CC = 2. As f was the start function this is the end of the second closed chain and we are going to the next statement in f. • Come to the call of calc1, add CC = 1, and go inside. − Come to call of function f and add CC = 2. As f was the start function this is the end of the third recursive chain. Furthermore, as there are no more statements in f this was the last step in the calculation and total value of ICC(f) = 19. In case of an open chain, the only difference is related to the determination of the end of the chain. 6.2. Analysis and discussion of the algorithms In this section, we analyse and discuss the versions of the algorithms and their behavior in trivial (LOR = RC = 0 and LOR = RC = 1) and general cases (LOR > 1, RC = LOR2 ). 6.2.1. Trivial cases In order to decide which algorithm is the most suitable to be accepted as a new metric calculation algorithm, we have to deal with borderline application cases. Here we have to analyze two situations: • There are no recursive chains among observed functions (LOR = RC = 0)
G. Rakić, M. Tóth and Z. Budimac
• Some of the recursive chains are of length 1 (LOR = RC = 1). LOR = RC = 0 If we observe each version of the OPC calculation algorithm, we can conclude the following: OPC_IL and OPC_IR CC increased by 0 will not have an effect which is expected and correct, because there is nothing to increase initial complexity. OPC_ML and OPC_MR CC multiplied by 0 will reset complexity for that call to 0. In this case, even ICC would not be taken into account as a factor of complexity. Therefore, OPC_ML and OPC_MR could be eliminated. For generality, we kept them in the calculation by taking the maximum of 1 and LOR or RC values in the multiplication. LOR = RC = 1 If we observe the behavior of each version of the OPC calculation algorithm, we can conclude the following: OPC_IL and OPC_IR for each function call initiating a recursive chain of length 1, CC will be increased by 1, which has the same effect in both cases. OPC_ML and OPC_MR for each function call initiating a recursive chain of length 1, CC multiplied by 1 will not have an impact on the result, which is not expected because there is recursion which should increase initial complexity. This confirms that OPC_ML and OPC_MR are not suitable to be selected as an OPC calculation algorithm. However, for the exploration purposes, in the rest of the text, we will continue to observe all four versions of the defined algorithm. Therefore, in the critical case (OPC_ML and OPC_MR when LOR = RC = 0), we will keep only the basic complexity value instead of multiplying it by 0. Actually, in the provided formula for OPC_ML, we will replace LOR(g) by max(1, LOR(g))), and in the formula for calculation of OPC_MR, we will replace RC(g) with max(1, RC(g))). An explanation for this decision is that if the recursion does not exist, it should not affect complexity (nor increase either decrease it). Thus, we get the final version of the corresponding formulas, which corresponds to the implemented algorithm (Fig. 7): ∑ 𝑂𝑃 𝐶_𝑀𝐿(𝑓 ) = 𝐶𝐶(𝑓 ) + 𝑔 (𝐼𝐶𝐶(𝑔) ∗ max(1, 𝐿𝑂𝑅(𝑔))) ∑ 𝑂𝑃 𝐶_𝑀𝑅(𝑓 ) = 𝐶𝐶(𝑓 ) + 𝑔 (𝐼𝐶𝐶(𝑔) ∗ max(1, 𝑅𝐶(𝑔))) Regarding this decision, we still have to keep in mind that in this case, results will be the same for OPC_ML and OPC_MR when LOR = RC = 0 and OPC_ML and OPC_MR when LOR = RC = 1, which is another limitation of these versions of the algorithm.
Information and Software Technology 118 (2020) 106203
Table 1 Metric values for the different versions of an Erlang module (modules: m at the Listing 4 and m_changed1 and m_changed2 at the Fig. 10). Module
m
m_changed1
m_changed2
Function
calc1/1 f/1 g/1 h/1 calc1/1 calc2/1 calc3/1 f/1 g/1 h/1 calc1/1 calc2/1 calc3/1 f/1 g/1 h/1
CC
1 2 2 3 1 1 1 2 2 3 1 1 1 2 3 3
ICC
18 19 19 20 19 50 24 22 27 23 21 56 26 24 31 25
OPC _IL
_IR
_ML
_MR
22 64 45 46 26 50 24 77 78 52 28 56 26 85 87 56
24 72 51 52 32 50 24 89 90 58 34 56 26 97 99 62
39 114 81 80 67 50 24 162 155 92 73 56 26 182 172 100
77 264 201 196 199 50 24 438 407 224 217 56 26 494 448 244
Injected difference between the first and the second changed version (Fig. 10) is one new branching in function g, but without new function calls. Calls existed in the first changed version (Fig. 10, left) are nested in the injected branches in the second one (Fig. 10, right). This change resulted in the small increase of complexity for all functions that have function g in their recursive or non-recursive call chains. If we observe all changes between the original code (Listing 4) and the second changed version (Fig. 10, right), the injected changes contain all three characteristic cases: • Two new functions (calc2 and calc3) without participation in the recursion but calling recursive functions f, g, and h which affects their complexity. • One new branching in function g • One additional call of calc1 in function g, added to the previously added new branching. If we observe the results in Table 1, illustrated by the chart in Fig. 12, we can conclude that OPC grows faster with adding a new recursive call than with adding a new non-recursive call or a new branch. However, an important growth of values will be noticed with the injection of the combination of these complexity factors. 7. Validation and results
6.2.2. General cases Table 1 shows the values of all four versions of OPC metrics measured for the Erlang module presented in Listing 4 and illustrated by diagram shown in Fig. 6. Furthermore, in order to provide more insights into the results, the table provides values of these metrics for two modifications of this module (Fig. 10). Even if we previously concluded that versions 3 and 4 of the algorithm have certain weaknesses, we still observe all four versions, with special attention on the first two. In these examples, we pay attention to change of the complexity metric values when we add recursive call and/or branch in the code. We will observe two changed versions (Fig. 10) in comparison to the original one (Listing 4). Comparative illustration of all three versions is displayed in Fig. 11. Injected differences between the original code (Listing 4) and the first changed version (Fig. 10, left) are: • Two new functions (calc2 and calc3) without participation in the recursion but calling the recursive functions f, g and h which affects their complexity. Still, their OPC is equal to ICC. • One additional call of calc1 in function g. This significantly affects the complexity of all functions that have the function g in their recursive or non-recursive call chains.
All four versions of the OPC algorithm were evaluated and validated based on the Framework for evaluation and Validation of Software Complexity measures [18]. In the Framework, validation is defined in three steps: 1. Evaluate the practical utility features of the metric, 2. Evaluate the metric against the measurement theory, and 3. Evaluate the practical utility of the metric empirically on two levels: • preliminary, on test cases and examples, and • empirical validation on real-life examples. After three basic validation steps, a set of desirable properties should be checked to prove the usefulness and robustness of the observed metric and self-evaluation should be done to summarise identified strengths and weaknesses of the metric. 7.1. Practical utility features The evaluation framework defines five important features of a measure.
G. Rakić, M. Tóth and Z. Budimac
Information and Software Technology 118 (2020) 106203
Fig. 10. Modified Erlang module (m_changed1 and m_changed2).
m
Fig. 11. Illustration of Erlang modules m, m_changed1, and m_changed2.
Fig. 12. Illustration of metric values (Table 1) for the different versions of an Erlang module (modules: m at the Listing 4 and m_changed1 and m_changed2 at the Fig. 10).
G. Rakić, M. Tóth and Z. Budimac
Objective or goal of measure: Our goal is to provide a missing element in an overall software product quality picture. The main aim is to support developers in their everyday activities: coding, testing, reading the code, debugging and changing the code. This objective is to be accomplished by providing them with information about potentially problematic code fragment observed globally on the level of collaboration between functions. This information consists of the point in the code (a function) where the metric is high, which means that the function is a starting point of a complicated and potentially risky function collaboration. Users and scope of the measure: Primarily, the metric is applicable on a source code or an adequate representation of it. Consequently, its users are primarily software developers, but also all team members involved in the development and maintenance. Entities and attributes to be measured: The metric reflects the quality of a software product by observing separate functions and groups of functions based on their collaboration. Attributes that are observed are branches, iterations, and function calls with the focus of two important aspects of each function call: (1) if the call participates in an open or in a closed (recursive) invocation chain, and (2) how long is the observed invocation chain. Definition of metric and its measuring methods and instruments: The metric is defined in Section 5 in this article. To informally summarise it, for the observed function, the metric calculation algorithm passes through its execution paths, increases the value for each branch and for each iteration, while for every function call statement it steps into the called function and repeats the same algorithm for it. Additionally, the value is affected by the length of the invocation chain (increased or multiplied by the length of the chain or its square depending on the candidate algorithm). One implementation of the metric is developed in Erlang and currently can be used to evaluate Erlang code. Relationship between attributes and the metric: The metric definition and the calculation algorithm indicate that metric value is directly proportionally correlated with attributes that are measured.
7.2. Evaluation against measurement theory A direct relation between Measurement Theory (MT) and evaluation criteria for software complexity measure has been established and accepted by the community. Therefore, it is necessary to verify if the metric follows the main rules of MT. The following aspects should be considered: Empirical Relational System (ERS): Components of the entity quantification system are the values of the attributes targeted for the quantification, binary relations are showing dependencies among them, while a binary operation is used to produce new values from the existing ones. If we consider our OPC metrics, components are functions (i.e. their bodies). In general, it can be abstracted to any program body, or even to any program block. The only applicable binary operation is concatenation. The only parameter to be compared among fragments is complexity. So we observe only the binary relation more_or_equally_complex. If we observe two concatenated program bodies, then the OPC value for this fragment can be only increased compared to the OPC value for the first body. Thus the relation A more_or_equally_complex can be obviously established between the concatenated fragment and each of the initial body. Numerical Relational System (NRS):
Information and Software Technology 118 (2020) 106203
The set of possible values for OPC is a set of positive integers, the binary relation is ≥ , and the numerical binary operation is the addition (+) of positive integers. Mapping of entities to the values: The OPC metrics map the entities to the values without mapping any empirical or numerical knowledge, which is an important requirement. Representation condition: There are two parts of the representation condition. In the first part, it is required that, for any two observed entities (e1 and e2), if one of the entities is more complex then we get the higher value of OPC for that entity than for the other one. Namely, if the relation e1 more_or_equally_complex e2 stands, the relation OPC(e1) ≥ OPC(e2) also stands and vice versa. Obviously, to declare one fragment to be more complex than the other one is a subjective decision in practice. However, we discuss this aspect in Section 7.3 and conclude that this is satisfied to a reasonable extent. The second part of the representation condition observes an entity gained by the concatenation of two entities. In this case, it is required that the OPC of concatenated entities is equal to the sum of OPC values of the separate entities. Namely, the relation OPC(e1e2) = OPC(e1) + OPC(e2) should be satisfied. The initial complexity of an entity that does not contain branches, iterations and invocations is equal to the basic value of CC, that is 1 (there is always one execution path), while each of these constructs may increase OPC. After the concatenation, the concatenated entity will have one basic execution path that corresponds to the concatenated two basic paths, and the initial complexity is 1, instead of 2 that we had before the concatenation (1 for e1 and 1 for e2). However, each of the constructs in both entities will still increase OPC in the same way as before the concatenation. Thus, if we denote it as: OPC(e1) = 1 + x1, OPC(e2) = 1 + x2𝑎𝑛𝑑OPC(e1e2) = 1 + x1 + x2 we get OPC(e1) + OPC(e2) = 2 + x1 + x2 > 1 + x1 + x2 = OPC(e1e2) Therefore, the second condition is not satisfied. However, this is the weakness rooted in the CC that is taken as an initial measure for our new metric. Admissible transformation: As a scale is defined as a measure that satisfies ERS, NRS and the representation condition, and the OPC does not fully satisfy the last one, we can not say that OPC is a scale. Therefore, we do not discuss on a kind of scale. 7.3. Evaluation of practical utility In order to evaluate the practical utility of the OPC metric, we conduct a two-level validation: 1. As a preliminary evaluation, all four versions of the OPC algorithm were examined on a large set of different implementations of the QuickSort algorithm. These implementations are written in Erlang by a group of students and therefore of very different quality and complexity. 2. The OPC metric, mainly the selected one, was examined on a set of larger (industrial size), real-life open-source examples. However, described observation might be subjective. The real observation should encompass opinion and behaviour of developers. The experiments should be done on a large sample of participants where some of possible approaches include their rating of complexity according to some scale. An advanced experimentation may include a combination of fault injection and measurement of time and effort needed for elimination of the injected faults by participating developers. Finally, in a
G. Rakić, M. Tóth and Z. Budimac
Information and Software Technology 118 (2020) 106203
Fig. 13. Graphical illustration of complexity of all 12 implementations of QuickSort algorithm.
Fig. 14. Module qs2.
real-life experimentation we might observe different parameters about the project such as time spent on reading or debugging of specific functions or time needed for a bug elimination and their correlation with the corresponding OPC metric values.
7.3.1. Preliminary validation of practical utility We examined all four versions of the new metric on the large set of students’ implementations of the QuickSort algorithm in Erlang. However, we focus our discussion here only on 12 illustrative ones. Fig. 13 illustrates a simplified execution flow through these programs, focusing only on nodes that affect the complexity and compressing all chains of simple statements in a single node. Table 2 contains values of CC, ICC, OPC_IL, OPC_IR, OPC_ML and OPC_MR for these solutions.
If we observe these graphs, we expected solution qs2 to be the most simple, and this was the case based on our OPC metric. This solution is implemented with two very simple functions. The implementation has two branches, and only one branch contains a self-recursive call. The real complexity is recursion built into list comprehensions and library calls which do not affect the readability of this certain code fragment (Fig. 14). To understand how our code works, we will usually check each branch once, and then go into the recursive call and again check both branches. The more interesting observation is how OPC changes with the growth of the complexity of the solutions. For example, functions illustrated by the graphs qs7 and qs9 look very similar (Fig. 15). However, according to the results of the OPC metrics, the complexity of these solutions is much different. Thus, for the most complex functions of these solutions (sep_err/3 and partition_6), OPC_IL differs from 23–80, OPC_IR differs from 23–82, OPC_ML differs from 21–101 and OPC_MR differs from 21–151. This big difference is partially expected because solution qs9 has one more recursive call, and this recursive chain is of length 2. This means that in the process of understanding the logic and debugging the code, we will need to make at least one more cycle through all these complex control structures which multiplies the total complexity of the code logic. Furthermore, an interesting point to observe was solution qs6, which was expected to be very complex. However, this solution is divided into seven functions which communicate mainly by iterative call chains and only four of them are recursive with length 1 (Fig. 16). If we want to understand the implemented logic to check if it is correctly implemented, it is not hard to follow and analyse these structures. Therefore, this solution was not noted as a complex one. The most complex solution is qs11 (Fig. 16), consisting of two branchings with three and four branches. Even six (of seven) branches
G. Rakić, M. Tóth and Z. Budimac
Information and Software Technology 118 (2020) 106203
Fig. 15. Modules qs7 and qs9.
Fig. 16. Modules qs6, qs11 and qs5.
contain self-recursive call which gives the value of 307 to OPC_IL and OPC_IR and 301 to OPC_ML and OPC_MR. According to the previous discussion of the borderline case LOR = RC = 1, the values for pairs of OPC metric are the same as all the recursive chains have length 1. Another interesting solution concerning OPC is qs5 (Fig. 16), and values for OPC_IL, OPC_IR, OPC_ML and OPC_MR are 139, 143, 176, and 264, respectively. This solution is very complex because it combines several branchings, three self-recursions and two recursive chains of length 2. However, in this solution, complexity is distributed among the functions and therefore has lower complexity than qs11. If we observe the correlation between ICC and OPC values, it depends on the applied formula for calculation of OPC. The lowest correlation is 0.5172 with OPC_MR, while the highest is 0.6066 with OPC_IL. Correlation between ICC and OPC_IR is 0.6049, while ICC correlates to OPC_ML with the factor of 0.5938. These numbers show us that OPC metric brings new information in the complexity measurement. However, this is a small sample, and we will come back to correlation analysis on real-life, industrial size examples in the following section.
7.3.2. Evaluation on real-life examples We have analyzed our approach on the source code of the following Erlang open source projects: • • • • •
Useful algorithms and data structures in Erlang10 Mnesia based graph library11 Image processing library12 Logging framework13 Library to construct SQL queries14
10 Data structures and algorithms, 2019,https://github.com/aggelgian/ erlang-algorithms. 11 Mnesia based graph library, 2019, https://github.com/kbarber/ erlang-mdigraph. 12 Image processing library, 2019, https://github.com/evanmiller/erl_img. 13 Logging framework, 2019, https://github.com/basho/lager. 14 Library to construct SQL queries, 2019, https://github.com/ddosia/mekao.
G. Rakić, M. Tóth and Z. Budimac
Information and Software Technology 118 (2020) 106203
Table 2 QuickSort metric values. Module
Function
CC
ICC
qs1
qsort/1 qs/1 process/3 qsort/1 helper/1 qsort/1 qsort/2 qsort/4 qs/1 qsh/1 split/3 quicksort/1 qs_acc/2 split_acc/4 qs/1 qs_tail/3 qs_tail/1 select/2 simplify/3 simplify/1 select/4 qs/1 qs2/2 sep_er/3 qs/2 split/4 qsort/1 qs/2 partition/6 quicksort/1 qs/2 sort/3 qs/1 part/4 qs/1 qs2/2 sort/4
1 2 4 2 2 3 9 3 1 2 6 2 2 4 3 2 2 1 4 2 3 1 4 3 2 4 1 4 3 1 3 3 2 7 2 2 3
21 20 16 6 4 19 10 9 41 22 18 24 22 28 31 28 24 10 12 2 9 26 25 9 22 16 26 25 24 19 18 9 51 49 17 15 19
qs2 qs3
qs4
qs5
qs6
qs7
qs8 qs9
qs10
qs11 qs12
OPC IL
IR
ML
MR
21 39 55 6 7 53 10 23 41 43 44 24 32 139 31 55 24 10 12 2 23 26 91 23 64 55 26 82 80 19 50 23 51 307 17 23 77
21 39 55 6 7 53 10 23 41 43 44 24 34 143 31 55 24 10 12 2 23 26 91 23 64 55 26 84 82 19 50 23 51 307 17 25 81
21 39 52 6 6 51 10 21 41 42 42 24 58 176 31 54 24 10 12 2 21 26 88 21 62 52 26 102 101 19 48 21 51 301 17 40 101
21 39 52 6 6 51 10 21 41 42 42 24 114 264 31 54 24 10 12 2 21 26 88 21 62 52 26 150 151 19 48 21 51 301 17 78 161
• Priority queue implementations15 • Distributed DBMS16 We analyzed small libraries having less than a thousand lines of code (mdigraph) and large (mnesia) applications, having more than twenty thousand lines of code. In total, we encompassed 47,012 LOC containing 2899 functions by the analysis. We found 758 recursive paths and from that 602 were independent. For details check Table 3. The most interesting findings of our investigation are summarized in Table 4. We counted the number of different LOR values: the majority (71.60%) of the recursive execution paths have length 1, but we found LOR value 13 and 14 as well. We discussed in Section 6.2.1 the borderline case when LOR = RC = 1: the multiplications in OPC_ML and OPC_MR calculations do not affect metric values. Therefore, we suggest using OPC_IL or OPC_IR for Erlang developers, while further investigation is still planned. We also measured the ratio of the recursive functions in the analyzed projects. Table 5 contains the data for the largest analyzed application, Mnesia, and for all analyzed seven projects. The percentage of the recursive functions in the analyzed projects is around 20. For the recursive functions, we also measured the ratio of functions that are taking part in more than one recursive chain (16.61%) and the ratio of the functions that are taking part in recursive chains longer than 1 (9.9%). In Table 6, we listed the different metric values for two functions from the mnesia_schema module. The function get_table_properties/1 was included in a recursive path 15 Priority queue implementations, 2019 https://github.com/okeuday/ pqueue. 16 Distributed DBMS, 2019, http://erlang.org/doc/man/mnesia.html.
that has length 3. The function version/0 was included in several recursive paths that have length 5, 6, and 14. Even though bodies of these functions have similar complexity, chains starting with calls in version/0 go through several heavily used library calls. The called functions also have several branches, and it is not obvious when to execute which branch, thus it is not easy to follow the possible execution paths. Therefore, the OPC metric values for this function are obviously higher. Table 6 could lead us to the conclusion that the growth of the OPC values is proportional to the growth of ICC metric. However, if we observer the correlation among ICC and OPC_IR values we will see that this is not the case, since the correlation is 0.4930 in module mnesia_schema. In Table 7, we selected some OPC values which show that functions with similar CC and ICC values can have different OPC values. Even observing CC, ICC and LOR together (e.g., cvt_ycbcr_row/4 and get/1), but as separate values, does not bring us enough information, because this complexity metric combines individual complexity of observed basic constructs in a specific way. Overall trends in the data generated for the analysed projects are illustrated graphically at Fig. 17. It shows the different fluctuation of ICC and OPC_IR values for 200 functions with the highest complexity according to both criteria. Two functions with extremely high values were excluded from the chart. Including these values in the chart would decrease visibility on the charts because it would require a very wide interval on the y axis. However, this does not affect the integrity of the illustration because ICC and OPC_IR values are the same for these functions: ICC(eval_gl/2) = OPC_IR(eval_gl/2) = 16,255,036 and ICC(log_even/2) = OPC_IR(log_even/2) = 5,418,344. The next highest values, actually the highest values displayed on the charts are OPC_IR(format_reason/1) = 895,855 and ICC(handle_info/2) = 464,668. For similar reasons, we exclude functions with the lower values from the charts, as at some point, the fluctuations visually disappear. Effectively, values keep fluctuating proportionally: for lower values fluctuations are smaller, and therefore, not really visible on the charts. Both charts represent the same set of data. The chart (a) illustrates fluctuations of OPC following the trend of ICC, while the chart (b) shows the opposite - fluctuations of ICC following the trend of OPC. We can conclude that functionality implemented by functions in which we have peaks could be weak, or at least the most difficult, points for maintenance of our projects. Table 8 summarizes the correlation among the ICC and OPC_IR values for the small-scale QuickSort experiment and for the real-life applications as well. For the QuickSort examples, the correlation was a bit higher (0.6049), which could be explained by the smaller sample, while we should always trust more to the larger one. Finally, the main statement in this article is that theory and practice do not provide a metric to consider overall complexity of problem solution logic and its implementation. Consequently, we propose a metric that will enable us to understand the complexity of the solution in terms of understanding its logic, following the logical execution, comprehension and debugging. We introduce a new factor that affects difficulty of these processes in term of complexity of (recursive) call chains. As OPC metric defines new parameters in comparison to other complexity and maintainability metrics and tries to measure a different dimension of complexity, we do not get a high correlation between OPC and previously introduced metrics, such as the Maintainability Index. We have implemented this metric using CC and ICC values as well. The correlation was between −0.25 and −0.3. Correlation with Halstead Value, Halstead Effort and UCM is around 0.6. The correlation with Halstead Difficulty is a bit lower, 0.32. However, it was expected due to the fact that this metric is taking into account the number of operators and operands and no semantic information, such as recursion. The summary of the measured correlations can be found in Table 9. The other two metrics comparable to OPC (Section 4) were not directly applicable to Erlang code, while mapping of concepts, interpretation of the meaning and adapted implementation could influence the reliability of the analysis results. These lead us to the conclusion that
G. Rakić, M. Tóth and Z. Budimac
Information and Software Technology 118 (2020) 106203
Table 3 Analyzed applications. Application
Num. of modules
LOC
Num. of functions
Num of recursive paths
Max LOR
erlang-algorithms erlang-mdigraph erl_img-master lager mekao pqueue mnesia
11 2 15 20 2 4 28 82
1,622 802 4,767 6,741 704 8,301 24,075 47,012
104 104 396 367 50 81 1,796 2,899
22 17 157 107 11 17 427 758
2 1 4 4 1 1 14
Table 4 LOR values. LOR
Num. of paths
Ratio%
Num. of independent paths
Ratio%
1 2 3 4 5 6 13 14
539 66 39 28 20 12 26 28
71.11 8.71 5.15 3.69 2.64 1.58 3.43 3.69
539 33 13 7 4 2 2 2
89.53 5.48 2.16 1.16 0.66 0.33 0.33 0.33
this dimension of complexity is not measured and expressed by available metrics, while the usefulness of this perspective has to be evaluated in practical usage. We also implemented and compared cognitive metric variants to OPC. We have implemented the BCS based cognitive method weight for Erlang functions, and the modified recursion aware one as well. In the latter case, we have defined two versions. While the first one does not inline the weight for local function calls, the second one inlines the local weights as well. For functional languages, where the problem is usually decomposed to functions, the latter one is more feasible. We analysed the lager project and found that the correlation between OPC_IR and the cognitive method weight is around 0.6.
The modified versions of the cognitive complexity metrics, introduced in Section 3: CFS, CICM, MCCM, CPCM, NCCoP could be implemented for Erlang and compared with our OPC metric. However, some of them require modifications. Cognitive Functional Size (CFS) is calculated based on the number of inputs and outputs of the software, multiplied by the sum of the cognitive weight (𝐶𝐹 𝑆 = (𝑁𝑖 + 𝑁𝑜 ) ∗ 𝑊𝑐 ). Wc takes into account the function calls and recursive function calls with a fixed number (2) in the Wc calculation and does not take into account the depth of the call chains and the length of the recursive paths. Therefore, the defined OPC metric describes the complexity more precisely. In CPCM calculation the number of input and output variables are considered and incremented by the sum of the cognitive weight (𝐶𝑃 𝐶𝑀 = 𝑁𝑖 + 𝑁𝑜 + 𝑊𝑐 ). In Erlang, we do not have input/output variables, but it could be replaced by the function arguments and the return value. However, the issue raised by the Wc calculation remains the same as in case of CFS. Moreover, the same argument holds for the Modified Cognitive Complexity Metric (MCCM), for the Cognitive Information Complexity Measure (CICM) and the New Cognitive Complexity of Program (NCCoP). MCCM takes into account the number of operands and operators, like the Halsted metrics, and multiplies in with the cognitive weight. Our previous measures showed a slight correlation between our metric and Halsted metrics. NCCoP takes into account the number of variables in a particular line of code. Therefore, a variable affects the calculation as many times as it occurs in the code. However, variables are immutable in Erlang, therefore, the value changes do not affect the complexity.
Table 5 Analysis of recursive functions.
Ratio of recursive functions (%) Ration of recursive functions participating chains longer than 1 (%) Ration of recursive functions participating more than one chains (%)
Mnesia
All 7 applications
19,71 9.32 14,69
21,59 9.9 16,61
Fig. 17. Trends of ICC and OPC for analysed projects.
G. Rakić, M. Tóth and Z. Budimac
Information and Software Technology 118 (2020) 106203
Table 6 Various results in mnesia_schema. Function
CC
ICC
OPC
get_table_properties/1 version/0
3 5
692 319,910
_IL
_IR
_ML
_MR
875 454,549
893 454,731
2,558 6,013,439
7,643 83,838,095
Table 7 Various results from applications mnesia, erl_img and lager. Function
CC
mfa_to_string/3 cvt_ycbcr_row/4 get/1 tr_pixels/7 mask_to_levels/3 start_ok/2 unpack_bits/2 zip/5 pr/3 tuple_contents/3 record_fields/3 list_body/4
1 2 2 3 3 4 4 4 3 1 4 8
ICC
OPC
14 14 14 14 15 14 16 31 30 11,255 10,308 11,312
LORs
_IL
_IR
_ML
_MR
29 27 14 26 28 22 55 136 40 11,319 20,789 29,090
35 27 14 26 28 26 55 138 42 11,331 20,793 29,150
76 26 14 25 27 32 52 161 69 45,251 31,254 116,256
226 26 14 25 27 60 52 221 131 180,995 52,194 465,000
Table 8 ICC-OPC correlations.
3 1 1 1 1 1,1 1 1,1 2 3,4 1,2 2,3,3, 3,4,4
numbers in the larger set. However, the negative values appear when we compare them with the Maintainability Index.
Correlation(ICC, OPC_IR) Quicksort Real-life code
7.4. Property-based evaluation
0.6049 0.5584
To evaluate the usefulness and robustness of the OPC (the selected one, the OPC_IR) metric, we explore if it satisfies essential properties desirable for every complexity metric.
Table 9 Correlations on real-life code. Metric
ICC
OPC_IR
Unique Complexity Metric Halsted Effort Halsted Volume Halsted Difficulty Maintainability Index (CC) Maintainability Index (ICC) Cognitive (local_not_inlined) Cognitive (local_inlined)
0.7253 0.7478 0.7442 0.5981 −0.3522 −0.5923 0.2229 0.6865
0.5747 0.6226 0.5825 0.3219 −0.2579 −0.3014 0.1402 0.6164
Metric
ICC
OPC_IR
Unique Complexity Metric Halsted Effort Halsted Volume Halsted Dicciculty Maintainability Index (CC) Maintainability Index (ICC) Cognitive (simple) Cognitive (local_not_inlined) Cognitive (local_inlined)
0.5500 0.5882 0.8309 0.6586 −0.2500 −0.4300 −0.0536 −0.0697 0.2687
0.1900 0.8459 0.3361 0.3507 −0.6400 −0.7100 0.3012 0.3080 0.0772
Table 10 Correlations for QuickSort examples.
Some other cognitive metrics (MC, CWC, AC, CLC, CC, AMC, ACC, ACF, AAC), from the suite defined in [40], are not feasible to implement for Erlang. Therefore, we are not able to compare our OPC metric with them. Those are applicable for OO programs, and we plan to evaluate them in the future when we will have a Java based implementation of our metric as well. QuickSort Correlations. Table 10 summarizes the correlation among different source code metrics and the newly introduced OPC_IR and ICC metrics. Due to the small set of examples, the numbers differ from the
Property 1. The metric should be simple. It should be calculated based on a simple calculation without involving complex mathematical functions. Therefore, we conclude that this property is satisfied by OPC. Property 2. The measurement should be language independent. OPC is calculated based on elementary program constructs supported by any language. Its implementation is possible on the source code or on its static representation that may be produced independently of programming language. Therefore, this property is also satisfied by OPC. Property 3. The measure should be developed on a proper scale. Currently, we do not clearly define upper and lower bounds, but it can be clarified in its practical usage. However, based on the current results, we can say that there are some borders that can be naturally set. Therefore, we can conclude that this property is satisfiable. Property 4. Metrics in metric suite should be consistent. The OPC metric is derived from the calculation of CC, ICC, LoR and RC and therefore, these metrics are always calculated together. Growth of any of the four related metrics causes the growth of the OPC. However, the simultaneous growth of multiple ones and the occurrence of multiple complex invocations can cause more accelerated growth of the OPC. Finally, the OPC includes all these metrics and can be observed independently, while by observing all metrics together may bring us additional information such as the main cause of the high values. In any case, the OPC satisfies this property. Property 5. The metric should have some foundation that can be explained and visualised The OPC reflects the complexity of a combination of control structures and invocations and moves the observation from the simple control-flow aspects to the cognitive level. In this article, we give
G. Rakić, M. Tóth and Z. Budimac
multiple explanations and some visualisation options, while there are some additional possibilities. Therefore, we conclude that this property is also satisfied. Property 6. The metric should give a complexity as a positive number This property is fully satisfied. Property 7. The metric should differentiate between the complexities of the basic program constructs. The OPC differentiates between different constructs based on their internal complexity observer on the both, local and global level. For example, the complexity of a branch depends on the complexity of every single statement of its body, while the complexity of a function call depends on the complexity of all functions on the invocation chain. Therefore, this property is satisfied. Property 8. The metric should differentiate between a sequence of the same constructs and a nesting of them or an equivalent constructs. The selected version of the OPC metric is calculated by summing values of all participating constructs. Therefore it is not sensitive to nesting, and this property is not satisfied. Property 9. The metric should consider the modular complexity In general, the OPC metric will be sensitive to addition, deletion or replacement of a module because it reflects and includes the complexity of the interaction between modules. These aspects explain why the OPC satisfies also this property. As a conclusion, our OPC metric fully satisfies six properties. There are three properties that are partially satisfied. The first one is related to the threshold values. Currently we do not have defined boundary values, but we expect to define them in practice. However, based on the experiments conducted on real-life projects (Section 7), some natural borders can preliminary be identified. Another two, relatively interconnected properties that are not fully satisfied are related to making the difference between basic program constructs and making the difference between nested and concatenated constructs. These properties are satisfied on the level of complexity factors that we have introduced by this metric, while elements of our our metric that does not satisfy this are inherited from the CC metric. Four other metrics are evaluated through the same set of properties in [18]. The best results are gained for the CFS metric, that did not satisfy only consistency of measures in the metric suite as it does not come with a suite. The Halstead Effort Measure satisfies five properties (1, 4, 5 and 6), while CC satisfies only four properties (1, 2, 5, and 6), as well as Statement count (1, 5, 6, and 9). We conclude that, out of the five metrics evaluated through these nine properties, the OPC metric takes the second place after the CFS. 7.5. Self-evaluation As “every measure has its own advantages and disadvantages” [18], elements of the self-evaluation are strengths and weaknesses of the introduced metric. The main strengths of the OPC metric are: • The metric is introduced with clearly defined objectives to support developers in everyday development and maintenance activities by providing them with a novel perspective to the complexity of the code. • The metric brings a new information compared to the available metrics, that is demonstrated through the empirical observation of the gained values and their not so strong correlation to the values of previously available metrics. • The information that higher OPC values bring is that the observed function is an entry point of a complicated and potentially problematic fragment of the solution and it may encompass functions interconnected by invocation chains that the function participates in. It may be a valuable information in the debugging, testing, or refactoring.
Information and Software Technology 118 (2020) 106203
• The metric can be applied independently of the input language based on an expressive static representation of the observed code, such as the OCFG. • The metric is derived from four simple metrics such as CC, ICC, LoR and RC. While high values of the metric bring the information which fragment is potentially problematic, observation of these separate values may give us the insight in the cause of the high complexity. Observed drawbacks of the OPC metric are: • The OPC metric values may be very high, and some approach for its normalisation may be necessary. • Threshold values are still not defined. • The metric does not satisfy all requirements of MT, but the root of this drawback is in the CC that is used as a basic block of the new metric. • The metric is not always sensitive to differences between similar constructs or to the nesting of the constructs, which is, again, a consequence of reliance on CC. However, the OPC is sensitive to inlining or extracting the function if it is on a recursive chain, which was the intent of this metric. Some of these drawbacks might be resolved as a part of our future work (Section 8). However, a significant part of the future work is to ensure acceptance from the industry. For these purposes, tool support for multiple languages is necessary. 8. Conclusion and future work The main contribution of this article is a new complexity metrics, the Overall Path Complexity (OPC). It is built on the basis of Cyclomatic Complexity (CC). This metric is defined on Overall Control Flow Graph (OCFG) code representation taking into account all loops and branchings, but also following chains of function calls with a special focus on closed (recursive) ones. We observe separately each recursive chain and take into account its length and complexity of each function on it. In this way, OPC is affected by the number of call chains (primarily recursive ones) and their length; and control-structures complexity on local (CC) and global (ICC) level for all functions in each chain. Therefore, we defined the Length of Recursion (LOR) and the Recursive Complexity (RC) metrics. These metrics were necessary for the definition of OPC. Furthermore, we proposed four different versions of the OPC calculation algorithm and explored their suitability. In order to test the applicability, all four versions were implemented and applied to different Erlang programs, from students’ solutions to industrial cases. After detailed exploration and discussion of general and trivial cases, we selected the second version (OPC_IR) as the most appropriate according to current results with (still open) possibility to change the decision after further validations. Finally, based on correlation analysis with comparable available metrics, we conclude that OPC introduces a new dimension of complexity which is not measured by available metrics, while the usefulness of this perspective has to be confirmed in practical usage. Erlang is not a purely functional programming language. Therefore, further investigation is needed for other functional programming languages such as Clean17 [55] or Haskell18 [56] to decide which metric is the most appropriate for functional programming. We would like to mention here that the way recursion is used is not just programming language-dependent but also depends on the habits and style of the developer. Recursion is a broadly used mechanism, and therefore the applicability of OPC on code written in other languages has to be investigated as well. As a part of future work, we plan to integrate implementation of this metric into SSQSA framework [14] and to test results on different 17 18
CLEAN, 2019, http://clean.cs.ru.nl/Clean. Haskell, 2019, https://www.haskell.org/.
G. Rakić, M. Tóth and Z. Budimac
languages and paradigms. This metric is designed to be language and paradigm independent. Its application on heterogeneous code should show us different additional anomalies in the code such as mixed programming approaches and styles, unintentional recursions, etc. Furthermore, we plan to extend this metric with additional information about the nature of the call chains. For example, it should be taken into account if chains pass the borders of units and packages. Finally, for languages where more than one entry points to the program or to the function are enabled, a normalized version of OPC could be designed based on the possible entrance branches to each observed function. Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. References [1] R.D. Banker, S.M. Datar, C.F. Kemerer, D. Zweig, Software complexity and maintenance costs, Commun. ACM 36 (11) (1993) 81–95. [2] V. Antinyan, M. Staron, A. Sandberg, Evaluating code complexity triggers, use of complexity measures and the influence of code complexity on maintenance time, Emp. Softw. Eng. 22 (6) (2017) 3057–3087. [3] D. Kafura, G.R. Reddy, The use of software complexity metrics in software maintenance, IEEE Trans. Softw. Eng. SE-13 (3) (1987) 335–343. [4] J.P. Kearney, R.L. Sedlmeyer, W.B. Thompson, M.A. Gray, M.A. Adler, Software complexity measurement, Commun. ACM 29 (11) (1986) 1044–1050. [5] R. Malhotra, A. Chug, Software maintainability: systematic literature review and current trends, Int. J. Softw. Eng. Knowl. Eng. 26 (08) (2016) 1221–1253. [6] S.H. Kan, Metrics and Models in Software Quality Engineering, Addison-Wesley Longman Publishing Co., Inc., 2002. [7] GitHub Statistics, 2019, https://madnight.github.io/githut//pull_requests/2019/2. [8] Elixir-stat, 2019, A collection of companies using Elixir in production, Elixir, https://elixir-companies.com/en. [9] S. Blau, J. Rooth, J. Axell, F. Hellstrand, M. Buhrgard, T. Westin, G. Wicklund, {AXD} 301: A new generation {ATM} switching system, Comput. Netw. 31 (6) (1999) 559–582. [10] F.E. Allen, Control flow analysis, SIGPLAN Not. 5 (7) (1970) 1–19. [11] T. McCabe, A complexity measure, IEEE Trans. Softw. Eng. SE–2 (4) (1976) 308–320. [12] J.A. Stafford, A.L. Wolf, A formal, language-independent, and compositional approach to interprocedural control dependence analysis, University of Colorado, 2000 Ph.D. thesis. [13] P. Lokuciejewski, P. Marwedel, Worst-Case Execution Time Aware Compilation Techniques for Real-Time Systems, Springer Science & Business Media, 2010. [14] G. Rakić, Extendable and Adaptable Framework for Input Language Independent Static Analysis, Universiti of Novi Sad, Faculty of Sciences, 2015 Ph.D. thesis. [15] W. Harrison, K. Magel, R. Kluczny, A. DeKock, Applying software complexity metrics to program maintenance, Computer 15 (9) (1982) 65–79. [16] E.J. Weyuker, Evaluating software complexity measures, IEEE Trans. Softw. Eng. 14 (9) (1988) 1357–1365. [17] S. Misra, Modified set of Weyuker’s properties, in: Proceedings of the 5th IEEE International Conference on Cognitive Informatics, 1, 2006, pp. 242–247. [18] R.C.-P. S. Misra I. Akman, Framework for evaluation and validation of software complexity measures, IET Softw. 6 (2012). 323–334(11) [19] K. Lakshmanan, S. Jayaprakash, P. Sinha, Properties of control-flow complexity measures, IEEE Trans. Softw. Eng. 17 (12) (1991) 1289–1295. [20] J.J. Vinju, M.W. Godfrey, What does control flow really look like? eyeballing the cyclomatic complexity metric, in: Proceedings of the IEEE 12th International Working Conference on Source Code Analysis and Manipulation (SCAM), IEEE, 2012, pp. 154–163. [21] S.D. Sheetz, D.P. Tegarden, D.E. Monarchi, Measuring object-oriented system complexity, in: Proceedings of the 1st Workshop on information Technologies and Systems, 23 pages, 1991. [22] W. Li, S. Henry, Object-oriented metrics that predict maintainability, J. Syst. Softw. 23 (2) (1993) 111–122. [23] K.R. Kiraly Roland, Metrics based optimization of functional source code, Annal. Math. et Inf. 38 (2011) 59–74. [24] A. Sipos, N. Pataki, Z. Porkoláb, On multiparadigm software complexity metrics, in: Proceedings of the MaCS 6th Joint Conference on Mathematics and Computer Science, 2006, p. 85. [25] M.H. Halstead, Elements of Software Science (Operating and Programming Systems Series), Elsevier Science Inc., New York, NY, USA, 1977. [26] J. Shao, Y. Wang, A new measure of software complexity based on cognitive weights, Canad. J. Electr. Comput. Eng. 28 (2) (2003) 69–74. [27] D.S. Kushwaha, A.K. Misra, Evaluating cognitive information complexity measure, in: Proceedings of the 13th Annual IEEE International Symposium and Workshop on Engineering of Computer-Based Systems (ECBS’06), IEEE, 2006, pp. 2–pp.
Information and Software Technology 118 (2020) 106203 [28] D.S. Kushwaha, A.K. Misra, Improved cognitive information complexity measure: a metric that establishes program comprehension effort, ACM SIGSOFT Softw. Eng. Notes 31 (5) (2006) 1–7. [29] S. Misra, A complexity measure based on cognitive weights, Int. J. Theoret. Appl. Comput. Sci. 1 (1) (2006). [30] S. Misra, Modified cognitive complexity measure, in: Proceedings of the International Symposium on Computer and Information Sciences, Springer, 2006, pp. 1050–1059. [31] S. Misra, Cognitive program complexity measure, in: Proceedings of the 6th IEEE International Conference on Cognitive Informatics, 2007, pp. 120–125. [32] A. Jakhar, K. Rajnish, A new cognitive approach to measure the complexity of software’s, Int. J. Softw. Eng. Appl. 8 (2014) 185–198. [33] S. Misra, An object oriented complexity metric based on cognitive weights, in: Proceedings of the 6th IEEE International Conference on Cognitive Informatics, in: COGINF ’07, IEEE Computer Society, Washington, DC, USA, 2007, pp. 134–139. [34] S. Misra, I. Akman, Weighted class complexity: a measure of complexity for object oriented system, J. Inf. Sci. Eng. 24 (2008) 1689–1708. [35] S. MISRA, I. AKMAN, M. KOYUNCU, An inheritance complexity metric for object-oriented code: a cognitive approach, Sadhana 36 (3) (2011) 317. [36] S. Misra, I. Akman, A unique complexity metric, in: O. Gervasi, B. Murgante, A. Laganà, D. Taniar, Y. Mun, M.L. Gavrilova (Eds.), Computational Science and Its Applications – ICCSA 2008, Springer Berlin Heidelberg, Berlin, Heidelberg, 2008, pp. 641–651. [37] S. Misra, Measurement of cognitive functional sizes of software, Int. J. Softw. Sci. Comput. Intell. (IJSSCI) 1 (2) (2009) 91–100. [38] S. Misra, I. Akman, F. Cafer, A multi-paradigm complexity metric (MCM), in: Proceedings of the International Conference on Computational Science and Its Applications, Springer, 2011, pp. 342–354. [39] S. Misra, M. Koyuncu, M. Crasso, C. Mateos, A. Zunino, A suite of cognitive complexity metrics, in: B. Murgante, O. Gervasi, S. Misra, N. Nedjah, A.M.A.C. Rocha, D. Taniar, B.O. Apduhan (Eds.), Computational Science and Its Applications – ICCSA 2012, Springer Berlin Heidelberg, Berlin, Heidelberg, 2012, pp. 234–247. [40] S. Misra, A. Adewumi, L. Fernandez-Sanz, R. Damasevicius, A suite of object oriented cognitive complexity metrics, IEEE Access 6 (2018) 8782–8796. [41] M. Crasso, C. Mateos, A. Zunino, S. Misra, P. Polvorín, Assessing cognitive complexity in java-based object-oriented systems: metrics and tool support, Comput. Inf. 35 (3) (2016) 497–527. [42] S. Misra, A.K. Misra, Evaluating cognitive complexity measure with Weyuker properties, in: Proceedings of the Third IEEE International Conference on Cognitive Informatics, 2004, pp. 103–108. [43] S. Misra, Weyuker’s properties, language independency and object oriented metrics, in: Proceedings of the International Conference on Computational Science and Its Applications: Part II, in: ICCSA ’09, Springer-Verlag, Berlin, Heidelberg, 2009, pp. 70–81. [44] S. Misra, An analysis of Weyukers properties and measurement theory, Proc. Ind. Natl. Sci. Acad. 76 (2) (2010) 55–66. [45] D. De Silva, N. Kodagoda, Applicability of Weyuker’s properties using three complexity metrics, in: Proceedings of the 8th International Conference on Computer Science & Education, IEEE, 2013, pp. 685–690. [46] D. De Silva, N. Kodagoda, S. Kodituwakku, A. Pinidiyaarachchi, Analysis and enhancements of a cognitive based complexity measure, in: Proceedings of the IEEE International Symposium on Information Theory (ISIT), IEEE, 2017, pp. 241–245. [47] N. Chapin, T.S. Lau, Effective size: an example of use from legacy systems, J. Softw. Evolut. Process 8 (2) (1996) 101–116. [48] N. Gorla, A.C. Benander, B.A. Benander, Debugging effort estimation using software metrics, IEEE Trans. Softw. Eng. 16 (2) (1990) 223–231. [49] M. Fowler, Refactoring: Improving the Design of Existing Code, Pearson Education India, 2009. [50] I. Bozó, D. Horpácsi, Z. Horváth, R. Kitlei, J. Kőszegi, M. Tejfel, M. Tóth, RefactorErl – Source Code Analysis and Refactoring in Erlang, in: Proceedings of the 12th Symposium on Programming Languages and Software Tools, 2011, pp. 138–148. Tallin, Estonia [51] M. Tóth, I. Bozó, Static analysis of complex software systems implemented in Erlang, in: Proceedings of the 4th Summer School Conference on Central European Functional Programming School, in: CEFP’11, Springer-Verlag, Berlin, Heidelberg, 2012, pp. 440–498. [52] J. Armstrong, Programming Erlang: software for a concurrent world, Pragmatic Bookshelf, 2007. [53] Z. Horváth, L. Lövei, T. Kozsik, R. Kitlei, A.N. Víg, T. Nagy, M. Tóth, R. Király, Modeling semantic knowledge in Erlang for refactoring, in: Proceedings of the International Conference on Knowledge Engineering, Principles and Techniques, KEPT Knowledge Engineering: Principles and Techniques, in: Studia Universitatis Babe-Bolyai, Series Informatica, 54(2009) Sp. Issue, 2009, pp. 7–16. Cluj-Napoca, Romania [54] G. Rakić, Z. Budimac, K. Bothe, Introducing recursive complexity, in: Proceedings of the 11th International Conference of Numerical Analysis and Applied Mathematics: ICNAAM, 1558, AIP Publishing, 2013, pp. 357–361. [55] P. Koopman, R. Plasmeijer, M. van Eekelen, S. Smetsers, Functional programming in clean, draft, 2001, https://clean.cs.ru.nl/download/papers/cleanbook/ CleanBookI.pdf. [56] G. Hutton, Programming in Haskell, Cambridge University Press, 2016.