ELSEVIER
Information and Software Technology 39 (1997) 497-509
Metrics for quality analysis and improvement of object-oriented software Christof Ebert”,*, Ivan Morschelb aAlcatel TeleEom, Switching Systems Division, B-2000 Anhverpen, Belgium bDaimlt-rBenz Research Centre, Urn, Germany
Received 13 August 1996; revised 15 November 1996; accepted 21 January 1997
Abstract Software metrics are playing an important role in analysing and improving quality of software work products during their development. Measuring the aspects of software complexity for object-oriented software strongly helps to improve the quality of such systems during their development, while especially focusing on reusability and maintainability. It is widely accepted that more widespread use of object-oriented techniques can only come about when there are tool systems that provide development support beyond visualising code. Unfortunately, many object-oriented metrics are defined and applied to classroom projects but no evidence is given that the metrics are useful and applicable both from an experience viewpoint and from a tools viewpoint - for industrial object-oriented development. Distinct complexity metrics are developed and integrated in a Smalltalk development support system called SmallMetric. Thus, we achieve a basis for software analysis (metrics) and development support (critique) of Smalltalk systems. The main concepts of the environment including the underlying metrics are explained, its use and operation is discussed, and some results of the implementation and its application to several industrial projects are given with examples. 0 1997 Elsevier Science B.V. Keywords:
Development
support; Maintainability;
Metrics; Object-oriented
1. Introduction Software metrics are measures of development processes and the resulting work products. In this context we will focus on metrics that are applicable to software developed in Smalltalk. We will further concentrate on such metrics that can be used as quality indicators during the development process, hence providing support for the developers. These metrics are often classified as product metrics because their inputs are products of the development process. We will not distinguish metrics and measures from a mathematical point of view. When referring to complexity metrics we are using this phrase for a group of software metrics that measure structural or volume aspects of products that are intuitively related to parts difficult to understand. These difficulties in dealing with such complex components have been proved to cause high error-rates, testing effort and bad maintainability (for further details on metrics see 111). Object-oriented programming practices per se will never make poor programmers into good ones. The opposite is the * Corresponding author. Tel.: +32 3 240 4081; fax: +32 3 240 9935; e-mail:
[email protected] 0950-5849/97/$17.00 0 1997 Elsevier Science B.V. All rights reserved PI1 SO950-5849(97)00005-O
metrics; Quality control; Smalltalk
case, especially while reflecting current surveys that show that corporate adoption of object-oriented technology has tripled since 1994, while at the same time, it is considered as having the lowest success rate among new technologies. Or in the words of one recent workshop summary: “Realistically, object-orientation will deliver benefits, but fewer of them, and with more effort required on the part of the organisation, than the promises of its most zealous advocates imply.” Typical class libraries consist of look-alike classes classes that cannot delete that drown programmers, unwanted properties, encapsulation at the class level that severely limits reuse (i.e. hiding too much details of overly specialized classes), or static class hierarchies undermined with quickly evolving information domains. As a result most programmers rather copy the class, change it, and create a new member to the library. It is highly necessary to give during design insight in design decisions that improve reuse instead of hiding valuable concepts in a flood of case-specific details. Because of the extreme subjectivity of quality attributes per se, it is important to select metrics that can be applied to the specific objectives of a project, that have been derived from the project’s requirements and can be used to prove consistency, that can be applied during several phases of the
C. Ebert, I. MorscheUInformation and Sofrware Technology 39 (1997) 497-509
498
development process on resulting products (design, code, documentation, etc.), and that can be collected and evaluated automatically by the development environment in use. This article introduces complexity-based classification techniques as a basis for constructing quality models that can identify outlying software components that might cause potential quality problems. Such models are based on the experience that typically a rather small number of components (e.g. methods) has a high failure rate and is most difficult to test and maintain. Our own project experiences for instance just recently showed that 20% of all components in telecommunication projects were the origin of over 40% of all field failures with high priority. Even worse is the fact that we could also show that it is not so difficult to identify these components ex ante - either by asking designers and testers and grouping their subjective ratings, or by applying classification rules based on simple structural software metrics [Z]. Effects of applying complexity-based criticality prediction to a new project can be summarised as follows: l
l
20% of all modules in the project were predicted as most critical (after coding), and these modules contained over 40% of all faults (up to release time).
Knowing l
l
that
60% of all faults can theoretically be detected before system integration, and fault correction during module test and code reading costs less than 10% compared to fault correction during system test.
Twenty-four percent of all faults can be detected early by investigating 20% of all modules more intensively with 10% of effort compared to fault correction during system test, therefore yielding a 20% total cost reduction for fault correction. Additional costs for providing the statistical analysis are in the range of two person days per project. Necessary tools are off the shelf and account for even less per project. In this context the paper addresses typical questions often asked in object-oriented software engineering projects: How can I identify early the relatively small number of critical components that mainly contribute to bad quality identified later in the life cycle? Which components should be better redesigned because their maintainability is bad and their overall criticality to the project’s success is high? What is the benefit of introducing a metrics program that investigates structural properties of software? Are there structural properties that can be measured early in the code to predict quality attributes? Can I use the - often heuristic - design and test knowhow on trouble identification and risk assessment to
l
build up knowledge-based systems that help to identify critical components early in the development process? Last, but surely not least, is it after all feasible to automate such classification in an object-oriented development environment?
The paper is organised as follows. The introductory Section 2 presents a brief overview of background and problems associated with metric-based classification models in the context of object-oriented development. Section 3 gives a very brief overview of the basic concepts of objectorientation and of Smalltulk. We will describe a programming analysis environment for SmaZltulk-80 [3], that we selected because of the language’s perceived uniformity and elegance. The syntax of Smalltalk is easy to understand, it possesses a small number of operators (in contrast to C++), and it completely supports the notion of object, class, and inheritance. This article presents a basic set of metrics to support the development of object-oriented programs as well as a tool to automatically measure and to judge programs written in Smulltulk. Sections 4 and 5 describe the selection of metrics for object-oriented software and a tool environment called SmallMetric. Results from applying the analysis environment in industrial projects are presented in Section 6. Conclusions and a brief summary with an outlook for further work are given in Sections 7 and 8.
2. Metrics for quality models Although striving for high quality standards, only a few organisations apply true quality management. Quality management consists of proactively comparing observed quality with expected quality, hence minimising the effort expended on correcting the sources of defect, In order to achieve software quality, it must be developed in an organised form by using defined methods and techniques and applying them consistently. In order to achieve an indication of software quality, software must be subjected to measurement. This is accomplished through the use of metrics and statistical evaluation techniques that relate specific quantified product requirements to some attributes of quality. The approach of integrating software measures and statistical techniques is shown in Fig. 1. The object-oriented CASE environment provides the formal description of different products developed during the software life-cycle with its necessarily defined methods and the underlying process. Multivariate statistical techniques provide feedback about relationships between components (e.g. factor analysis, cluster analysis) [2,4]. Classification techniques help determining outliers (e.g. error-prone components) [4-61. Finally, detailed diagrams and tables provide insight into the reasons why distinct components are potential outliers and how to improve them [ 1,4].
C. Ebert, I. MorscheUInformation and Sojiware Technology 39 (1997) 497-509
Fig. 1. Measures and statistical techniques
Product metrics are used to supply models for [ 1,2,5-71: l l
l l
estimating effort and costs of new projects; evaluating the productivity to introduce new technologies (together with their methods and tools); measuring and improving software quality; forecasting and reducing testing and maintenance effort.
Quality or productivity factors to be predicted during the development of a software system are affected by many product and process attributes, e.g. software design characteristics or the underlying development process and its environment [ 1,5,6]. Quality models are based upon former project experiences and combine the quantification of aspects of software components with a framework of rules (e.g. limits for metrics, appropriate ranges etc.). They are generated by the combination and statistical analysis of product metrics (e.g. complexity measures) and product or process attributes (e.g. quality characteristics, effort, etc.) [5,6]. These models are evaluated by applying and comparing exactly those invariant figures they are intended to predict, the quality metrics (e.g. maintainability, error rate,
499
in software engineering.
etc.). Iterative repetition of this process can refine the quality models hence allowing the use of them as predictors for similar environments and projects. For assessing overall quality or productivity, it is suitable to break it down into its component factors (e.g. maintainability), thus arriving at several aspects of software that can be analysed quantitatively. Typical problems connected to data collection, analysis, and quality modelling are addressed and discussed comprehensively in [ 1,5,6]. Classification or clustering algorithms are mathematical tools for detecting similarities between members of a collection of objects. Classification algorithms can be loosely categorised by the underlying principle (objective function, graph-theoretical, hierarchical) or model type (deterministic, probabilistic, statistical, fuzzy). Information about the objects (e.g. software components) to be analysed is input to classification algorithms in the form of metric vectors. The elements of a metric vector are the measurements of distinct software features that have been chosen as a basis for comparing a specific object to other objects. The output of a classification or clustering algorithm can then be used to
500
C. Eberi, 1. MorscheVlnformation and Software Technology 39 (1997) 497-509
classify the objects into subsets or clusters. The classification of metric vectors can be performed with the information about different classes (e.g. errors, change-rate). The training of any classification algorithm using this kind of information is called supervised. If the algorithm classifies the data autonomously the training is unsupervised. We will further focus on supervised learning because quality metrics are provided within training data sets. Metric vectors assigned to the same cluster are in some sense similar to each other, more so than they are to other metric vectors not assigned to that cluster. Instead of predicting number of errors or changes (i.e. algorithmic relationships) we are considering assignments to groups (e.g. ‘high maintainability’). While the first goal has been achieved more or less with regression models or neural networks predominantly for finished projects, the latter goal seems to be adequate for predicting potential outliers in running projects, where preciseness is too expensive and unnecessary for decision support. Due to successful application in many projects such metrics obviously should be available for object-oriented environments. The goals are the same as for procedural systems, primarily indicating potentially troublesome classes that should be improved before being introduced to the class libraries. The object-oriented paradigm could directly profit from metrics as a vehicle to instruct staff who are new to this approach. Furthermore, software metrics could be used to measure the problems to introduce this paradigm and its acceptance as well as to set design standards for an organisation. Traditional metrics for procedural approaches are not adequate for evaluating object-oriented software, primarily because they are not designed to measure basic elements like classes, objects, polymorphism, and message-passing. Even when adjusted to syntactically analyze object-oriented software they can only capture a small part of such software and so can just provide weak quality indication [g-lo]. Even dedicated research on metrics for object-oriented programs gave only rough guidelines, such as limiting the size of methods in Smalltalk to 12 lines, without any indication on how to taylor project-specific design guidelines [IO]. It is hence important to define customised metrics for objectoriented programs. Additionally the characteristics of the target language should be considered. Some languages directly support the object-oriented approach (C + + , Smalltalk, Eifsel) and others just to some extent (Ada). Other factors like the size and contents of the class library and the semantics and syntactical form of particular commands should also be considered.
3. Object-orientation
understanding of the approaches described later, we will try to give a rather brief summary about interesting features of object-orientation with respect to the Smalltalk programming language. In this paper we use the Smalltalk terminology. As can easily be imagined the term ‘object’ plays a central role in object-oriented programs. It encompasses data structures that describe its state, and methods that realise its functionality. Data structures are encapsulated and provide information hiding with respect to their object which means that they do only offer access functions, called methods, but no direct use of the internal data structures. Objects communicate with each other via message passing which means that one method starts a method in another object. Mechanisms to hierarchically structure objects in classes exist in all object-oriented languages. Instances can be derived from classes and differ from other objects only on the basis of associated states. Another important characteristic is the possibility of incrementally defining class hierarchies. This is done by the inheritance mechanism. From a superclass, a subclass inherits all its data structures and methods. In the subclass, new data structures and methods can be defined or they can be rewritten. Smalltalk’s inheritance mechanism for example is designed to model software evolution as well as to classify. Smalltalk supports the object-oriented concepts fully. It manipulates classes, objects and implements a single inheritance mechanism. It does not include multiple inheritance, prototypes, delegation, or concurrency in its standard version. In addition to its programming language, Smalltalk and its current derivatives and flavours include an open programming environment to develop object-oriented programs. It offers a comfortable graphical user interface, several helpful tools and a vast class library. Programming language and environment coexist in a homogeneous form, where concepts at the programming level are reproduced in the environment. An example is the message passing mechanism. In Smalltalk programs, it means the activation of a method. The same strategy is used in the user interface to identify a selected object. This message passing consists of: an object (the receiver of the message), a message selector and optional arguments as parameters. Object behaviour is described by the mentioned methods, which have a selector and include Smalltalk commands. In different classes, methods with the same selector can exist. This is called polymorphism. The status of an object is captured through class and instance variables that might be accessed from outside by suitable methods. Class variables are defined at the class level and instance variables at the object level to store object’s states.
and Smalltalk
Object-oriented modelling and programming is based on four fundamental concepts, namely abstraction, inheritance, encapsulation within classes, and polymorphism. For better
4. Metric analysis and development oriented software Goals such as quality improvement,
support for object-
increasing productivity
C. Ebert, I. MorscheWlnformation
and Sofhvare Technology 39 (1997) 497-509
or maturity certification are of growing interest in industry. Navigating the way with metrics is one important approach to ensure that a company stays on the course of achieving these goals. Though the search for underlying structures and rules in a set of observations is performed in software quality control and effective solutions to refine forecasting methods based on past data have been suggested, so far their applicability to object-oriented software development has been restricted. There is a growing awareness that such approaches could also support the object-oriented software development process. Anybody starting with object-oriented software raises similar questions that also serve as guidelines for developing a measurement tool environment for quality control: What is good style in object-oriented programs? Are there any rules that can be applied to develop a good object-oriented program? Which metrics could be employed in order to determine if a program is ‘good’ or not? What contributes to the complexity of an object-oriented system? Based on such questions there has been substantial work concerning the definition of metrics for object-oriented programs, however most approaches have been based on number crunching after the facts and only a few industrial environments that include complete tool-support were described. With the broader application of this paradigm quality control, both analytic (i.e. with metrics) and constructive (i.e. by providing design and help facilities) quality control are of increasing importance. One of the first attempts to investigate quality aspects of object-oriented programs was done by Lieberherr and colleagues [ 111. They defined a set of design rules that restricts the message-sending structure of methods. It was called Law of Demeter. Informally, the law says that each method can send messages to only a limited set of objects: to argument objects, to the self pseudovariable, and to the immediate subparts of self (self being the object or class itself). The Law of Demeter thus attempts to minimise the coupling between classes. Many applications of metrics for object-oriented programs originated from transforming metric-concepts for procedural programs (e.g. message passing, calling structures, cyclomatic complexity) [7,9,12]. It is of course clear that many concepts hold for object-oriented software as well and thus might be questioned as influencing parameters when it comes to measuring achieving objectives (e.g. volume, class tree). However, such metrics do not cover completely the relevant aspects of coupling, such as inheritance or polymorphism. Other approaches suggest metrics that go beyond such transformations and really focus on object-oriented descriptions, but do not offer any guidelines for using the metrics in practical projects [8,13]. Sharble and Cohen compare two object-oriented development methods using an object-oriented brewery as example [14]. They
501
suggest indicators to enhance the software quality by increasing cohesion, reducing coupling, increasing polymorphism and eliminating redundancy. Many metrics have been defined and applied to a toy environment but no evidence was given that the metrics are useful and applicable - both from an experience viewpoint and from a tools viewpoint - for practical object-oriented development. Our approach for selecting appropriate metrics is goaloriented, rather than following already published literature that often measured what seemed measurable. Such a mere definition of a metrics suite combined with statistical number crunching without intuitive backgrounds would result in the same acceptance problems procedural metrics applications ran into during the eighties [l]. As long as the objectives of a object-oriented development process are not stated and supported with tailored methods, metrics would be of no practical help. We therefore focused on the product and its inherent quality attributes and then determined how to measure their achievement during design. The process for building a development support environment thus closely followed measurement theory [ 11: 1. Identify and define intuitive and well-understood attributes of software quality and productivity that should be achieved during the project. Here we selected reuse and maintainability. 2. Determine metrics that clearly specify these desired attributes. We built this set of quality metrics (as opposed to quality indicators that are determined in a later step) upon interviews with designers and retrieved few reproducible quality metrics related to maintainability and reuse. Most were based on ranking classes of past projects. 3. Specify precisely the underlying documents, structures, or attributes of these documents to be measured. We identified as objects to be measured SmalEtulk programs from the beginning of a project onwards. 4. Determine formal models or abstractions which relate the quality attributes to properties of the underlying documents and their individual attributes. Based on the interviews with designers we built relations between what reusability and maintainability means and how it relates to distinct product properties. 5. Define metrics that measure these selected product properties. These metrics must be available during design because otherwise they cannot be used as indicators for the already selected quality metrics. We thus call them quality indicators. 6. Validate these metrics based on past projects. To overcome the stated problems related to practical applicability and necessary tool support we introduce SmallMetric, a tool to evaluate and meliorate object-oriented programs written in Smalltalk. It is embedded in an environment for the learning of object-oriented programming [1.5].
502
C. Ebert, I. MorscheUInfomation and Software Technology 39 (1997) 497-509
5. A description of the object-oriented framework
metrics
SmallMetric analyses object-oriented programs by applying construction rules that distinguish between (Fig. 2):
. l
the static and dynamic structure of a class or an object; the static and dynamic relationships between classes and or objects.
The metrics that are presented in the following list represent different aspects of object-oriented software. We will describe the underlying intuition of the metrics as well as a comprehensive summary of our observations from objectoriented development projects. 5.1. Metric 1: Volume The volume of an object is a basic size measure that is intuitively connected with the amount of information inside a class. Many empirical investigations of metrics showed relationships among size metrics and comprehensibility or number of errors [7,10,12]. Volume thus is a potential indicator of the effort to develop an object as well as for its future maintenance. The larger the number of variables and methods, the more specific for one application. In other words, the object’s reusability is likely to be small with increasing volume. Volume can be measured by: l l
number of variables (class and instance variables); number of methods (class and instance methods).
Both metrics measure mere volume and do not account for the impact of inheritance and reuse, which will only be seen within a whole suite of metrics.
of its functionality. If the methods are overloaded with information to pass back and forth, there is good reason to assume that the object or class should be broken into several objects or classes. Method metrics are used to forecast effort for debugging and testing early. Method structure can be measured by: l l l
5.3. Metric 3: Cohesion The term cohesion is frequently used in software engineering to designate a mechanism for keeping related things together. Cohesion can be defined to be the degree of similarity of methods. The higher the degree of similarity of methods in one class or object, the greater the cohesiveness of the methods. Cohesion in Smalltalk means the organisation of methods, which set or access the value of a class or instance variable, under predefined schemes (protocols). These protocols are predetermined in Smalltalk. The programmer can use them to manipulate variables of an object. Such methods are called accessors [16]. The intuitive base is that direct reference to class and instance variables limits inheritance by fixing storage decisions in the superclass that can not be changed in a subclass. Besides, modifications in the structure of these variables are not visible to other methods, just to the accessors. Hence, the effort to extend or to modify a given program is minimised. As an example consider an instance variable instVar of an object anobject. To access the class and instance variables it is necessary to define two kind of methods: l
5.2. Metric 2: Method structure.
number of parameters per method; number of temporary variables per method; number of message passing per method.
one method for getting the value of an instance variable instVar
The internal structure of an object based on its methods and the information that is accessed by them is an indicator
* insWar
INTER-
INTRA-
classes / objects
classes I objects naming
used inheritedvariables
number of variables
us& inheritedmethods
number of methods
externaluse of methods
useofprcbcois
coupling:prdocol ‘jxivatd
number of parameters I method
abstractclasses
number of messagespassing~~ cohesion: Fig. 2. Taxonomy
for SmallMetric.
preckftmed
protocols
C. Eberf, 1. MorscheUlnformation and Software Technology 39 (1997) 497-509 l
and another for setting an instance variable instvar: aValue instVar := aValue
This solution forces all accesses to variables to go through an accessor method. Therefore, information hiding with respect to variables and methods in a class is enforced [ 171. SmallMetric examines a Smalltalk program to find accesses to variables outside of the predefined protocols. This is called a cohesion violation of an object. 5.4. Metric 4: Coupling Coupling designates the interaction between objects that are not related through inheritance. Excessive coupling between objects besides inheritance is detrimental to modular design and prevents reuse. The more independent an object, the easier it is to reuse it in another project [ 1,111. The suggested metric is: .
number of invoked classes.
A predefined scheme in Smalltalk is the protocol private. It comprehends methods that should only be activated inside an object. The Smalltalk compiler or interpreter does not check these specific accesses. When a message from another object starts a method under this protocol, undesirable effects can occur because during development such access had not been anticipated. SmallMetric tries to identify such references. 5.5. Metric 5: inheritance tree This group of metrics analyses the amount of inherited variables and methods used by a class. The use of inherited methods and data in a class indicates the difficulty of changing superior classes. On a low level of the inheritance tree variables and methods available to a class could be changed in meaning several times on higher levels, thus increasing complexity even more. It is hence necessary to provide information about how many methods and variables are available to a distinct class. The metrics are: l l
inherited variables used; inherited methods used.
In Smalltalk, an instance variable can be directly set by an object of a subclass. This can reduce the reuse of a class in other applications. SmallMetric nominates it an ‘information hiding violation’ (Fig. 3). 5.6. Metric 6: Class organisation This group of analyses captures three comprehensibility indicators: naming, checking of comments and the use of predefined protocols. Naming analyses all identifiers of a class. SmallMetric informs the developer about their
503
distribution. This metric has just documentation purposes. The existence of comments within an object is also checked. In Smalltalk, one can define a global comment to clarify the intents and functionality of an object. SmallMetric warns when there is no such comment provided. It is clearly impossible to check comments based on contents, so we focused only on existence. The programmer may organise the methods of an object under predefined protocols. The Smalltalk environment advises the developer to use these recommendations which is checked by SmallMetric. For novices, these protocols can help to elucidate some aspects of a Smalltalk program.
6. Experiences with SmallMetric for Smalltalk program analysis Upon starting SmallMetric a window is opened, which inquires the name of the class to be analysed. Wildcards (*) can be used. When the given class is found, a new window is created (Fig. 4). It presents the following information: 1. 2. 3. 4. 5. 6.
number and list of all variables; number of methods; buttons to switch between class and instance predefined protocols used; naming; violations of SmallMetric metrics.
Four buttons are provided to select a new class, to print the information of a class, to switch between different dialogue languages (now English and German) and to activate Help. The critique window of course can be adjusted to specific standards and process guidelines of an organisation. It has a menu, which presents the design limits for development support. SmallMetric comprises a basic set of guidelines for metric-based development support of Smalltalk applications. On the basis of the metrics above as applied to Smalltalk projects with available quality data, we extracted some design guidelines to enhance the quality of object-oriented programs written in Smalltalk. Because one of the main reasons for using object-oriented technology is reusability, we focused our evaluations on maintainability and reusability. Such guidelines should be understood as recommendations and not as restriction of a programmer’s creativity. The projects being analysed ranged in size from few classes to 400 classes of a commercially available Smalltalk-based tool, thus covering effort of up to 30 person years. Our approach for extracting guidelines of metrics that can serve as indicators of poor quality is based on analysing the classes with respect to complexity metrics and quality data. Since the metrics are applied on different scales (complexity metrics: at least interval scale and quality metrics: ordinal scale) we performed non-parametric statistical methods for correlations and factor analysis. Unlike in
504
C. Ebert, I. MorscheWInformation and Sojiware Technology 39 (1997) 497-509
Object subclass: #Superclass instanceVariableNames: ‘text classVariableNames: ” poolDictionaries: ” category: ‘SmallMetric’ Superclass subclass: #Subclass instanceVariableNames: ” classVariableNames: ” poolDictionaries: ” category: ‘SmallMetric’ Subclass class methodsfor: new
‘instance creation’
“super new initialize Subclass methodsfor: ‘initialize release’ initialize text := ‘This is an example of an information Fig. 3. An information
other approaches we do not discard outliers, because it is ex ante - unknown what classes or metrics are outliers. Instead all metrics are normalised with a quadratic approach before comparing or ranking them. To show the practical application of SmallMefric we took
hiding violation
!!’
hiding violation.
a SmaZLtuZk project related to a development environment with simulation on an Occam real-time engine. The whole project was performed by a five person core development team in a timeframe of several years. 71 classes have been provided and we investigated these classes with Small-
Small-Critic on Bibliothek-llserlnterface
Bestandsverwdung Mdnwesen .Verwsltung LeitHele Likraturnachweis Perscnslverwsfhng
65 (27) Class messages at all. 7 (7) Class messages to SELF. 2 (1) Class messages to a super class. 0 (0) Class messages to MODEL. 0 (0) Class messages to VIEW 0 (0) Class messages to CONTROLLER. to c&herclasses. 56 (21) Class messa 0 (0) class messego 0 (0) Class message 0 (0) Ctnss message 0 (0) Class message
Maximal number of meeeagas in a Method: Mexbnsl number 04 Ymthodm: VdUlll&
Marimsl
number af Varisbles:
hWmal
kngthofnemes:
There ere 6 varisbles in this class (more than 5) ! Max. number d cencerned cleereer Structure d method+:
Yawimsl mmlber d mmmmagmm to otiside:
6 Class methods.
Fig. 4. The user interface of SmallMetric.
C. Eben, I. h4orscheUlnformation and So&are
505
Technology 39 (1997) 497-509
quality
cohesion
20
0
40
60
80
100
120
140
160
180
voi-meth Fig. 5. Scatterplot
of number of methods (axis) and cohesion (ordinate) with respect to maintainability
Metric. Before the object-oriented metric collection, however, we interviewed the development team to find out what the ranking of classes in terms of maintainability is, This subjective, yet experience-driven - ranking was further exploited for analysing the structural metrics on their validity as indicators of maintainability. Maintainability involves a longitudinal study which we achieved by interviewing designers after the facts of the first design - which we investigated - when they were already maintaining it for more than a year. The approach can be generalised because it follows exactly to what has been proposed by measurement theory (see Section 4). Fig. 5 shows all 71 samples that were investigated in the course of this analysis (axis is volume in terms of methods, ordinate is cohesion metric, and the shape represents the perceived maintainability). The scatterplot relates volume in terms of number of methods and cohesion on the two axes together with maintainability as the shape of the dots. Fig. 6 shows for one metric, number of methods, the sualr &
__
.
.
. ..
__
.._
m
. ... . .. ..
.
.
variance related to the five levels of maintainability assigned to each sample. The box is drawn with the mean in its centre and one standard deviation size to both directions. Outliers are indicated by the whiskers. In the case of few samples the standard deviation makes not so much sense which results with whiskers inside the box. Not all object-oriented metrics calculated by SmallMetric will be discussed in the context of this article due to space restrictions. We will further concentrate on the following metrics that have been condensed with factor analysis (i.e. factor analysis has been applied for reducing the dimensionality of the original metrics by focusing on what are orthogonal factors; factor analysis typically replaces the original metrics which is why in two cases so-called hybrids were added that mainly reflect structure and cohesion): l
. l
..
vol-meth: number of methods; vol-vur: number of variables; str-meth: method structure (this is a hybrid of structural metrics regarding the code of a method); . .... ._.
,LL.
Cl-.
0
__
20
40
60
80
(shape of dots on a 5-level scale) for all samples.
100
120
.. . . . .. .. . ..f.. ..
1
140
160
0
vol.math
I
180
Fig. 6. Box-Whisker plots for number of methods and the related maintainability levels of a11 samples. Boxes are with mean in the center and one standard deviation length in both directions, which explains why in cases with few samples the outliers are within the boxes.
C. Ebert, I. MorscheUInformation
506
and Software
Technology
39 (1997) 497-509
Table 1 Results of the Spearman rank correlation vol-meth
vol-var
vol-meth vol-var str-meth cohesion comment
l
l
0.800
str-meth 0.607 0.413
cohesion: cohesion (this hybrid of primarily the mentioned cohesion metrics mainly focuses on access to variables outside the predefined protocols); comments: comments (this metric was only added because we wanted to investigate the importance of comments).
All metrics but the last are at least on an interval scale; comments are on an ordinal scale. The related quality metric maintainability is on an ordinal scale as well. This quality metric is on a scale 1. ..5, where 5 is the highest maintainability. 6.1. Analysis 1: Relationships
between the metrics
A Spearman rank correlation was performed to investigate relationships among the metrics (Table 1). Significance levels for most correlations were far below 0.0005; only comments had a significance level above this limit. The highest random correlation coefficient that we generated in 1000 trials with random metric generation based on the given set of metrics observations and their distribution was 0.244. This means that correlation coefficients higher than this limit are meaningful because even many trials with random, however fitting data, would not generate higher correlations. The p value based on the given r-coefficients higher than 0.50 with a significance level of CY= 0.05 is in the interval of [0.43,0.73]. An orthogonal factor rotation of all metrics clearly separates three groups, namely the object-oriented metrics, comments, and quality. For better insight the single linkage values of building clusters of ‘similar’ metrics are given. Single linkage values combine vol-meth and vol-var on a level of 0.35, both with cohesion on a level of 0.42, then quality on a level of 0.50, str-meth on 0.59 and finally comments on 0.92. 6.2. Analysis 2: Regression
model of maintainability
The second step of the analyses performed was regarding the predictability of the maintainability factor. For the complete regression analysis the dependent variable is quality, while the independent variables are vol-meth, vol-var, strmeth, cohesion and comments. The probability > F is equal to 2E - 07. There are 5 independent and one dependent variable in the regression model. Probability > F, commonly known as the p-value, indicates the significance of
cohesion 0.721 0.754 0.414
comments
quality
- 0.131 - 0.194 0.077 - 0.101
-
0.691 0.569 0.439 0.675 0.023
a regression. The smaller the probability, the more significant the regression. The very small value of probability > F indicates, with a high degree of confidence, that some prediction is possible. R-square is the quality indicator of a regression test that measures the quality of the predictions. It shows how much variance in the dependent variable is accounted for in the sample of 71 observations. Adjusted R-square measures the same aspect as R-square, but in the population, with the adjustment depending on both sample size and the number of independent variables. R-square is 0.450 and adjusted Rsquare is 0.408. The residual is 88.2. In our sample > 45% of the total variance in maintainability is accounted for by the metrics in the sample, and > 40% in the population. It is interesting to investigate the same regression without comments as independent variable. R-square and residual values remain unchanged, while the probability > F is a bit smaller. Obviously comments do not account much in the prediction model which was already clear based on the cluster analysis. The standard error in both cases for all predictions is below 1.165, which is a good result given the scale of [ 1,5] for the quality variable. We conclude that the prediction of maintainability from the given metrics is possible. 6.3. Analysis 3: Discriminant
analysis
The third step of the statistical analyses that give insight in the validity of given metrics as quality indicators is typically a real prediction based on some kind of discriminant analysis. Although more advanced analyses are feasible, we will stick to one because the effects are similar [2]. The chosen discriminant analysis tries to fit hyperplanes in a space built up by the independent variables. The planes separate the 5 classes of the dependent variable, i.e. maintainability (Table 2). In total 52.11% of all cases had been classified correctly. Below 10% of all samples were predicted as belonging to the opposite side of the dependent variable (i.e. difference between real value and predicted value > 2).
7. Discussion and further results Based on the three analyses
we conclude
that there is a
C. Ebert, I. MorscheWInformation and Software Technology 39 (1997) 497-509
507
Table 2 Results of discriminant analysis Maintainability Real 1 2 3 4 5
Calculated
1
N 12 5 II 12 31
I 0 0 0 1
strong statistical relationship between object-oriented metrics used within SmallMetric for decision support during development and maintainability. Maintainability can be predicted from combinations of metrics collected from Smalltalk. Based on the described set of metrics we investigated different projects both from industry and academia to provide practical guidelines. Our experiences with quantitative support during the development of object-oriented software projects can be summarised as follows: Provide continuous integration of new components instead of one big integration effort near delivery date. Attack risks during the whole development process actively based on early assessments, especially based on quantitative feedback. Both risk and progress is measured in the product and less in supporting documentation. Quantitative feedback of the product to the engineers and project manager helps in much better scheduling of the next steps (e.g. testing, integration, supportive tool delivery, configuration management). Metrics used as quality indicators during design should cover at least the diversity of object-oriented software in terms of classes (i.e. volume, complexity, structure), messages (i.e. interface structure, communication between entities), and processes (i.e. dynamic structure, run-time dependencies, method usage). Rapid changes are a source of risk because they are a sign of unstable architectures. Quantitative feedback on changes in each class help design co-ordinators during the design, and even more important the test co-ordinators during test to immediately detect critical areas that need thorough investigation. For instance, changes in several classes with same frequency clearly indicate inherent ripple effects that are better removed early than during maintenance. Industrial software projects, such as switching systems, are typical functionally structured. Since object-oriented software emphasises on data structuring, many data structures that were originally part of a module are new classes. It is therefore impossible to reuse quantitative design heuristics of former projects when moving to object-oriented technology in parts of legacy software (as is the case for most large industrial projects).
4
3
2 I
1 0 1 2
1 2 I 2 0
5 0
3
1
I
1 9 15
3 0 13
Our investigations of more classes of Smalltalk projects provide some generalised suggestions related to the design and implementation of classes that due to having analyzed more projects go beyond the described analysis. They seem to be very clear in theory, however practical use often shows the opposite: l
Volume: Number of object attributes or instance variables class; Number of methods per class - maximum 30;
l
per
Structure: Number of message passing per methods - maximum 30; Cyclomatic complexity of methods - maximum 5; Nesting depth within methods - maximum 5;
l
Cohesion and Coupling: Existence of an accessor outside of a predefined protocol; Number of external message passing per method maximum 5; External access of methods under the protocol private - maximum 5;
l
Inheritance: Number of predecessor classes - maximum 5; Number of successor classes - minimum 2, maximum 10; Number of changes to inherited methods - maximum 5 (in fact zero overriding would be the best; however in some cases it cannot be avoided);
0
Class organisation: Number of characters of an identifier - minimum 7; Comments must be given; suggested comment density - minimum 0.2 Editing distance of identifiers - minimum 3.
The measured values were analysed with respect to boundaries (minimum, maximum), intervals, deviations from the average, and nonparametric correlations between them. The interpretation was performed according to these criteria and used as follows:
508
C. Ebert, I. MorscheUlnformation and Sofhvare Technology 39 (1997) 497-509
1. Sustain a high comprehensibility level by providing a sufficient length of descriptive parts in all design objects and object names with meanings, rather than enumeration such as ‘class-l ‘. The descriptions should include subclasses or inheritance relations, changes of inherited methods or variables, functionality, related objects, used data items, date, author, test cases to be performed, requirements fulfilled, management activities and staff connected with this project. 2. During class and object design, the metrics and their statistical evaluation (regarding similar projects) are taken to distinguish between different designs (e.g. alternative approaches, division into subclasses). Rules that can be tailored to support design include size limits (e.g. lines per method) or functionality limits (e.g. methods per class, variables per class). It should be clear that for distinct applications and libraries these limits must be adjusted taking into consideration for instance the design experience, the available resources, or the requested quality. 3. During reviews at the end of design and coding, the metrics are taken as indicators for weak components (e.g. inadequate inheritance hierarchy, unsatisfying object description) and as indicators for process management (timely ordered number of classes or volume metrics). After applying such metrics to several projects, the results obtained can be used to define intervals or limits for projectspecific metrics in order to achieve more tailored quality indicators.
8. Summary Most complexity metrics have been designed without regard to the problem domain and the programming environment. There are many aspects of complexity and a lot of design decisions which influence the complexity of a product. This paper presents an approach to integrate software metrics with design support for object-oriented techniques based on Smalltalk. A tool environment for program analysis called SmallMetric that incorporates metrics and guidelines for improving programs has been developed. Based on this set of metrics we investigated different projects both from industry and academia to provide practical guidelines. This approach to integrate a measurement tool system into Smalltalk illustrates a way to minimise the efforts for implementation and maintenance of such a tool and shows how to cope with changes in future requirements for such tools and their individual interfaces. By transforming the object-oriented information representation into another language it is possible to integrate such measurement techniques into other environments as well. Collecting metrics in the running projects helps in building up a historical database that can be further used for
better estimates and risk assessments in following projects. Be careful, however, not to measure everything; 6.. .12 metrics should be sufficient which is substantiated by the factor analysis we described that reduces factor dimensionality to 5 and less. It is harder to tell what is important than to say what can be measured. Clearly the traditional complexity metrics, such as cyclomatic complexity or function size are not measuring the significant aspects of complexity in object-oriented systems. Sticking to lines only does not help in identifying the maximum size of methods. We rarely see high cyclomatic numbers or numerous lines of code in methods of Smalltalk programs, but this does not mean that Smalltalk programs are better than C programs. It only indicates that Smalltalk utilises a paradigm that distributes functionality into smaller units. With an early analysis of software products we are able to provide developers with helpful hints to improve their designs and code during the development process and not at the end when it win be much more expensive. By following the given suggestions we could improve designs and achieve better programs in terms of such quality items as understandability, reusability and maintainability. Of course, much more research is necessary in order to provide complete guidelines for achieving high quality designs. We consider this approach also as a vehicle towards measuring productivity and estimating effort early in the analysis and design of object-oriented software. The basic step, however, still is the measurement and evaluation of software complexity as early as possible: during the software development process when the most expensive faults are induced (e.g. inheritance trees). By making software engineers aware that there are suitable techniques and tools for analysing their programs, even when they are object-oriented, this could be a small step to avoid a similar software crisis to what we are currently facing in procedural environments.
Acknowledgements The assistance of the Landis & Gyr corporation, Switzerland, to provide product and process data of object-oriented projects is gratefully acknowledged. Several discussions with A. Riegg of Debis in Stuttgart contributed to the proposed guidelines.
References [l] N.E. Fenton, S.L. Pfleeger, Software Metrics: A Rigorous and Practical Approach, 2nd edition, Thomson Computer Press, London, UK, 1996. [2] C. Ebert, Evaluation and application of complexity-based criticality models, Proc. of the 3rd Int. Software Metrics Symposium (METRICS96), IEEE Computer Sot. Press, Los Alamitos, CA, USA, 1996, pp. 174-185. [3] A. Goldberg, D. Robson, Smalltalk-80: The language and its implementation, Addison-Wesley, 1983.
C. Ebert, I. MorscheUlnformation and Software Technology 39 (1997) 497-509 [4] C. Ebert, Visualization techniques for analysing and evaluating software measures, IEEE Transactions on Software Engineering 18 (1 I) (1992) 1029-1034. [5] G. Stark, R.C. Durst, C.W. Vowel], Using metrics in management decision making, IEEE Computer 27 (9) (1994) 42-48. [6] B.A. Kitchenham, S.G. Linkman, D.T. Law, Critical review of quantitative asessment, Software Engineering Journal 9 (3) (1994) 43-53. [7] S.R. Chidamber, C.F. Kemerer, Towards a metric suite for object oriented design, Proc. of Conf. on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA), Sigplan Notices 26 (11) (1991). [8] W. LaLonde, J. Pugh, Gathering metric information using metalevel facilities, Journal of Object Oriented Programming 6 (1994) 33-37. [9] B. Henderson-Sellers, J. Edwards, BOOKTWO of Object-Oriented Knowledge: The Working Object, Prentice Hall, Sydney, Australia, 1994. [IO] M. Lorenz, J. Kidd, Object-Oriented Software Metrics, Prentice Hall Object-Oriented Series, Englewood Cliffs, USA, 1994.
509
[I I] K.J. Lieberherr, I.M. Holland, Assuring good style for object-oriented programs, IEEE Software 6 (9) (1989) 38-48. [12] S. Karunanithi, J.M. Bieman, Candidate reuse metrics for object oriented and Ada software, Proc. Int. Software Metrics Symposium, IEEE Computer Society Press, New York, 1993, pp. l20- 128. [ 131 N.I. Churcher, M.J. Shepperd, Towards a conceptual framework for object-oriented software metrics. Software Engineering Notes 20 (2) (1995) 69-75. [ 141 R. Sharble, S. Cohen, The object-oriented brewery: A comparison of two object-oriented development methods, Software Engineering Notes 18 (2) (1993). [ I.51 I. Morschel, An intelligent tutoring system for the learning of objectoriented programming, Proc. EAEEIE’93, Prague, 1993. [ 161 K. Beck, To accessor or not to accessor?, The Smalltalk Report, vol. 2, no. 8, 1993. [17] D.L. Pamas, P.C. Clements, D.M. Weiss, The modular structure of complex systems, IEEE Transactions on Software Engineering 11 (3) (1985) 259-266.