Deriving structurally based software measures

Deriving structurally based software measures

J. SYSTEMS SOFTWARE 1990; 12: 177.187 Deriving Structurally 177 Based Software Measures Norman Fenton Centre for Software Reliability, The City Un...

1MB Sizes 11 Downloads 156 Views

J. SYSTEMS SOFTWARE 1990; 12: 177.187

Deriving Structurally

177

Based Software Measures

Norman Fenton Centre for Software Reliability, The City University, London,

United Kingdom

Austin Melton Department

of Computing

and Information

Sciences, Kansas State University, Manhattan,

Many software engineering methods place internal structural constraints on the documents (including specifications, designs, and code) that are produced. Examples of such structural constraints are low coupling, high cohesion, reuse in designs and code, and control structuredness and data-abstraction in code. The use of these methods is supposed to increase the likelihood that the resulting software will have desirable external attributes, like reliability and maintainability. For this reason, we believe that the software engineering community needs to know how to measure internal attributes and needs to understand the relationships between internal and external software attributes. This can only be done if we have rigorous measures of the supposedly key internal attributes. We believe that measurement theory provides an appropriate basis for defining such measures. By way of example, we show how it is used to define a measure of coupling.

applicability probably include that until recently software often consisted only of code, and when other types of documents were included, there was little uniformity as to the format of these extra documents or even to what these other documents were. However, it is now clear that for each type of life cycle document, there are interesting and useful measures that can be defined on that type. In fact, many of the efforts to estimate the values of code measures from specifications or requirements involve attempts at defining measures on specification or requirement documents [3, 131. In this article, we introduce a measure of coupling, and this definition is valid on each document type that contains information about modules, their calling structure, and their interfaces.

1.2 Processes, 1. INTRODUCTION

1 .I Measures and the Life Cycle Historically, the functions that measure the static “complexity” of software have been called metrics. We prefer to use the term measures for these functions. Reasons for our preference are grounded in measurement theory that we will introduce later. Our feeling is that the term measure emphasizes that we want our collection of measurements to preserve relationships that exist between the entities being measured. For a more complete discussion of the terms measure and metric, see [Baker et al. 11. Most software measures are defined on executable program text, i.e., on code. The reasons for this limited Address correspondence to Austin C. Melton, Department of Computing and Information Sciences, 234 Nichols Hall, Kansas State University, Manhattan, KS 66506. 0

Elsevier

655 Avenue

Sctence Publishmg of the Americas.

CO., New

Inc. York,

NY

IIMJIO

Kansas

Products,

and Resources

Measurement is usually defined to be the assignment of numbers to objects in such a way as to describe the objects in light of certain properties or attributes. Thus, measurement requires the identification of objects (which we shall also refer to as entities) and attributes, and the “amount” of the attributes in each entity is what is being measured. Thus, the first obligation in any software measurement activity is to identify the entities and attributes of interest. We argue that for software measurement, there are three basic classes of entities of interest. These three classes are processes, products, and resources. A detailed account and justification of this classification is provided in Bush and Fenton [2]. This approach enables us to understand software measurement in terms of measurement theory. Processes are activities having a time dimension. An example of a process is constructing a specification document. Products are deliverables, artefacts, or in general

0164.1212190/$3.50

178

J. SYSTEMS SOFTWARE 1990; 12: 177-187

that arise from activities (or processes) performed during the software life cycle. Examples of products include specifications and design documents at various levels of detail, representations of the source or object code, and test strategy documents. Resources are the miscellany of items that are inputs to processes. Examples include personnel (individuals or teams), materials (including offices), and tools (software, hardware, and methods). Some items are resources for some processes and products of other processes. For example, a CASE tool is clearly a major product of the software company that developed it, whereas a company that uses such a tool for its own design process is using it as a resource. Although many tools may be considered as both products and resources, it will generally be clear from the context which view is relevant. In this article, we restrict our attention to products. There appear to be two broad classes of attributes of products that we are interested in measuring; we shall refer to these as internal and external attributes. External attributes are those product attributes that are dependent on one or more entities in addition to the product in question, e.g.: ~~~~en~s

The reliabilityof program code is dependent on both the machine on which the program is run and on its operational usage (which also includes an implicitly understood specification). The understandability of a specification document is dependent on the person who is trying to understand it. The maintainability of source code is dependent on both the person performing the maintenance and any tools available to support this.

Internal attributes are those that are not dependent on any entity other than the product in question, i.e., static attributes. In the case of specification documents, examples include length, functionality, modularity, reuse, redundancy, and syntactic correctness (if the language specification is assumed known). For formal designs and code, all of these attributes are also pertinent. Later we shall argue that internal product attributes have a crucial role to play in software engineering and measurement.

1.3 Measuring

Software

We propose that anything that we might wish to measure about “software” can be classified as an identifiable attribute of some process, product, or resource. We can not say that this classification provides a perfect division of all entities ever likely to be considered in software production, but we do claim that to date all software entities clearly fit into this classification scheme.

N. Fenton and A. Melton Many misunderstandings surrounding software measures can be attributed to the failure to make clear the distinction between (a) product/process/resource attributes and measures, and (b) internal and external product attributes and measures. The very nature of an external, behavioral, product attribute is such that we cannot measure it directly from the product alone. We are general/y forced to resort to using process m~sures as approximate measures for the product attribute. There are well-documented instances in which external attributes like maintainability and reliability have been defined in terms of the internal attributes of size and structure. In fact, this preempts one of the objectives of software measurement research, which is to determine which internal attributes strongly influence which external attributes. It would certainly be useful if we could obtain an accurate prediction of, e.g., maintainability in terms of internal attributes such as modularity, structural complexity, and size because we could then determine in advance which resources we would need to maintain our software. However, to define maintainability in terms of internal attribute measures is missing the point of rigorous measurement.

1.4 Models It is clear that we must have models of the various software processes, products, and resources before we can discuss these concepts; in fact, our standard documents or “representations” of these items are models. To be able to reason precisely about these concepts, however, we need unambiguous, formal models. Furthermore, as there are different attributes of each software document type, there are different models (or abstractions) of each document type, In general, we need to choose the model that most clearly emphasizes the attribute(s) in question. The casual reader of this document may question the need for our degree of formality. We acknowledge that industrial producers of software systems consider “software metrics” to be nontheoretical, and we agree that the use of software metrics is entirely nontheoretical. However, we also argue that the lack of a sound theoretical basis for software measures has been a major reason for the fragmented state of the art and poor take-up of these tools. We believe that research work in software measures must strike a balance between developing an appropriate theoretical basis and using empirical and observational studies in real environments. Unfortunately, the software engineering journals do not appear to share this view. The emphasis has always been that articles reporting actual measurement activity and experiments are preferred to theoretical work. We draw attention to an observation by Hoyle [7] about Sir Hermann Bondi and his colleagues’ views on the balance between theory

Structurally Based Measures and observation in astronomy, and ask the reader to accept the analogy for software engineering and measures (and bemoan the fact that software engineering has yet to produce a Hermann Bondi!): Through the late 1940’s and early 1950’s, Lyttieton, Bondi, and I were pe~etually at war with the Royal Astronomical Society over its refereeing policy. Our view was that mistakes in theory rarely do much harm, whereas mistakes in claimed observations can do untold harm. Yet the RAS behaved as if the situation was the other way around. The Society treated papers in theory with severity and those in observation were scarcely refereed at all. Bondi undertook the task of surveying the astronomical literature generally so as to build up a catalogue of blunders, both theoretical blunders and observational blunders. Rather to our surprise he found more of the latter, and submitted a paper to this effect to the RAS. The paper caused great embarrassment because the majority of the observational blunders were non-British, and it was felt that giving them the publicity which Bondi had done would cause international offense. 2. MEASUREMENT MEASURES

THEORY AND SOFTWARE

J. SYSTEMS SOFTWARE 12: 177-187

1990;

179

Where no previous measurement has been performed, this constitutes a natural process of understanding the attribute and the entities that possess it. It does not preclude the possibility that more accurate measurement will subsequently be achieved indirectly. Let us assume that for a given set of objects C, we have identified an attribute Q that is potentially possessed by each member of C. Additionally let us assume that this attribute Q induces a set CRof empirical relations Ri, RX,. . . , R, on C, i.e., our understanding of Q leads to the observation that these relations hold. For example, our set of objects could be people, and the attribute in which we are interested might be height. Two natural relations for this set and attribute are the relation “taller than,” i.e., if we let RI denote this relation, then (x, y) is in RI if person x is taller than person y, and the relation “tall,” i.e., if we let RZ denote this relation, then z is in RZ if person z is tall. We are interested in the ordered pair (C, 03)) which is also denoted by 6:. In order to have measurement of an attribute, we need to have a mapping from the set of objects possessing the attribute into a “number system,” e.g., the real numbers & More generally, we need a numerical relation system that consists of a set together with one or more relations on that set. We let N represent a set and let

We present an introduction to the (mathematical) represe~tat~on theory of measurement as presented in works

6

such as that by Roberts [ 141. A basic assumption in measurement theory is before measurement is done there must be a clearly identified set of entities and a clearly defined attribute. The approach is

be a set of relations defined on N. Then the numerical relation system 37, is the pair

to determine axioms that capture intuitive understanding and empirical observations about the attribute. . (Representation Theorem) to show that the attribute can be represented in an appropriate number system by a mapping which preserves the axioms. l (Uniqueness Theorem) to show that any two functions that are defined from the set of entities to the set of numbers and that faithfuliy represent the attribute are related in certain ways. l

We are careful to talk about representation in a number system rather than in the restricted case of the real numbers. One of the major criticisms of the mathematical approach (,see, e.g., [4]) has been a perceived reliance on the real numbers for all representations. It has been shown (in [6, 141, e.g.) that the mathematical approach very easily generalizes to arbitrary types of number systems. We assume this generalization here. The representational theory of measuremenr is initially concerned with the direct measurement of attributes, i.e., measurement that does not depend on any other attribute (in contrast to indirect measurement).

== (PI,

p2,.

f . ,Pn)

X=(N,6)

For example, we may have N = R (the set of real numbers) and we may have the relations: PI C El x 3: given by “(x, y) E PI if x > y”. Pz C ‘3: given by “x E Pz if x > 70.” In order to have a measure M for our attribute, we need to identify a numerical relational system X = (N, 6) whose relations 6 “correspond” to the empirical relations (R under the mapping A4 : C --+N. (For our height example, we could have relation PI correspond to relation RI and relation P2 correspond to relation RI_) Thus, 1!4 maps objects in C to elements in N, and it also maps relations (in 0%)into the appropriate relations (in 6). This mapping A4 must insure that all the empirical relations are preserved in the numerical relational system. Such an A4 is called a homomorphism. For the formal definition of m~surement~ we let C be a set of objects each containing attribute Q such that CR=

(RI,...,&)

180

I. SYSTEMS SOFTWARE 1990; 12: 177-187

is the set of relations

N. Fenton and A. Melton

in C defined by Q and

taking logarithms of real numbers, the key question is whether the results of these operations give meaningful results about the objects being measured.

6 = (PI,...&) is a set of relations

in a numerical

relational

system 92. 2.1 Regular

M: i.e.,

(C, @) ---) (N, 6)

is a measure for Q if from C to N

M(&) = Pi

Rj(X,, . . . ,xk) if and only if Pi(M(xl),

and Meaningfulness

The theory of scale types and meaningfulness of operations provides a mechanism within which to reason about when statements about measurement make sense. Consider the following statements:

M:d:-+Yz

M is a function

Scales

. . . , M(xk))

The last condition is called the representation condition. When this condition is fulfilled, then the attribute in question has been genuinely captured. The representation condition requires that measurement establish a correspondence between the objects in C (or more appropriately between the elements in a model of C) and the numbers in such a way that the relations induced by Q on C imply and are implied by the relations between their images in the number set. Having obtained a homomorphism M (also called a representation); the triple (d: , 92, M> is called a scale. When d: and 3t are obvious from the context, we sometimes refer to M as the scale. The first basic problem of measurement theory is the representation problem: Given an observed relational system 6: and a “matching” numerical relational system 92, find necessary and sufficient conditions for the existence of a homomorphism from 6: into 3t. The conditions are called the axioms for the representation, and the theorem stating their sufficiency is called a representation theorem. The second basic problem of measurement theory is the uniqueness problem. The homomorphism given in the representation condition is not in general unique. In the height example, the numbers may be assigned according to centimeters or inches among many choices of scale. In fact, if M is a measure for height, then so is any scalar multiple aM. We say that crM is a resealing of M. Just which resealings are allowed depends on the properties of the relational system (C, @l), and it is these resealings (characterized by a uniqueness theorem) that determine what kind of scale M is. A uniqueness theorem puts limitations on the kind of mathematical manipulations that can be performed on the measurement values, i.e., on the images of M in N. Although one can always perform operations like adding, averaging, and

Statement 1: The number of errors discovered during integration testing of program X was at least 100. Statement 2: The cost of fixing each error in program X is at least 100. Statement 3: A semantic error takes twice as long to fix as a syntactic error. Statement 4: A semantic error is twice as complex as a syntactic error. Arguing at a purely intuitive level, statement 1 seems to make sense but statement 2 does not, for the number of errors may be specified without reference to a particular scale, whereas the cost of fixing an error cannot. Statement 3 seems to make sense (even if we think it cannot possibly be true) whereas statement 4 does not, for the ratio of time taken is the same regardless of the scale of measurement used (if a semantic error takes twice as many minutes to repair as a syntactic error, it also takes twice as many hours, seconds, years, etc.). But the ratio of “complexity” (which is ambiguous anyway in this statement) is not necessarily the same. Measurement theory allows us to determine the meaningfulness of statements like those above. This determining is based on the uniqueness of the homomorphisms satisfying the representation condition. We have seen that it is possible for there to be more than one homomorphism from an empirical relational system d= into a numerical relational system X. What we want to determine is the allowable ways of transforming one acceptable homomorphism into an other acceptable homomorphism. This information will tell us, for example, when statements like number 3 above are meaningful. To make our discussions simpler, we restrict our attention to numerical relational systems whose underlying set is $d. In this case the scale (d: , 32, M) is called simply a numerical scale. We need to introduce the notion of an admissible transformation. Consider the example of measuring (the usual notion of) length of physical bodies. Given any numerical assignment M, i.e., a mapping that satisfies the representation condition for all the relations of length, it is easy to see that any scalar multiple crM of M also satisfies the representation condition (provided 01 > 0). The transformation 4 that maps M into (YM is

5. SYSTEMSSOFTWARE

Structurally Based Measures

Admissible Transformations ’ = F(M)

Scale Types Nominal

Examples Labels

Ordinal

Preference, hardness, air quality intelligence teats (raw scores)

(~~-l=rn~~~~ M’

(F monotonic increasing ie M(z) > WV) =+ ~~(~~ > ~‘(~)) M’=aM+p

(ck > 0)

M”=aM M’ = M

181

t99o;t2:177-187

(a > 0

Interval

Ratio Absolute

called admissible because under the transformation the numerical assignment still preserves the representation condition. Generally and formally we define: Definition 2.1. Suppose M : 6: + 32 is a homomorphism where & = (C, a) and 37 = (N, 6). Suppose that I#Iis a function that maps the range of M, the set M(C) = {M(c) : c E C) into the set N. Then the composition 4 o M is a function from C into N. If I#J o M is a homomorphism from 5: into %, we call 4 an admissible transformation of s&e.

Time (Calendar) Temperature (Fahrenheit, Centigrade) Intelligence tests (“standard scores”) 1 Time interval, Length, Temperature (Absolute) Counting 1 Figure 1.

Scales of measurement.

operations can be meaningfully performed on given measurements. The mathematical aspects of measurement theory are largely concerned with theorems that assert conditions under which certain scales of direct measurement are possible for certain (numericaI) relational systems. A typical example of such a theorem due to Cantor {see [14]) gives necessary and sufficient conditions for ordinai measurement in an important class of relational systems over a countable set:

Definition 2.2. If (6:) 3Z, M) is a scale such that for every scale (C, X, M’) there is a transformation rb: M(C) + N for which M’ = Cpo M, then the scale (C , 32, M) is called regular. If every homomorphism M from 5: into 37. is regular, we call the representation

Theorem 2.4. Suppose C is a countable set and A is a binary relation on C. Then there is a real-valued function M on C satisfying

5: f

if and only if (C, R) is a strict weak order. Moreover, if there is such an M, then 6 = (C, R) --+ X = (W, > ) is a regular representation and (5:) 3X, M) is an ordinal scale. Note: A strict weak order R over C is asymmetric (xRy + l(yRx) for all X, y E C) and negatively transitive (xRy ==+xRz or sRy for all X, y, z E C). This abstract result has important ramifications for socalled complexity measures. Many attempts have been made to find a single real-valued function to characterize program complexity. The most well known is McCabe’s [IO] cyclomatic complexity. Programs form a countable class of objects C and, thus, the complexity functions defined are (ordinal) measures of “complexity” if and only if the relation R: “x is more complex than y” is a strict weak order. It is our contention that no general notion of complexity can give rise to such an order because negative transitivity is not a reasonable expectation. In Figure 2, it seems plausible that xRy but that neither xRz nor zRy holds. However, if we were to assume that McCabe’s cyclomatic complexity M were a valid measure of com-

3X regular.

Thus, a representation 6: -+ X is regular if, given any two scales M and M’, each can be mapped into the other by an admissible transformation. Most scales encountered within software measurement are regular. For regular representations, we have the following definition of meaningfulness. D~~nition 2.3. A statement involving numerical scales is meaningful provided its truth is invariant under all admissible transformations. If a representation 6: -+ 3;1 is regular, i.e. I all scales (de, 3t, M) are regular, then the class of admissible transformations defines how unique each scale is, and these admissible transformations can be used to define the scale typ. The most important scale types are summarized in Figure 1. Thus, for example, if the collection of admissible transformations is closed under all l-1 mappings, then the scale type of the measure is ~orn~~a~; and if it is closed under all scalar multiplications (as for height), then the scale type is ratio. Now we can decide which

182

N. Fenton and A. Melton

J. SYSTEMS SOFTWARE 1990; 12: 177-187

only if _:x(6

X

Y

Z

Figure 2. Complexity relation not negatively transitive?

plexity, then since M(x) = 3, and M(y) = M(z) = 2, it would be the case that xRz, i.e., x is defined to be more “complex” than z. But whether x is or is not more complex than z is not clear; therefore, we do not want our “complexity measure” to tell us that it is. In addition to Theorem 2.4 giving us insight into the usability of complexity measures, it also can be used to construct ordinal scale measures of specific complexity attributes (like control-flow path structuredness) that do reasonably lead to strict weak orders. It is worth noting that the real task is first to establish the relation, i.e., the document ordering, since the ordinal measure then follows naturally. One of the most powerful applications of the theory of scale types and meaningfulness is in determining what types of operations or statistical analyses can be sensibly applied to particular types of measures. Rather than presenting a comprehensive account of this work (the reader can find such accounts in Ellis [4] and Roberts [14]), we restrict our attention to an important example that has significant ramifications for many software measures. One of the most common operations that we wish to apply to a set of measurements is to determine the average in some sense. Suppose that M is a measure and that X = {xl,. . . ,x,} and Y = (~1,. . . ,y,} are two sets of objects for which we know M(xi) and M(yj) in each case. We want to be able to determine whether the average measure in X is greater than the average measure in Y. Thus, we want the statement The average of the M(x;)‘s is greater than the average of the M(yj)‘S to be meaningful. The most common way to compute averages is to use the arithmetic mean. In this case we would want the statement -1 CM(Xi) n

> AEM

(1)

to be meaningful. We know that it is meaningful if for all admissible transformations 4, Eq. (1) holds if and

O1M)tXi) > AC(4

O W(Yj)-

(2)

It is easy to see that if 6 is a transformation of type 9(x) = CYX,a > 0 or even of type 6(x) = czx + 0, a > 0, then Eq. (1) holds if and only if Eq. (2) holds. Thus, Eq. 1 is meaningful if M is either a ratio or interval scale. Thus, we can meaningfully take the arithmetic mean of such measures. However, Eq. (1) is not meaningful if the scale of M is ordinal. To see this, suppose we have an ordinal scale for software “complexity” that is a ranking of five different classes of complexity:

Trivial Simple Moderate Complex Incomprehensible

1 2 3 4 5

1 2 3 4 10

The last two columns represent two perfectly acceptable ordinal scales of measurement in this case, which we shall call MI, M2, respectively. There is an admissible transformation mapping Ml into M2, i.e., the monotonic mapping {1+1,2+2,3-+3,4--+4,5-+10}. Now suppose Sl, S2, Sl,, Si are software systems for which M,(S1) = ~,MI(&) = 4,M,(S’,) = ~,MI(&) = 5. We hope to be able to say which of the sets {Sl, S2) and {Sl, , S$} has the greatest average complexity; however this is not meaningful because 3.5 = ;(M,(S,) +

M,(S;))

+M1@2N

>

;WdS;)

<

;W2@;)

= 3

whereas 3.5 =

;V42W

+

+M2@2))

M2(S;)) = 5.5.

Since the latter equation is derived from the former by an admissible transformation for the scale MI, the former is not meaningful, since the inequality does not hold in the latter. All is not lost because there is a measure of “average” value that is meaningful for ordinal scale measures. This is the median value, i.e., the value of the middle ranked item. Note also that for nominal scale measures, the median is not a meaningful measure of average, but the mode (most commonly occurring class of item) is meaningful.

Structurally

Based Measures

3. INTERNAL SOFTWARE STRUCTURES Most of the software engineering methods that have been proposed and developed in the last 25 years provide rules, tools, and heuristics for producing software products, and in particular these methods show how to provide structure. This structure is found at two levels: . In the development process, i.e., certain products need to be produced at certain relative stages. . In the products themselves, i.e., the products must conform to certain internal structural principles. In the second case, we emphasize that the use of software engineering methods leads to the construction of products whose distinguishing properties are internal and almost entirely structural, e.g., modularity, reuse, low coupling, high cohesiveness, redundancy, Dstructuredness, hierarchical, and data-abstraction. Furthermore, there is an almost axiomatic assumption among software engineering experts that the existence of these internal structural attributes will assure the existence of the external quality attributes expected by software users, e.g., reliability, maintainability, and usability and the process attributes expected by managers, e.g., productivity and cost-effectiveness. If this assumption is incorrect, then almost all software engineering research and development has been worthless. A more thorough argument to support this thesis may be found in Johnson and Loomes [8]. However, in spite of the important connections between the internal structure of software documents and the external and process attributes of the software, there have actually been very few scientific attempts to establish specific relationships between the internal and external attributes. An important reason for this is the lack of understanding of how to measure important internal attributes of software products. We believe that measurement theory as introduced above can provide the relevant basis for deriving measures of internal attributes. The authors have already applied these ideas to measures of control flow attributes [5, 111, but these are only really applicable to source code. In the next section, we illustrate how these ideas can be used to derive a measure of an important attribute of designs, i.e., coupling.

J. SYSTEMS SOFTWARE 1990; 12: 177-187

183

tionships are described. Since a design or specification document represents a description of a software system that is available before the system is implemented, any measures of important attributes of such documents may provide useful information about the system at an early phase in the development. Unless otherwise stated, we shall assume that the products in question are design documents detailing the module interrelationships. Texts such as that by Pressman [12] define coupling as a measure of the degree of interdependence between modules The definition is ambiguous since it is unclear as to whether coupling is an attribute of the design as a whole or of each pair of modules. We assume the latter since this is implicit in Pressman 1121, and, in any case, the former, which we shall refer to as global coupling, ought to be derivable from the latter. There is something else that is unsatisfactory about the above definition. Although defined as a “measure,” there appears to have been no serious attempt to provide a numerical characterization of this attribute. This is particularly strange because there appears to be well-established empirical relations about coupling that indicate that there is at least an ordinal scale of measurement. First, we mention some binary relations that exist on pairs of modules x, y: Rs: (x, y) E R5 if x refers to the inside of y, i.e., it branches into, changes data, or alters a statement in y. The type of coupling characterized by the relation R5 is called content coupling. R4: (x, y) E R4 if x and y refer to the same global data. The type of coupling characterized by the relation R4 is called common coupling. This type of coupling is undesirable because if the format of the global data needs to be changed then all common coupled modules must also be changed.

R3: (x, y) E R3 if x passes a parameter to y with the intention of controlling its behavior, i.e., the parameter is a fag. The type of coupling characterized by the relation R4 is called control coupling.

RZ: (x, y) E R2 if x and y accept the same record type

4. COUPLING

as a parameter. The type of coupling characterized by the relation R2 is called stamp coupling. This type of coupling may manufacture an interdependency between otherwise unrelated modules.

The properties considered in this section are attributes of any software product for which the underlying modules are apparent. Examples of such products are the source code and, perhaps more importantly, any system design document (and even certain types of specification documents) in which the system modules and their interrela-

RI : (x, y) E R, if x and y communicate by parameters, each one being either a single data element or a homogeneous set of data items that do not incorporate any control element. The type of coupling characterized by the relation RI is called data.coupling. This type of coupling is necessary for any communication between modules.

184

J.SYSTEMS SOFTWARE

N. Fenton and A. Melton

199C; 12: 177-187

5. Content coupling 4. Common coupling 3. Control coupling 2. Stamp coupling 1. Data coupling 0. No coupling

(x’, y’) than between (x, y). This relation has the property that if (Xi, yi) is any pair of modules in Ri for i=0,...,5,then

Bad 1 I 1 Good

(x0,

Ro: (x, y) E Ro if x and y have no communication, i.e., are totally independent. The type of coupling characterized by the relation R. is called no coupling. With this classification of types of coupling, there is an empirical ordering of the types as shown in Figure 3, where the order of worst type of coupling to best runs from top to bottom. At this point, it is desirable to have a precise model for studying coupling. The model we choose is a directed, labeled multigraph, i.e., a directed and labeled graph that may have many arcs between two nodes. The label on each arc is an ordered pair in which the first component represents a type of coupling as shown in Figure 3, and the second component represents the number of times the given type of coupling occurs between the nodes (in a directed sense). In Figure 4, we have part of a coupling-model graph. This graph represents four modules where . modules M1 and M2 share two common record types. . module Ml passes to module M3 a parameter that acts as a flag in A43. . module I& branches into module A& and also passes two parameters that act as flags in M4. We say that there is a relation + defined on the set of pairs of modules; the relation is if (x, V) and (x’, Y’) are pairs of modules then

(x’, Y'>

when there

is a stronger

level of coupling

between

4

Gw

. M2

(22)

M3 ’

4 (x2,

Y2) + (x39 Y3) Ys),

i.e., any pair of modules that are content coupled have a stronger level of coupling than any pair that are common coupled, which in turn have a stronger level of coupling than those that are control coupled, etc. The simplest measure of pairwise coupling that preserves the relations identified would be a mapping like M(x,y)=ifori=0,...,5 i.e., the measure of coupling for those modules that have no coupling is 0, for those that have data coupling is 1,. . . , for those with content coupling is 5. A problem with this approach is that we are unable to take account of the different types of coupling that can occur inside the six classes identified. For example, it seems intuitive that if (x, y) and (x’, y’) are both pairs of content coupled modules, i.e., are both in R-j, but if x makes only one reference to the inside of y and x’ makes several references to the inside of y’, then the level of coupling between (x’, y’) is stronger than between (x, y), i.e.,

(X>Y) + (x’, Y'> To implement such a measure, we need to count the number of interconnections of each type between pairs of modules. This would allow us to give measures of each type of coupling, e.g., the amount of content coucling between modules x and y could be seven interconnections; but this counting method does not easily suggest a single measure of coupling between modules. If we want a single measure of coupling between modules x, y that preserves all the relations identified so far, then the following is such a measure on an ordinal scale : M(x, y) = i + -$

Figure 4. A coupling-model graph.

Ml.

Yl)

+ (x49 Y4) 3 (x5,

Figure 3. Types of coupling.

(x, Y) 4

Yo) -X (XI,

M4

where i is the above measure of greatest coupling type, e.g., i = 3 if x, y have control coupling, and n is the number of interconnections between x and y. Having established some intuitively reasonable measures of coupling in the sense of pairwise coupling between modules, we turn our attention to the case of global coupling, which may also be thought of as a notion of connectivity of the module design chart. We again consider empirical relations that exist for this attribute. One such relation can be encapsulated as an

Structurally Based Measures intuitive axiom for coupling: Axiom 1: If 0 and D’ are module design charts (hereafter referred to as simply structure charts), such that the only difference in I)’ and D is the inclusion of an extra interconnection in D’, i.e., there is an extra arc in D’, then D’ has a greater level of coupling than D.

If we look for a measure that preserves this axiom, then the simplest sort would be the sum taken over all pairs of modules in the structure chart of the coupling measures defined above. This is in fact a sophisticated version of some crude global coupling measures that have been proposed and that are simple counts of the total number of arcs. However, such an approach appears to be mixing in a separate notion, i.e., size. Global coupling seems to be understood as a measure of a kind of “average” level of pairwise coupling, which could be characterized by the axiom Axiom 2: Suppose S is a system consisting of two moduies Di and 4, so that the global coupling of S is just the pairwise coupling between Di , c)z. If S is extended to S’ by the addition of module D3 and if the pairwise coupling between D I , 03 and 02, D3 are equal to the pairwise coupling between Dr ,l&, then the global coupling of S and S’ are the same.

Note that the “average” is increased if exactly one arc is added (as in axiom 1) but is not necessarily increased if another module with some new interconnections is added. In fact, the overall level of coupling may decrease. In this example, control coupling between two modules is “reduced” to data coupling by the introduction of an extra module and two extra arcs. There are two ways to think of the global coupling as having been reduced: 1. because the total control coupling is now zero, and control coupling “dominates” data coupling. 2. because the ‘*average” coupling between module pairs is reduced. The second approach seems more appealing, but there is a problem. We have acknowledged that at present, we cannot do better than an ordinal scale for the general notion of coupling, and we have noted in the last section that there is no meaningful notion of average in the arithmetic mean tense for such measures. However, there is a meaningful notion of average if we consider median values. Thus, we may define an in~itively reasonable measure of global, i.e., average system level, coupling as: The global coupling of a system S consisting of modules {D,, ,D,} is given by the median value of the set {M(Di, Dj): 1 5 i < j 5 n} where M is the measure of pairwise coupling given above.

I, SYSTEMS SOFTWARE 1990: 12: 177-187

185

In one of the few references in which a serious attempt has been made to relate coupling to some measures, Troy and Zweben [15] identify a range of different counts that are obtainable from the structure chart and suggest that all of these “contribute” to coupling. While this is useful in helping us to further our understanding of coupling, it goes away from the idea that coupling is itself a measurable attribute. It is also interesting to note that most of the proposed counts suggest a view of coupling as being a property of individual modules (rather than pairs), i.e., the extent to which each module is coupled to others. In this sense, the attribute is dealt with elsewhere, where some of these measures have a useful value in their own right. The measures of Troy and Zweben, which interpret coupling in the global sense as an “average” view per module as opposed to pairs of modules as suggested here, include: the maximum number of interconnections per module. the average number of interconnections per module. the total number of interconnections per module. the number of modules accessing control interconnections . the number of data structure interconnections to the top level module. Troy and Zweben have attempted to test the following hypothesis: Programs with high coupling contain more errors than programs with low coupling.

The problem is that without a proposed measure of coupling the hypothesis is untestable. What Troy and Zweben have done is attempt to find linear correlations of all the various “counts” contributing to coupling against the number of errors recorded for various programs. This does not test the hypothesis above! With the measure of coupling we suggest, a more interesting hypothesis (which gets rid of many of the experimental design flaws) could be properly tested: For programs of similar length and levels of functionality, those with higher coupling contain more errors In fact, a number of other similar hypotheses could be tested, e.g., “contain more errors” could be replaced by “are more difficult to maintain.” What is interesting and novel about our approach is that it may lead to measures of coupling and the like being used constructively in design. For example, we must dispel the idea of simple linear correlations with errors-it seems far more plausible that for a given size program, there is likely an optimal level of coupling with regard to likely errors (and this optimal level will certainly not be zero!). It is this kind of a numerical standard that may be achievable.

186

N. Fenton and A. Melton

1. SYSTEMS SOFTWARE 1990; 12: 177-187

It should be noted that our approach differs significantly from that of Henry and Kafura 193, even though the objects for measurement, i.e., module design structure charts, are effectively the same. An important difference is that we are not attempting to provide a general design “complexity” measure, but rather we are defining a measure of a very specific and yet important property of these design structures. Moreover, unlike the Kafura and Henry measure, the measure proposed here is based on (we hope) unarguable axioms about the property being measured. 5. CONCLUSIONS

AND FUTURE WORK

One of our major claims is that researchers in software measurement have been overlooking an important claim of software engineering, i.e., that there are strong connections and relationships between internal attributes of software and impo~ant external attributes. This observation is potentially very impo~ant for software measurement because it is much easier to get direct measurements of internal attributes than it is of external ones. Thus, if we can define good, accurate measures for internal attributes, and if we can determine the relationships between the internal and external attributes, then we can begin to get accurate measurements (or at least good estimates) of external attributes that are so important for many software engineers. Also we claim that researchers in software measurement have failed to take advantage of measurement theory. As we show in this article, measurement theory can help us see how much we can reasonably expect our measures to do. For example, just because we have a function defined from a set of program documents to the set of real numbers, it does not follow that we can average the values associated with a subset of the documents and get a useful value. Measurement theory’s most important contribution to our work may be as a guide in the way we approach our research. From measurement theory, we realize that it is clearly not enough to define functions from document sets to a set of numbers. Before we try to define unctions, we need to understand the properties of what we are trying to measure. Of course, we must have a well-defined set of documents, and we must decide which attributes we want to measure. Furthermore, we need to understand the properties of these attributes or of this attribute. Last, we find or define a number system that has these same properties. If we use the real numbers as our answer set, then we cannot perform more operations on these answers than is allowable from the relations and axioms defined on the document set. As we try to determine the relationships between in-

ternal and external attributes, the “spirit” of measurement theory can still be helpful. We are not just trying to define unctions or relations between the internal and external attributes; we must determine what properties each attribute has and have some theory that can relate the attributes and their properties. This direction of work holds promise for software measurement, and it also may unlock some keys to why certain software engineering methods work so well. In a real sense, software measurement is not just following software engineering and trying to put numbers on what software engineering does. Software measurement is understanding the foundations of software engineering, and then our supplying of the “numbers” is a natural and relatively easy task. If we want to really evaluate the efficacy of software engineering methods, then we have an initial obligation to determine the extent to which a given method has actually been applied. As we have argued, this invariably means dete~ining the extent to which certain documents and internal properties of these documents are present. Thus, rigorous measurement of specific internal attributes is a necessary first step in software methods evaluation, as well as in the evaluation of the relationship between internal and external attributes. ACKNOWLEDGMENTS Research for this work is supported in part by NATO Collaborative Research Grant 034/88. Research (N. F.) is supported in part by ESPRIT project PDCS and by British Telecom. Research (A. M.) is supported in part by ONR Grant NO001 4-88-K-0455.

REFERENCES 1. A. L. Baker, J. M. Bieman, N. E. Fenton, D. A.

Gustafson, A. C. Melton, and R. W. Whiny, A philosophy for software measurement, J. Syst. Software 12, 277-281 (1990). 2. M. Bush and N. E. Fenton, Software measurement: A conceptual framework, J. Syst. Software 12, 223-231 3. 4. 5.

6.

7.

8.

(1990). T. DeMatco, Controlling Software Projects, PrenticeHall, Englewood Cliffs, New Jersey, 1982. B. Ellis, Basic Concepts of ~~~~~e~i~ Cambridge University Press, Cambridge, England, 1966. N. E. Fenton, Software measurement: Theory, tools and validation, IEE Software Engineering J. 4, 56-68 (1990). L. Finkelstein, Representation by symbol systems as an extension of the concept of measurement, ~y~rnetes, 4, 215-223 (1975). F. Hoyle, Sir Hermann Bondi-70th birthday: Reminiscences from the impressionable years, Bulletin IMA, 25, 282-284 (1989). J. Johnson and M. Loomes, ed.; The Mathematical Revolution Inspired by Computing, in 77~ ~~~~~~~cs of

Structurally Based Measures Complexity of Software Engineering and Computer Science,

Oxford University Press, London, 1990. 9. D. Kafura and S. Henry, Software quality metrics based on interconnectivity, J. Syst. Sofrware 2, 121-131 (1981). 10. T. J. McCabe, A complexi~ measure, IEEE Trans. Software Engineering

SE-2, 308-320 (1976).

11. A. C. Melton, D. A. Gustafson, J. M. Bieman, and A. L. Baker, Mathematical perspective of software measures research, IEE Software Engineering J. 1990, in press.

3. SYSTEMS SOFTWARE 1990:12: 177-187

187

12, R. S. Pressman, Software Engineering: A Pmctitioner’s Approach, McGraw Hill, New York, 1987. 13. Proc. Joint SHARE/GUIDE Symp., Measuring Application Development Productivity, 1979. 14. F. S. Roberts, Measurement Theory with Applications to Darien faking, Utility* and the Social Sciences,

Addison Wesley, Reading, Massachusetts, 1979. 15. D. A. Troy and S. H. Zweben, Measuring the quality of structured design, J. Sysr. Software 2, 113-120 (1981).