Complexity assessment: A design and management tool for information system development

Complexity assessment: A design and management tool for information system development

COMPLEXITY ASSESSMENT: A DESIGN AND MANAGEMENT TOOL FOR INFORMATION SYSTEM DEVELOPMENT Management JEFFREY E. KOTTEMANX and BENN R. KONSYNSKI Systems...

1MB Sizes 1 Downloads 32 Views

COMPLEXITY ASSESSMENT: A DESIGN AND MANAGEMENT TOOL FOR INFORMATION SYSTEM DEVELOPMENT

Management

JEFFREY E. KOTTEMANX and BENN R. KONSYNSKI Systems, University of Anzona. Tucson. AZ 85721. U.S.A

Information

(Received 7 October 1981; in reusrd form

I June 1982)

Abstract-Information system complexity is related to many software engineering metrics. Design, construction and maintenance difficulty can be associated with their respective complexity metrics. Complexity measurement algorithms for information system schemas are considered. A set of properties is proposed that serve as a basis for complexity assessment throughout the life cycle. Heretofore, an emphasis in system engineering has been the analysis of software program complexity. Current work, as well as new alternative measures, in software complexity are discussed. Complexity measurement is extended to include other major phases of the system life cycle. Benefits include the evaluation of complexity propagation between system development life cycle phases. Complexity projection provides for improved management of the development process. The implementation of the Dynamic Complexity Monitor (DYCOM) is discussed. DYCOM is comprised of analyzers and models useful in both research and industrial environments. DYCOM supports complexity assessment for requirement definitions, logical and physical design schemata, and implementation and maintenance. The system determines complexity projection functions between life cycle phases. Further, projection functions are automatically updated as the system is used. DYCOM “learns” from system development experience. Also. proposed complexity models can easily be added 10 DYCOM in order to evaluate their utility. DYCOM serves as a laborator) for examination-of alternative complexity metrics. INTRODUCTION Complexity

as measurement

It is less difficult to address complexity when we introduce a context of evaluation. a viewpoint and focus for observation. By providing a context (i.e. b> specifying certain tasks to be performed with certain objects). the meaning of complexity becomes more formal. and one can determine the relative difficulty of performing tasks. Furthermore. the measurement of relative difficulty can be made in terms of a unir that is meaningful in the context of the application (e.g. time to do the task). Complexity is closely allied to the notion of difficulty. Within a given context, complexity is recognisable as a measurable cost. The determinatlon of the contexts of system complexity and the respective cost functions is one tool in the process of moving program/system design from (basically) an art to (basically) a science. or at least an engineering discipline. Information system complexity is related to many software engineering metrics. Design, construction, and maintenance difficulty can be associated with their respective complexity measures. Throughout the system life cycle. measures of complexity can serve as design quality metrics. Further, design quality can be viewed as the potential propagation of complexity from a given life cycle phase to subsequent phases. This paper discusses complexity in the system development life cycle. A grouping of phases in the life cycle is proposed that forms realms of common complexity concerns, within each of the realms, a basis of complexity properties is discussed which offers a unified view of complexity throughout the life cycle. The majority of research into complexity focuses on software complexity. Selected metrics for assessing software complexity are discussed in Section 2. The following sections discuss complexity

information

Decisions are based on information, value systems and evaluation procedures. Information is frequently presented in terms of “measurements” that communicate such things as “status,” “environment.” “performance,” etc. Measurements reflect the “values” of the observer (individual or system), and may bias the decision-making process and consequently the resulting decision. This paper addresses complexity issues related to the decision-making process in the information system development process. The intent of the discussion is to explore complexity measurements that may be useful in the information system design process and to discuss the implementation of a tool, the Dynamic Complexity Monitor, which is used to automate complexity assessment. The emphasis of the inquiry is on the technical system characteristics, both static and dynamic. Nevertheless, many of the issues and concerns presented are applicable to important behavioral issues. The nature of complexit?

Complexity is a phenomenon of observation. It is easily recognised but difficult to formally define. One feels that he can make relative comparisons and partial orderings, yet explicit quantification eludes us. If one were to ask an automobile mechanic, “What is complexity?’ she may well answer, “The complexity of what?” If one were to ask her, however, “What is more complex, a carburetor system or a fuel-injection system?‘. She would answer, “A fuel-injection system.” Asked why, she might say, “Because when I have to trouble-shoot a fuel-injection system, it usually takes me twice as long.” I95

JEFFREYE. KOTTEMANNand BENN R. Kousurmt

196

assessment in other life cycle phases, the implementation of an automated system (DYCOM) for complexity measurement and forecasting, and the relationship between complexity and productivity measurement. AN IS COMPLEXlTY

MODEL

FRAMEWORK

The system life cycle is one useful model of system development and evolution. Components that occur in the system life are shown below.

realms assists the designer~manager in identifying attributes which cause conflicts in complexity concerns. A simple example of this type of complexity conflict is illustrated below. The code in this figure was constructed to optimize run-time efficiency. Using multiplication, the 16th power of a variable X is calculated.

Although

this code

is computationally

efficient and may be desirable in R,,

it is undesirable

in R,. y:=,y:

(1) Requirements definition. (2) Logical design. (3) Physical design. (4) Construction. (5) Testing. (6) Implementation. (7) Operation. (8) Maintenance and modification. For purposes of this discussion, and for the purpose of implementing the Dynamic Complexity Monitor discussed below, we have partitioned these life cycle components into five sets which we call complexity realms denoted by R. The choice of realm partitions is based upon temporal and operational relationships between life cycle phases.

R, = 111

Requirement definition.

& = PI

Logical design.

R, = [3,4,5,61

Physical design through implementation.

JG = 171

System performance.

R, = PI

Maintenance

and modification.

Ri deals with the complexity of the requirement specifications. R2 is concerned with the complexity of the logical design of the system. R, involves the complexity of designing, constructing, and implementing a physical design (software) that is complete and consistent with the logical design. R, is concerned with the complexity with respect to overall operating efficiency of the object system. In the context of a program, R4 is akin to computational complexity. R, addresses the flexibility and reliability inherent in the system implementation., Partitioning the system development life cycle into realms is justified for several reasons. First, the sets of attributes appficable to the different realms are to some extent disjoint. In other words, some attributes are relevant to only a subset of the realms. Separation into realms takes into account the context dependencies of complexity. Second, several attributes span realms. This spanning phenomenon stems from context independencies and may result in conflicting objectives, e.g. minimizing complexity in R4 often implies a sub-optimal level of R, and RS complexity, minimizing R, complexity often implies a sub-optimal level of R, and R4, etc. Separation of the life cycle into

FOR I:=1

TO 4 DO

y:= y* Y;

A third justification of realm partitioning is based on the temporal nature of the life cycle. The detail level of information that is available at each life cycle stage varies. The information space enlarges at successive phases of the system life cycle and so constrains the measures that can be derived. The realms, then, are space partitioned to reflect this information differential. Further, as discussed below, complexity projections can be made from a given realm into subsequent realms. Projection offers, perhaps, the largest benefit of complexity analysis. Realms are defined in terms of properties. The properties describe the basis for the “complexity space”. It is desirable to identify properties that are orthogonal. The orthogonality allows us to focus on the several dimensions of complexity, making complexity analysis a modular process. Further, orthogonality will insure that several dimensions of complexity can be explained (measured) with a minimum of information. Let the properties of complexity be denoted by P. Where P, = Volume P, = Distribution P, = Location. Volume, P,, is a measure of the size of an entity. An entity may be a FORTRAN subroutine in R,, a PSL process description in R2, etc. Distribution, Pz, is a measure of the interrelatedness of components within an entity. This measure addresses the structural complexity of an entity. Location, P,, is a measure of the inter-entity interface complexity for a given entity. The location measure deals with the relative functional/conceptual distance between entities. In general, location is a measure.of “environmental” interaction and provides a mechanism for abstracting views of complexity. The orthogonality of these properties in software has been supported empirically~ I]. These three types of complexity serve as a basis for the assessment of complexity in all phases of the life cycle. In requirements definition these three complexities involve the complexity of the individual requirements, of the interdependencies between requirements, and of the volatility of the host environment with respect to the requirements. In software development the complexities are the complexity (volume) of the source statements, of the interrelations between statements (program structure), and of the interface with other modules.

Complexity assessment: a design and management tool for information Each property is defined in terms of its attributes. Let the set of attributes be denoted by A. A given element in A is an attribute which belongs to property P in realm R. and functions of A form complexity metrics for properties within realms. There exist two basic types of attributes. There are attributes of the system itself (e.g. the number of discreet system processes), and there are the attributes of resources applied to realize the system (e.g. the qualifications of the project members). The first type of attribute is a result of design, whereas the second set is a result of management (resource allocation). This distinction is important from a system development standpoint. Complexity levels are controlled via the attributes that contribute to system complexity. The control mechanisms used will depend on the type of attribute in question. If the attribute is one of design, system design methodologies are used as control mechanisms. If, on the other hand, the attributes are resource related, then possible control mechanisms include resource allocation and scheduling. The majority of research into information system complexity focuses on R,-software construction and testing. In the following section, therefore. a complexity model is synthesized for realm R,. The effort metric of software science serves as the Volume property. An alternative to McCabe’s Cyclomatic complexity. c(G), is proposed for the Distribution property. In addressing the Location property, the concept of information hiding[2] serves as a representative example. In the following sections complexity assessment is extended to other realms. It is suggested that the complexity models developed for use in R, are applicable, to varying degrees, to the other realms. Finally, a tool, the Dynamic Complexity Monitor (DYCOM), is discussed which has been designed to automate the concepts highlighted in this paper. A COMPLEXITY

MODEL

FOR R,

P,: Volume Halstead[3] developed a linguistic-like model of software, called software science. Software science offers several program characteristic measurements which are based upon counts and estimations of counts of operators and operands. Software science metrics are derived using the following parameters: rl, = the number of unique operators, r12= the number of unique operands, N, = the total number of occurrences of all operators, N, = the total number of occurrences of all operands. The vocabulary, q, of an algorithm is defined to be ‘I, + q2. The length of an algorithm, N, is defined to be N, + Nz. Given that rl tokens are used N times, the minimum number of bits needed to represent an algorithm can be expressed (N, + NJ log, (n, + rh). or simply (N) log, @I). This measure, based on information theory, is referred to as the volume, V, of an algorithm. The value of V depends upon whether an algorithm is implemented in a “high” or “low” level language.

IS Vol

X No

l--D

system

development

197

The most succinct (highest level) representation of an algorithm is a call to a function or procedure. A FORTRAN program which calculates the sine of a variable x can be written as simply sin (X). The minimum, potential volume V* of an algorithm requires two operators (N: = q p = 2). One operator IS needed to name the function and the other to serve as an assignment or grouping symbol. The minimum number of unique operands qf IS equal to the number of input/output parameters, Since each unique operand need occur only once, we know that Nf = 0:. Potential operator and operand measures can be used to derive the minimal potential volume for a given algorithm. V*=(N:+

N:)log,(p:+~~)

and given NY = qy = 2 and .Y* = qt. By substitution v* = (2 + VT, log, (2 f VT). Potential volume offers insight mto the level of a program. The level of a program is defined as the ratio of potential volume to actual volume; given I,‘* and V. program level, L. can be defined as V*: V. Two observations can be made based on this formulation. First, if a given algorithm is translated into a different implementation language, as the volume increases (decreases) the level decreases (increases) proportionally. Second, the product of L and V* remains constant for a given language. Halstead uses the measures of volume and program level to derive a measure of programming effort. The implementation of an algorithm entails N selections from a vocabulary q. Halstead reasoned that if a “mental binary search” is used to make the N selections, then on the average (N) log,(q) mental comparisons are required to generate a program. The number of elementary mental discriminations necessary to make each comparison is dependent on the program level L. Further. programming difficulty is inversely proportional to the level L. Therefore, the total number of mental discriminations E (for effort) necessary to write a given program can be given by E = V/L, and as cited earlier L = V*iV, thus E= V’/V*. Of the many software science equations, the EFFORT equation cited above is most closely allied to a notion of complexity. The attributes used in the EFFORT model are language-defined and user-defined tokens. This choice of factors to be considered, as with all selection factors, directly impacts the interrelationships that can be determined. Several questionable assumptions are those related to unity and linearity. Implicit in the counting strategy is the assumption

JEFFREY E. KOTI-EMANN and BENN R. KONSYNSKI

198

that operators are of equal significance. Two typical COBOL statements are shown below. As the counts for u,, t/*, N, and N2 are equal for both statements, software science metrics will derive equality in the complexity assessment of the two statements. IF A GREATER COMPUTE

B AND NOT C LESS D.

A = (B + C) * D. Fig. I. Nodal complexity

The counting strategy is a linear framework as well as unitary. It is evident that software science metrics do not explicitly reflect differences in program structure. Illustrated below are two skeletal code segments. The code on the left depicts a recursive (nested) decision construct and the code on the left depicts a linear case construct. Given that the counts for each code segment are identical, the software science measure of effort will igain be equal for the two segments. IF IF IF IF ELSE ELSE ELSE ELSE.

IF ELSE ELSE ELSE ELSE ELSE

IF IF IF IF

Despite the fact that software science makes many subjective explicit and implicit assumptions regarding language semantics, it has several qualities valuable to our research activities. First, the counting of operators and operands is easily automated. Second, the model accommodates a view of abstraction level via the notion of language level. It is possible, therefore, to use software science metrics to assess logical designs described in a language such as PSL[4]. Third, by manipulating the counting strategy, software science can be made selective. Software science metrics are useful in assessing complexities arising from system volume (P,). As we have seen, however, they do not explicitly measure structural complexity. P?: Distribution (structure) Process structures are frequently represented as graphs, with the nodes of the graphs representing control transfer points in a computer program, the discrete programs of a software system, the discrete systems in a distributed network, etc. The interrelationships between nodes are represented as connective arcs. In general, the arcs indicate some form of communication or relation between nodes. The interprocess communications include such relations as control transfers, exchanges of data sets, etc. In assessing the complexity of a given graph structure, a major concern is the complexity of the CO]tection of interrelations-the configuration of arcs. Figure 1 illustrates three nodes with their incident arcs. How can the complexity of each node be quantified? If it is presumed that complexity is a

noncombinatorial phenomenon, then a simple count of the incident arcs would suffice. In the context of a graph this measure simplifies to the counting of arcs. Using an arc counting strategy, the complexities of the graphs in Fig. I are 2, 4 and 6 respectively. In order to get a useful measure of system complexity, the relationships between nodes and arcs must also be addressed. The cyclomatic number of a graph has properties that suggest its utility as a complexity metric. The cyclomatic number has been proposed as a measure of complexity in computer programs (McCabe[S]). The cyclomatic numbers of all the graphs in Fig. I are I. All of the above approaches assume that complexity is a linear function. The assumption of linear complexity increase is suspect. Intuitively, the entry of new complexity factors frequently has more than an additive effect on overall complexity. Intuitively, we would anticipate that linear metrics either neglect important dimensions of complexity or they do not effectively reflect complexity growth. Given that structural complexity is heavily influenced by the combinatorial relationship between nodes, it is desirable to derive a function that reflects the combinatorics. One function that satisfies this criterion is nodal complexity = &(n,)d,,,(n,) where &(n,) is the number of arcs which have n, as their start-node, and &(n,) is the number of arcs which have ni as their end-node. The measures of complexity for the three nodes in Fig. I are I, 4 and 9 respectively. As discussed earlier, process structures may be represented as graphs. In turn, precedence graphs are frequently represented as n by n boolean matrices. Given a matrix representation of a graph, B, we have di,(n,) = 1 B, for all i d,,&,) = c B, for all i. i

The ordered set of indegrees will be referred to as the CONVERGENCE of a graph. The ordered set of outdegrees 411 be termed the DIVERGENCE of a graph. The convergence of the graph forms an nmember (non-boolean) vector, C. the divergence of

Complexity assessment: a design and management tool for information system development the graph forms an n-member vector, D. The vector product CD is an approximation of the connective complexity in a graph and, thus, of the underlying system being modeled. The structural complexity (distribution) measured by the product CD reflects only first-order connectivity. The true complexity of a process is a function of the density of the transitive closure of the process as well. It may be desirable, then, to include higher order connectivity in the analysis. By adding the identity matrix and taking successive powers of B, the product CD can be derived for all orders (dimensions) of connectivity. Further, as mentioned earlier, divergence enters into program complexity more so than does convergence A measure of convergent “density” can be obtained by taking a running product of the vector D. Divergent density = fi D,. ,=I

A frequently cited control flow complexity metric is the graph cyclomatic number. McCabe developed a complexity metric based upon the cyclomatic number of a program flow graph. The cyclomatic number of a graph is the cardinality of the cycle basis and represents the number of linearly independent elementary cycles in the graph. These basic paths, when taken in combination, can be used to generate all possible paths through the graph. The metric defines the complexity of a program o(G)=a

--n +2

where n = the number of nodes, a = the number of arcs, and 2 is the number of connected components. Two assumptions underlie McCabe’s use of o(G). Given a program flow graph: (I) Associate with it a digraph that has a unique

199

entry node and a unique exit node (the graph must be well-formed). (2) An arc must exist (or be introduced) which branches from the exit node to the entry node (the graph must be strongly connected). To transform a given program flow graph into a strongly connected structure (assumption 2), cycles must be introduced. Cycles, however, are critical factors in themselves when assessing program complexity. The validity of “inventing” such cycles is questionable. Figure 2 depicts one such transformation. The invention of the single return arc has an unbounded impact on the number of paths through the graph but has only a minor impact on the value u(G). The large increase in the number of paths would have a significant influence on many aspects of program analysis-exhaustive testing for example, Due to the undesirable impact of assumption 2, we will maintain only assumption 1. Note that wellformedness (assumption 1) is a major premise of structured programming-one entrance, one exitwhile the introduction of cycles in the form of “backward gotos” (assumption Figures 3-6 are reproductions flow graphs used by McCabe

2) is discouraged. of several program

in illustrating the “behavior” of the cyclomatic complexity metric. The

figures also include the values of u(G) and first-order CD for each of the graphs. The relative ordinal ranking of the graph complexities are similar between v(G) and CD. The relative differences (ratios) in measured complexities between each of the graphs, on the other hand, vary for the v(G) and CD metrics. Figures 7 and 8 are graphs of “extreme” cases. Figures 7 and 8 present two program graph topologies and their respective skeletal code representations. Figure 7 depicts a program graph of a recursive decision structure; a completely balanced nested IF block. Figure 8 presents a program graph of a linear decision structure; a case IF block. As c(G) does not

Number

of

paths (b)

Fig. 2. “Invention” of a return arc.

m

JEFFREY

E. Korrmarm

and

BENN

R. KONSYNSU

Fig. 3.

Fig. 5.

Fig. 4.

recognize the role of convergence, divergence, or prapagation effects, it will judge both pragrams to be of equal compkxity: u(G) in Fig. 7 is eight, u(G) in Fig. 8 is eight. The measure CD, on the other hand, does recognize the distinction between these two decision structures: CD in Fig. 7 is twenty, CD in Fig. 8 is eight. As the number of arcs is dependent on the number of nodes in the above topologies, the functions u(G) and CD can be reduced to functions of the number of nodes. A pIot of u(G) for each topology is presented in Fig. 9, and a plot of CD is presented in Fig. IO. Based upon the plot of measured complexity as a

Fig. 6.

Comptexity assessment: a

design and management

tool for information

system deveiopmeni

201

ELSE END NEST

Fig. 7. A recursive (nested) program flow graph and code representation. function of the number of nodes in Fig 9, the measure u(G) asserts that: (I) Civen an equal number of nodes (n > 4), the complexity of a case structure is greater than that of a nested structure. (2) As the number of nodes increases, the complexity of a case structure increases at twice the rate of a nested structure. Based upon the plot in Fig. ID, the measure CD asserts that: (if Given an equat number of nodes (n > 4) the complexity of a case structure is less than that of a nested structure. (2) As the number of nodes increases, the complexity of a nested decision structure increases at one and one half times the rate of a linear case structure. Also, when analysis of CD isextended to higher orders of connectivity, the measured complexity of the recursive decision structure of Fig. 7 will increase. Higher order analysis does not increase the measured complexity of Che linear case structure of Fig. X. The distribution eompfexity (Pr) of a module is a measure of intra-modular structure. It should mea-

sure the density of connection between components in a module. A simple count of arcs does not reflect relative degrees of connective density. The measure a(G) treats density as a global, linear property af a graph. CD treats the density of connection of a given node as a nonlinear property. Using first-order CD, the overall complexity is refiected as the sum of jnd~v~dual nodal complexities. Pub-order CD, on the other hand, treats both nodal and overaff graph complexities as nonlinear progressions. Given Walsteads eR’ort metric as a measure of the complexity of “describing” the components of a module and CD as a measure of the complexity of intra-modular connectivity, we must now address the complexity of a module’s interface with other modules. P,: Locarion (inferactiOn) Pamas recognized that changes in a completed system are simphfied when the connections between mod&s are designed Cocontain as httfe information as possiblel2j. The basic thesis of Parnas work in this area is Chat the designers who specify the system structure should control the dist~butjon of design

JEFFREY E. KOITEMANNand BEFW R. KONSYNSKI

BEGIN CASE

ELSE ELSE ELSE ELSE ELSE ELSE

IF IF

IF IF IF

END CASE

Fig. 8. A linear (case) program Row graph and code representation

v(G)

CD

2 I

4

H

Fig. 9. A plot of u(G).

I

I 4

information, hiding details that are likely to change. Information hiding relates directly to location, P3, complexity. The notion of information hiding addresses the design process as much as it addresses the product of design. In hiding design information, we are also making the functional responsibilities clear which affects the unde~tandability and changeability of the system. A location complexity measures must reflect

I

Fig. 10. A plot of CD. the amount of knowledge that the system modules have of each other concerning the nature of their activities. By minimizing the amount of information shared between modules one expects that the difficuIty of system construction and m~ifi~tion will be lessened.

Complexity

assessment:

a design

and management

In terms of the location property, we are concerned with the type and amount of inter-module communication. The communication relations between processes can be represented by an m by m by n matrix, E, where M = the number of processes (modules), n = the number of communication item types. An entry in E, e, = the number of communication items of type k that are communicated between process i and process j. Representative ~mmunication item types are data variables, control variables, concurrent process semaphores, etc. Given the matrix E and weights for each communication type r+pk(as is discussed below, the actual weights can be determined statistically by the DYCOM system), a measure of the location complexity property for the overall system is given as

The location complexity for a given process is the summation for a fixed value of i. To derive a nonlinear distribution, the vector product CI) can be used in the analysis of location complexity. A similar approach has been used successfully in the analysis of maintenance complexity for the UNIX operating system (Henry and Kafura[6]). The above analysis is essentially unidimensional. By partitioning the graph into component subgraphs, location complexity measurement addresses higher order communication schemata. Examples of the schemata analyzed by DYCOM include: Given communication items x, p and z, and given processes A, B, C and D: (1) x, y and z communicated from A to B; (2) x communicated from A to B, C and D; (3) A communicates x, y and z to B, C and D, respectively; (4) D receives x, f and z from A, B and C, respectively. SYNTHESIS OF AN R, COMPLEXITY

MODEL

The above complexity metrics are synthesized to form a complexity function for realm R,. The EFFORT metric of software science serves as the Volume property, the measure CD as a Distribution property metric, and information hiding as the Location property. A weighted combination is used to measure complexity in R,. (Again, the weights are determined by DYCOM.~

where A is the set of attributes belonging to property P,f(A,) is the function defining the relationship between each A, in P, and W, is the relative weight (coefficient) of property P in R,. A thorough model of program complexity serves as a guideline for software construction as well as an evaluation mechanism for software designs. Testing strategies, furthe~ore, can be formulated which will concentrate testing on the appropriate (most complex) modules. By extending the evaluation of com-

tool for information

system development

203

ptexity into other life cycle phases, assorted design guidelines and formalisms can be determined. Further, complexity in a given life cycle realm, R, for example, can be used to forecast levels of complexity in subsequent realms, R3 for example. The following discussion concerns complexity assessment in other realms and the design of an automated tool for complexity assessment and forecasting (projection). COMPLEXITY MEASUREMENT IN OTHER REALMS

Existing software complexity models are concerned in large part with the stages of physicat design and construction, These stages, however, are relatively inexpensive in terms of resource consumption. Further, the complexities manifest at these stages are often a direct result of complexities introduced at earlier stages of the life cycle. and complexities introduced in early stages manifest themselves heavily in the later phases of operation and maintenance. Problems of logical design complexity and maintenance complexity are more acute and costly. If a complexity model is to serve as a robust tool for software engineering and software project management it must encompass the entire life cycle. The measures of complexity discussed above for R, can be applied to varying degrees in the realms R, and R1. In R,, volume is a measure of the complexity of the individual system requirements, distribution is a measure of the interdependencies of the requirements, and location is a measure of the complexity of requirements with respect to the host environment. ln R2. the complexity of the process descriptions reflects volume, the complexity of inter-process, structural dependencies reflects distribution, and the complexity of the logical system’s interface with the environment reflects location. The formulation of complexity models in each realm is constrained by the amount and nature of information available in the realm. The attributes available in each realm are candidates for inclusion in the complexity model. Further, attributes that are available in more than one realm offer the potential for complexity projection. The following sections discuss the implementation of a tool to support complexity evaluation and projection. The Dynamic Complexity Monitor (DYCOM) is designed around the ISDOS development system (41 and the PLEXSY S system f7]. Complexity analysis is performed on the Requirement Specification Language. RSL, the Problem Statement Language, PSL, and source code processes in COBOL and Fortran. Since these design tools are implemented as languages, variants of the complexity property measures discussed above for R, are used for assessing complexity in R, and R,. Thecomplexity projection in DYCOM is influenced by the work presented in [8], [9]. The structure of DYCOM is influenced by work presented in [ 10, 1I]. COMPLEXtTY PROJECTION

Complexity measures within a given realm are useful in the evaluation of productivity of the realm

204

JEFFREYE.

KOTTEMANN and

activities. The measures serve as feedback mechanisms and guidelines for programmers, analysts and designers. A robust complexity model, however, must reflect complexity holisticalIy, that is for example, the complexity of a given logical design can only be judged on the basis of the complexity which propagates into the realms of implementation and maintenance. Further, to be of use to project managers a complexity model must support project ~heduling and resource allocation. Given, e.g. information available at the end of the logical design phase, &, it would be useful to project the complexity in R,. Chrysler[9) has conducted research related to complexity projection. Chrysler used step-wise multiple regression t.o correlate program and programmer characteristics with the time taken to complete a programming task. Using a sample of 31 COBOL programs, five independent variables were found that resulted in a multiple correlation coefficient of 0.836. (I) Programmer experience at the facility. (2) Number of input files. (3) Number of control breaks and totals. (4) Number of input edits. (5) Number of input fields. The importance of these five independent variables lies in the fact that their values are all available at the conclusion of the detailed logical design phase of the life cycle, and are, therefore, available for projecting complexity from R, to R,. Further, the attributes can be broken into the two classes discussed abovedesign attributes and resource attributes. Whereas the variables two through five are design attributes and must be controlled during logical design, the first attribute is a human resource attribute and can be controlled at the end of Rz. A similar approach is used in DYCOM to formulate complexity projection functions between all realms. For example, property measures derived from PSL logical design descriptions in R, are used to project complexity into R,, Rd and R5. Boehm has developed the Constructive Cost Model, COCOMO. As with Chrysler, Boehm’s model is empirically based. COCOMO is a set of projection equations derived empirically from 63 software development projects. Both Chrysler’s and Boehm’s models serve as starting points for a complexity assessment and projection system. The two major improvements possible upon these models are: (I) the automation of model building, data collection, and model use, and (2) the incorporation of a dynamic view of models. In order to be useful in varying environments, the models must be treated as dynamic entities and must “learn from experience”. One system that attempts to fulfil the first objective is the Automated Measurement Tool, AMT(ll]. AMT takes quality measures during the requirements, design, and implementation life cycle phases. At present, data gathering is automated only during software constructjon; AMT takes m~surements from COBOL source code. These measures, moreover, are concerned largely with the Volume prop-

BENN

R.

KONSYNSKI

erty, P,. AM7 does not support projection and does not automate model refinement. DYCOM is implemented such that model building, data collection. and model usage are largely automated. Also, DYCOM incorporates the view of models as dynamic entities which are automatically updated (refined). AN AUTOMATED TOOL FOR COMPLEXITY ASSESSMENT AND PROJECTION

Complexity can be perceived as an impediment to system quality. Measures of complexity serve as true design quality assessment mechanisms. Further, they can be used in assessing the tfue productivity of a given system development effort as these measures reflect the inherent dif&uIty of the development. Many models ofcomplexity have been proposed for R, (see e.g. [12]). Experimental validation of complexity modeis, however, is limited. Further, given the lack of validation within an industrial setting, these nodels are not in wide use. What is needed is a tool that can serve both as a research tool to experiment with different complexity models and as a dynamic, learning system to be used in actual system deveiopment environments. The Dynamic Complexity Monitor, DYCOM, has been developed to serve these two needs. The structure of DYCOM is fashioned after the Generalized Model Management System, GMMS, developed by Konsynski [ IO]. GMMS was developed for the dynamic management of econometric models. The basic premises behind its development are, (I) that models (processes) can be managed as data, and (2) the updating of models can be automated. Thus, as new dataeconomic data in this case-is received by GMMS, the econometric models are automatically updated to reflect the new information. This management of models as dynamic en:ities is incorporated in DYCOM. The basic structure of DYCOM is represented in Fig. I 1.The system has been implemented in conjunction with automated info~ation system deveiopment system developed by the ISDOS project [ 131 and the PLEXSYS system (Konsynski and Nunamaker[7]). DYCOM consists of a collection of analyzers, model bases, and data bases. The two major analyzers in DYCOM are the realm complexity analyzers and the complexity projection analyzers. The model bases contain the modeling forms-PSL modeis during logical design, e.g.-of each realm, the complexity models for each realm, and the complexity projection models for each inter-realm mapping. The DYCOM data bases contain current complexity assessment and project info~ation as weil as archival compiexity assessment and project information. Complexity models can be built by the user with the aid of the analyzer builder, AB. These models can then be tested against actual system development experiences to validate the proposed model. RB is used to specify which attributes are to be extracted for each property in each realm. The complexity analyzers, then, are used to extract the attributes

Complexity

assessment: a design and management tool for information

system

205

development

\ COMPLEXITY

ANALYZER

ANALYS!S

BUILDER

A

REPORTING t I

1

COl4PLEX!TY PROJECTION ANRLYZER

Fig.

11. The DYCOM

from the development model base for a given realm modeling form, example PSL. As with Chrysler, the actual complexity models are determined by weighting the various attributes using multiple regression. This basic approach is also used to determine complexity projection models. It must be emphasized that the modelling forms RSL, PSL, COBOL and Fortran are all language oriented forms. In the current implementation of DYCOM, the measures discussed for reatm R, are also used for complexity measures in R, and R:. Software science metrics, the structural metrics CD and u(g), as well as the analysis of inter-entity coupling are applicable to the requirements and logical designs defined using RSL and PSL. During any given phase of the life cycle, complexity reports can be generated. At the completion of a given life cycle phase, say R2, complexity projection reports can be generated for R,, R, and R,. At the completion of a subsequent realm, say R,, DYCOM provides for (1) a refined estimate of projected complexity into R, and R,, (2) the dynamic refinement of the R, to R, and R, to RJ projection models. and (3) the generation of complexity projection model performance statistics. In this way the model “learns” from experiences accrued during system development efforts. At the heart of this “learning” is the model refinement process. If a complexity model is to reflect the evolutionary nature of system development, the model must “learn”. Model refinement via feedback insures that the model reflects reality. When, for example, the actual maintenance complexity for a given system is attained, this information is used by DYCOM to dynamically update the forecasting/projection functions, R,-R5, RI-R, and R,-R,. Currently DYCOM is implemented to analyze requirement definitions in the RSL modeling form, logical design in the PSL modeling form. and con-

system.

struction in COBOL and Fortran modeling forms. The system is developed for both research and industrial use. ft is hoped that the structure of DYCOM-specifically its dynamic “learning” aspects-will encourage complexity research in industrial system development environments.

COMPLEXITY

AND PRODUCTIVITY MEASURES: A FIKAL NOTE

Productivity measures/models, in general, involve the ratio of outputs to inputs. Traditionally. the ratio is (units of production output/input resource consump~ion), where output is measured in a meaningful unit and input resources include labor, materials and capital. Productivity measures are used (1) to determine the performance of a given production system over time, and (2) to assess the relative utility of different production tools and methodologies. More importantly, productivity models serve to isolate factors that can be manipulated in order to effect improvements in productivity. Productivity models, then, provide direction; they indicate the factors that are perceived as influencing productivity, and hence, they indicate factors that can be possibly controlled. To realize the best possible gains, moreover, productivity models must undergo continuing refinement to reflect sources of complexity. In the analysis of software development productivity, there is a continuing refinement in the definition of “a unit of output” so as to reflect the notion of software complexity. This, in turn, leads to the development of tools and methodologies that address individual production factors in order to improve productivity. The most common productivity ratio is number

of statements

produced/number hours.

of man-

206

JOY

E. KO~NN

and BENNR. KONSYNSKI

A more refined measure of software deveiopment

assessment

productivity involves weighting different categories of statements:

system, DYCOM, designed within this framework was discussed. DYCOM is designed as both a re-

((0.8 (the number of house-keeping code statements)) f (( 1.0 (the number of algorithmic statements)).

search vehicle and as an industry production tool for complexity assessment and projection. The relationship between complexity and software development productivity models was noted to indicate the

As reflected by the 0.8 weighting, house-keeping code is perceived to be less complex than algorithm code. The continuing refinement of productivity model in order to reflect sources of software complexity has several benefits. It leads to productivity models: that are more accurate, that better indicate ways to improve productivity, and that help uncover conflicting productivity.factors in the context of the system life cycle. Incorporating more accurate reflections of complexity in productivity models is an ongoing process that is supported by systems such as DYCOM. In so doing, we are better able to assess productivity trends, to uncover the source of these trends, to identify loci for productivity improvement through design methodologies and automated tools, and to assess productivity trade-offs that stem from confiicting concerns among phases of the system life cycle. CONCLUSIONS

The authors have examined the characteristics of current alternative complexity measures with the objective of determining useful measures during different stages in the system design process. Both linguistic and graph theoretic models offer usefut information at different stages. Further, both have limitations that make them insufficient as stand alone measures. Five complexity realms were discussed along with a common set of properties for complexity assessment within the realms. These realms deal with initial requirements specification and analysis, logical design, the construction and implementation process, the system in operation and the maintenance and modification activities. A general framework was proposed to realize an overall system development complexity model which will support complexity

and projection.

instrumentality

Further

of complexity

an automated

measures.

REFERENCES M. M. Taniko. A comparison

of program complexity prediction models. ACM ~ZG~O~~4), IO-16 (1980). D. L. Pamas: On the criteria to be used in decomposing systemsintomodules. CRCMI5(12), 1053-lOSS(l972). M. Ii. Halstead: EIPmenzs oysofrware Science. NorthHolland, New York (1977). D. Teichroew and E. A. Hershey III: PSL/PSA: A computer-aided technique for structured documentation and analysis of information processing systems. In Advanced Syslem Developmenr /Feasibility Tech niques (Edited by I. Daniel Cougar, Mel A. Colter and Robert W. Knapp). Wiley, New York (1982). PI T. J. McCabe: A complexity measure. IEEE T on Sqfrware Engng S&2(4), 308-320 ( 1978). 161S. Henry and D. Kafura: Software structure metrics based on info~at~n llow. iEEE Ton Software Engng SE-7(S), 510-518 (1981). [71 8. R. Konsynski and J. F. Nunamaker: Plexsys: A system devetopment system. In Advanced Sysrem Development/Feasibility Techniques (Edited by J. Daniel Cougar, Mel A. Colter and Robert W. Knapp). Wiley, New York (1982). 181B. E. Boehm: Sofrware Engineering Economics. Prentice-Hall, Englewood Cliffs, New Jersey (I 98 1). 191E. Chrysler: Some basic determinants of computer programming productivity. CACM 21, 471483 (1978). WI B. R. Konsynski: Model management in decision sup port systems. Pres. NATO Adwnced Study Insriwe on Dara Base and Decision Support Systems, Estoril, Portugal (June 1981). [IIf J. McCall, D. Markham, M. Stosick and R. McGindly: The automated measurement of software quality. Proc. COMPSACII, pp. 52-58. IEEE Computer Society Press ( I98 1). u21 S. N. Mohanty: Models and measurements for quality assessment of software. Computing Surveys 11(3), 251-275 (1979). 1131D. Teichroew, E. A. Hershey III and Y. Yamamoto: The PSLjPSA approach to computer-aided analysis In Advanced System Ofvi and documentation. elopmenr/Feasibility Techniques (Edited by J. Daniel Cougar, Mel A. Colter and Robert W. Knapp). Wiley, New York (1982).