AND ELSEVIER
EW Information and Software Technology 38 (1996) 719-721
Is modularization
always a good idea?
Les Hatton Programming Research Ltd., GlenbrookHouse, l-11
Molesey Road, Hersham, Surrey KTl2 4RH, IJK
Abstract
This polemic paper questions the long held belief that ‘modularization’, or structural decomposition, improves system quality. The author also subscribed to this view until the last year or so, when the results of error rates in three very disparate systems were collated. Each of these studies exhibits the same behaviour: that small components tend to have a disproportionately larger number of bugs than bigger components. This is completely counter-intuitive to general beliefs about the need for and benefits of modularization. As it stands, this measurement-based evidence strongly suggests that a smaller number of larger components is preferable to a bigger number of smaller components if the most reliable system is required. A number of mitigating factors exist which may partially restore the intuitive view; these include whether or not modularization actually makes the system smaller; whether or not interface inconsistency was automatically enforced and whether or not corrective maintenance is a dominant part of the overall maintenance overhead. The case for each of these mitigating factors will be argued. It is concluded that the view that modularization is an essential element of design quality in systems is at best too simplistic and that in some cases modularization may actually be harmful. This is an excellent example where an intuitively attractive design principle on the one hand, and reliability measurements on real systems on the other, are currently in serious conflict. Keywords:
Modularization; Design quality; Reliability
1. Introduction In software engineering, the lack of experimental evidence often means that anecdotal, intuitive or sometimes plain apocryphal arguments become surprisingly well entrenched. In spite of the huge invested effort in object-oriented technology, for example, there seems little if any, solid, repeatable evidence to prove that it delivers the benefits it promised on intuitive grounds. This is a relatively modern example of course. There are many such examples over the 40-year history of software engineering. CASE in the 1980s provided similar extravagant claims which it was unable to deliver, and database technology in the 1960s and 1970s went through many traumas before eventually realizing some of its original promise, when people finally realized that relations were simple and not complicated. Knowledgebased systems and formal methods have been similarly oversold and are just beginning to recover from the hype. These are all symptoms of a relatively immature discipline. So if whole application areas have shaky foundations, it should not perhaps be surprised if more fundamentally accepted principles are similarly unsupported by any repeatable measurement. 0950-5849/96/$15.00 0 1996 Elsevier Science B.V. All rights reserved PIZ SO950-5849(96)01120-2
One such principle which the author has subscribed to for many years is the belief that modularization always improves systems. This is so widespread as to be almost unchallengeable. It is responsible for the important programming language concept of compilation models which are either separate (e.g. C++, ‘new-style’ C, Ada and Modula-2), or independent (e.g. ‘old-style’ C and Fortran), whereby a system can be built in pieces and glued together later. It is a very attractive concept with strong roots in traditional engineering--‘divide and conquer’. Of course, the proof of any engineering concept relies on substantiation by the observation and measurement of real systems. For example, in conventional engineering systems, the need for reliability suggests splitting an overall design into pieces. This makes sense for intellectual tractability, but it has long been known that if designs are split into too many small pieces, reliability may be prejudiced. This is embodied in the celebrated and highly pragmatic KISS engineering principle (Keep It Simple Stupid). By analysing the results of recent measurements of the reliability of software systems, this paper will show that the same applies in software and that modularization may not always lead to better systems.
720 2.
L. Hattonllnformation and Software Technology 38 (1996) 719-721
Case histories
The central point of this paper revolves around measurements recently reported [l-3]. In essence, each of these authors reported the same phenomenon, viz. that small components tend to have a disproportionately larger number of bugs than bigger components. These case histories are worth describing in a little more detail owing to their very disparate nature. Hatton and Hopkins [l] studied the internationally famous NAG Fortran scientific subroutine library, comprising some 1600 routines totalling around 250000 executable lines. The NAG library is very attractive to the software experimentalist because it has been through some 15 separate releases in the more than 20 years since its first appearance. In addition, there is a complete bug and maintenance history embedded in machine extractable form in the header of each component routine. In essence, Hatton and Hopkins [l] found that the number of bugs in the library was well predicted by the formula:
Nbugs = Pho
(0. c>
where p and 51 are scalars and C is a measure of the complexity, in this case the static path count (i.e. the path count assuming sequential paths multiply and parallel paths add, and that all predicates are independent). Although not explicitly stated in this form, it follows immediately from its less than linear relationship that smaller components contain proportionately more bugs than larger components. Both authors were perturbed
by these results but were forced to accept this rather non-intuitive conclusion on the grounds of the extremely strong experimental evidence. For reference, they used thevaluesp=fi= 1. Some four years later, following a detailed analysis of a well-measured but very different engineering project, i.e. a different programming language-C, and a different part of the life-cycle-development, Davey et al. [2] reported precisely the same phenomenon. Once again, it was the smaller components which contained proportionately more errors. In this case, however, the measure of complexity used was a count of the source codes lines. Finally, in the same year, Moller [3] analysing yet another different type of development, this time several versions of operating systems written primarily in assembler, again reported the same phenomenon, that the smaller components were proportionately more unreliable. Again, complexity was measured in terms of lines. Now if the logarithmic prediction of Eq. (1) is used with the data of both Davey and Moller, the diagram shown in Fig. 1 results. Considering the radically different nature of the software systems, and that different measures of complexity were used by Hatton and Hopkins, this resemblance
between logarithmic prediction and each of the three datasets shown in Fig. 1 is remarkable. In Davey’s data, the hump in the smaller modules could be explained by the fact that Davey’s data was extracted during the development stage as opposed to the more mature data used by Moller. In order to match Davey’s data, a value of 52 = 1 was used in Equation [l], whereas for Moller’s data, values of R = 0.18 and 0.23 were used for new and changed code respectively. This in itself is qualitatively interesting as Moller’s data was for assembler, whereas Davey’s data was for C. This would suggest that the data would correspond very closely indeed were N 5 assembler instructions equivalent in ‘essential complexity’, (the complexity of the underlying functionality), to 1 C instruction. As stated earlier,the author originally believed that the results reported in Hatton and Hopkins [l] must be anomalous, but the above case histories suggests that a rather more general principle is at work. Before investigating the origins of this, some implications of the logarithmic growth will be investigated.
3. Implications of logarithmic growth It is important generally to distinguish between changing an existing system and designing a new one, and the following argument is very simplistic although it retains the essential features. Suppose that a particular functionality requires 1000 ‘lines’ to implement, where a ‘line’ is some measure of complexity. The immediate implication of the earlier discussion is that, on reliability grounds, it is far better to implement it as one 1000 line component rather than 10 x 100 line components, for example. The former would lead to perhaps loglo (1000) = 3 bugs, whilst the latter would lead to 10 x log,, (100) = 20 bugs. This inescapable but unpleasant conclusion runs completely counter to conventional wisdom, although the intuitive viewpoint might be restored by some 9
-
Davey
--)-
Davey pred
-
Siemens new
actual
6 7 6
actual
5 --O-
4
Siemens new pred
3
-
2
Siemens chg
actual
1
*
Siemens chg pred
0 1
2
3
4
5
6
Fig. 1. A comparison of the bugs reported by Davey et al., [2] and Moller [3], (his Figs. 6 and 7) compared with the predictions based on Equation [l]. The abbreviation ‘chg’ stands for ‘changed’.
L. Hattonllnformation
combination l
l
l
and Software
of the following three mitigating factors:
If splitting the system up into small components reduced the necessary number of lines by the mechanism of ye-use. However, a small calculation reveals that the reduction in size would have to be dramatic indeed. If the additional unreliability due to splitting up the system into small components is due to simple interface inconsistencies. This is considered to be important by Muller, but was not a factor in Hatton and Hopkins who found that the interface consistency in the NAG library was much better than the average. Corrective maintenance (i.e. removing bugs) is only 17% of the overall maintenance cost according to Lientz and Swanson [5]. It may be that the overall maintenance cost is reduced by modularization even though the corrective component is higher. (This might, however, be unacceptable in safetycritical systems where reliability is of paramount importance.)
A considerably more sophisticated above can be found in Hatton [4].
argument
to the
4. Interpretation As this is a polemic paper, the author feels free to be rather more speculative than usual in discussing the underlying mechanics of this behaviour. Although the author’s analysis is still at a relatively early stage, the behaviour described in this paper is overwhelmingly similar to that pertaining in classical thermodynamics. For example, Eq. [l] is tantalizingly like Boltzmann’s famous relationship (which he never explicitly wrote down, although it forms his epitaph): S=Klog(W)
(2)
Here, S is the entropy whilst W is the essential complexity, or the number of ways the insides of a system can be re-arranged
without changing the exterior
view
(Feynman et al. [6]). This is highly evocative of software systems and if the number of bugs is associated directly with the entropy S, the relationship is complete. A more
Technology 38 (1996)
719-721
121
complete analysis, including a description of the corresponding duals of dependent variables and their implications for maintenance, will be the subject of another paper. Amongst other things, by assessing the relative magnitudes of the partial derivatives of the duals, this analysis already indicates in which ‘direction’ an existing system should be changed to minimize the impact on its reliability using thermodynamic arguments only [4].
5. Conclusion The above data and analysis suggests that the price for modularization will be an inevitable reduction in overall reliability. It may well be that guaranteed interface consistency would ameliorate this, although the existing case histories do not really support this view. It is also unlikely that re-use through modularization would provide an answer as a simple calculation shows that the necessary reduction in system size is probably unachievable [4]. Probably the most likely avenue supporting conventional wisdom that modularization is a sound design concept is that it may ease adaptive and perfective maintenance, although as is shown above, this would be at the expense of corrective maintenance, and therefore system reliability. Overall, however, it is possible that the sum of all three maintenance overheads may reduce. Further experiments and analysis will be necessary to prove or disprove this conjecture.
References [l] L. Hatton and T.R. Hopkins, Experiences with Flint, Software metrication tool for Fortran 77. Symposium on Software Tools, Napier Polytechnic, Edinburgh. [2] S. Davey, D. Huxford, et al., Metrics Collection in Code and Unit Test as part of Continuous Quality Improvement, EuroStar’93, London, BCS, 1993. [3] K.-H. Moller, An Empirical Investigation of Software Fault Distribution. CSR’93, Amsterdam, Chapman-Hall, 1993. [4] L. Hatton, Safer C: Developing for High-Integrity and SafetyCritical Systems, McGraw-Hill, 1995. [5] B.P. Lien&, and E.B. Swanson, Software Maintenance Management, Reading, MA, Addison-Wesley, 1980. [6] R.P. Feynman, R.B. Leighton et al., The Feynman Lectures on Physics, Vol. l., Reading, MA, Addison-Wesley, 1977.