Application of spreadsheet software in software engineering measurement technology

Application of spreadsheet software in software engineering measurement technology

Application of spreadsheet software in software engineering measurement technology P Kokol The application of measurement technology to software engin...

653KB Sizes 0 Downloads 148 Views

Application of spreadsheet software in software engineering measurement technology P Kokol The application of measurement technology to software engineering and science is relatively new and therefore the benefits of usin9 it are not recognised enough. The development and use of adequate measures of software and its development process are essential, however, to the production of cost-effective and reliable software. The main objective of the paper is to show that spreadsheet software, hybrid metrics, and metrics life-cycle models can improve the situation. measurement, software metrics, hybrid metrics, reliability, spreadsheet software, life-cycle models

Measurement technology serves as a foundation for all scientific and engineering disciplines 1. The benefits of using measurement are well recognised, as are the costs. Almost all advances in fields such as physics, biology, and chemistry have occurred through interaction between measures of objects and events in the real world and their abstractions in the world of models and explanations. On the other hand, the application of measurement technology to software engineering and science is relatively new, and therefore its benefits are not recognised by most people who work with computers. The development and use of adequate measures of software and its development process are essential, however, to the production of cost-effective and reliable software.

Spreadsheet software One way to make software measurement technology more attractive is to introduce software tools, which should help both researchers and practitioners to perform their work more effectively. Such tools must in general satisfy the following constraints to be successful: • They must be easy to learn and teach. • They must be easy to use without previous 'software' knowledge. • They must enable the user to store and access considerable amounts of data effectively. Faculty of TechnologySciences,Smetanova 17, 62000 Maribor, Yugoslavia

vol 31 no 9 november 1 9 8 9

• They must be widely recognised and accessible on all kinds of computers (especially personal computers). • They must be powerful enough to enable various models, strategies, etc., to be built and tested. The author considers spreadsheet software 2-1° to be one such tool. It combines the convenience and familiarity of a pocket calculator with the powerful capabilities of the computer. The computer's display becomes a window into a large electronic spreadsheet, which is divided into rows and columns. Their intersection is called a cell, which can contain a number, a character label, or a formula. The formula can reference other cells in the spreadsheet, so that changing a value changes all related values too. It is possible to insert, delete, move, or copy numbers, formulas, labels, and also whole rows or columns. As changes occur spreadsheet software automatically restructures the spreadsheet to reflect modifications. Graphics and printing features enable the user to print or plot various types of graphs, tables, reports, etc. in desired sizes and formats. In addition, spreadsheet software usually contains word-processor, database, statistical, mathematical, and financial functions. Finally, the special built-in language enables the user to extend basic sets of functions and commands, creating his own applications environment in this manner. These facilities make spreedsheet software a powerful modelling, planning, forecasting, and analysing tool. Therefore decision making, comparing different strategies, optimization, and testing 'what-if' questions are made easy with a spreadsheet.

Objectives This paper has three main objectives: • to show that spreadsheet software can be used as a valuable tool in software measurement technology • to show that hybrid metrics 4'6 are better at estimating complexity than single metrics • to show that metrics life-cycle model 6 can be used first to present more clearly activities performed during the

0950-5849/89/090477-09 $3.00 ~ 1989 Butterworth & Co (Publishers) Ltd

477

development and use of new metrics and second to analyse and compare the use of different tools in supporting metrics life-cycle activities METRICS

LIFE-CYCLE

MODEL

In this section the metrics life-cycle model is presented (see Figure 1). It enables a more systematic presentation of the basic activities in the metrics life-cycle. Figure 1 shows that the process of developing and using new metrics consists of the following phases: modelling, empirical measurement, evaluation, implementation, and practical use.

Modelling In this phase a metrics model is formulated with help of mathematical, statistical, or other appropriate techniques for explanatory or predictive purposes. According to the selected technique, three types of model '1 are distinguished: • theoretical models, which are based on hypothesized relations among variables • data-driven models, which are based on statistical analyses with little concern about the validity of or intuitive justification for the formulas developed • combined models, in which intuition is used to determine the basic shape of the model and data analysis is used to determine the model's constants The model can be described with the following equation: (1)

y = f ( x l , x 2 , . . . , x,)

where dependent variable y is a complexity metric of interest. It is the function of the independent variables xl and can be either product or process related.

Modelimprovements 11

Exercisemodel q

I

iremodellinIg m°iel Evaluation

vo, i o,,onr model

Evaluation

Emprical T - Imeosurementsl

measureme~ J Empirical I

Data collections

Figure 1. Metrics life-cycle model

478

Practical use

Empirical measurement and evaluation In these two phases the model constructed in the modelling phase is empirically validated. The measure calculated with the metrics model is compared with an independent measure obtained through software analysers, report forms, interviews, etc., according to an evaluation model. Note that the two terms metrics model and evaluation model differ in the sense that the metrics model is generalized, whereas the evaluation model relates the metric to one or more specific real-world attributes (e.g., complexity, number of errors, development time, etc.). The second measure represents a quantified criterion, and it is the purpose of evaluation to determine how well the model can predict this criterion and the degree of relationship between calculated and actual measurements. After the metrics model is validated it is usually compared with other models from similar fields. The significance of differences between models can be proved with various statistical tests.

Implementation and practical use In this last phase the metrics model, which has successfully passed the previous tests, is implemented (as a program, manual procedure...) and transferred to software engineers, project managers, and other professionals who can use it as a tool for performing their everyday work. Useful information that can lead to model improvements or can extend the database of empirical measurements is also collected during the practical use of the implemented model.

Spreadsheet software and metrics life-cycle The facilities of spreadsheet software listed previously show that spreadsheet software supports almost all activities performed during the metrics life-cycle. It enables: • storing and accessing of considerable amounts of data in a simple and effective way • building quick prototypes of models • easy modification of models • optimization of models • effective and diverse evaluations of models • effective and diverse comparison of models • easy implementation and use of models CASE

STUDY:

HYBRID

METRIC

In this section the application of spreadsheet software to develop a new class of software metrics called hybrid metrics 6 is presented.

Background The development of hybrid metrics (or H_ metrics in short) was motivated by the fact that various complexity metrics can measure different complexities for the same program. For example, a long program with few deci-

vol 31 no 9 november 1989

sions and loops has a large lines-of-code (LOC) count '~ or Halstead's E measure '2a3 but a small McCabe's cyclomatic complexity V(G) ~3,,4. Thus it is reasonable to combine them into a single metric. This combination should smooth differences and give more accurate resuits. It can be argued that in situations similar to the one above there is a set of objects O on which a set of measurements with metrics M~ can be performed. For every object Oj where 0 • O, therefore, a subset of metrics values Mij (i = 1 ... n) is obtained, where n is the number of metrics. To combine them into a single metric and to avoid difficulty with different units (for example, Halstead's E is expressed in [mental discriminations] and McCabe's V(G) in [linearly independent paths]) the following model is proposed:

correlation between the basic and other metrics then equation (2) becomes an equation for computing the ordinary average value. Actual results lie between these two extremes.

Model: H_T metric The H_T metric 4 is used to estimate times to develop programs. It is based on Halstead's software science metrics and is calculated as: H_T = H_V/(S • L) H_V=

1 +

RM~MI+

+ RM~M MM.j

RM.~M 2 + . . - +

RM~M.

V + RV~V(G)Mv(G) + RV~LocMLoc

(4)

1 + R v ~ . v ( G ) + RV.~LO c

H_M/=

M/+RMoM, MM,J + RMoM2MM2j + -.-

(3)

L = (A/H_V)

(5)

Mvm) = fvm) = 303V(G) + 14066

(6)

MLOC = ~ O C

(7)

= 53.2LOC--

1778

V = (N, + N2)log2(n x + n2)

(8)

T ^ - nlN2V 2Sn 2

(9)

(2) Mr% = fi(Mo);

i = (1... n);

j = (1 ... k)

where M is the basic metric, Mi are other metrics of interest, fi are functions to calculate basic metric values M s from other metrics M~, n is the number of metrics of interest (except the basic metric), and k is the number of objects O. Functions f~ and coefficients RM~M, are calculated via regression analysis 11 between the subset of metric values Mi~ and the subset of basic metric values M1j for each pair (M, M ) separately. The experimentation with various regression models has shown that the best results are attained with a linear model, which is also the easiest to compute. Equation (2) presents a weighted average of selected metrics. If there is no correlation between the basic and other metrics then the second metrics has no impact on the final score. On the other hand, if there is a perfect

Program

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

LOC

1026 1118 1064 1664 1912 1549 1680 2237 3397 2523 2736 2935 2685 3145 3711 3641 3043 3383 4062

McCabe V(G)

103 182 195 261 270 313 313 347 367 398 403 407 436 448 475 519 573 603 740

nl

n2

55 57 64 64 67 67 68 74 63 74 72 71 75 73 78 71 64 77 65

360 331 876 952 306 930 1070 924 627 996 11~ 975 997 951 1049 992 475 1010 595

N1

2335 3042 3873 5482 6781 5939 6344 7264 i~17 8168 9281 9373 9136 10989 12495 11339 9605 11393 13434

N2

1506 1925 2442 3427 3943 3652 3937 4470 6883 5135 6041 5704 5592 6833 7603 7459 6 449 6996 8598

where 2 = 1.53 and is the language level for PASCALI1; S -- 18 and is Stroud's number 1x,12; R w . . v m ) _- 0.95 and is the correlation coefficient between the set of V metric values and the set of V(G) metric values; RV.~LOC = 0.82 and is the correlation coefficient between the set of V metric values and the set of LOC count metric values; nl is the number of different operators; n 2 is the number of different operands; N 1 is the total number of operators; N 2 is the total number of operands; V is the program volume and presents the basic metric; H_V is the hybrid volume metric; and H_T is the hybrid development time metric. The above equations were modelled with the spreadsheet shown in Figure 2. Columns n~, tlEN2, N,, LOC, and McCabe V(G) were imported from the file produced

V

33405 42716 62370 88989 91615 95540 104376 116905 159375 133873 156204 151232 148253 178220 203759 188994 145676 185474 206359

V(V(G))

45275 69212 73151 93149 95876 108905 108905 119207 125267 134660 136175 137387 146174 149810 157991 171323 187685 196775 238286

V(LOC)

52805 577~0 54827 86747 9994@ 80629 87598 117230 178942 132446 143777 154364 141064 165536 195647 191923 160110 178198 214320

H_V

H_T

T^

43572 55698 62975 89452 95732 94383 99962 117696 155989 133617 146013 148208 145172 165460 187428 184767 163062 186324 218541

113 164 197 334 370 362 394 504 769 609 696 712 690 840 1012 991 822 1~3 1275

59 1@9 86 158 610 194 202 323 850 394 477 485 481 721 889 779 977 763 1496

Figure 2. Modellin9 spreadsheet for H _ T and T ^ metrics

vol 31 no 9 november 1989

479

T^

59 109 86 158 194 610 202 323 394 481

477 485 850 977 721 779 763 889 1496

T

MRE

IOO 120 150 280 300 350 400 480 550 600 600 650 800 800 850 950 950 1000 12OO

0.41 0.09 0.43 0 44 0 35 0 74 0 50 0 33 0 28 0 20 0 21 0 25 O. 0 6 0.22 0.15

0,18 O. 2 0 0.11

0.25

with the analyser program. H_V, V(LOC), V(G), V, and T ^ were calculated with equations (3)-(9). Correlation coefficients, constants, and multipliers for equations (6) and (7) were calculated by linear regression analysis built into the spreadsheet software. All equations were entered only in the first numeric cell of the desired column and then simply copied into other cells. Results were immediately visible, as were any changes, and all this was achieved without any conventional programming, control, or data structures.

PRED(.25)

0 1 0 0 0 0 0 0 0 1 1 0 1 1 1 I 1 1 1

Empirical measurement and evaluation To evaluate the H_T metric the software has been

analysed for an industrial controller 15, which consists of 19 PASCALprograms. H_T metrics have been calculated for these programs and they have been compared with a quantitative criterion T obtained by interviewing the programmers. The T includes time for design, coding, testing, and correcting errors. The evaluation was performed with the spreadsheet shown in Figure 3, using two evaluation criteria, namely, magnitude of relative error (column MRE) and prediction to 2 5 ~ (column PRED(.25)). Data for the T ^ column were imported from the modelling spreadsheets. Results obtained for H_T, NEW_CT 4, and Halstead's T ^

0.28 0.53 Figure 3. Evaluation spreadsheet (example for T ^ ) I .

Idl.il~l

0.9 0.9

-

0.65 O.T9

0.8-

~

0.7-

0.6-

0.53 0.5

.

--

0.4-

0.3-

022 ~~0.14

02-

0.12

.1 -

_

M~E

T*

~-'~ H_T

~-~_CT

Figure 4. Comparison between H_T, NEW_CT, and T " metrics

information and software technology

COMPARISON IN~ee

T, T~ eml H_T

1.5 1.4 1.3 12

1.1 1

!|

0.9 0.8 0.1' 0.6 0.6

0.4 0.3 02. 0.1 0

T

I

1

I

3

I

I

I

f

6

I

I

"

I

9

1'

I

!

11

Program D

T

+

1~

I

I

13 o

!

16

I

I

I

I

17

19

H_T

Figure 5. Comparison between H_T, T ^ , and actual times T

metrics were exported to the utility spreadsheet. (Note that the NEW_CT metric is a variation of the H_T metric.) This was used together with spreadsheet's graphic capabilities to present statistical differences between calculated complexities more clearly (see Figures 4 and 5.). Note that importing and exporting are built-in spreadsheet functions. To compare the performance of the hybrid metric with the individual metrics (according to correlation coefficients) of which it is composed, the spreadsheet shown in Figure 6 was constructed in a similar way to the previous ones. A close look at Figures 4-6 shows that the hybrid metrics H_T and NEW CT perform better than the original Halstead's metric T ^ and also better than the individual metrics that form them.

• Will the H_T metric still have the best R 2 value if data from another project is used? • Will another evaluation criterion give similar results? • Will another hybrid metric (for example, H_E) give better results?

LOC

'What if' questions

McCabe V(G)

1026 1118 1064 1664 1912 1549 1680 2237 2523 3@43 2685 2935 2736 3397 3145 3383 3641 3711 4062

I@8 182 195 261 270 313 313 347 398 573 436 407 403 367 448 603 519 475 740

0.94

0.81

V

H T

T

334@5 42716 62370 88989 91615 95540 104376 116905 133873 145676 148253 151232 156204 159375 17822~ 185474 188994 203759 206359

76 I10 194 331 346 368 421 499 611 694 712 734 770 794 939 997 1025 1148 1170

100 12@ 150 280 300 35@ 400 480 550 6~ 6~O 650 8~ 800 850 95@ 950 1~ 12~

0.96

0.98

1

The above spreadsheets can be easily used to answer 'what if' questions. It is only necessary to change the data or equations to obtain other models, criteria, metrics, etc. For example, the following questions can be answered:

R Squared

• What will happen if a nonlinear regression model is used?

Fioure 6. Spreadsheet for evaluatin 9 correlation coefficients between metrics and measured time T

vol 31 no 9 november 1989

481

SPREADSHEET A N D CONVENTIONAL PROGRAMMING The case study indicates that spreadsheet software is a powerful tool for problem solving in the measurement technology field, but it is also clear that every problem that can be solved by it can also be solved by writing an ordinary computer program. Paradoxically, writing programs is exactly the reason for using spreadsheet software. There is no need to write a program to solve a problem. Instead, the user simply keys in relevant data, specifies relations between that data, and defines the answers he wants to compute. Spreadsheet and conventional programming seem similar at first glance. Actions for each basic activity were therefore entered in the metrics life-cycle model for the spreadsheet (see Figure 7) and for conventional programming (see Figure 8) to examine differences between them (see Table 1). Table 1 shows that spreadsheet software is data oriented and truly interactive. Progress

to desired answers is achieved in a step-wise fashion, viewing the results in every step. So any error (logical, syntax, or semantic) can be corrected immediately. In contrast, a computer program is command oriented, and even in interactive environments the user can only react to syntax errors. The spreadsheet software displays input and output data in their natural relationship. It is excellent for analysing 'what if' situations, for example, what would happen if some data or relation is changed. There is no need to invoke an editor and to rerun a program, the user just changes the desired formula or data and results are immediately visible. In addition, spreadsheet software is designed to be easy to learn and teach. Its language level (according to Halstead's software science 11,12,1¢ it is possible to compute the level of the language, which he called a product of the program volume V and its level L where L = (2 • n2)/(nl * N2)) is high 5 (see Figure 9). All the above facilities show that spreadsheet software forms a coherent environment, which enables the user to

Model improvements I

Model improvements 1

Jll

J

'What if' M I Enter/ • change _1 add/change change/ data Modell.ing. equations _1 'mpl.~am~ss~e~?nlI odd data

Ent,,/-ISP 'ea°snee'

| change

| equations

-I

I access to /

J data via

J

|

/E ~ references J l i I / Enter/ehongel Evolu?!ion.j F'nter/change data data jspre°asneev I Enter/change equation Enter/change" equations access to measurements Enter data via references via built-in._l Measurement J editor ~l spreadsheet J

M

/

.11

write / / JJ change I . . . . . ~ change i i run ~} program I Moae,,,ng I proq~m Jlmplementotion J program compile'[ pr°gram I ~ "J program J enter / run/rerun aecess v' / data / enter data parameter [ / , possmg / |E program [Eval,at/onI writeoroaram | write " pro~ rOB I compile '- -equations . co m pile program L run/ rerun run/rerun

writeinput prog.

run/rerun input prog. ~['-

access to measurements via read and open statements

~

compile input prog. enter data

Data collections C

Data collections C 4

,I

Figure 7. Augmented metrics life-cycle mode~for spreadsheet programming

Figure 8. Augmented metrics life-cycle mode~for conventional programming

Table1. Spreadsheetagainstconventionalprogramming Property

Spreadsheet programming

Conventional programming

Type Level of interaction

Data oriented Truly interactive, results are immediately visible Syntax, semantic, and logic errors Input, output, and relations are located in one place Any changes are immediately visible Very easy Few hours Very high Halstead's level > 5

Command oriented Interactive or noninteractive

Errors corrected interactively Coherence 'What if' question analysis Ease of use Time of learning basic functions Language level

482

Syntax errors only Input, output, and relations are dislocated Editing and rerun of data and program is needed Average easy Few days On average high Halstead's level < 2

information and software technology

Language Levels 2 -

10.411 10967-

!

654,3-

2.41

Z43

Pascal

I:¢/1

2.04

2.07

2 1 0

[7-7] -6tzmd~v

Fortran

Asumbier

AlOoi68

Slr~

Plot

÷$1mal.[~

MNn ~m

Figure 9. Language levels

make his applications more effectively, more reliably, and in less time than with conventional programming techniques. (Peabody 1° reports that a simulation program for induction machines, which took two man-months to develop with FORTRAN, was developed in only five mandays with spreadsheet software.)

(12)

E = bl;C = c l ; M = aE = abI

which are then entered into equations (10) and (11), respectively. Dividing equations (10) and (11) gives: D=

4hal + 36I + I(2c + 1) + 2 2baI + 2hi + I(c + 1)

,~ 2

(13)

Analysis of actions Augmented metrics life-cycle models from Figures 7 and 8 can be used to estimate the number of actions (N) needed to develop and use a software metric, for both spreadsheet and conventional programming. Thus intuitive arguments from the previous section can be proved more formally. Using M for the number of modelling loops, E for the number of evaluation loops, I for the number of improvements, and C for the number of data collections gives the following two equations: Np~og = I(E(4M + 3) + 1) + 2 + 2C

(10)

Nspread = I ( E ( 2 M + 3) + 1) + C

(11)

To enable a direct comparison between NproB and Nspread the following relations are introduced:

vol 31 no 9 november 1989

Equation (13) shows that approximately two times more actions are needed with conventional than with sPreadsheet programming to develop and use new software metrics. Some empirical tests have shown that: a=3;b=5;c=

10

(14)

Entering equation (14) into equations (10) and (11) finally gives: Nprog = 300I 3 + 15I 2 + 21I + 2

(15)

Nsprcad = 150I 3 + 10I 2 + 11I

(16)

which are graphically presented in Figure 10.

483

2.6 2.4 22 2 1.8 |

1.6

1 0.8 0.6 0.4 02 0

.~

1

"7

~

~i

3

D Conventional Ixog. Figure 10. Number of actions

[

5

I

"" t 7

t

CONCLUSION This paper presents some applications of spreadsheet software in developing and using software metrics. Selected examples show that spreadsheet software forms a coherent environment that enables the user to make applications more effectively, more reliably, and in less time than with conventional programming techniques. Data and relations are entered in the worksheet where they can be changed to research or test various alternatives. This enables the user to build a quick prototype of the model and makes this model easy to modify. So the user can ask 'What if' questions to search for the most promising alternatives or models. To conclude, the three main findings are: • Spreadsheet software effectively supports all of the main activities of the metrics life-cycle. • Hybrid metrics estimate development times better than single ones and also better than the original Halstead's T ^ . • The metrics life-cycle model can be used to compare spreadsheet and conventional programming and to analyse the performances of both.

REFERENCES 1 Dunham, J R and Kreusi, E 'The measurement task area' Computer Vol 16 No 11 (November 1983) pp 47-54

484

I

9 11 Number of main ©yc~e

I

I

I

I

13 +

S p t u d ~ h u t 1~o9.

2 Dubash, M 'Borland Quattro: after 1-2-3' Practical Comput. Vol 10 No 12 (December 1987) pp 36-38 3 Hagler, M 'Spreadsheet solution to partial differential equations' IEEE Trans. Education Vol 30 No 3 (August 1987) pp 130-134 4 Kokol, P e t at 'A combination of three software effort metrics' Soft. Eng. Notes Vol 13 No 3 (July 1988) 5 Kokol, P 'Spreadsheet language level: how high is it' SIGPLAN Notices Vol 23 No 6 (June 1988) pp 121-134 6 Kokol, P and Zumer, V 'Controlling complexity and cost of software projects with spreadsheet software' in Proc. IFAC/IFIP Workshop Experience with the Management of Software Projects Pergamon Press, Oxford, UK (September 1988) 7 Kokol, P'Some applications of spreadsheet programs in software engineering' Soft. Eng. Notes Vol 12 No 3 (July 1987) pp 45-50 8 Kokol, P and Novak, B 'Microcomputer spreadsheet program applications in engineering' Soft. Eng. Workstations Vol 4 No 2 (April 1988) pp 108-112 9 Landram, F G and Cook, J R 'Spreadsheet calculations of probabilities from the F, t, x, and normal distribution' Commun. ACM Vol 29 No 11 (November 1986) pp 1090-1093 10 Peabody, F, Nyberg, D W and Dunford, W G 'The use of a spreadsheet program to design motors on a personal computer' IEEE Trans. Ind. Applic. Vol 23 No 3 (May/June 1987) pp 520-525 11 Conte, S D, Dunsmore, H E, and Shen, V Y Software

information and software technology

12 13 14 15

enaineering metrics and models Benjamin/Cummings, Menlo Park, CA, USA (1986) Halstead, M H Elements of software science Elsevier North-Holland, New York, NY, USA (1977) MeCabe, J T 'A complexity measure' IEEE Trans. Soft. Eng. Vol 2 No 4 (December 1976) pp 308-320 Kitehenham, B A 'Measures of programming complexity' ICL Tech. J. (May 1981) pp 298-316 Kokol, P e t M. 'lMCL--industrial microcomputer

vol 31 no 9 november 1989

language' Proc. ICS' 88 Tamkang University, Taiwan, Republic of China (1988) pp 303-308

BIBLIOGRAPHY Curtis, B e t al. 'Measuring psychological complexity of software maintenance tasks with the Halstead and McCabe metrics' IEEE Trans. Soft. Eng. Vol 5 No 2 (March 1979) pp 95-105

485