Scale economies and production function estimation for object-oriented software component and source code documentation size

Scale economies and production function estimation for object-oriented software component and source code documentation size

European Journal of Operational Research 172 (2006) 1040–1050 www.elsevier.com/locate/ejor O.R. Applications Scale economies and production function...

257KB Sizes 1 Downloads 15 Views

European Journal of Operational Research 172 (2006) 1040–1050 www.elsevier.com/locate/ejor

O.R. Applications

Scale economies and production function estimation for object-oriented software component and source code documentation size Parag C. Pendharkar

*

School of Business Administration, Penn State Harrisburg, 777 West Harrisburg Pike, Middletown, PA 17057, USA Received 27 January 2004; accepted 30 October 2004 Available online 25 December 2004

Abstract In this study, we investigate the factors that influence the object-oriented (OO) component size and source code documentation. For multiple inputs and multiple outputs, we use data envelopment analysis to illustrate that non-linear variable returns to scale (VRS) economies exist for OO component size and source code documentation. The existence of non-linear variable returns to scale economies indicates that non-linear regression models will perform better than linear regression models. Using empirical data, we compare the performance of non-linear artificial neural network (ANN) forecasting model and linear regression model. Our results indicate that the ANN model performs well when VRS economies exist between multiple inputs and multiple outputs. Ó 2004 Elsevier B.V. All rights reserved. Keywords: Data envelopment analysis; Artificial neural networks; Production function estimation

1. Introduction Software development can be characterized as an economic production process with the output as software size and the inputs as software complexity, software labor, software development tools and techniques [4,7]. There are several models that are used to learn the production function *

Tel.: +1 717 948 6028; fax: +1 717 948 6456. E-mail address: [email protected]

of the software development process. Among the popular models that are used to learn the production function are parametric linear regression models, non-parametric machine learning models and probabilistic Bayesian network models [28,30]. A typical use of learning a production function model, for software development process, is the forecasting of software size in future projects. Several researchers compared the performance of several software size forecasting models. However, none of the studies focused on a systematic analysis

0377-2217/$ - see front matter Ó 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.ejor.2004.10.023

P.C. Pendharkar / European Journal of Operational Research 172 (2006) 1040–1050

of the underlying software production function process, which may govern the choice of the forecasting model. A standard property exhibited by a production function is that of monotonicity. Pendharkar and Rodger [27] have recently showed that artificial neural network (ANN) is a good forecasting model to use when data are expected to satisfy monotonicity property, and the forecasting relationship between inputs and output(s) is non-linear. The contributions of the study are as follows. First, using the available literature and real-world data of 152 object-oriented (OO) software components, we identify the factors that impact the OO component and documentation size. Second, using the data envelopment analysis (DEA), we illustrate that the variable-returns-to-scale (VRS) economies exist between the predictor variables and the predicted variables. Third, we illustrate that nonlinear ANN model can provide better forecasts than linear regression model for estimating OO component and documentation size. The rest of the paper is organized as follows. In Section 2, we summarize the research on scale economies in software development. In Section 3, we review software engineering literature to develop a causal model for predicting software size and source code documentation size. In Section 4, we use real-world data on 152 OO components and test the causal model using structural equation modeling. In Section 5, we show that variable returns to scale economies exists for OO-component size and OO component source code documentation size. In Section 6, we compare the performance of ANN and linear regression for forecasting OO component and source code documentation size. In Section 7, we conclude the paper with the summary of our findings.

2. Scale economies in software development For the purpose of cost estimation and productivity evaluation, researchers and practitioners have characterized software development as an economic production process [4]. Software size, software effort and software productivity are common output variables of the software production process and inputs vary from the type of program-

1041

ming language, tools used and development environment [4,2,7]. The type of relationship between the inputs and output(s) of the software production process are important as knowledge of this relationship allow managers to scale future projects to maximize productivity of software development effort [4]. Banker et al. [4] have noted the existence of non-linear economies and diseconomies of scale in software development. According to Banker and Kemerer [7], a production process exhibits local increasing returns to scale if, at a fixed volume, the marginal returns of additional input exceed the average returns. Among the reasons for economies of scale in the software development process are labor specialization (learning curves), and the use of CASE tools [7]. Among the reasons for diseconomies of scale are higher team size leading to increased communication requirements and increased conflicts, project complexity and increased project overhead activities, such as planning and documentation [7,15,20]. Researchers have used several different models to test the economies and diseconomies of scale in software development. The models used by researchers can be categorized as two production functional analysis (PFA) models, the parametric PFA model and the non-parametric PFA model. Using a simple parametric PFA model, increasing returns to scale theory can be tested by estimating a function of the form Effort = a(FP)q [7]. Where Effort is software development effort and FP is software size in function points. The parameters a and q are estimated by taking logarithms on both sides and solving the following equation using linear regression: lnðEffortÞ ¼ a þ q lnðFPÞ: A simple parametric PFA model is criticized for its limitations. Banker and Kemerer [7] argue that the simple parametric PFA model does not allow for the possibility of increasing returns for some projects and decreasing returns for others. To relax some of the restrictions posed by simple parametric PFA model, Banker and Kemerer [7] have used the second order twice-continuously-differentiable production function model (SOTPFM). The SOTPFM for testing economies of scale between

1042

P.C. Pendharkar / European Journal of Operational Research 172 (2006) 1040–1050

software effort and software size in FP can be written as follows: 2

lnðEffortÞ ¼ b0 þ b1 ðlnðFPÞÞ þ b2 ðlnðFPÞÞ : Both the simple parametric PFA and the SOTPFM are criticized for imposing considerable untested structure on the production function [7,8]. Several researchers have argued that flexible parametric functional forms frequently violate regularity conditions, such as monotonically increasing relationship between the inputs and the outputs [7]. Banker and Kemerer [7] argue that ‘‘. . .the limited a priori knowledge about the functional form of the production process underlying software development, specifying a parametric form for the production correspondence is difficult to substantiate and validate statistically’’. Using the production economics theory as a guideline, Banker and Kemerer [7] propose the use of non-parametric DEA approach for production function estimation. Banker and Kemerer [7] argue that the DEA approach is superior to parametric approaches as it does not impose a particular form on the production function and assumes a monotonically increasing and convex relationship between inputs and outputs. Monotonicity and convexity are standard economic production function assumptions [7]. The robustness of DEA models was demonstrated in several independent studies, Banker et al. [3] and Banker et al. [5] compared the performance of DEA with the performance of other frontier regression techniques, such as corrected ordinary least squares (COLS) and found that DEA models generally perform better than COLS for non-classical efficiency distributions.

3. Factors impacting size of OO component and source code documentation size The ‘‘size’’ of the software source code documentation is measured as documentation lines of code (DLOC), and the software size can be measured as source lines of code (SLOC). Generally, the factors that lead to increase in software size also tend to increase the documentation size [29]. Software size in SLOC is directly related to the software complexity. A few researchers measure

software complexity using function points (FP) [6]. Laranjeira [22] showed that SLOC and FP are mathematically related. The mathematical relationship can be represented as KLOC = (B  FP)  A, where KLOC is software size in thousands of lines of code, and A and B are constants. Thus, it can be concluded that the size of an object-oriented system depends on the complexity of the system. There are three levels of complexity for an object-oriented system—method, class and system [24]. Chidamber and Kemerer [14] note that the number of the methods used in a class determines complexity at the method level. They state that the number of methods increase the amount of time and effort required for developing and maintaining the class. Further, many methods in a class have a great impact on sub-classes, as sub-classes inherit all the methods from the parent class. Given that sub-classes inherit methods from the parent class, many methods may impose restrictions or cause difficulties in the ability to inherit [14]. The number of parameters used in a method increases the cognitive complexity [37]. The impact of number of parameters used in the method on the component size is not well established [23,24]. Class level complexity depends on the complexity of the given class itself, the complexity of any inherited class, and the number of inherited classes [24,14]. Class complexity may be explained by the complexity of methods [24]. The number of subclasses that inherit methods of the parent class increases their complexity. Chidamber and Kemerer [14] identify the following reasons for increase in complexity with increase in number of sub-classes. With an increase in the number of sub-classes the level of reuse increases. Further, with increase in the number of sub-classes more testing is necessary to ensure proper reliability. System level complexity includes non-objectoriented parts that may be too relevant to be neglected [24]. System level non-object-oriented parts include a set of global definitions of types, structures, unions and global declarations of the variables. The complexity at system level may measured as: number of system functions/procedures, number of global definitions, number of global variables, number of graphical user inter-

P.C. Pendharkar / European Journal of Operational Research 172 (2006) 1040–1050

face (GUI) elements in a component, and number of events and state changes handled by window [24,12]. The earlier component based software development studies divided the components based on their functionality. Verner and Tate [36] and Dolado [17] divided the software components, for Informix-4GL, into three different categories: menus, input, and report/inquiries. Dolado [17] study illustrated that component type is an important factor impacting software size. Damiani et al. [16] highlighting the importance of a software component classification by type of component note ‘‘. . .we believe that correct component classification can help to address several other problems, besides reuse, such as code comprehension for reverse engineering, dynamic domain modeling, evaluation of programming language dependencies, and usage patterns’’. Damiani et al. [16] suggest following six principles for classifying the object-oriented software components: 1. 2. 3. 4. 5.

descriptor-based behavioral classification, controlled granularity, language independence, trainable user-adaptive response, support for both query and navigational interfaces, and 6. thesaurus-based controlled vocabulary. Bielak [12] in his study categorizes an application into seven different object-oriented packages of related behavioral functionality. These seven categories (called component types hereafter) are: (1) Application component: which is a high-level user interface component that allows user data display and editing, object property dialogs, and calculations setup dialogs. (2) Data component: an abstract data object factory which provides GUI components for user data selection. (3) Filter component: a GUI component which can be used for importing and exporting ‘‘foreign’’ data formats. (4) GUI components: simple GUI components such as text fields, radio buttons, etc.

SLOC

Met.

1043

DLOC

GUI

Sub-C

Event

Fig. 1. A causal model of factors impacting software and source code documentation size.

(5) Business components: basic abstract business objects. (6) Concrete business input/output components: concrete input/output components for abstract business components. (7) Application support components: user display and editing, object property dialogs, and calculation setup dialogs. The different types of components represent different levels of complexity. For example, Bielak [12] writes, ‘‘At business object level (Obj), components are relatively compact, with behaviors generally restricted to read, write, and data accessors. Indeed, several ‘‘set’’ and ‘‘get’’ functions consist of nothing more than one or two lines of code. At the other extreme, user interface member functions are on average several SLOC larger, carrying the burden of code required to process widget instantiation and configuration, event handling, and display updating’’. Fig. 1 illustrates the causal model used for our research. There are two OO component size outputs (dependent variables) that depend on four input (independent) variables. The output variables are SLOC and DLOC. The input variables are number of methods (Met.), number of subclasses (Sub-C), number of events (Event) and number of GUI elements in a component.

4. Data collection and model validation We use structural equation modeling (SEM) to empirically test the effects of independent variables on source code documentation size and software size. The results of SEM will allow us to test

1044

P.C. Pendharkar / European Journal of Operational Research 172 (2006) 1040–1050

significance, magnitude and direction of the relationships between the independent and the dependent variables. We use the data for 152 OO components developed in C++ programming language. The data has been used in previous software engineering literature [26,12] and the details of the data are available in Bielak [12]. Table 1 illustrates the means and standard deviations of the independent and dependent variables in the data set. The SEM analysis provides a stronger basis for analyzing the relationships between the independent and dependent variables [21]. We use LISREL program for specifying and testing the proposed causal model in Fig. 1. LISREL considers and solves for all the relationships simultaneously. In case of regression analysis, a model has to be solved for each dependent variable and the set of independent variables. Among other advantages of SEM are considerations for residuals that are not included in model, availability of ‘‘goodness of fit’’ indices to determine the validity of the proposed causal model. Using the variables identified in Table 1, the data set with 152 OO components, LISERAL package and the causal model from Fig. 1, we conducted the SEM analysis. Fig. 2 and Table 2 illustrate the results of the SEM analysis. To evaluate the results of the LISREL model, we consider Akaike [1] information criterion (AIC) and BozdoganÕs [13] consistent information criterion (CAIC). As shown in Table 2, the values of independent model and the model of interest are computed. Based on the values of AIC and CAIC, the model of interest provided lower values when compared to independent model. The Bentler–Bonnett normed index, Bentler–Bonett non-normed index and the comparative fit index values were 0.88, Table 1 The descriptive statistics of the variables Variable

Mean

Standard deviation

DLOC SLOC Methods Sub-classes GUI elements Events

479.93 463.41 19.1 7.1 11.51 7.5

473.42 539.26 17.6 10.9 17.3 11.1

0.944 SLOC

DLOC

0.51 0.26 0.41 Met.

0.16 0.19 0.44

GUI

Sub-C

0.19 0.23 Event

Fig. 2. Significant path structural equation model.

Table 2 Structural model fit indicators Parameter

Value

Independence AIC Model AIC Independence CAIC Model CAIC Bentler–Bonett normed fit index Bentler–Bonett non-normed fit index Comparative fit index Saturated AIC Saturated CAIC Goodness of fit index Adjusted goodness of fit index

843.2 111.2 867.3 191.6 0.88 0.8 0.90 42.00 126.50 0.90 1.85

0.80 and 0.90, respectively. When the values of these indices are near 0.9 the resulting model is considered to be a good fit. The goodness of fit indices (GFI = 0.90 and AGFI = 1.85) confirm a good fit as well. The inter-factor correlation of 0.944 shows good convergent validity between the two dependent variables SLOC and DLOC. All the independent variables significantly explained the variances in SLOC and DLOC at a 0.05 level of significance. Fig. 2 illustrates the significant coefficients for each of the independent variables. The positive coefficients indicate monotonically increasing relationship between all of the input and the output variables.

5. The DEA variable returns to scale economy experiments The DEA is a tool that is used for evaluating the performance of manufacturing and service

P.C. Pendharkar / European Journal of Operational Research 172 (2006) 1040–1050

operations. The DEA is a multi-factor productivity model for measuring the efficiency for homogeneous set of decision-making units (DMUs). Assuming that there are n DMUs, each with m inputs and s outputs, then the relative inefficiency score for pth 2 {1, . . . , n} DMU can be obtained by using the following linear program [35]. hCp n X

min s:t:

ki y ki  y kp P 0

ki P 0

8k 2 f1; . . . ; sg;

ki xji  hxjp 6 0

8j 2 f1; . . . ; mg;

i¼1

ki y ki  y kp P 0

8k 2 f1; . . . ; sg;

and

n X

ki ¼ 1;

8i 2 f1; . . . ; ng;

i¼1

8i 2 f1; . . . ; ng;

where hCp is the Charnes, Cooper and Rhodes (CCR) [11] inefficiency score for the pth DMU, xji is the amount of input j used by DMU i, yki is the amount of output k produced by DMU i, and ks are dual variables. Based on the CCR model formulation, a test DMU p is inefficient if a composite (linear combination of all the units in the set) can be shown to consume less input that the test DMU while maintaining same output levels. For each inefficient DMU, DEA allows the user to calculate the improvements that are necessary in a DMUs inputs and outputs so that the DMU becomes efficient. A total of n linear programs are solved for calculating inefficiency scores for each of n DMUs. Fig. 3 illustrates an example of CCR efficient frontier. The DMUs on efficient frontier are efficient. The CCR model assumes constant returns to scale (CRS) economies, which means that doubling output exactly doubles costs. Banker, Char-

Output

s:t:

hBp n X

ki P 0

i¼1

5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0

min

i¼1

i¼1 n X

nes and Cooper (BCC) [9] proposed the following model to account for the variable returns to scale economies.

n X

ki xji  hxjp 6 0 8j 2 f1; . . . ; mg;

1045

where hBp is the BCC inefficiency score for the DMU p. Fig. 4 illustrates an example of BCC efficient frontier. Under variable returns of scale economies, outputs and inputs have a non-linear increasing or decreasing relationship. For example, if production function is decreasing returns to scale then increase in output requires a greater proportional increase in all inputs. For our VRS economy experiments, each OO software component was considered as a DMU. The inputs for the DMU were the number of methods, the number of sub-classes, the number of GUI elements, and the number of events. The outputs for the DMU were SLOC and DLOC. Using the for 152 components two inefficiency scores were obtained, for each DMU, one for each of the CCR and the BCC model. Using the methodology described in Banker and Slaughter [2], we tested for the existence of VRS economy between the inputs and output variables. Tables 3 and 4 illustrate the results of our experiments for different distribution assumptions. The

CCR Efficient Frontier

Regression Line

0

1

2

3

4

5

6

Input

Fig. 3. An example of CCR efficient frontier.

Fig. 4. An example of BCC efficient frontier.

1046

P.C. Pendharkar / European Journal of Operational Research 172 (2006) 1040–1050

Table 3 Variable returns to scale DEA tests on software size Distribution Exponential Half-normal **

Test statistic P152 C P152 B i¼1 ðhi  1Þ= i¼1 ðhi  1Þ P152 C 2 P152 B 2 i¼1 ðhi  1Þ = i¼1 ðhi  1Þ

F value 4.97** 8.82**

Significant at a = 0.05.

Table 4 Variable returns to scale DEA tests on documentation size Distribution Exponential Half-normal **

Test statistic P152 B P152 C i¼1 ðhi  1Þ= i¼1 ðhi  1Þ P152 C 2 P152 B 2 i¼1 ðhi  1Þ = i¼1 ðhi  1Þ

F value 2.58** 2.81**

Significant at a = 0.05.

results indicate that hypothesis of VRS economy was supported at 95% confidence. The existence of VRS economy means that SLOC and DLOC will not linearly increase with the increase in the values of independent variables, but a non-linear relationship exists. This indicates that non-linear forecasting models for forecasting DLOC and SLOC may fare better than the linear multiple regression approach. The use of non-linear forecasting models have a potential to accurately estimate software and documentation size, which managers may find useful in preparing their budgets for future OO software projects. Further, better data fitting and more accurate heuristics can be developed if a non-linear model is used to learn a forecasting function. The selection of non-linear model is important. For example, if non-linear quadratic regression model is chosen then a researcher is making an assumption of quadratic relationship between the independent and the dependent variables. Artificial neural networks (ANNs) are used to learn non-linear forecasting models, but do not make any assumption on the type of relationship learned to map a set of independent variables to the corresponding dependent variables. Additionally, multiple dependent variables can be used in an ANN regression model. ANNs have been applied to numerous nonparametric and non-linear classification and forecasting problems [19,34]. In an ANN model, a neuron is an elemental processing unit that forms part of a larger network. There are two basic types

of ANNs: a single layer (of connections) network and a double layer (of connections) network. A single-layer network uses the perceptron convergence procedure [33]. A modification of the perceptron convergence procedure can be used to minimize the mean-square error between the actual and desired outputs in a two-layer network [18], which yields a non-linear, multivariate-forecasting model. The backpropagation learning algorithm [32], most commonly used to train multilayer networks, implements a gradient search to minimize the squared error between realized and desired outputs. Fig. 5 illustrates a three-layer (of nodes) ANN that may be used for multivariate forecasting. The number of input layer nodes corresponds to the number of independent variables describing the data. The number of nodes in the hidden layer determines the complexity of the forecasting model and needs to be empirically determined to best suit the data being considered. While larger networks tend to over fit the data, too few hidden layer nodes can hinder learning of an adequate separating region. Although having more than one hidden layer provides no advantage in terms of nature of forecasting accuracy, it can in certain cases provide for faster learning [32]. The number of output nodes represents the number of dependent variables. The backpropagation algorithm is a supervised learning algorithm, which needs training data to learn initial set of connection weights in an ANN. The data set used to learn connection

Met.

SLOC

GUI Sub-C

DLOC

Event

Fig. 5. An ANN model for forecasting SLOC and DLOC.

P.C. Pendharkar / European Journal of Operational Research 172 (2006) 1040–1050

weights for an ANN is called the training data. After the connection weights are learnt, the performance of an ANN can be validated using the test data. The test data typically contains examples different from that of the examples in the training data, but of similar characteristics (variables, distributions, number of examples, etc.). For a given data set, let ASLOC and ADLOC represent the actual values of SLOC and DLOC for an OO component. Further, assume that PSLOC and PDLOC represent the ANN predicted values of SLOC and DLOC for an OO component. The differences between the ANN predicted values and actual values for SLOC and DLOC can be represented as ESLOC and EDLOC. The variables ESLOC and EDLOC can be used to compare the performance of ANN. The lower values of the square of ESLOC and EDLOC usually mean better performance and production function fit.

6. Comparing the performance of ANN and linear regression for forecasting OO component and source code documentation size We use backpropagation (BP) algorithm for training the ANN. We randomly divide our dataset of 152 examples into two data sets of non-overlapping 76 examples each. We use the first data set for training the ANN, and the second data set for testing the performance of ANN. In order for ANN procedure to work well, the training and test data should have similar characteristics. We performed a simple t-test to check for differences in means for the two dependent variables SLOC and DLOC between training and test data. Tables 5 and 6 illustrate the results of our t-test. The values in first two columns have a format of Mean (Standard deviation). The results from Tables 5 and 6 indicate that there is no difference in means for software and

Table 5 The difference of means t-test for SLOC SLOC (testing)

SLOC (training)

t-value

Significance

419.7 (438.8)

422.6 (432.4)

0.04

0.97

1047

Table 6 The difference of means t-test for DLOC DLOC (testing)

DLOC (training)

t-value

Significance

431.7 (452.8)

495 (615.1)

0.72

0.47

source code documentation sizes between training and test data sets. In order to get good results, we carefully selected an appropriate design of the ANN. Among the design issues were: 1. Learning, generalizability, and overlearning issues: The network convergence criteria (stopping criteria) and learning rate determines how well and how quickly a network learns. A lower learning rate increases the time it takes for the network to converge, but it does find a better solution. The learning rate was set to 0.1. The convergence criteria was set as follows: IF (jScaled Actual_Output  Scaled Predicted_ Outputj 6 0.1 for all examples) OR Training iterations P Maximum iterations THEN Convergence = Yes ELSE Convergence = No. we selected the above-mentioned convergence criteria to account for the high variability of the dependent variable. A more strict convergence criterion was possible; however, an issue arose regarding overfitting the network on training data. Evidence in the literature shows that overfitting minimizes the sum of the square error in the training set at the expense of the performance on the test set [10]. We believed that the above-mentioned convergence criteria can make the network learning more generalizable [26]. A standard backpropagation algorithm was used to learn the weights. The backpropagation algorithm uses minimization of sum of square as the optimization criteria. The maximum number of iterations was set to 10,000. 2. Network structural issues: The network structure that we chose for the current study was similar to the one shown in Fig. 5. The ANN had four input nodes and two output nodes. We had a three-layer (of nodes) network for modeling a non-linear relationship between the independent

1048

P.C. Pendharkar / European Journal of Operational Research 172 (2006) 1040–1050

variables and the dependent variables. The number of hidden nodes was twice the number of input nodes, which is a more common heuristic for smaller sample sizes [10,31]. In the case of larger sample sizes, a higher number of hidden nodes are recommended [25]. The connections weights for the ANN were learnt using the training data. For the training data, the learning was terminated at the end of 10,000 iterations and a root-mean-square (RMS) error of 193.67 was observed. For the test data, the RMS value was 251.25. Using the training data, we developed two linear-regression models, one for each of the dependent variables SLOC and DLOC, and all the independent variables. The linear forecasting models learnt from the training data are shown below. Tables 7 and 8 illustrate the overall significance of the models. SLOC ¼ 12:533ðEventsÞ þ 23:852ðGUIÞ þ 7:310ðMethodsÞ

Fig. 6. The sum of squared error for each component in training data.

Fig. 7. The sum of squared error for each component in test data.

þ 2:367ðSubClassesÞ  38:571; DLOC ¼ 8:612ðEventsÞ þ 17:182ðGUIÞ þ 9:462ðMethodsÞ þ 2:197ðSubClassesÞ þ 45:069: The RMS errors for linear regression model for training and test data were 290.75 and 338.64, respectively. Figs. 6 and 7 illustrate the plot of

sum-of-squared error using ANN and linear regression for each component in the training and the test data. The RMS values for ANN were lower than RMS values for linear-regression for both the training and the test data sets. It can be seen from the Figs. 6 and 7 that ANN outper-

Table 7 The regression summary table for SLOC

Regression Residual Total *

Sum of square

DF

Mean square

F value

Significance

25,291,180 3,088,416.5 28,379,597

4 71 75

6,322,795.04 43,498.824

145.356

0.000*

Significant at 1%; R-square 0.89.

Table 8 The regression summary table for DLOC

Regression Residual Total *

Sum of square

DF

Mean square

F value

Significance

17,052,463 3,336,317.7 20,388,781

4 71 75

4,263,115.82 46,990.39

90.723

0.000*

Significant at 1%; R-square 0.84.

P.C. Pendharkar / European Journal of Operational Research 172 (2006) 1040–1050

formed linear regression in both the training and the test data. 7. Summary and conclusions In the current research we used tools from economic production process to govern the choice of production function estimation (forecasting) model. Using the empirical data and SEM, we identified the factors that impact the OO component and source code documentation size. We used DEA to illustrate that increasing variable returns to scale economy exists between the predictor variables and predicted variables of OO component and source code documentation size. The existence of variable returns to scale economy indicated that non-linear forecasting model will almost certainly provide better forecasting estimates than linear-regression model. Our experiments comparing the performance of ANN and multiple-linear regression confirm the superior performance of non-linear regression model ANN over the linear multiple regression model. The higher performance of non-linear model was demonstrated in both the training and validation (testing) data sets. For production processes that generally satisfy monotonicity property between inputs and output(s), we believe that DEA may be a useful tool to identify an appropriate forecasting model that can be used to learn the underlying production function. For example, if DEA experiments indicate that constant-returns-to-scale (CRS) economy exists between the predictor and predicted variable(s) then linear regression model may be appropriate to learn the production function. However, if VRS economy exists between the predictor and predicted variables then ANN may be an appropriate forecasting model to use. References [1] H. Akaike, Factor analysis and AIC, Psychometrika 52 (1987) 317–332. [2] R.D. Banker, S. Slaughter, A field study of scale economies in software maintenance, Management Science 43 (12) (1997) 1709–1725.

1049

[3] R.D. Banker, H.H. Chang, W.W. Cooper, Simulation studies of efficiency, returns to scale and misspecification with nonlinear functions in DEA, Annals of Operations Research 66 (1996) 233–253. [4] R.D. Banker, H.H. Chang, C.F. Kemerer, Evidence on economies of scale in software development, Information and Software Technology 36 (5) (1994) 275–282. [5] R.D. Banker, V.M. Gadh, W.L. Gorr, A Monte-Carlo comparison of 2 production frontier estimation methodscorrected ordinary least-squares and data envelopment analysis, European Journal of Operational Research 67 (3) (1993) 332–342. [6] R.D. Banker, S.M. Datar, C.F. Kemerer, A model to evaluate variables impacting the productivity of software maintenance projects, Management Science 37 (1) (1991) 1–18. [7] R.D. Banker, C.F. Kemerer, Scale economies in new software development, IEEE Transactions on Software Engineering 15 (10) (1989) 1199–1205. [8] R.D. Banker, A. Maindiratta, Piecewise loglinear estimation of efficiency production surfaces, Management Science 32 (1) (1986) 126–135. [9] R.D. Banker, A. Charnes, W.W. Cooper, Some models for estimating technical and scale inefficiencies in DEA, Management Science 20 (9) (1984) 1078–1092. [10] S. Bhattacharyya, P.C. Pendharkar, Inductive, evolutionary and neural techniques for discrimination: A comparative study, Decision Sciences 29 (4) (1998) 871–900. [11] A. Charnes, W.W. Cooper, E. Rhodes, Evaluating program and managerial efficiency: An application of data envelopment analysis to program follow through, Management Science 27 (6) (1981) 668–697. [12] J. Bielak, Improving size estimates using historical data, IEEE Software (2000) 27–35. [13] H. Bozdogan, Model selection and AkaikeÕs information criteria (AIC): The general theory and its analytical extensions, Psychometrika 52 (1987) 345–370. [14] S.R. Chidamber, C.F. Kemerer, A metrics suite for object oriented design, IEEE Transactions on Software Engineering 20 (6) (1994) 476–493. [15] S. Conte, H. Dunsmore, V. Shen, Software Engineering Metrics and Models, Benjamin/Cummings, Reading, MA, 1986. [16] E. Damiani, M.G. Fugini, C. Bellettini, A hierarchy-aware approach to faceted classification of object-oriented components, ACM Transactions on Software Engineering and Methodology 8 (4) (1999) 425–472. [17] J.J. Dolado, A validation of the component-based method for software size estimation, IEEE Transactions on Software Engineering 28 (10) (2000) 1006–1021. [18] R.O. Duda, P.E. Hart, Pattern Classification and Scene Analysis, John Wiley and Sons, New York, 1973. [19] B.A. Jain, B.N. Nag, Artificial neural network models for pricing initial public offerings, Decision Sciences 26 (3) (1995) 283–302. [20] C. Jones, Programming Productivity, McGraw-Hill, New York, 1986.

1050

P.C. Pendharkar / European Journal of Operational Research 172 (2006) 1040–1050

[21] K.G. Joreskog, D. Sorbom, Lisrel 7 A Guide to the Program and Applications, SPSS Inc., Chicago, IL, 1989. [22] L.A. Laranjeira, Software size estimation of object-oriented systems, IEEE Transactions on Software Engineering 16 (5) (1990) 510–522. [23] M. Lorenz, J. Kidd, Object-Oriented Software Metrics: A Practical Guide, Prentice Hall, Englewood Cliffs, NJ, 1994. [24] P. Nesi, T. Querci, Effort estimation and prediction of object-oriented systems, The Journal of Systems and Software 42 (1) (1998) 89–102. [25] E. Patuwo, M.Y. Hu, M.S. Hung, Two-group classification using neural networks, Decision Sciences 24 (4) (1993) 825– 845. [26] P.C. Pendharkar, An exploratory study of object-oriented software component size determinants and the application of regression tree forecasting models, Information and Management 42 (1) (2004) 61–73. [27] P.C. Pendharkar, J.A. Rodger, Technical efficiency-based selection of learning cases to improve forecasting accuracy of neural networks under monotonicity assumption, Decision Support Systems and Electronic Commerce 36 (1) (2003) 117–136. [28] P.C. Pendharkar, G.H. Subramanian, J.A. Rodger, A probabilistic model for predicting software development effort, Lecture Notes in Computer Science 2668 (2003) 581–588. [29] P.C. Pendharkar, J.A. Rodger, An empirical study of factors impacting the size of object-oriented component

[30]

[31]

[32]

[33] [34]

[35] [36] [37]

code documentation, in: Proceedings of the 20th ACM International Conference on Systems Documentation, Toronto, Canada, October, 2002, pp. 152–157. P.C. Pendharkar, G.H. Subramanian, Connectionist models for learning, discovering and forecasting software effort: An empirical study, The Journal of Computer Information Systems 43 (1) (2002) 7–14. P.C. Pendharkar, An empirical study of design and testing of hybrid evolutionary-neural approach for classification, Omega: An International Journal of Management Science 29 (4) (2001) 361–374. D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning internal representations by error propagation, in: D.E. Rumelhart, J.L. McClelland (Eds.), Parallel Distributed Processing: Exploration in the Microstructure of Cognition, vol. 1: Foundations, MIT Press, Cambridge, MA, 1986. R.J. Schalkoff, Artificial Neural Networks, McGraw-Hill, New York, 1997. J.W. Shavlik, R.J. Mooney, G.G. Towell, Symbolic and neural learning algorithms: An experimental comparison, Machine Learning 6 (1991) 111–143. S. Talluri, Data envelopment analysis: Models and extensions, Decision Line 31 (3) (2000) 8–11. J. Verner, G.A. Tate, Software size model, IEEE Transactions on Software Engineering 18 (4) (1992) 265–278. H. Zuse, Software Complexity: Measures and Methods, Walter de Gruyter, Berlin, 1991.